Line lists

On this page

Overview

A line list is a table that contains key information about each case in an outbreak, with each row representing a case and each column representing a variable such as demographic, clinical and epidemiologic information (e.g., risk factors and exposures). Line list information describes an outbreak in terms of person, place and time. The line list may be created before, or at the same time as the hypothesis generating questionnaire.

A well-constructed line list will summarize the known information, allowing for quick identification of trends, missing information, and errors. This will facilitate the creation of descriptive statistics and the epidemic (epi) curve.

Outbreaks are all unique and therefore line lists will vary from outbreak to outbreak. Line list construction has some flexibility as it is driven by the needs of the investigation process, however, the use of a consistent set of data elements and formats help to ensure data consistency, standardization, accuracy, and reliability.

Back to top

Variables

Each column of the line list represents one variable (also called a “field”). To facilitate analysis, it is best to use closed answers and limit free text. Key dates are also important as the date of illness onset is not always known, it is important to collect as much date information as possible. The most common line list variables are (but not limited to):

  • Administrative variables
    • Unique ID number/case identifier
    • Questionnaire completion status
    • Interviewer information
    • Case status (e.g., not a case, suspect, secondary, probable, or confirmed)
  • Demographic information and outcome
    • Age
    • Sex
    • Geolocator(s) (e.g., province, region, city)
    • Symptoms (one column for each symptom)
    • Date of illness onset
    • Death
    • Hospitalization
  • Laboratory information
    • Pathogen
    • Serovar
    • Typing results (e.g., PFGE, genotype, etc.)
    • Date of specimen collection
    • Date reported (by laboratory)
  • Exposures: this text field can be used to capture key food exposures of interest as well as information such as lot codes, purchase dates, and purchase locations that are important for guiding food safety investigations. If there are common exposures identified, it is best to capture each different exposure in a separate column.
  • Notes: this text field can be used to capture information which is believed to be relevant but is not captured in any of the above variables.

Back to top

Software considerations

Microsoft Excel is an efficient and easy-to-use tool for outbreak investigations since descriptive statistics can be done in the same program, but care must be taken with data entry to ensure that the variables are formatted correctly. Cautions when using Microsoft Excel for line lists are described below.

Cautions when using Microsoft Excel for a line list:

  • When sorting data, freeze the column heading row (i.e., variable names), so that the heading does not get sorted in with all the other lines of the spreadsheet.
  • To properly sort data by column, the entire spreadsheet must be selected. Selecting just a single column can introduce inaccuracy into the data set.
  • Pay careful attention when the auto fill feature attempts to fill in fields, so that data is accurately entered.
  • Format each column, particularly date columns, to ensure that the data is of consistent type.
  • When possible, limit data entry error by using in-cell drop-down menu and/or field restrictions.

Line lists can also be extracted from database systems such as Microsoft Access, Epi Info, and Epi Data. These database systems offer more features to minimize data entry error when used by well-trained personnel. However, building a database is more complex process that should not be undertaken with the sole purpose of maintaining a line list.

Back to top

Examples

Back to top

Tools

Toolkit line list and data dictionary

  • This Microsoft Excel-based tool is designed to be used as a template for foodborne outbreak investigation line lists. Once data has been entered, common descriptive statistics are automatically calculated. A data dictionary describing each data field in the line list is available in the final tab.

Toolkit outbreak response database

  • This Microsoft Access Database will follow the layout and structure of the PHAC enteric hypothesis generating questionnaires. Users will be able to enter data, export select fields to a Microsoft Excel line list, generate automatic food and risk exposure summary tables and run custom queries. This database tool is expected to be complete by the end of 2015.  

Back to top

References

Gregg, M.B (ed.). 2002. Field Epidemiology, 2nd Edition.Oxford University Press, Oxford, England.

MacDonald, P.D.M. 2012. Methods in Field Epidemiology. Jones & Bartlett Learning, Burlington, MA.

World Health Organization. 2008. Foodborne disease outbreaks: guidelines for investigation and control. WHO Press, World Health Organization, Geneva, Switzerland. http://www.who.int/foodsafety/publications/foodborne_disease/outbreak_guidelines.pdf