## On this page

## Overview

Analytic studies are used during an outbreak investigation if and when there is at least one clear hypothesis regarding the source of the outbreak. By using an analytic study, investigators can estimate the risks of illness/disease associated with a hypothesis (e.g., a specific exposure). There are two main study designs used for outbreak investigations: retrospective cohort studies and case-control studies.

The choice of study design in an outbreak is specific to each outbreak, and is based on a number of factors, including the nature of the population at risk, feasibility in terms of resources and logistics, and timeliness.

## Cohort studies

In the context of an enteric outbreak, a cohort study design may be used if the group of individuals that are at risk are members of a defined group. For example, students in one class, employees that attended a company picnic, attendees at a church dinner, and participants at a sporting event. Cohort studies are therefore commonly used when an outbreak occurs at a specific event or venue.

Enteric outbreak cohort studies are retrospective, i.e. “retrospective cohorts”, because cases and controls are identified at the same time, with subsequent identification of exposures. In contrast, in typical cohort studies in epidemiology, a group of exposed and unexposed people are followed prospectively through time; at the end of the time interval, the proportion of people that develop the outcome in question is compared between the two groups.

The measure of association in a cohort study is the relative risk, or risk ratio (RR): a ratio of the risk of disease in the exposed group to the risk of disease in the unexposed group.

Disease | No disease | Totals | |
---|---|---|---|

Exposed | a | b | a + b |

Unexposed | c | d | c + d |

a/(a+b) = risk in the exposed group

c/(c+d) = risk in the unexposed group

**Relative risk** = [a/(a+b)]/[c/(c+d)]

## Case-control studies

The case-control study design is commonly used in outbreaks that are not linked to a specific event or location, and/or when the population at risk is not clearly defined. For example, if cases are identified through a surveillance system from across a large geographic area with no apparent commonalities.

Case-control studies used in outbreak investigations are retrospective: cases (and controls) are identified first; exposures are determined subsequently. The selection of an appropriate comparison group or controls is one of the most difficult aspects of designing and implementing a case-control study. The goal of control selection is to enroll individuals who are as similar to the cases as possible (other than the exposure of interest), in order to minimize potential biases.

The measure of association used to assess exposure and outcome relationships in a case-control design is an odds ratio (OR): a proportion of the odds of disease in the exposed group to the odds of disease in the unexposed group. In the context of a foodborne disease outbreak, an OR is a relative measure of the odds of consumption of a food product in the case group to the odds of consumption of the same food product in the control group.

Disease | No disease | Totals | |
---|---|---|---|

Exposed | a | b | a + b |

Unexposed | c | d | c + d |

a/b = the odds that an exposed person develops illness

c/d = the odds that an unexposed person develops illness

## Examples

- Matched case-control study and recipe-based restaurant cohort study example:Buchholz, U.
*et al*. 2011. German outbreak of*Escherichia coli*O104:H4 associated with sprouts.*N Engl J Med*. 365:1763-1770. - Case-control study example: Middleton, D.,
*et al*. 2014. Risk factors for sporadic domestically acquired*Salmonella*serovar Enteritidis infections: a case-control study in Ontario, Canada, 2011.*Epidemiol Infect.*142 (7) 1411-1421 - Retrospective cohort study example: Grinberg, A.,
*et al*. 2011. Retrospective cohort study of an outbreak of cryptosporidiosis caused by a rare*Cryptosporidium parvum*subgenotype.*Epidemiol Infect*. 139(10):1542-50. - Case control example: Barton Behravesh C,
*et al*. 2012. Multistate outbreak of*Salmonella*serotype Typhimurium infections associated with consumption of restaurant tomatoes, USA, 2006: hypothesis generation through case exposures in multiple restaurant clusters.*Epidemiol Infect*. 140 (11): 2053-2061.

## Tools

**Toolkit enteric questionnaire repository**

- Questionnaires for various pathogens, to be used at various stages of an outbreak investigation, including hypothesis generation, refinement, and testing.

- OpenEpi is a free and open source software for epidemiologic statistics. It provides statistics for counts and measurements such as stratified analysis with exact confidence limits, matched pair and person-time analysis, sample size and power calculations, random numbers, sensitivity, specificity and other evaluation statistics, R x C tables, chi-square for dose-response, and links to other useful sites.

**Statistical Tool Syntax Locator (StatTool)**

- The Statistical Tool Syntax Locator (StatTool) is a free and open source software developed by Public Health Ontario and is based on the table “What statistical analysis should I use?” developed by the University of California, Los Angeles (UCLA) Statistical Consulting Group. StatTool helps you choose a statistical test and provides syntax information in the software of your choice as an aid for analysis and reporting. It is intended for epidemiologists, analysts, and researchers performing statistical analysis and research.

- EpiSheet is a free and open source software written and developed by Dr. Kenneth Rothman. It is a downloadable Microsoft Excel spreadsheet used for analyzing epidemiologic data and is meant to support epidemiologists, analysts, and statisticians in their analytical work.

- Epidata software is a free and open source software created for epidemiologists with two components, EpiData Entry and EpiData Analysis. EpiData Entry is primarily used for simple data entry and data documentation. EpiData Analysis performs basic statistical analysis, graphs and comprehensive data management.

- Epi Info™ is a public domain suite of software tools designed for public health practitioners and researchers. It provides for data entry form and database construction and data analyses with epidemiologic statistics, maps, and graphs for public health professionals who may lack an information technology background.
- Epi Info 7 User Guides
- Tutorials
- Epi Info Community Group: This discussion board allows community members to post and reply to questions related to the Epi Info software and share training materials.

- R is a language and environment for statistical computing and graphics. It is free and open source.

- RStudio is an integrated development environment (IDE) for R. It includes a console, syntax-highlighting editor that supports direct code execution, as well as tools for plotting, history, debugging and workspace management. RStudio is available in open source and commercial editions.

## Further reading

Kanchanaraska, Sukon. 2008. “Estimating risk”. Johns Hopkins Bloomberg School of Public Health. Available at: http://ocw.jhsph.edu/courses/fundepiii/PDFs/Lecture16.pdf