Analytic studies

On this page


Analytic studies are used during an outbreak investigation if and when there is at least one clear hypothesis regarding the source of the outbreak. By using an analytic study, investigators can estimate the risks of illness/disease associated with a hypothesis (e.g., a specific exposure). There are two main study designs used for outbreak investigations: retrospective cohort studies and case-control studies.

The choice of study design in an outbreak is specific to each outbreak, and is based on a number of factors, including the nature of the population at risk, feasibility in terms of resources and logistics, and timeliness.

Back to top

Cohort studies

In the context of an enteric outbreak, a cohort study design may be used if the group of individuals that are at risk are members of a defined group. For example, students in one class, employees that attended a company picnic, attendees at a church dinner, and participants at a sporting event. Cohort studies are therefore commonly used when an outbreak occurs at a specific event or venue.

Enteric outbreak cohort studies are retrospective, i.e. “retrospective cohorts”, because cases and controls are identified at the same time, with subsequent identification of exposures. In contrast, in typical cohort studies in epidemiology, a group of exposed and unexposed people are followed prospectively through time; at the end of the time interval, the proportion of people that develop the outcome in question is compared between the two groups.

The measure of association in a cohort study is the relative risk, or risk ratio (RR): a ratio of the risk of disease in the exposed group to the risk of disease in the unexposed group.

  Disease No disease Totals
Exposed a b a + b
Unexposed c d c + d


a/(a+b) = risk in the exposed group

c/(c+d) = risk in the unexposed group

Relative risk = [a/(a+b)]/[c/(c+d)]

Back to top

Case-control studies

The case-control study design is commonly used in outbreaks that are not linked to a specific event or location, and/or when the population at risk is not clearly defined. For example, if cases are identified through a surveillance system from across a large geographic area with no apparent commonalities.

Case-control studies used in outbreak investigations are retrospective: cases (and controls) are identified first; exposures are determined subsequently. The selection of an appropriate comparison group or controls is one of the most difficult aspects of designing and implementing a case-control study. The goal of control selection is to enroll individuals who are as similar to the cases as possible (other than the exposure of interest), in order to minimize potential biases.

The measure of association used to assess exposure and outcome relationships in a case-control design is an odds ratio (OR): a proportion of the odds of disease in the exposed group to the odds of disease in the unexposed group. In the context of a foodborne disease outbreak, an OR is a relative measure of the odds of consumption of a food product in the case group to the odds of consumption of the same food product in the control group.

  Disease No disease Totals
Exposed a b a + b
Unexposed c d c + d


a/b = the odds that an exposed person develops illness

c/d = the odds that an unexposed person develops illness

Odds ratio = (a/b)/(c/d) 

Back to top


Back to top


Toolkit enteric questionnaire repository

  • Questionnaires for various pathogens, to be used at various stages of an outbreak investigation, including hypothesis generation, refinement, and testing.


  • OpenEpi is a free and open source software for epidemiologic statistics. It provides statistics for counts and measurements such as stratified analysis with exact confidence limits, matched pair and person-time analysis, sample size and power calculations, random numbers, sensitivity, specificity and other evaluation statistics, R x C tables, chi-square for dose-response, and links to other useful sites.


  • EpiSheet is a free and open source software written and developed by Dr. Kenneth Rothman. It is a downloadable Microsoft Excel spreadsheet used for analyzing epidemiologic data and is meant to support epidemiologists, analysts, and statisticians in their analytical work.

EpiData Software

  • Epidata software is a free and open source software created for epidemiologists with two components, EpiData Entry and EpiData Analysis. EpiData Entry is primarily used for simple data entry and data documentation. EpiData Analysis performs basic statistical analysis, graphs and comprehensive data management.

Epi Info

  • Epi Info™ is a public domain suite of software tools designed for public health practitioners and researchers. It provides for data entry form and database construction and data analyses with epidemiologic statistics, maps, and graphs for public health professionals who may lack an information technology background.


  • R is a language and environment for statistical computing and graphics. It is free and open source.


  • RStudio is an integrated development environment (IDE) for R. It includes a console, syntax-highlighting editor that supports direct code execution, as well as tools for plotting, history, debugging and workspace management. RStudio is available in open source and commercial editions.  

Back to top

Further reading

Kanchanaraska, Sukon. 2008. “Estimating risk”. Johns Hopkins Bloomberg School of Public Health. Available at:

Back to top