Hypothesis generation

On this page


Generating hypotheses is an important, but often challenging, step in an outbreak investigation. When generating hypotheses, it is best to keep an open mind and to cast a wide net. A good starting place would be to identify exposures that have been previously been associated with the pathogen under investigation. This can be done by:

  1. searching an outbreak database such as Outbreak Summaries, the Marler-Clark database, and the CDC Foodborne Outbreak Online database (see Tools for links to these databases)
  2. reviewing the published literature using a search engine such as PubMed or Google Scholar.

If the case definition for the illness under investigation includes laboratory subtyping information in the form of a Pulsed-Field Gel Electrophoresis (PFGE) pattern, consider investigating where and when the pattern has been seen before. Provincial and federal public health laboratories maintain databases of PFGE patterns that can contain valuable information for outbreak investigation purposes. PulseNet Canada can provide information about how common or rare the pattern is nationally, where and when last it was seen, and if it has been detected in any food samples in the past. PulseNet Canada will also be able to check the United States’ PulseNet PFGE databases for matches. FoodNet Canada can provide information about whether the pattern has previously been seen in farm or retail samples from its sentinel sites.

While it is important to gather such historical information, the most effective way to generate a high-quality hypothesis is to identify common exposures amongst cases. This can be achieved by interviewing cases using a hypothesis generating questionnaire and analysing exposures. 

Back to top

Questionnaires and interviewing

Hypothesis generating questionnaires

Hypothesis generating questionnaires (or shotgun questionnaires) are intended to obtain detailed information on what a person’s exposures were in the days leading up to their illness. They are typically quite long and ask about many exposures such as travel history, contact with animals, restaurants, events attended, and a comprehensive food history. The time period of interest varies between pathogens, as the exposure period is equal to the maximum incubation period of the pathogen.

When designing a questionnaire, it is important to ensure that the questions are gathering the intended information. Questions should be concise, informal, and specific. Before interviewing cases, questionnaires should be tested to ensure clarity and identify any potential errors.

Read more – Questionnaire Design

Case interviewing

Once the questionnaire is developed and piloted, it should be administered to cases in a consistent and unbiased manner. Case interviews can be conducted by single or by multiple interviewers. A centralized approach allows a single interviewer to standardize interviews, detect patterns, and probe for items of interest. However, a multiple- interviewer approach is more time-efficient and allows for multiple perspectives when it comes time to identify the source.

Although case interviewing is an important outbreak investigation tool, it is not without its challenges. By the time the outbreak team is ready to conduct the interview, it could be weeks to months after the onset of symptoms. It is difficult for people to recall what they ate over a month ago. Sometimes cases might need to be interviewed multiple times as the hypothesis is developed and refined.

Read more- Case interviews

Back to top

Exposure analysis

Once the interviews are complete, the data can be entered into a database or line list. The frequency of exposures for the cases is then obtained (e.g., % of cases that consumed each food item).

It is tempting to conclude that the most commonly consumed food items are the most likely suspects, but it is possible that these foods are commonly consumed amongst the general population as well. What is needed is a baseline proportion to compare the exposure frequencies to. Reference population studies, such as the CDC Food Atlas, the Nesbitt Waterloo study and Foodbook (see Tools), can be used for this purpose. These studies provide investigators with the expected food frequencies based on 7-day food histories from thousands of respondents. These data can be used as a point of comparison for questionnaire data to identify exposures such as food items with higher than expected frequencies. Statistical tests (e.g., binomial probability tests) can then be used to test whether the differences between the proportion of cases exposed is significantly different from the proportion of “controls” (i.e., people included in the population studies) (see Tools).

There are many limitations to using expected food frequencies, such as not accounting for:

  • Seasonality (e.g., consumption of cherries is higher in the summer, however the expected levels are the same year-round)
  • Differences in consumption between men and women, adults and children
  • Geographic location
  • Various ethnic/religious/cultural groups

Further, since specific questions differ among surveys, it is often difficult to find the most appropriate comparison group. For example, the CDC Atlas of Exposures differentiates between hamburgers eaten at home or outside the home, while questionnaires used in investigations typically do not. Such differences in food definitions can make it challenging to determine which reference variable is the most appropriate to use as an “expected” level.

It is important to keep in mind that some foods with high expected consumption levels (e.g., chicken) may not flag statistically, but could still be potential sources. Further, there are other common exposures amongst cases that can carry important clues about the source of the outbreak. Cases that report common restaurants, events, or grocery stores can be considered sub-clusters. These sub-clusters should be investigated thoroughly by obtaining menus, receipts, or shopper card information if possible.

Back to top


Back to top


Toolkit binomial probability calculation tool for food exposures

  • This Microsoft Excel document allows users to enter outbreak case food exposure numbers for 300 food items and automatically calculates binomial probabilities using two reference populations and flags exposures of interest for follow-up (Reference populations: CDC Population Survey Atlas of Exposures, 2006-2007 and Waterloo Region, Ontario Food Consumption Survey, November 2005 to March 2006).

Toolkit Outbreak Summaries overview

  • This PDF document provides an overview of the Outbreak Summaries application, its key features and benefits, and an example of how it can be used during an outbreak investigation.

CDC Foodborne Outbreak Online Database (FOOD)

  • The FOOD tool allows users to search and download data on foodborne disease outbreaks reported to CDC from 1998 through 2012. The database is updated periodically as new data are available. Search fields include year, state, location of consumption, and etiology (genus only). The downloaded database includes additional fields–total illnesses, hospitalizations, and deaths; and food vehicle and contaminated ingredient.

Food Consumption Patterns in the Waterloo Region

  • This food frequency study by Nesbitt et. al. was conducted in Waterloo, Ontario in 2005-2006. The study collected 7-day food consumption data from 2,332 Canadians.

CDC Food Atlas 2006-2007

  • This study by CDC was conducted in 10 U.S. states in 2006-2007. The study asked 17,000 respondents about their exposure to a comprehensive list of foods as well as animal exposure.

FoodNet Canada Reports and Publications

  • FoodNet Canada reports and publications provide information on the areas of greatest risk to human health to help direct food safety actions and programming as well as public health interventions, and to evaluate their effectiveness.

CDC FoodNet Reports

  • The Foodborne Diseases Active Surveillance Network (FoodNet) Annual Reports are summaries of information collected through active surveillance of nine pathogens.

Marler Clark Foodborne Illness Outbreak Database

  • This database provides summaries of food and water related outbreaks caused by various enteric pathogens dating back to 1984.

FDA Foodborne Illness-Causing Organisms Cheat Sheet

  • A quick summary chart on foodborne illnesses, organisms involved, symptom onset times, signs and symptoms to expect, and food sources.

CFIA: Canada’s 10 Least Wanted Foodborne Pathogens

  • This infographic prepared by the CFIA includes information on symptoms, onset time, transmission, potential sources, and preventative measures for ten foodborne pathogens.


  • This search tool searches for numerical data on the web presented as graphs, tables and charts. It is a useful tool for searching for outbreak and other scientific data.

Foodbook: Canadian Food Exposure Study to Strengthen Outbreak Response

  • Foodbook is a population survey conducted by PHAC that will estimate Canadians’ exposure to select foods over a seven and three-day period. In addition to food exposures, this study will also collect data on the frequency of consumption of select food items; drinking and recreational water exposures; animal-related exposures; consumer food safety knowledge and practices; acute gastrointestinal illness; obesity indicators and demographic factors. A total of 11,016 Canadians from all provinces and territories will be interviewed over 12 calendar months from April 2014 to April 2015. The final Foodbook report is expected to be released in the fall of 2015.

Toolkit outbreak response database

  • This Microsoft Access Database will follow the layout and structure of the PHAC enteric hypothesis generating questionnaires. Users will be able to enter data, export select fields to a Microsoft Excel line list, generate automatic food and risk exposure summary tables and run custom queries. This database tool is expected to be complete by the end of 2015.

Back to top