Exercise 2: Food frequency analysis in Excel

 Day 22: Friday May 29, 2020 (n=14)

As of this morning, you have completed seven re-interviews using the hypothesis-generating questionnaire.

You decide to enter your questionnaires into a database and then export the data into Microsoft Excel for analysis. You need to analyze the data to determine the frequency of exposures for cases and whether there are any exposures that are reported at higher than expected frequencies, compared to baseline population data.

Population-based studies such as Foodbook and the CDC FoodNet Population Survey provide investigators with baseline data on the reported frequency of specific food exposures for non-ill individuals over a 7-day period. These data can be used as a point of comparison for questionnaire data to identify food exposures that are reported by outbreak cases more commonly than would be expected.

For more information on expected food frequencies, refer to the background reading below.

Background reading

 This is a three part exercise.

Part 1

 Examine the initial food exposure data in Table 1 (“Original Data”) of Module 2 – Exercise 2.

Identify the most common food exposures among the cases – look at the proportion of cases that answered “Yes”, as well as those that answered “Yes” or “Probably”.

Calculate the proportion of cases reporting each food item ([Yes+Probably]/(Yes+Probably+No]). Note that ‘Don’t Know’ responses are not included in the food frequency analysis. In practice, investigators look at all exposures no matter how many cases report them. This is because not all cases will have necessarily been asked about all exposures. As well, some foods (e.g., sprouts, nuts, seeds, flour) may be used as an ingredient in other foods or used as a garnish, which makes them more difficult for cases to recall. Even a small proportion of cases reporting an uncommon food or a food that may be used as an ingredient can provide an important clue for investigators during hypothesis generation. Investigators will also look closely at the exposures that ‘flag’ as being reported more frequently than expected based on the population data (at p<0.05) as well as exposures that are reported frequently, but do not flag.

Part 2

Compare the food frequencies to reference food frequencies in Table 2 (“Food Frequency Reference”) of Module 2 – Exercise 2.

How do the proportion of cases reporting an exposure compare to expected levels (e.g., proportions obtained from population-based food survey data)?

Some items (e.g., any eggs, cucumbers, bell peppers, apples, bananas) are within 10% of expected levels, suggesting they may not be of further interest. However, it is important to still do a binomial probability calculation for these foods before ruling them out.

 Items that were not available on the food survey include:

  • Chia seeds
  • Flax seeds
  • Other seeds
  • Non-dairy milk
  • Pistachios
  • Other nuts

There are many limitations to using expected food frequencies, such as not accounting for:

  • Seasonality (e.g., consumption of cherries is higher in the summer; however, the expected levels are based on the average across the whole year, and therefore do not take into account variations by month),
  • Differences in consumption between men and women, or different age groups, geographic location, and
  • Various ethnic/religious/cultural diets

Question 2-8: How might you utilize Foodbook data to address these limitations (for example, if cases are located just in Atlantic provinces, and are primarily under 18)?

In these cases, you may want to restrict the Foodbook data to more accurately reflect the case population.

To help address some of these limitations you can restrict your expected food frequencies to better align with your cases. For example, if you have an outbreak in which cases are mostly children under 18, you may want to look at the Foodbook value for the 0-18 age group. To account for some foods that are more seasonal (e.g., fresh berries, watermelon, cherries, etc.) it can be helpful to use the Foodbook values for the same time period as your cases’ illness onset dates (e.g., if your cases’ onsets were from July 15-August 25, look at Foodbook values collected in July and August only). If your cases are located in only a couple of provinces/territories in a certain part of the country (e.g., Eastern Canada, Western Canada, Northern Canada), look at Foodbook values for only those provinces/territories.

The entire Foodbook dataset is available here for such purposes. 

 Another challenge that arises is when cases have similar food consumption habits. Sometimes if cases are eating many of the same items it becomes challenging to determine a suspect source. For example, this particular case demographic appears to consume largely plant based diets. Thus, they all likely consume a lot of fresh produce items of all types – from berries to leafy greens and other vegetables.

 Part 3

One way to assess whether the difference between the proportions of cases reporting an exposure is significantly different from the expected proportion is to calculate the binomial probability.

Examine Table 3 (“Binomial Probability”) in Module 2 – Exercise 2. Which food items are reported at a significantly higher frequency than expected?

The following food items are significant at p≤0.05.

  • VEGETABLES: spinach, sprouts and zucchini
  • FRUITS: blueberries, blackberries, mangoes
  • NUTS & SEEDS: almonds, walnuts, and sesame seeds

Items that were not available on the food survey include:

  • Chia seeds
  • Flax seeds


<Previous Next>