Exercise 3: Food frequency analysis in Excel

Day 4: Friday July 25, 2014 (n=12)

Questionnaires for eight cases are received throughout the day.

You decide to enter your questionnaires into a Microsoft Access database and then export the data into Microsoft Excel for analysis. You need to analyze the data to determine whether there are any exposures that occur at higher than expected frequencies. For more information on expected food frequencies, refer to the background reading below.

Background reading

This is a three part exercise.

Part 1

Examine the raw food exposure data in Table 1 (“Original Data”) of Module 2 – Exercise 3. This represents a raw data extract from the Access database where the questionnaires were entered.

Identify the most common food exposures among the cases – look at the proportion of cases that answered “Yes”, as well as those that answered “Yes” or “Probably”.

List the food exposures that are reported by more than 60% of the cases (use Yes + Probably column). These are the exposures you will examine in further detail in Parts 2 and 3 of this exercise.


The following foods of interest were identified:

  • VEGETABLES: lettuce (any), romaine lettuce, bell peppers (any), red bell peppers, orange bell peppers, yellow bell peppers, fresh tomatoes (not grown at home), cherry tomatoes, cucumbers, broccoli, and mushrooms
  • MEAT: any beef (not including deli meat), hamburgers, bacon, any chicken (not including deli meat), and whole or cut chicken pieces or parts
  • DAIRY & EGGS: ice-cream/gelato, eggs and handling of raw/undercooked eggs
  • FRUIT: strawberries and bananas
  • OTHER: other pre-packaged snack food (e.g., chips, pretzels, crackers, cookies, snack cakes) and cold breakfast cereal


Part 2

Compare the selected food frequencies to reference food frequencies in Table 2 (“Food Frequency Reference”) of Module 2 – Exercise 3.

How do the proportion of cases reporting an exposure compare to expected levels (i.e. proportions obtained from food surveys)? How do the suspect foods change when you consider these expected food frequencies? What are some of the limitations of using food survey data to obtain expected food exposures?


Some items (e.g., bananas, cold breakfast cereal, eggs, ice-cream/gelato, any chicken) are within 10% of expected levels, suggesting they may not be of further interest. Some items, though more than 10% higher than expected, are reasonable to exclude based on what the expected values were measuring. For example, the variable in CDC Atlas used as a comparison for pre-packaged snack food only included chips. In our questionnaire it included pretzels and cookies, hence, we would expect that our value would be higher.

When focusing on food items that are reported more than 10% above expected levels, the following food items remain (note that the results below include items with no expected values, i.e., those not on available food surveys):

  • VEGETABLES: lettuce (all), romaine lettuce, bell peppers, red bell peppers, orange bell peppers, yellow bell peppers, fresh tomatoes (not grown at home), cherry tomatoes, cucumbers, broccoli, and mushrooms
  • MEAT: any beef (not including deli meat), hamburgers, bacon, and whole or cut chicken pieces or parts
  • FRUIT: strawberries

There are many limitations to using expected food frequencies, such as not accounting for:

  • Seasonality (e.g., consumption of cherries is higher in the summer; however, the expected levels are the same year-round),
  • Differences in consumption between men and women, adults and children,
  • Geographic location (except for the state level that is available in the US Atlas of Exposures), and
  • Various ethnic/religious/cultural groups.

Further, since specific questions differ among surveys, it is often difficult to find the most appropriate comparison group. For example, the CDC Atlas of Exposures differentiates between hamburgers eaten at home or outside the home, while the hypothesis-generating questionnaire used in this investigation does not. Such differences in food definitions make it difficult to determine which variable of hamburger consumption is most appropriate to use as an “expected” value.


Part 3

One way to assess whether the difference between the proportions of cases who ate a food is significantly different from the expected proportion is to calculate the binomial probability.

Examine Table 3 (“Binomial Probability”) in Module 2 – Exercise 3. Which food items are significantly higher than expected?


The following food items are significant at the p

  • VEGETABLES: romaine lettuce, bell peppers, red bell peppers, cherry tomatoes and mushrooms
  • MEAT: hamburgers and bacon
  • FRUIT: strawberries