Wednesday, April 9, 2014

Calorie Intake and the US Obesity Epidemic

Between 1960 and 2008, the prevalence of obesity in US adults increased from 13 to 34 percent, and the prevalence of extreme obesity increased from 0.9 to 6 percent (NHANES surveys).  This major shift in population fatness is called the "obesity epidemic".


What caused the obesity epidemic?  As I've noted in my writing and talks, the obesity epidemic was paralleled by an increase in daily calorie intake that was sufficiently large to fully account for it.  There are two main sources of data for US calorie intake.  The first is NHANES surveys conducted by the Centers for Disease Control.  They periodically collect data on food intake using questionnaires, and these surveys confirm that calorie intake has increased.  The problem with the NHANES food intake data is that they're self-reported and therefore subject to major reporting errors.  However, NHANES surveys provide the best quality (objectively measured) data on obesity prevalence since 1960, which we'll be using in this post.


The second source of data on calorie intake is the USDA Economic Research Service (1).  The ERS estimates food consumption based on production.  The data are freely available and some of them go all the way back to 1909.  Despite the limitations of this method, I've come to believe that the ERS database is the most accurate and complete record of US food intake available.  After appropriate adjustments*, ERS data show that on average, US adults consumed 363 more calories per day in 2009 than we did in 1960.

If we plot calorie intake and obesity prevalence on the same graph, the correspondence is striking, but it only occurred to me recently to try to put them both into a scatterplot.  Scatterplots directly plot one variable vs. another (i.e., one on the horizontal axis and one on the vertical axis), and they are useful for determining how tightly two variables are correlated.  The more tightly the two variables are correlated, the better the points approximate a diagonal line.  When I plot calorie intake vs. obesity prevalence between 1961 and 2006, the correlation is striking:

The R-squared value quantifies the strength of the correlation, with a value ranging between 0 and 1.  The R-squared value of 0.93 indicates an extremely strong correlation between calorie intake and the prevalence of obesity, and this correlation is highly statistically significant.  The slope of the "best fit line" also allows us to draw another conclusion: each 100-calorie increment corresponds to a 4.2 percent increase in the prevalence of obesity.

When we consider extreme obesity (BMI greater than 40), we see a similar correlation:


The ERS data also allow us to look at the macronutrients carbohydrate, fat, and protein.  Let's see how they correlate with the prevalence of obesity, starting with carbohydrate:


The correlation with carbohydrate isn't quite as strong as with calories, but it's still extremely strong.  How about fat?


Again, the correlation is slightly weaker than with total calories, but still extremely strong.  How about protein?


Surprisingly, this was the strongest correlation of all-- at an R-squared value of 0.94, it slightly surpasses the correlation strength of total calories.

Conclusions

Here's what the graphs show:
  • We're eating more calories than we used to.
  • There is a very strong relationship between the number of calories we eat and the prevalence of obesity in the US.
  • The extra calories are coming from carbohydrate, fat, and protein, and increased intake of all three tightly correlate with increased obesity prevalence.
In other words, we're eating more of everything than we did 50 years ago.  

This begs the question, why are we eating more of everything?  There are multiple reasons for it, but I described some of the most compelling explanations in my talk Why Do We Overeat?  These same concepts form the basis of the Ideal Weight Program.


* Gross values reduced by 28.8% to account for waste between production and consumption (adjustment determined by the ERS).  Also adjusted for an artifact in 2000 that results from a change in the liquid oils assessment method and artificially inflates fat intake.

25 comments:

slowfit said...

Stephan,

You have laid out your diet recommendations and practices in the past. What are you current thoughts and recommendations on exercise and lifestyle? Do you use any supplementation?

Thank you.

Teech said...

I remember another post in which you demonstrated that Americans have been consuming more vegetable oils, and that animal fat consumption has stayed fairly constant. So people are eating more products, processed and highly palatable no doubt, that contain"heart healthy" veggie oils. Same stuff farmers use to fatten livestock.

Evgeny Rokhlin said...

Hi Stephan,

I have some understanding of statistics, so maybe you can easily explain me what I don't understand, it's a serious question:

In your example, we have carbohydrate calories vs obesity, and R squared is 0.9, so carbohydrate intake alone already explains over 90% of obesity increase.

But then we look at the next graph, and now fat calories intake has an R squared over 0.9 too, so fat intake explains over 90% of obesity increase.

So it looks like fat plus carbohydrate intake already explains way over one hundred percent, and we still did not take into account protein calorie intake.

How should I make sense of the three graphs, which tell me that protein, plus carb, plus fat calorie intake explains a whopping 270% of obesity level increase?

Stephan Guyenet said...

Hi Slowfit,

Maybe I'll lay it out in a post (or a book) someday, but it's a bit long for a comment.

Hi Evgeny,

I understand your question, but I'm not knowledgeable enough about statistics to answer it confidently. I want to say that summed R-squared values can exceed 1.0, but I'm not certain. Perhaps it's because carb, protein, and fat are all highly correlated with one another and are therefore not independent variables? If someone else with more stats expertise wants to explain this, please feel free.

I had Excel estimate the best-fit line and automatically calculate R-squared. I believe the values are correct, but I haven't double-checked them using R. I can send you the raw data if you want to play with them. Just send me a personal e-mail.

Adam Čabla said...

Evgenyi, it's the problem of mutual coleration of explaining variables. I suspect that if you would draw regresion lines of each of macros against each other, you would have also R-squares near 1. You can look at it as if fat explains carbs and carbs explain obesity, than you will see BOTH fat and carbs explain obesity in similar magnitude. Do not forget, that R-square is just measure of correlation, so it's totaly inapropriate suming them up.

Gretchen said...

Why don't you ask Ned Kock. http://www.nedkock.com/

He teaches statistics and is interested in health.

Maybe a guest blog on statistics?

Austin Powers said...

I know enough about statistics to know that:

* This data is consistent with the theory that excess calories are the cause of overweight, though they don't prove it.

* This data is consistent with the theory that excess carbs are the cause of overweight, though they don't prove it.

* Likewise for the theory that excess protein causes overweight.

* Likewise for the theory that excess fat causes overweight.

In other words, this data demonstrates nothing, and in particular does not suggest that excess calories is an issue at all. In an argument with Gary Taubes, in other words, this data could not be used by either of you to support either the calorie or the carb theory.

"As I've noted in my writing and talks, the obesity epidemic was paralleled by an increase in daily calorie intake that was sufficiently large to fully account for it. " Yes, but if you replace the words "daily calorie intake" with "daily carb intake" or "daily fat intake", the second part of the statement remains true. This article has provided no evidence to support one theory over a purely macronutrient-based theory, and so the fact that the obesity epidemic was paralleled by an increase in daily calorie intake still suggests nothing.

I personally would guess that calories are very important, and besides that I would trust your guess more than mine, but not if your guess is based on this data or anything remotely like it!

Stephan Guyenet said...

Hi Adam,

Thanks, that is what I suspected.

Hi Austin,

You said "this data demonstrates nothing". It demonstrates that we're eating more of everything, which is what I stated in my post.

As you know if you're a scientific-minded person, no piece of evidence should be interpreted in isolation. If this were the only data we had, then your statements would be correct. However, there are many other studies pertinent to this question, and they collectively suggest that calorie intake but not macronutrients affects adiposity.

For example, here are the only two controlled trials that compared fat vs. carb overfeeding while holding calorie intake constant. If one reads past the abstract, both studies found that fat gain was virtually identical in the two conditions:

http://ajcn.nutrition.org/content/62/1/19.short
http://journals.cambridge.org/action/displayAbstract?fromPage=online&aid=879624

In other words, if the "average person" eats more calories, his body fatness goes up, no matter whether the calories come from fat or carbohydrate. After controlling for calories, the fattening effect of food is independent of its carbohydrate and fat content.

The energy value of food is the only food quality that has been convincingly shown to affect adiposity in humans. There may well be other food qualities that affect adiposity (I suspect there are), but we haven't identified them yet. Therefore, the most rational interpretation of the calorie intake data is that the increase in calories accounts for the increase in weight, and that is exactly what you will hear from most researchers.

Michael Terry said...

One way to answer "Why are we eating more food?" is: Food prices have dropped.

Gretchen said...

I think one factor this discussion doesn't take into account is this.

If you lock people in a metabolic ward, you can show that it's calorie intake that is most important. But most people don't live in metabolic wards.

So the critical factor is how various diets affect how many calories people take in. If people feel full on Diet A but ravenously hungry on Diet B, Diet A will be more successful in the long run even if Diet B resulted in weight loss when people stuck to it.

Stephan Guyenet said...

Hi Michael,

I agree. That's one of the factors I discussed in my talk. In the 1930s, food cost about 25% of disposable income. Today, it costs about 10%.

Hi Gretchen,

I agree, with the caveat that it's not just about hunger. We eat for many reasons that have nothing to do with hunger as well.

slowfit said...

Stephan,

Would love to see a book! But, before then please do lay out the basics (exercise, lifestyle) in a post. I always learn something here.

valerie said...

I would be much more interested in seeing a graph of average weight vs average calories per day per capita. Add to it the average height vs average calories per day per capita, and we might see some interting trends.

Your graph makes for nice headlines, but think about it: you are looking at the percentage of people whose weight divided by height squared is above 30, vs the average calories consumed per day per person. Proportion of qualitative data vs average of quantitative data. That's getting too close to data dredging to me.

By the way, you have a good regression mostly because you have two clusters and nothing else (except for one lone wandering point). You'll always get an excellent regression when you only have two clusters.

Mirrorball said...

To Evgeny and Stephan,

R^2 just measures the degree of linear correlation between two variables from -1 (perfectly inversely correlated) to 1 (perfectly directly correlated). And it only measures *linear* correlation, it can't tell you if variables are correlated some other way.

So a high R^2 doesn't explain anything necessarily, nor does it have to sum 100%. For instance, if you plot people's weights in kilograms vs. their weights in pounds, you will get R^2 = 1. If you then plot their weights in kilograms vs. their weights in ounces, again you will get R^2 = 1. But nothing explains anything. All you know is that these variables are tied to each other perfectly in a line, which of course they are, since they are the same information in different units.

So what we can learn from Stephan's plots is that obesity, calories, carbohydrate, fat, and protein intake have all increased together.

Sampson Greenovich said...

Over the years I have seen new definitions for the weight of a person. First it was just called overweight, then obese, very obese, and morbidly obese. These terms are medical terms for the weight to fat ratio.
http://www.martinhealth.org/bariatrics

Stephan Guyenet said...

Hi Valerie,

The point of looking at BMI is that it's a better measure of fatness than body weight alone. If you're interested in the relationship between fatness and calorie intake, you want to use BMI rather than weight or height.

Regarding your comment about clustering, I have to disagree. There is really only one cluster in most of the graphs, with the rest of the points falling nicely along the best-fit line. Have another look at the first scatterplot.

Hi Mirrorball,

You're referring to the Pearson correlation coefficient R rather than R-squared. R-squared is always positive regardless of whether it's a positive or negative correlation. R squared is said to explain the proportion of variance accounted for by a variable.

valerie said...

You are *not* looking at BMI. That would at least have the merit of being a quantitative, continuous variable. You are looking at the percentage of obesity. Obesity is a qualitative variable (yes/no, based on the rather arbitrary threshold of BMI 30). It is not the same at all.

When I look at the first scatterplot, I see a cluster of three points on the left, and a cluster of four points on the right (plus one lost soul in the middle). Within the left cluster, there is a slight positive correlation. Within the right cluster, there is a negative correlation. The overall strong positive correlation is due to the two clusters being separated by nothingness. It turns each cluster into an influential value, which drive the regression.

Seriously, I would be much more interested in seeing a graph of weight (or BMI, if you insist) vs calories. It just makes a lot more sense to run a regression between two quantitative variables.

Chris Wilson said...

Indeed a consistent set of relationships. r-squared can never be greater than 1 for any given model; comparing across models there's no constraint to sum to 1. As Stephan said, it just means the predictors are collinear.

Stephan Guyenet said...

Hi Valerie,

Ah OK, I understand what you're saying now. You're right, it would be better to look at BMI as a continuous variable. I didn't have those data on hand, but I could probably dig them up.

I still disagree about the clustering. If you look at the leftmost "cluster", it is a series of points that fall roughly along the line, not a random cloud of points.

Mirrorball said...

My bad, but in a linear model with one independent variable, one is equal to the other squared.

Adam Čabla said...

R-Square is juste the square root of coeficient of multiple correlation between one variable and the "optimized" linear combination of the others. You do not have to compute regression to obtain it.

But linear combination here means that regression is linear in parameters, so the dependence itself can be parabolic, hyperbolic and many others, because you can simulate those (and it is usual to do so) by adding artifical variables.

Saying that it does "explain variability" can be quite missleading, since it is still only corellation. If you have only two variables and switch their possition (x -> y and y -> x) you will see the same R-square and if you add the second variable "z" (y = b0 + b1x -> y = b0 + b1x +b2z) you will see non-decrease (and most probably increase) of R-square even if "z" would be white noise.

Regression is just corelational analysis even though mathematically written as causation.

The second and even larger problem is, that using time-series you can not use this level of simplicity, because two things can share trend even though they are not correlated and you can not say. More econometric approach would be convenient here, but I see you need to KISS.

Jim Oliver said...

I am convinced that we eat more now than 50 years ago because food is more affordable.
The Economics of Obesity
EconTalk Episode with Darius Lakdawalla

zorbeteland said...

You could also compare , and got a nice correlation between number of mules and prosperity of a country. Are the mules the reason?

Stephan Guyenet said...

Hi zorbeteland,

If I were to do a controlled experiment and show that removing mules from a group of people increases prosperity, and adding mules decreases it, then I would have evidence for causality. That is the situation with calories. Many experiments have demonstrated that adding calories increases body fatness, and subtracting them decreases it. Therefore, we can safely interpret the relationship as causal. Make sense?

jennette said...


As someone who has been consuming under 1500 kcal for about a year and NOT losing weight, I can say for sure that exercise would be my only other tool. I only recently realized that body fat needs no calories to stay on the body, hence the reason why I must dip below 1200 kcal net to lose anything and then it's so slow (.25 lb week). Unfortunately, I am caught in a circle of surgeries, injuries and illnesses (broken ankle, etc.) which have both have contributed to weight gain and also make it very difficult to lose.

So I think it's fair to say that a lot of people who are overweight do not continue to overeat, but still remain in that "overweight" category.

When I studied behavioral genetics a few years back, I was struck at the fact that BMI is strongly heritable. I honestly had no clue that MZ twin studies showed a strong correlation... numbers which rival the heritability of sexual orientation.

"If everyone ate the same amount and exercised the same amount, people would still differ in weight for genetic reasons (Plomin text)."

So, although some of us are consuming more calories, some of us are less active due to the environment (long hours, desk jobs, etc), some of us are older, some of us had spurts of weight gain (and were unable to get it off despite a normalization of caloric intake), and some of us have a genetic disposition to burn less or eat more.

When our bodies force us to crave food for survival, it's difficult to turn it off. There's so much more to this - as a biochemist, I'm sure you know all the hormonal aspects and that we still don't know how each person's unique DNA sequence contributes to the mechanisms behind metabolism. Oh, I'm sure most of it is overeating and the sedentary lifestyle, but having hundreds of minor mutations can contribute to the expressed genotype. I am amazed when I see all the different bone structures, muscle/fat distributions, and facial features; why shouldn't we believe that our individual biochemistries are affected too?
This is not a one-size-fits all problem that can be solved simplistically.