The troubles of observational epidemiology

As I’ve been working on my term papers, I was reminded of the “chicken and egg” problem that we face in observational epidemiology. One of my papers is about looking at the association between substance abuse and suicidal tendencies. According to my analysis — and this should be no surprise — kids who use certain drugs of abuse are more likely to attempt suicide. But does that mean that kids would be less likely to attempt suicide if they didn’t use drugs? Or does it mean that kids would be less likely to use drugs if they weren’t prone to attempt suicide?

Do not let him near a gun!
Do not let him near a gun!

The dynamics of why people attempt suicide are complicated. There’s a mental health component, yes, but there are other things at play as well. People in some cultures or religions are less likely than others to attempt suicide. People in certain social strata are more likely to attempt it. And that’s just the 30,000-ft view of the issue. When it comes down to the independent person’s reasoning for attempting suicide, it gets even more complicated. So much so that I cannot currently take a person’s characteristics, enter them into any kind of mathematical model, and predict with any accuracy if they are likely to commit suicide or not.

This is true of many other things that we observe in epidemiology. We see these associations between an exposure and an outcome, we report on those associations, and the public and/or media are left to decide if we’re looking at egg-chicken or chicken-egg. It’s not until we do experimental epidemiology that we get a better idea of what the direction of the cause-and-effect arrow is. Throughout all this, we also seem to forget about biological plausibility.

Don't think too hard about it.
Don’t think too hard about it.

If I told you that ice cream causes drowning, you may or may not be skeptical. (I hope you are.) But if I told you that smoking causes cancer, you would be more inclined to believe me (even if we didn’t know all that we know about smoking). You would be even more inclined to believe me if you have had some lessons in biology. Biologically speaking, the toxins in cigarettes cause irritation and changes at the genetic level of cells they come in contact with. That triggers cancer. But what is the mechanism by which you’re more likely to drown if you eat ice cream?

What we have then is another “glitch” in epidemiology that we epidemiologists have been doing a poor job in explaining to the public. We take these big, observational studies done on entire populations (at the 30,000-ft level) and we forget to explain to the public and/or media that those observations do not apply at the individual level. It’s called the ecological fallacy.

Not to be confused with an ecological movie full of fallacies which made it a disaster.
Not to be confused with an ecological movie full of fallacies which made it a disaster.

An example is someone who lives in a wealthy neighborhood. It would not be unreasonable to assume that the person is wealthy. But are they? Again, we do this all the time. If Hispanics are more likely to develop diabetes, a physician taking care of a Hispanic person is more likely to screen their patient for diabetes even if the patient — by all other measures — is perfectly healthy. The characteristics of a group are applied to an individual.

Back to the ice cream and drowning.

As it turns out, most drownings in the United States happen in the summer time as people head out to beaches, lakes, rivers, and pools to cool off. People also eat more ice cream in the summer. If you put those two things together on a graph, you’ll see that ice cream consumption correlates with drownings.

Okay, I cheated a little. The example of ice cream and drowning is more of an example of confounding, where summer is the true cause of drowning and ice cream consumption but we’re confounding the whole damn thing. Still, the lesson is the same. Just because A goes well with B, even at a statistically significant level (e.g. not by chance alone), we still cannot say that A causes B or that B causes A without an experiment.

In the case of ice cream and drowning, the experiment could be to see if people who drown ate ice cream in excess of people who do not drown. A less ethical approach would be to randomize people to ice cream and no ice cream, then throw them all in a lake and see if one group drowns at a faster rate than the other. (Talk about survival analysis.)

These are all the things that the health and science consumer needs to keep in mind when they’re looking at the latest, greatest study. Even if the statistical significance is strong, the causation can be misleading, and the personal/medical/legal decisions that are made can be quite, in a word, wrong.

Consider the case of saccharin back in the 70s, 80s and 90s. We were convinced that it caused cancer because some rats were given saccharin and then got cancer. Pretty scary, right? The rats were cancer-free, then given saccharin, then got cancer. (Bladder cancer, to be exact.) And the rats that were not given saccharin didn’t develop cancer. It wasn’t until the year 2000 that FDA and state governments eased their labeling requirements of saccharin when it was discovered that rats handle saccharin in their bodies differently than humans. In essence, it wasn’t the saccharin. It was how the rats handles the saccharin.

Hey, he likes the coffee sweet, okay?
Hey, he likes the coffee sweet, okay?

At the end of the day, when you listen to the latest, greatest revolutionary findings, you have to keep in mind several things, including biological plausibility, causation, statistical significance, confounding, effect modification, etc. Take another example. This one is about a “cancer cluster” in Florida. A bunch of people living close to each other developed different kinds of cancers, and the search was on to find a cause. After exhaustive environmental testing and interviewing of the cases and their families, epidemiologists came up empty. All except one.

Which one?
Which one?

That one saw that most of the cases were not native to the area where the “cluster” showed up. In fact, they were from all over the country and had only moved there in the last few years as part of their retirement. So you had cancer cases whose likely exposures happened all over the country all moving into one area. When you removed the “immigrants” from the equation, the people who lived there all their lives had lower cancer rates than the rest of the country. Then, when you looked at the immigrants’ origins, they all came from places with low levels of cancer. The only difference in everything was that older people with a higher propensity for cancer all moved from all over the country into one tight place.

Grandpa?
Grandpa?

That above is an example that most epidemiology and environment health professors will tell in lecture to make us aware to think thoroughly about what we’re seeing and to take into account things that are not readily visible. As I move forward in my studies toward that “golden” DrPH degree, I’ll have to keep in mind these and about 3,576 other things when planning my thesis, carrying out my thesis project, and then in defending it. And God forbid I miss something that’s invisible. That could unwind the whole thing.

I'm a fourth-year doctoral candidate in the Doctor of Public Health program at the Johns Hopkins University Bloomberg School of Public Health. All opinions posted here are my own, of course, and they do not necessarily reflect the opinions of my school, employers, friends, family, etc. Feel free to follow me on Twitter: @EpiRen

2 thoughts on “The troubles of observational epidemiology

Comments are closed.