This is the first part of a series of posts where I will try to explain to any who are willing to listen why “simple” statistical analyses of complex data are not the kind of thing you want to do. You end up looking like a fool and possibly getting people hurt if you do. So let’s do a mental exercise and place you in the position of an epidemiologist who is tasked with whether or not to make the recommendation to fluoridate the water of a town.
It goes without saying that all the data presented here — like the simulation — is all fake and should not be taken as actual evidence of something. It’s an example. (And you should know better… Fluoride is good for teeth.)
Imagine that you’re looking at two towns. Town A is composed of 50,000 people, and Town B is composed of 52,000. Town A has well water, and Town B has a modern water treatment plant. Town B uses fluoride in its water treatment, and Town A has been looking at installing a water treatment plant like Town B’s, only without the fluoride. But then you arrive as the health officer in Town A and hear that they want to treat the water but not add fluoride. So you decide to compare the rates of tooth decay for the last year between the two towns in order to show that fluoride is a good thing, that it protects teeth from tooth decay.
Here are your results:
Town A (no fluoride) – The one dentist practice sees 1,364 patients with cavities in two or more teeth. That’s a proportion of 2.73% of the town with cavities in two or more teeth. Per 10,000 population, that’s a rate of 272.8.
Town B (with fluoride) – The three dentist practices see 834 patients with cavities in two or more teeth. That’s a proportion of 1.6% of the town with cavities in two or more teeth. Per 10,000 population, that’s a rate of 160.4.
So far, we have only adjusted (or accounted) for the differences in the total population. This adjustment shows that the rates of dental cavities in two or more teeth per 10,000 people are higher in the town with no fluoride (Town A). Because you lack the budget for an advanced statistical software package, you use an online one and get these results:
When doing statistical analyses, there are two hypotheses (theories) to test. The null hypothesis is that there is no difference in the rates between the two groups. The alternative is that there is a significant difference between the two. We have rejected the null and failed to reject (or accepted) the alternative. The “<0.0000001” in the output above, tells us that there is a less than one in ten million chance of rejecting the null hypothesis when it is true. In other words, there is a very small chance that we made the mistake of going with the alternative.
Is this enough to prove that fluoride is a good thing for teeth?
As an aspiring epidemiologist, what other things would you look at to come up with a recommendation of whether or not to fluoridate? Would this very simple adjustment to your observations be enough? Tell me in the comments below.
Finally, a friend pointed out to me that simple statistical analyses are a good starting point for what you’re trying to figure out. However, you can’t just compare two rates (or averages, or whatever) and go from there without looking at the whole universe in which your data exist. (It could get your paper retracted.) We’ll talk more about all of that in the second and third part.
Spoiler alert: No, this is not enough, and we’ll look into more data analysis next time.