In the last two blog posts (here and here), I told you about the history of fluoridation of water in the United States and other parts of the world. I told you how a dentist traveled out west to Colorado and, through observation and laboratory study, helped determine that adding fluoride to drinking water prevented tooth decay. Of course, the level of fluoride added had to be non-toxic, but it also had to be low enough to not cause the brown stains of teeth seen in people living in places where fluoride is in high concentrations naturally in their drinking water.
Since the 1940s, towns and municipalities in the United States have been adding fluoride to their water because the evidence is clear that — on average — people who live in those communities are less likely to have tooth decay leading to poor oral health. The key word in that last sentence and in this blog post is “average.” On average, there was a decreased risk. On average, people were less likely to have tooth decay. There will still be people with tooth decay, but the overall effect on human health of fluoride in water is observable and measurable.
As I also explained previously, as long as there have been public health interventions, there have been those who are against those interventions. They will take the slightest bit of information about a perceived harm from the intervention, disregard the average response to the intervention, and exaggerate the harms. The best example I can give you is anti-vaccine activism. One flawed and fraudulent study in the 1990s has fueled the campaigns against the measles vaccine for 20 years now. In that time, measles has made a return to places where it was once eliminated, hurting tens of thousands of children and killing thousands as well… All because one fraudulent study claimed that the MMR vaccine caused autism
Since that 1998 study, public health and academics have failed to find a true causal association between vaccines and autism. In fact, respectable theories based on proper science are converging on the idea that autism is more genetic than environmental, and that any environmental exposures causing autism do not include vaccines. If anything, vaccines prevent diseases and conditions in expectant mothers and their newborns that would cause autism. Rubella, for example, causes physiological deformities and intellectual disability if a fetus becomes infected and is born with Congenital Rubella Syndrome (Rubella is one of the Rs in MMR).
The same has happened to fluoride as has happened with vaccination, though the consequences have not been as acute, in my opinion. When children develop measles, we see them go through a hellish infection and disease for a couple of weeks. One out of a thousand will develop a severe complication, and one out of a thousand will die. That’s pretty acute, if you ask me. On the other hand, if a child goes without fluoride in the water, the effects will probably take years to manifest themselves, and said effects would likely be measured more at the population level than at the individual level.
Or the effects can be mitigated through access to toothpaste or proper dental care, something not everyone in the United States of America in 2019 enjoys. So we need to look at individual and population level characteristics. If a group develops cavities at a higher rate than another group, and both groups have fluoridated water, what is it about one group that increases their risk compared to the other? Diet (e.g. more sugar)? Culture? Access to oral healthcare, or lack thereof?
As epidemiologists, we ask these questions and take measurements to answer them, and we also make sure that we account for all the other factors that could explain what we are seeing. Then we hand off the data to biostatisticians, or we do the work with biostatisticians. Doing this assures us that we are measuring our variables correctly and that all associations we see are not due to chance. Or, if chance had something to do with it, we recognize it and minimize the factors that lead to chance being a factor in and of itself.
Back To The Study
This all leads me back to the Green et al. study, a study whose main finding was that children of mothers who ingested fluoride during pregnancy had 4 IQ points lower for each 1 mg of fluoride consumed by the mother. If you’re asking yourself, “Compared to whom?” you are on the right track. There was no comparison group. Women who did not consume tap water or lived outside a water treatment zone were not included, and that’s something I discussed in the previous post. What the authors did was a linear regression based on the data, and not much more.
What is a linear regression? A linear regression takes two sets of data represented as points on a graph. One set of data on this study was the IQ level of children. The other set was the estimated fluoride ingestion by their mothers. Put on a graph, the two sets of data look like a bunch of points, and it might be a little hard to understand what is going on.
To understand the relationship between the two sets of data, statisticians will draw a line through the data in a way that accounts for the points on the graph. That line is then the average of Y (the vertical axis) for each value of X (the horizontal axis). You see these graphs in figure 3, with X being the fluoride consumed by mothers and Y being the IQ measurements for the respective children. That line can go up and to the right, or down and to the right. If there is no statistical association between the two variables (X and Y) then the line will be perfectly horizontal. This is very hard to do manually, but it can be done.
It will take you a long time to do it by hand if you have to account for all sorts of other variables, like age and sex or place of residence, but it can be done. To do it quickly, you program a computer to do it for you, but that’s for a different conversation at a later time. For now, all you need to know is that the line on that graph represents the average IQ value for a child whose mother consumes whatever the fluoride level is on the X axis (the horizontal axis).
For example, in Figure 3A, the graph on the left, you can see that the average IQ of a child for a mother consuming 1.5 mg of fluoride is about 100. You also see that only ONE point is representing that average. That in itself is a huge problem because the sample size is small, and these individual measurements are influencing the model a lot, specially if their value is extreme. Because we’re dealing with averages, any extreme values will have a disproportionate influence on the average value.
Averages and Outliers
Let’s say that you’re trying to figure out the average income of the people in your office. You send out a questionnaire and see the results coming back in real time. There are also only 10 people in your office. First person makes $12 an hour. So your average pay at that moment is $12 per hour. The next person makes $18. Your average is now $15 per hour. Third person, $15, so your average is still $15 per hour… You know where this is going. After 9 responses, the average is $16 per hour. Then your boss responds with $50 per hour. At that moment, your average went from $16 per hour to $19.4 per hour. ($16 times 9, add $50 and divide by 10.)
When biostatisticians see these extreme values popping up, we start to think that the sample is not what you would call “normally distributed.” If that is the case, then a linear regression is not exactly what we want to do. We want to do other statistical analyses and present them along with the linear regressions so that we can account for a sample that has a large proportion of extreme values influencing the average. Is that the case with the Green study? I don’t know. I don’t have access to the full dataset. But you can see that there are some extreme values for fluoride consumption and IQ. A child had an IQ of 150, for example. And a mother consumed about 2.5 milligrams of fluoride per liter of beverage. Municipal water systems aim for 0.7 mg per liter in drinking water, making this 2.5 mg/L really high.
Again, we don’t have access to the data, so we don’t know why this woman was consuming that much. Also, because she and she alone had a child with an IQ below normal (less than 100), she had a huge influence in both the average fluoride consumption and the average IQ. That mother-child pair is what we call an outlier, and it should be looked at closely, through statistical analysis that is not just a linear regression. (We don’t get rid of outliers just because, remember?)
P Hacking? I Hope Not
The authors also did something that is very interesting. They left covariates (the “other” factors) in their model if their p-value was 0.20. A p-value tells you the probability that the results you are observing are by chance. In this case, they allowed variables to stay in their mathematical model if the model said that there was as much as a 1 in 5 chance that the association being seen is due to chance alone. The usual p-value for taking out variables is 0.05, and even that might be a little too liberal.
Not only that, but the more variables you have in your model, the more you mess with the overall p-value of your entire model because you’re going to find a statistically significant association (p-value less than 0.05) if you throw enough variables in there. Could this be a case of P Hacking, where researchers allow more variables into the model to get that desired statistical significance? I hope not.
When it comes to the analysis itself, they report:
“Regression diagnostics confirmed that there were no collinearity issues in any of the IQ models with MUFSG or fluoride intake (variance inflation factor <2 for all covariates). Residuals from each model had approximately normal distributions, and their Q-Q plots revealed no extreme outliers. Plots of residuals against fitted values did not suggest any assumption violations and there were no substantial influential observations as measured by Cook distance. Including quadratic or natural-log effects of MUFSG or fluoride intake did not significantly improve the regression models. Thus, we present the more easily interpreted estimates from linear regression models.”
That last part is also worrisome because that is all they presented. They didn’t present the results from other models or from their sensitivity analysis. Or, rather, they did, but it was an electronic supplement to their paper. As I don’t want to speculate too much, I’ll just let that part sit. Let’s dive into their results…
The first part of the results section has something interesting but not stunning in it. Girls had significantly higher IQ than boys, but not very impressively so. The average girl’s IQ was 109.6 while the average boy’s IQ was 104.6. That’s 5 IQ points. IQ is pretty much the mental age divided by the physical age of a person. (It’s a little more complicated than that, but not a lot.) A person with an IQ of 100 means that their mental age is the same as their physical age. A person with an IQ of 150 means that their mental age is 1.5 times that of their physical age. Someone with a developmental delay or intellectual disability might have an IQ lower than 100 because their mental age would be less than their physical age. (We can discuss later how some people with developmental delays show higher-than-normal IQ numbers in some categories and not others.)
What these results are saying is that, in this sample, girls had about 5% higher mental age than boys, but, on average, boys had 5% higher mental age than the general population. Again, that is in this sample, and not the whole population. Remember that last point. The next part, “Fluoride Measurements,” shows something that is expected. Women in fluoridated areas had higher measured fluoride levels than those who lived in non-fluoridated areas. (I’m still curious about that outlier. Where did she live? Wouldn’t it knock your socks off if she lived in a non-fluoridated area? We discussed those biases in the last post, by the way.) Next is “Maternal Urinary Fluoride Concentrations and IQ,” where we get into the main finding of this study:
“Adjusting for covariates, a significant interaction (P for interaction = .02) between child sex and MUFSG (B = 6.89; 95% CI, 0.96-12.82) indicated that an increase of 1 mg/L of MUFSG was associated with a 4.49 (95% CI, −8.38 to −0.60; P = .02) lower FSIQ score for boys. An increase from the 10th to 90th percentile of MUFSG was associated with a 3.14 IQ decrement among boys (Table 2; Figure 3). In contrast, MUFSG was not significantly associated with FSIQ score in girls (B = 2.43; 95% CI, −2.51 to 7.36; P = .33).”
Let’s break this down…
The Thing About Confidence Intervals
After taking into account the other factors that influence Mean Urinary Fluoride adjusted for Specific Gravity (MUFSG), for every 1 milligram per liter increase, the average IQ score for boys fell b 4.49 points. Based on this sample, the researchers are 95% confident that the true drop in IQ in the population they’re studying is between 0.6 points and 8.38 points. (That’s what the 95% CI, confidence interval, means.) Also, for every 1 mg/L increase, the average IQ score for girls increased by 2.43 points, and they were 95% confident that the true effect of 1mg/L increase in fluoride concentration in the population (not this sample, the whole population) of girls is between -2.51 (a decrease) and 7.36 (an increase). It is because of that last 95% CI that they say that fluoride ingestion is not associated with a drop in IQ in girls. In fact, they can’t even say it’s associated with an increase. It might even be a 0 IQ change in girls. In boys, the change is as tiny as 0.6 and as huge as 8.38 IQ points.
Is this conclusive? In my opinion, no. It is not conclusive because that is a huge range for both boys and girls, and the range for girls overlaps 0, meaning that there is a ton of statistical uncertainty here. How can we take care of it? We could increase the sample size. However, with the epidemiological design the authors used here, I’m afraid we’d only increase uncertainty and/or complicate matters more. The whole thing about not including women who did not drink tap water is troubling since we know that certain drinks have higher concentrations of fluoride in them. If they didn’t drink tap water, what are the odds that they drank those higher-fluoride drinks, and what was the effect of that?
The Big Conclusion
Finally, we have “Estimated Fluoride Intake and IQ”:
“A 1-mg increase in fluoride intake was associated with a 3.66 (95% CI, −7.16 to −0.15; P = .04) lower FSIQ score among boys and girls (Table 2; Figure 3). The interaction between child sex and fluoride intake was not statistically significant (B = 1.17; 95% CI, −4.08 to 6.41; P for interaction = .66).”
Again, look at those confidence intervals, especially the boys’. Look at the p-value for boys at 0.04. That’s borderline statistically significant. It means that there is a 4% probability that the value they saw (3.66 IQ drop for very increase of 1mg of fluoride) is due to chance alone and not any of the factors they saw. Yeah, it’s below 0.05, but that P Hacking we probably saw in keeping covariates with p-values below 0.20 and not 0.05 could have had something to do with it. I really wish they had not done that. Now, look at those confidence intervals between boys and girls in Table 2. They overlap.
This means that there is a better than 5% probability that the IQ measurements in boys and girls are not different from each other. If there is no statistically significant difference between boys and girls, and girls don’t show a decrease in IQ, when what are the chances that boys also did not decrease in IQ in the population (not just this sample)? You see this same overlap in Figure 3A. The discussion section is standard jargon that researchers include in their papers where they basically acknowledge that their study is based on a limited sample of the entire population and that more research is needed. (That’s the standard line for asking for more cash to do another study.) They also state clearly: “Nonetheless, despite our comprehensive array of covariates included, this observational study design could not address the possibility of other unmeasured residual confounding.”
Then, in the conclusions, the bait and switch:
“In this prospective birth cohort study from 6 cities in Canada, higher levels of fluoride exposure during pregnancy were associated with lower IQ scores in children measured at age 3 to 4 years. These findings were observed at fluoride levels typically found in white North American women. This indicates the possible need to reduce fluoride intake during pregnancy.”
I call it a bait and switch because we’ve been baited into thinking that fluoride intake lowers IQ, but you can see that the probability of the observations being by chance is actually quite high. Furthermore, we see the message switched to “lower IQ scores in children” when the results show that it wasn’t all children. It was boys, and that change in boys was not different than girls who — as a group — showed an increase in IQ as their mothers reported higher consumption of fluoride.
The Big Idea
The big idea of these three blog posts was to point out to you that this study is just the latest study that tries very hard to tie a bad outcome (lower IQ) to fluoride, but it really failed to make that case from the epidemiological and biostatistical approaches that the researcher took, at least in my opinion. Groups were left out that shouldn’t. Outliers were left in without understanding them better. A child with IQ of 150 was left in, along with one mother-child pair of a below-normal IQ and very high fluoride, pulling the averages in their respective directions. The statistical approach was a linear regression that lumped in all of the variables instead of accounting for different levels of those variables in the study group. (A multi-level analysis that allowed for the understanding of the effects of society and environment along with the individual factors would have been great. The lack of normality in the distribution of outcome and exposure variables hint at a different analysis, too.)
Also, we shouldn’t base the entirety of our understanding of fluoride in water to prevent caries solely on these epidemiological studies. We need to understand the nature of fluorine, what levels of it are toxic, and how fluoride is not fluorine in the way that table salt is not chlorine. We need to look at the entire population of places that have had fluoride in their water systems for decades. Are they collapsed civilizations with people of low IQ? No, not really. Though some may make the “snowflake” argument that fluoride could affect someone differently based on their individual physiology, the truth is that we’re not that different from each other. We have the same machinery to deal with chemicals and compounds in our environment.
Finally, I hope that public health policy is not done on based on this paper. It would be a terrible way to do public health policy. Scientific discovery and established scientific facts are reproducible and verifiable, and they are based on better study designs and stronger statistical outcomes than this. Unfortunately, as we have seen with the MMR vaccine and other pseudoscience, all that denialism needs to seed itself in a group or individual is something to agree with preconceived and erroneous notions, no matter how flawed that something is. Thank you for your time.
René F. Najera, DrPH
I'm a Doctor of Public Health, having studied at the Johns Hopkins University Bloomberg School of Public Health.
All opinions are my own and in no way represent anyone else or any of the organizations for which I work.
About History of Vaccines: I am the editor of the History of Vaccines site, a project of the College of Physicians of Philadelphia. Please read the About page on the site for more information.
About Epidemiological: I am the sole contributor to Epidemiological, my personal blog to discuss all sorts of issues. It also has an About page you should check out.