In the last blog post, I told you all about how fluoride has been added to potable water systems in the United States and elsewhere based on the observations of one Dr. McKay. Dr. McKay went to a town in Colorado and noticed that the townspeople tended to have stained teeth. Although their teeth were stained, their teeth were very resistant to tooth decay. The ultimate reason for this was fluoride (a salt containing fluorine) in the water supply. How they got to this conclusion is one example of an epidemiological study:
“Black investigated fluorosis for six years, until his death in 1915. During that period, he and McKay made two crucial discoveries. First, they showed that mottled enamel (as Black referred to the condition) resulted from developmental imperfections in children’s teeth. This finding meant that city residents whose permanent teeth had calcified without developing the stains did not risk having their teeth turn brown; young children waiting for their secondary set of teeth to erupt, however, were at high risk. Second, they found that teeth afflicted by Colorado Brown Stain were surprisingly and inexplicably resistant to decay…
The water-causation theory got a gigantic boost in 1923. That year, McKay trekked across the Rocky Mountains to Oakley, Idaho to meet with parents who had noticed peculiar brown stains on their children’s teeth. The parents told McKay that the stains began appearing shortly after Oakley constructed a communal water pipeline to a warm spring five miles away. McKay analyzed the water, but found nothing suspicious in it. Nonetheless, he advised town leaders to abandon the pipeline altogether and use another nearby spring as a water source.
…The two discovered something very interesting: namely, the mottled enamel disorder was prevalent among the children of Bauxite, but nonexistent in another town only five miles away. Again, McKay analyzed the Bauxite water supply. Again, the analysis provided no clues. But the researchers’ work was not done in vain.
McKay and Kempf published a report on their findings that reached the desk of ALCOA’s chief chemist, H. V. Churchill, at company headquarters in Pennsylvania. Churchill, who had spent the past few years refuting claims that aluminum cookware was poisonous, worried that this report might provide fresh fodder for ALCOA’s detractors. Thus, he decided to conduct his own test of the water in Bauxite-but this time using photospectrographic analysis, a more sophisticated technology than that used by McKay. Churchill asked an assistant to assay the Bauxite water sample. After several days, the assistant reported a surprising piece of news: the town’s water had high levels of fluoride. Churchill was incredulous. “Whoever heard of fluorides in water,” he bellowed at his assistant. “You have contaminated the sample. Rush another specimen.”
Shortly thereafter, a new specimen arrived in the laboratory. Churchill’s assistant conducted another assay on the Bauxite water. The result? Photospectrographic analysis, again, showed that the town’s water had high levels of fluoride tainting it. This second and selfsame finding prompted Churchill to sit down at his typewriter in January, 1931, and compose a five-page letter to McKay on this new revelation. In the letter, he advised McKay to collect water samples from other towns “where the peculiar dental trouble has been experienced… We trust that we have awakened your interest in this subject and that we may cooperate in an attempt to discover what part ‘fluorine’ may play in the matter.”
McKay collected the samples. And, within months, he had the answer and denouement to his 30-year quest: high levels of water-borne fluoride indeed caused the discoloration of tooth enamel.”
As with all science, these findings had to be replicated, studied again, and discussed in the scientific community. It wouldn’t be until the 1940s that a more controlled experiment would take place in Michigan:
“This finding sent Dean’s thoughts spiraling in a new direction. He recalled from reading McKay’s and Black’s studies on fluorosis that mottled tooth enamel is unusually resistant to decay. Dean wondered whether adding fluoride to drinking water at physically and cosmetically safe levels would help fight tooth decay. This hypothesis, Dean told his colleagues, would need to be tested.In 1944, Dean got his wish. That year, the City Commission of Grand Rapids, Michigan-after numerous discussions with researchers from the PHS, the Michigan Department of Health, and other public health organizations-voted to add fluoride to its public water supply the following year. In 1945, Grand Rapids became the first city in the world to fluoridate its drinking water.The Grand Rapids water fluoridation study was originally sponsored by the U.S. Surgeon General, but was taken over by the NIDR shortly after the Institute’s inception in 1948. During the 15-year project, researchers monitored the rate of tooth decay among Grand Rapids’ almost 30,000 schoolchildren. After just 11 years, Dean- who was now director of the NIDR-announced an amazing finding. The caries rate among Grand Rapids children born after fluoride was added to the water supply dropped more than 60 percent. This finding, considering the thousands of participants in the study, amounted to a giant scientific breakthrough that promised to revolutionize dental care, making tooth decay for the first time in history a preventable disease for most people.”
The study of fluoride and its effects on human health did not stop there, of course. Many other studies done in other places that adopted water fluoridation took place. The findings were all the same: low, non-toxic levels of fluoride in the water reduced the population risk of tooth cavities significantly. As with all science, there was discussion on what is toxic and what is not. Recently, the recommendation for the level of fluoride in the water in the United States was changed to something a little bit lower because we’re trying to hit the sweet spot between lowering the risk of cavities and preventing excess fluoride consumption.
But, alas, the anti-fluoridation activists wiggled their way into the discussion and have managed to squeeze in some junk science. By “junk” I mean studies that are poorly designed, poorly analyzed or whose conclusions are exaggerated or misinformed. Like with anti-vaccine “studies,” these groups and individuals take one observation in one study or two and run with it.
Recently, there was a study published in the Journal of the American Medical Asociation (JAMA) titled “Association Between Maternal Fluoride Exposure During Pregnancy and IQ Scores in Offspring in Canada.” Rivka Green, the primary author of the study, has been making the rounds online and in the media talking about the study. She keeps repeating the study’s main findings: A 1mg increase in fluoride intake by mothers is asociated with a 4-point drop in IQ in boys, but no drop in IQ in girls. (Though I don’t really see her emphasize the difference.)
In this second part of this series, I will explain to you how the study was designed from an epidemiological point of view. In the third part, I will explain the biostatistical analysis and why, no, the study doesn’t show that boys are going to have lower IQs if their mothers consume fluoride. (Shocking, I know.)
So what do I mean by epidemiological? When we epidemiologists study the distribution and causes or risks for a disease or condition in the population, we make sure that the observations we are making are as accurate as possible, and that the results of our observations are grounded in reality, scientific plausibility and that the chance that our observations were just a fluke of how the universe works are minimal.
To do this, we make sure that the groups we are comparing are comparable in as many factors as possible except for the risks and exposures we are analyzing. For example, if we are doing an outbreak investigation from people getting sick after eating at a wedding, we want to make sure that the cases — the people sick — and the controls — the people not sick — are as similar to each other as possible. We’re talking similar ages, genders, backgrounds. Why? Well, if we theorize that it was the beef brisket that caused people to be sick, we wouldn’t compare meat-eating people with vegetarians. If we suspected that it was the margaritas, we wouldn’t compare children with adults.
Of course, it is not always possible to compare people who are alike in every way except for their exposure to a risk factor. This is why we have biostatistical analyses that take into consideration all the different ways in which people can be different and the different chances that their differences could be influencing what we are seeing. Again, we’ll discuss this in the third part.
The Green et al. study took 2,001 women from different parts of Canada who were enrolled in the Maternal-Infant Research on Environmental Chemicals (MIREC) cohort. The MIREC cohort takes samples from the participants and their environment and makes the results of those laboratory analyses available to researchers for studying. As the authors write: “ Women who could communicate in English or French, were older than 18 years, and were within the first 14 weeks of pregnancy were recruited from prenatal clinics. Participants were not recruited if there was a known fetal abnormality, if they had any medical complications, or if there was illicit drug use during pregnancy.”
This is good. If you’re going to look at outcomes in children, you want to get rid of as many confounders as possible when you’re comparing exposures. This way, the few things that explain the observed differences between exposure groups are not things that we know cause developmental delays or intellectual disability in children. So no problems so far.
The children were recruited into the study thus: “A subset of 610 children in the MIREC Study was evaluated for the developmental phase of the study at ages 3 to 4 years; these children were recruited from 6 of 10 cities included in the original cohort: Vancouver, Montreal, Kingston, Toronto, Hamilton, and Halifax. Owing to budgetary restraints, recruitment was restricted to the 6 cities with the most participants who fell into the age range required for the testing during the data collection period.”
This is okay. The part about only recruiting from some of the cities can open the study up to biases because the fluoridation of those cities may be different between the cities (and even within the cities, as some households might be getting water from both fluoridated and un-floridated sources).
Figure 1 from the study shows how the final study group was defined. What troubles me from how they defined their final group is that a lot of mother-child pairs were excluded, and they were excluded for reasons that we need to understand if we are to understand the true extent of fluoride exposure and its effects on intellectual ability of children. For example, some were excluded because they did not drink tap water or lived outside a water treatment zone. Wouldn’t you want to know if not drinking tap water or living outside a water treatment zone led to children with normal-to-high IQs compared to the others?
This raised flags with me because I don’t exclude someone from an outbreak investigation if they don’t have a desired exposure. In fact, I want to know if someone who is not exposed to something is less likely to develop the disease or have the condition I’m studying. It would be like saying that I don’t want women who live in air-conditioned apartments in a city included in a study on Zika because they are not likely to have been exposed to mosquitoes like women living in huts in the jungle.
In the end, they had 369 mother-child pairs with mean urine fluoride (MUF) measurements, IQ measurements and water fluoride data and 400 mother-child pairs with fluoride intake and IQ measurements. But that’s 769 pairs when 610 children were originally considered? Yes, there is some overlap between the two groups. No big deal if they do their biostats right. (Spoiler alert for Part Three: They didn’t.)
They then used data on mean urine fluoride concentrations from spot (one-time) urine samples taken at different points in the mothers’ pregnancies, and they only accepted those who had been tested throughout (i.e. didn’t miss a test). The problem with this is that the standard to really know how much fluoride someone is exposed to — by testing their urine — is a 24-hour collection of urine. In that test, you have someone collect their urine for 24 hours and then we measure the fluoride (or a lot of other chemicals) in that sample. This is because urine concentrations of chemicals vary throughout the day. If you drink a lot of fluoridated water in the morning, then your urine is likely to have higher concentrations shortly thereafter than in the evening, when you’ve been drinking bottled water without fluoride. Or, if you worked out in the morning and drank energy drinks but stuck to only tap water in the evening, your urine fluoride will be different.
Yes, the researchers were using data already collected, so they were limited to the data they had. But this is kind of lost in the message they’re putting out about their study. Though they do accept in the discussion section that there are some limitations to their study, albeit without going into a broader discussion on said limitations.
From the discusion section:
“Nonetheless, despite our comprehensive array of covariates included, this observational study design could not address the possibility of other unmeasured residual confounding. Fourth, fluoride intake did not measure actual fluoride concentration in tap water in the participant’s home; Toronto, for example, has overlapping water treatment plants servicing the same household. Similarly, our fluoride intake estimate only considered fluoride from beverages; it did not include fluoride from other sources such as dental products or food. Furthermore, fluoride intake data were limited by self-report of mothers’ recall of beverage consumption per day, which was sampled at 2 points of pregnancy, and we lacked information regarding specific tea brand.17,18 In addition, our methods of estimating maternal fluoride intake have not been validated; however, we show construct validity with MUF. Fifth, this study did not include assessment of postnatal fluoride exposure or consumption.”
As I’ll show you in the third part, there is a lot of residual confounding in this study. What is residual confounding, you ask? It’s all of the other stuff that your mathematical model fails to account for. But, again, we’ll talk about it in the next post.
Next, they also assessed the daily fluoride intake of the mothers through a survey. The survey was, as the authors point out, not validated. This means that it is hard to know if the survey really measures what it is supposed to measure. Still, they used it, and it leaves the study wide open to recall bias, something you want to minimize as much as possible. And they would have minimized it if they used it a more valid survey, or a prospective design to their study.
First, what is a prospective design? Well, this is when you take a group of women and sign them up for the study, then you carefully measure their fluoride intake with more validated laboratory assays and questionnaires, and then you follow their children and measure their IQ periodically. You don’t do it all retrospectively with already collected data. But, sometimes, what you have is what you have.
Next, what is recall bias? Recall bias is this interesting phenomenon we see when we rely on people telling us their story in order to ascertain risks and outcomes of exposures. We epidemiologists have noticed that people who have bad outcomes tend to be more likely to remember significant exposures. For example, parents of children with birth defects are more likely to remember things like exposures to chemicals or a history of disease in the family. While parents of typical children don’t recall similar exposures as much because, well, they aren’t looking to connect any dots.
(You see this all the time in anti-vaccine circles, where parents of autistic children are more likely to recall bad reactions to vaccines in their children.)
Not only did they use the intake survey, but the authors also did something very interesting. They multiplied the intake of certain drinks by some factors in order to estimate fluoride intake:
“To estimate fluoride intake from tap water consumed per day (milligrams per day), we multiplied each woman’s consumption of water and beverages by her water fluoride concentration (averaged across pregnancy) and multiplied by 0.2 (fluoride content for a 200-mL cup). Because black tea contains a high fluoride content (2.6 mg/L), we also estimated the amount of fluoride consumed from black tea by multiplying each cup of black tea by 0.52 mg (mean fluoride content in a 200-mL cup of black tea made with deionized water) and added this to the fluoride intake variable. Green tea also contains varying levels of fluoride; therefore, we used the mean for the green teas listed by the US Department of Agriculture (1.935 mg/L). We multiplied each cup of green tea by 0.387 mg (fluoride content in a 200-mL cup of green tea made with deionized water) and added this to the fluoride intake variable.”
This complicates things because, as you saw above, they excluded women who were not in places where the water was being treated and women who didn’t consume tap water. But, come on, have you ever met someone who never consumed tap water? Do we not use tap water to cook foods all the time? What about that fluoride intake? And why just multiply for fluoride in beverages and not, say, that delicious Canadian cheese soup I’ve heard good things about?
Finally, they used a standardized test for the IQ of the children. I’ve asked some friends of mine who are experts in childhood development, and they are skeptical of accurate measurements of IQ in children because children develop at different rates depending on a variety of variables. You may have seen this when you look at a classroom or a school play. Children are on a big spectrum of development, with milestones being really more like average moments. But, okay, we’ll take this IQ test for what it is.
Speaking of other factors, these are the covariates mentioned in the paper. But I’ll hold off on the discussion of those for the third post, when we discuss the biostatistics. For now, there are a few key things you need to take with you in closing.
The sample used in this study is not at all representative of all mothers and their children in Canada, not even close. As we saw in the paper, many women were left out of the study for a variety of reasons, and mother-child pairs were also excluded. I want to believe that there were good reasons for this, but I could not find them in the paper. The authors do mention that they wanted to look only at mothers consuming fluoride, but why not include those who were not expected or outright did not consume fluoride in order to really compare two populations of interest?
Finally, the authors mention other studies — some with rats, other purely environmental — where there is some association between fluoride intake and lowered IQ or some sort of negative impact to neurodevelopmental delay. The thing is, public health agencies around the whole world have been looking at these claims and not finding them to be true within their populations. It would appear as if things that happen in Petri dishes are not as easy to happen in complex biological systems, and that studies looking at whole populations without knowing all the variables that go into the outcomes observes are not as good as proper epidemiological studies.
I’m sure you’re shocked by this. And you won’t be shocked to hear that there is some exaggeration of studies’ results: