The Hijacking of Fluorine 18.998, Part Two

NOTE: This is the second of a three-part series on fluoride in drinking water and a recent study about it. You can read part 1 here. You can read part 3 here.

In the last blog post, I told you all about how fluoride has been added to potable water systems in the United States and elsewhere based on the observations of one Dr. McKay. Dr. McKay went to a town in Colorado and noticed that the townspeople tended to have stained teeth. Although their teeth were stained, their teeth were very resistant to tooth decay. The ultimate reason for this was fluoride (a salt containing fluorine) in the water supply. How they got to this conclusion is one example of an epidemiological study:

“Black investigated fluorosis for six years, until his death in 1915. During that period, he and McKay made two crucial discoveries. First, they showed that mottled enamel (as Black referred to the condition) resulted from developmental imperfections in children’s teeth. This finding meant that city residents whose permanent teeth had calcified without developing the stains did not risk having their teeth turn brown; young children waiting for their secondary set of teeth to erupt, however, were at high risk. Second, they found that teeth afflicted by Colorado Brown Stain were surprisingly and inexplicably resistant to decay…

The water-causation theory got a gigantic boost in 1923. That year, McKay trekked across the Rocky Mountains to Oakley, Idaho to meet with parents who had noticed peculiar brown stains on their children’s teeth. The parents told McKay that the stains began appearing shortly after Oakley constructed a communal water pipeline to a warm spring five miles away. McKay analyzed the water, but found nothing suspicious in it. Nonetheless, he advised town leaders to abandon the pipeline altogether and use another nearby spring as a water source.

…The two discovered something very interesting: namely, the mottled enamel disorder was prevalent among the children of Bauxite, but nonexistent in another town only five miles away. Again, McKay analyzed the Bauxite water supply. Again, the analysis provided no clues. But the researchers’ work was not done in vain.

McKay and Kempf published a report on their findings that reached the desk of ALCOA’s chief chemist, H. V. Churchill, at company headquarters in Pennsylvania. Churchill, who had spent the past few years refuting claims that aluminum cookware was poisonous, worried that this report might provide fresh fodder for ALCOA’s detractors. Thus, he decided to conduct his own test of the water in Bauxite-but this time using photospectrographic analysis, a more sophisticated technology than that used by McKay. Churchill asked an assistant to assay the Bauxite water sample. After several days, the assistant reported a surprising piece of news: the town’s water had high levels of fluoride. Churchill was incredulous. “Whoever heard of fluorides in water,” he bellowed at his assistant. “You have contaminated the sample. Rush another specimen.”

Shortly thereafter, a new specimen arrived in the laboratory. Churchill’s assistant conducted another assay on the Bauxite water. The result? Photospectrographic analysis, again, showed that the town’s water had high levels of fluoride tainting it. This second and selfsame finding prompted Churchill to sit down at his typewriter in January, 1931, and compose a five-page letter to McKay on this new revelation. In the letter, he advised McKay to collect water samples from other towns “where the peculiar dental trouble has been experienced… We trust that we have awakened your interest in this subject and that we may cooperate in an attempt to discover what part ‘fluorine’ may play in the matter.”

McKay collected the samples. And, within months, he had the answer and denouement to his 30-year quest: high levels of water-borne fluoride indeed caused the discoloration of tooth enamel.”

As with all science, these findings had to be replicated, studied again, and discussed in the scientific community. It wouldn’t be until the 1940s that a more controlled experiment would take place in Michigan:

“This finding sent Dean’s thoughts spiraling in a new direction. He recalled from reading McKay’s and Black’s studies on fluorosis that mottled tooth enamel is unusually resistant to decay. Dean wondered whether adding fluoride to drinking water at physically and cosmetically safe levels would help fight tooth decay. This hypothesis, Dean told his colleagues, would need to be tested.In 1944, Dean got his wish. That year, the City Commission of Grand Rapids, Michigan-after numerous discussions with researchers from the PHS, the Michigan Department of Health, and other public health organizations-voted to add fluoride to its public water supply the following year. In 1945, Grand Rapids became the first city in the world to fluoridate its drinking water.The Grand Rapids water fluoridation study was originally sponsored by the U.S. Surgeon General, but was taken over by the NIDR shortly after the Institute’s inception in 1948. During the 15-year project, researchers monitored the rate of tooth decay among Grand Rapids’ almost 30,000 schoolchildren. After just 11 years, Dean- who was now director of the NIDR-announced an amazing finding. The caries rate among Grand Rapids children born after fluoride was added to the water supply dropped more than 60 percent. This finding, considering the thousands of participants in the study, amounted to a giant scientific breakthrough that promised to revolutionize dental care, making tooth decay for the first time in history a preventable disease for most people.”

The study of fluoride and its effects on human health did not stop there, of course. Many other studies done in other places that adopted water fluoridation took place. The findings were all the same: low, non-toxic levels of fluoride in the water reduced the population risk of tooth cavities significantly. As with all science, there was discussion on what is toxic and what is not. Recently, the recommendation for the level of fluoride in the water in the United States was changed to something a little bit lower because we’re trying to hit the sweet spot between lowering the risk of cavities and preventing excess fluoride consumption.

But, alas, the anti-fluoridation activists wiggled their way into the discussion and have managed to squeeze in some junk science. By “junk” I mean studies that are poorly designed, poorly analyzed or whose conclusions are exaggerated or misinformed. Like with anti-vaccine “studies,” these groups and individuals take one observation in one study or two and run with it.

Recently, there was a study published in the Journal of the American Medical Asociation (JAMA) titled “Association Between Maternal Fluoride Exposure During Pregnancy and IQ Scores in Offspring in Canada.” Rivka Green, the primary author of the study, has been making the rounds online and in the media talking about the study. She keeps repeating the study’s main findings: A 1mg increase in fluoride intake by mothers is asociated with a 4-point drop in IQ in boys, but no drop in IQ in girls. (Though I don’t really see her emphasize the difference.)

In this second part of this series, I will explain to you how the study was designed from an epidemiological point of view. In the third part, I will explain the biostatistical analysis and why, no, the study doesn’t show that boys are going to have lower IQs if their mothers consume fluoride. (Shocking, I know.)

So what do I mean by epidemiological? When we epidemiologists study the distribution and causes or risks for a disease or condition in the population, we make sure that the observations we are making are as accurate as possible, and that the results of our observations are grounded in reality, scientific plausibility and that the chance that our observations were just a fluke of how the universe works are minimal.

To do this, we make sure that the groups we are comparing are comparable in as many factors as possible except for the risks and exposures we are analyzing. For example, if we are doing an outbreak investigation from people getting sick after eating at a wedding, we want to make sure that the cases — the people sick — and the controls — the people not sick — are as similar to each other as possible. We’re talking similar ages, genders, backgrounds. Why? Well, if we theorize that it was the beef brisket that caused people to be sick, we wouldn’t compare meat-eating people with vegetarians. If we suspected that it was the margaritas, we wouldn’t compare children with adults.

Of course, it is not always possible to compare people who are alike in every way except for their exposure to a risk factor. This is why we have biostatistical analyses that take into consideration all the different ways in which people can be different and the different chances that their differences could be influencing what we are seeing. Again, we’ll discuss this in the third part.

The Green et al. study took 2,001 women from different parts of Canada who were enrolled in the Maternal-Infant Research on Environmental Chemicals (MIREC) cohort. The MIREC cohort takes samples from the participants and their environment and makes the results of those laboratory analyses available to researchers for studying. As the authors write: “ Women who could communicate in English or French, were older than 18 years, and were within the first 14 weeks of pregnancy were recruited from prenatal clinics. Participants were not recruited if there was a known fetal abnormality, if they had any medical complications, or if there was illicit drug use during pregnancy.”

This is good. If you’re going to look at outcomes in children, you want to get rid of as many confounders as possible when you’re comparing exposures. This way, the few things that explain the observed differences between exposure groups are not things that we know cause developmental delays or intellectual disability in children. So no problems so far.

The children were recruited into the study thus: “A subset of 610 children in the MIREC Study was evaluated for the developmental phase of the study at ages 3 to 4 years; these children were recruited from 6 of 10 cities included in the original cohort: Vancouver, Montreal, Kingston, Toronto, Hamilton, and Halifax. Owing to budgetary restraints, recruitment was restricted to the 6 cities with the most participants who fell into the age range required for the testing during the data collection period.”

This is okay. The part about only recruiting from some of the cities can open the study up to biases because the fluoridation of those cities may be different between the cities (and even within the cities, as some households might be getting water from both fluoridated and un-floridated sources).

Figure 1 from the study shows how the final study group was defined. What troubles me from how they defined their final group is that a lot of mother-child pairs were excluded, and they were excluded for reasons that we need to understand if we are to understand the true extent of fluoride exposure and its effects on intellectual ability of children. For example, some were excluded because they did not drink tap water or lived outside a water treatment zone. Wouldn’t you want to know if not drinking tap water or living outside a water treatment zone led to children with normal-to-high IQs compared to the others?

This raised flags with me because I don’t exclude someone from an outbreak investigation if they don’t have a desired exposure. In fact, I want to know if someone who is not exposed to something is less likely to develop the disease or have the condition I’m studying. It would be like saying that I don’t want women who live in air-conditioned apartments in a city included in a study on Zika because they are not likely to have been exposed to mosquitoes like women living in huts in the jungle.

In the end, they had 369 mother-child pairs with mean urine fluoride (MUF) measurements, IQ measurements and water fluoride data and 400 mother-child pairs with fluoride intake and IQ measurements. But that’s 769 pairs when 610 children were originally considered? Yes, there is some overlap between the two groups. No big deal if they do their biostats right. (Spoiler alert for Part Three: They didn’t.)

They then used data on mean urine fluoride concentrations from spot (one-time) urine samples taken at different points in the mothers’ pregnancies, and they only accepted those who had been tested throughout (i.e. didn’t miss a test). The problem with this is that the standard to really know how much fluoride someone is exposed to — by testing their urine — is a 24-hour collection of urine. In that test, you have someone collect their urine for 24 hours and then we measure the fluoride (or a lot of other chemicals) in that sample. This is because urine concentrations of chemicals vary throughout the day. If you drink a lot of fluoridated water in the morning, then your urine is likely to have higher concentrations shortly thereafter than in the evening, when you’ve been drinking bottled water without fluoride. Or, if you worked out in the morning and drank energy drinks but stuck to only tap water in the evening, your urine fluoride will be different.

Yes, the researchers were using data already collected, so they were limited to the data they had. But this is kind of lost in the message they’re putting out about their study. Though they do accept in the discussion section that there are some limitations to their study, albeit without going into a broader discussion on said limitations.

From the discusion section:

“Nonetheless, despite our comprehensive array of covariates included, this observational study design could not address the possibility of other unmeasured residual confounding. Fourth, fluoride intake did not measure actual fluoride concentration in tap water in the participant’s home; Toronto, for example, has overlapping water treatment plants servicing the same household. Similarly, our fluoride intake estimate only considered fluoride from beverages; it did not include fluoride from other sources such as dental products or food. Furthermore, fluoride intake data were limited by self-report of mothers’ recall of beverage consumption per day, which was sampled at 2 points of pregnancy, and we lacked information regarding specific tea brand.17,18 In addition, our methods of estimating maternal fluoride intake have not been validated; however, we show construct validity with MUF. Fifth, this study did not include assessment of postnatal fluoride exposure or consumption.”

As I’ll show you in the third part, there is a lot of residual confounding in this study. What is residual confounding, you ask? It’s all of the other stuff that your mathematical model fails to account for. But, again, we’ll talk about it in the next post.

Next, they also assessed the daily fluoride intake of the mothers through a survey. The survey was, as the authors point out, not validated. This means that it is hard to know if the survey really measures what it is supposed to measure. Still, they used it, and it leaves the study wide open to recall bias, something you want to minimize as much as possible. And they would have minimized it if they used it a more valid survey, or a prospective design to their study.

First, what is a prospective design? Well, this is when you take a group of women and sign them up for the study, then you carefully measure their fluoride intake with more validated laboratory assays and questionnaires, and then you follow their children and measure their IQ periodically. You don’t do it all retrospectively with already collected data. But, sometimes, what you have is what you have.

Next, what is recall bias? Recall bias is this interesting phenomenon we see when we rely on people telling us their story in order to ascertain risks and outcomes of exposures. We epidemiologists have noticed that people who have bad outcomes tend to be more likely to remember significant exposures. For example, parents of children with birth defects are more likely to remember things like exposures to chemicals or a history of disease in the family. While parents of typical children don’t recall similar exposures as much because, well, they aren’t looking to connect any dots.

(You see this all the time in anti-vaccine circles, where parents of autistic children are more likely to recall bad reactions to vaccines in their children.)

Not only did they use the intake survey, but the authors also did something very interesting. They multiplied the intake of certain drinks by some factors in order to estimate fluoride intake:

“To estimate fluoride intake from tap water consumed per day (milligrams per day), we multiplied each woman’s consumption of water and beverages by her water fluoride concentration (averaged across pregnancy) and multiplied by 0.2 (fluoride content for a 200-mL cup). Because black tea contains a high fluoride content (2.6 mg/L), we also estimated the amount of fluoride consumed from black tea by multiplying each cup of black tea by 0.52 mg (mean fluoride content in a 200-mL cup of black tea made with deionized water) and added this to the fluoride intake variable. Green tea also contains varying levels of fluoride; therefore, we used the mean for the green teas listed by the US Department of Agriculture (1.935 mg/L). We multiplied each cup of green tea by 0.387 mg (fluoride content in a 200-mL cup of green tea made with deionized water) and added this to the fluoride intake variable.”

This complicates things because, as you saw above, they excluded women who were not in places where the water was being treated and women who didn’t consume tap water. But, come on, have you ever met someone who never consumed tap water? Do we not use tap water to cook foods all the time? What about that fluoride intake? And why just multiply for fluoride in beverages and not, say, that delicious Canadian cheese soup I’ve heard good things about?

Finally, they used a standardized test for the IQ of the children. I’ve asked some friends of mine who are experts in childhood development, and they are skeptical of accurate measurements of IQ in children because children develop at different rates depending on a variety of variables. You may have seen this when you look at a classroom or a school play. Children are on a big spectrum of development, with milestones being really more like average moments. But, okay, we’ll take this IQ test for what it is.

Speaking of other factors, these are the covariates mentioned in the paper. But I’ll hold off on the discussion of those for the third post, when we discuss the biostatistics. For now, there are a few key things you need to take with you in closing.

The sample used in this study is not at all representative of all mothers and their children in Canada, not even close. As we saw in the paper, many women were left out of the study for a variety of reasons, and mother-child pairs were also excluded. I want to believe that there were good reasons for this, but I could not find them in the paper. The authors do mention that they wanted to look only at mothers consuming fluoride, but why not include those who were not expected or outright did not consume fluoride in order to really compare two populations of interest?

Finally, the authors mention other studies — some with rats, other purely environmental — where there is some association between fluoride intake and lowered IQ or some sort of negative impact to neurodevelopmental delay. The thing is, public health agencies around the whole world have been looking at these claims and not finding them to be true within their populations. It would appear as if things that happen in Petri dishes are not as easy to happen in complex biological systems, and that studies looking at whole populations without knowing all the variables that go into the outcomes observes are not as good as proper epidemiological studies.

I’m sure you’re shocked by this. And you won’t be shocked to hear that there is some exaggeration of studies’ results:

The Hijacking of Fluorine 18.998, Part One

NOTE: This is the first of a three-part series on fluoride in drinking water and a recent study about it. You can read part 2 here. You can read part 3 here.

There is a cultural controversy dating back several decades that is very similar to the anti-vaccine controversy that we are dealing with today. Back in the early 1900s, a dentist in Colorado noticed that many of the town’s native-born people had brown-stained teeth. Not only that, but their teeth were also very resistant to decay. Through observations and testing of different water supplies in different parts of the country — where people had the brown stains and not — Dr. McKay and colleagues discovered that high levels of fluoride in water were to account for the staining and the resistance to decay. (Read more about this by clicking here.)

As it became more and more of a settled scientific concept that fluoride in water led to teeth resistant to decay (after adjusting the level to prevent the staining of the teeth), towns and cities began to add fluoride to their water supplies. Of course, not all the towns did. As is human nature, some of the people in those towns were suspicious of anything being put into their water supply by the government, beneficial or not. So not every town did.

Interestingly, this allowed for natural experiments on the effects of fluoridated water. Time after time, epidemiological studies have shown that fluoridated water leads to less tooth decay. Less tooth decay leads to better health outcomes as poor oral health is a risk factor for a variety of conditions. At the same time, all of these studies failed to see any association between bad outcomes and fluoridation done correctly.

In the modern era, where toothpaste is easily attained by most of us, the level of fluoride in potable water has been adjusted down by environmental health authorities. However, doing away with fluoridation completely would have an adverse effect on those who do not have access to toothpaste and/or dental care. Heck, I have access to toothpaste and fluoridated water, and I still managed to get cavities. Can you imagine if I didn’t?

Anyway, those people who were suspicious of putting fluoride in the water did what people who are suspicious of public health interventions often do: they heard of some bad outcome of ingesting fluoride (which is a compound made up of fluorine, the chemical element), amplified it, exaggerated it and showed it as the ultimate example of what fluoride consumption at any concentration can do to a person. If this sounds familiar to you, it’s because you’ve been reading this blog and you know the shenanigans of the anti-vaccine crowd. They took the effects of methylmercury on health, especially at high levels, then they amplified it, exaggerated it and use it as an example of what happens to everyone who gets a flu shot. Never mind that the mercury in vaccines has always been ethylmercury and at concentrations so low so as to be harmless. Never mind that saying that ethylmercury is mercury is like saying that salt is chlorine or sodium. (You don’t want pure sodium to come into contact with water, even humidity in the air. But you’ll be fine if you add salt to your broth.)

This leads us to the modern anti-fluoride movement, a group of people who are convinced that fluoride in water is, well, a sort of poisoning of the population. They are so vocal in their opposition to fluoride in the water that they actually have managed to convince some towns and cities to stop fluoridation altogether. While some scientists do push back against them, most of us don’t because we don’t see it as such a huge problem as, say, measles. We forget that not everyone can afford toothpaste, and that not everyone can afford a trip to the dentist.

Worst of all, we forget that children can and do die of tooth decay.

To make things even more complicated, some of these anti-fluoridation activists have made their way into scientific studies and publications. Like with anti-vaccine researchers and the junk studies they produce, the studies showing that fluoride causes this or that adverse event at a population level are also pretty junky.

In the next part of this short blog series, I will talk to you about a recent study from the Faculty of Health, York University, Toronto, Ontario, Canada. The study, published in the Journal of the American Medical Association, contends that increased fluoride intake by expectant mothers leads to a drop in IQ in boys. The study was so sensational that there are rumblings about removing fluoride from even more towns solely because of it. JAMA editors even had to publish an editor’s note on why they accepted the paper, and there are plenty of junk news articles that are more alarmist than informing, like this one, this one and this one. Even the American Dental Association (ADA) put out a statement asking everyone to pump the brakes on running with the conclusions of the paper.

And don’t get me started on the conspiracy sites and their take on it.

In consultation with friends and colleagues, we found a lot to be worried about in the epidemiological design of the study and the biostatistical analysis of the resulting data… And, of course, of the conclusions reached by the authors and the press (with some help from the authors). In the next part, I will lay out the epidemiological design of the study and how it is flawed both in terms of its internal and external validity. In the third part, I will lay out the biostatistical analysis issues we observed.

So, just like we had to do in the late 1990s with the Wakefield Fraud “study” that was not a study, here we go fighting a new fight against misinformation…

Sample Sizes and Statistical Significance

Not too long ago, a certain anti-vaccine individual decided that he was going to get his Master of Public Health degree in epidemiology at the same university where I got mine. (It was yet another troubling indicator that has since led me to believe that he wanted to emulate me, be the anti-Ren.) Over the years that he has been active online, writing incredibly racist and troubling things about people who promote the safe and rigorous use of vaccines, this individual has also mocked my epidemiological and biostatistical abilities.

At one point, in a discussion about p-values, he decided that increasing the sample size of a very small study would make the apparent association between a vaccine and a condition more statistically significant. The p-value was borderline significant, and I pointed out that increasing the sample size was necessary, and that it might even make the p-value not statistically significant (greater than 0.05). To this, he commented that I was somehow embarrassing the university where I was working on my DrPH because, in his mind, increasing the sample size of a study only makes the observed p-values more significant.

I didn’t have time back then to educate him on the nature of biostatistics and how there is the very real possibility that increasing the sample size of something may make the p-value of an association less significant. It’s actually effortless to understand the concept if you think about it. Let me explain with an example.


Suppose that you walk into a bar and there are 14 people there. You know from your reading about the bar that people there can wear either a red or a green shirt, and you theorize that people who wear red are more likely to be drinking beer. Under random conditions, the proportion wearing red would be 50%, and the proportion wearing green would of course be 50%. Similarly, the proportions drinking beer and drinking mixed drinks would be 50% each.

So you walk in and see that nine people are drinking beer, and seven of them are wearing red shirts. Of the five who are drinking mixed drinks four are wearing green shirts. Thus, 7 of 8 (87.5%) wearing red are drinking beer, and 2 of 6 (33.3%) not wearing red are drinking beer. Your theory has been proven correct, and you decide to publish your findings. But is that really all that influenced the observed association? What about all the other variables?

A quick chi-square analysis, just for kicks.

You ask the bartender about drinking habits, and he tells you that women tend to drink mixed drinks while men tend to drink beer. You look around, and you see that all ten of your subjects are men. That changes the math. Out of ten men, eight are drinking beer and two are not. Not only that, but seven are wearing red and three are not. Maybe there’s a preponderance of red not because of the beer but because men tend to wear red?

Not only that, but the bartender tells you that you’ve come in on a weekday. “Come in on Saturday,” he says. “That is ladies’ night.” And you do because who doesn’t like ladies’ night? That night, there are 100 people in the bar, the male to female ratio is 1 to 1, and you see that most men are wearing red while most women are wearing green. Furthermore, you see that most women are drinking mixed drinks while most men are drinking beer. So was it the shirts that predicted the drinking? Or was it gender?

Or was it the day (or night) of the week? The bartender also tells you that beers are $1 on Tuesday nights starting at 6pm, just after the nearby [stereotypical male-oriented worksite] lets out. But mixed drinks are $1 on Saturdays starting at 1pm, just after the nearby [place where women congregate]. So maybe gender had nothing to do with it? (You might suggest to the bartender to flip the script so you can see if men will drink mixed drinks when they’re $1 and women will drink beers when they’re $1.)

My examples are ridiculous, I know.

In the ladies’ night scenario, you’ve increased your sample size from ten to 100, and you washed out the influence of shirts on drinking habits and replaced it with the true (maybe?) influence of gender. It happens all the time. It’s the reason why you can’t go with small studies, and the reason why increasing the sample size will not always strengthen the statistical significance (or lack thereof) of your observation. You dig?

A Video Says It All

This video explains this and other sources of bias in studies that we are grappling with as scientists. (The effect I mentioned above is explained around 5 minutes and 55 seconds into the video.)


The online programs in public health have multiplied exponentially, it seems, in the last few years. More and more students are going straight from their undergraduate degree into a public health master’s degree without ever really having much experience in public health (or in life). This is why I’ll be the first one to remind you that an MPH or MHS or MS (or whatever) in public health is meaningless if the person holding it decides to be anti-science and a vitriolic antivaxxer.

Should you not trust someone with such a degree? Yes, but verify that what they say and do jives (or jibes) with what they claim to know. The particular antivaxxer’s criticism of me went unanswered because I really don’t have time for his shenanigans, and everyone who wanted to verify my epidemiological and biostatistical knowledge can just come to my blog or get to know me.

My work speaks for me… And, boy, have I been working a lot lately. But that’s for a later post.

What if Vaccines Are Harmful?

I’m in the process of reviewing an article on the “gut microbiome” and it’s alleged connection to Autism Spectrum Disorder (ASD). The more I review it, the more I keep reading little hints and dog whistles harking back to Andrew Wakefield’s fraudulent study. More on that once I get done with reading and analyzing what they did…

Anyway, I got to thinking about the whole “debate” about vaccines lately, and I was thinking: If vaccines really are harmful, what exactly would we be seeing? Let me start with a story or two, and then maybe you’ll see what I mean.

Kaposi’s Sarcoma and HIV/AIDS

Back in the late 1970s and early 1980s, physicians treating gay men noticed that some of them were presenting with Kaposi’s sarcoma, a rare kind of cancer of the skin. Kaposi’s is caused by a herpes virus that is usually kept in check by a healthy immune system. If something happens to the immune system, the virus multiplies like crazy on the skin and causes the lesions typical of the disease.

At that time, the disease was rare, and it was only seen in a few people who had a condition that would affect their immune system. All of a sudden, relatively healthy, young men were coming down with it. These physicians shared their observations through case studies and with colleagues. Soon thereafter, the public health authorities caught on that something was going on mostly in gay men, and then soon thereafter in the general population. As we know now, what was going was the emergence of HIV as a cause of AIDS, and AIDS as a condition causing a compromised immune system incapable of keeping the virus at bay and triggering the cancer.

Tampons and Toxic Shock Syndrome

Also in the 1980s, young women started to be seen at emergency rooms and at physician offices with symptoms of what turned out to be toxic shock syndrome. Those physicians reported the cases to public health, and epidemiologists put the data together to figure out that something was happening to those women around the time that they were getting their period. A case-control study followed, and it was discovered that a specific brand of tampons was at fault.

See the Pattern?

In both of those instances, the first signs that something was going required contact with healthcare. Physicians made the diagnosis/observation of the disease and then reported it to their peers. Those reports were compiled and public health authorities launched an investigation. Studies were done, both observational (like a case-control study) and experimental, and conclusions were reached. Those conclusions met strict criteria for what is biologically plausible and statistically possible.

So what would it be like if vaccines caused harm?

It’s Not Just VAERS

You might have heard of the Vaccine Adverse Events Reporting System, or VAERS. It is a passive reporting system where anyone and everyone can go and report an adverse event from a vaccine. You can go look at the data right now if you want to. However, it’s not just VAERS that helps see the first “signals” of something being wrong with a vaccine.

There are other systems, like the Vaccine Safety Datalink Safetylink (VSD) and the systems that exist at the state and local level in some jurisdictions. Then there is the competition between pharmaceutical industries. If vaccine A causes harm, then the manufacturers of vaccine B will be all over it to sell vaccine B to the public and tell the public that vaccine A is harmful. And then there is the Food and Drug Administration working in conjunction with local, state and federal public health, academic researchers and private industry to keep an eye on all, uh, foods and drugs.

If something were to happen with vaccines, red flags would go up all over the place. And red flags have very much gone up.

Into… Hold On… Inter… Intussusception?

In the late 1990s, a vaccine against rotavirus (an intestinal virus that causes diarrhea) was put on the market. The phase 0, 1, 2 and 3 trials were done. The data checked out. The Advisory Committee on Immunizations Practices looked at the data and recommended the vaccine be put on the schedule. Babies all over the United States were given the vaccine.

Just like it happened with Kaposi’s and the toxic shock syndrome, babies with intussusception (where their intestine folds in on itself) were taken to emergency rooms and treated for the potentially life-threatening condition. Intussusception is itself a complication from rotavirus infection, and almost 2,000 children would get it each year in the United States.

However, as part of the medical investigation, astute healthcare providers noticed that the children had been recently vaccinated with a particular brand of the rotavirus vaccine. They shared that information among their peers, others reported the same thing, public health was brought in… You know the drill. At the end, the vaccine was pulled from the market.

What had happened was that intussusception was so common before the vaccine, that it was hard to distinguish between the intussusception caused by the vaccine and the intussusception caused by the disease in the population where the clinical trials were done. One group would get the vaccine and the other would get a placebo. The two groups were not large enough to detect the “signal” of the intussusception in the vaccine group. It just seemed like it was the background/expected rate of the disease. So it wasn’t until the vaccine was given to a much, much larger group that the signal was amplified.

But, again, the vaccine was pulled from the market because it became clear that something was wrong. A different manufacturer learned from that mistake and designed their clinical trials accordingly, so they could see the signal in the noise and detect any problems. That is the vaccine that we now use in the United States. And, guess what? Not only did the cases of rotavirus precipitously drop after the vaccine, but so did the cases of intussusception. Why? Because natural rotavirus infection causes intussusception at a much higher rate than the vaccine. No rotavirus, not intussusception.

Ah, but most cases of intussusception are now being caused by the vaccine, right? Just like most polio cases are being caused by the polio vaccine? Well, yeah, sure, but you have to look at the alternative. The alternative is a massive increase in both rotavirus infection (diarrhea, and lots of it) and intussusception from rotavirus infection. Uh, no thanks!

Back to the Original Thesis of This Blog Post, if There Was One

So we hear a lot of stories from people who are generally anti-vaccine telling us that vaccines cause this or that. They point at general prevalence numbers on autism or food allergies and they say, “See? These things used to be rare, and now we see them everywhere because there are so many vaccines!”

Image result for autism and graph correlation
It’s the organic food, not the vaccines, right?

However, time after time, study after study, nothing is really found to be caused by vaccines at a population level except the decrease in the incidence (new cases) of diseases that used to kill children by the thousands in the pre-vaccine era. So I ask again: What would we be seeing?

We would see healthcare providers — lots of them — sharing case reports of children (since they’re the ones getting most vaccines) developing some otherwise rare condition in the hours or days after being vaccinated. Those reports would be compiled and reported to public health. Case control studies would be done. Competing pharmaceutical companies would jump in. Academic researchers would also start looking at things closely.

There would be, literally, hundreds of thousands of people looking into what was going on, and we would all know the results because you can’t really keep a secret between hundreds of thousands of people.

This is exactly what happened when the first accusations of autism being caused by vaccines came out in the 1990s. Andrew Wakefield’s fraudulent study was impressive because he was claiming something that was unheard of. Pediatricians the world over looked at each other and were like, “Uh, I’ve seen autistic children be autistic before they were vaccinated.” Others, like me (a humble medical technologist at the time) looked at the biologic plausibility of what that fraudulent study was proposing. It didn’t make any sense that, first, the measles virus from the vaccine would migrate from the injection site to the intestine without being torn to shreds by the immune system and without causing classic measles disease and, second, that anything happening in the gut would trigger the set of behavioral deviations-from-the-norm that we see in autism.

I mean, we’ve seen the commercials about “you’re not you when you’re hungry,” but to take it a million steps forward and say that something in the gut messes with your brain development?

That’s a stretch.

But We Kept Looking, Goddammit!

Just this year, in 2019, twenty years after Wakefield’s fraudulent study, a study from Denmark came out looking, once again, at the possible link between vaccines and autism. Guess what? No link. Instead of investing that money into worthwhile things, like accommodations and services for autistic children and adults, we are still chasing the biologically questionable and statistically improbable link between vaccines and autism.

And I say statistically improbable because about 20% of the US population is between 0 and 14 years of age. That’s over 60 million children! With vaccination rates higher than 90% in the whole country, that’s over 54 million children who received a vaccine at one point in their lives. If autism was really caused by vaccines, we would be seeing millions upon millions of new autism cases. Instead, we see a prevalence estimate that is still around 3% of all children, something the experts have long claimed is the true prevalence of autism in the human population throughout time.

Yes, there have been autistic people throughout time.

Can We Move On?

It’s going to be hard to move on and move away from the vaccine “debate” as long as certain things are true. First, as long as we have celebrities claiming to know more than scientists, and people willing to follow the celebrities instead of the science, we are going to have anti-vaccine people debating the science. Second, as long as scientists continue to be careful and indecisive in their response to questions about vaccine safety, we are going to have distrustful parents. And, third, as long as we have a pharmaceutical industry that operates like a black box where drugs and vaccines come out without much explanation of the science that went into making them, we are going to have people not trusting “Big Pharma.”

Can we move on? Yes. Will we?

Which Is Better? A False Positive? A False Negative? A True Positive? Or a True Negative?

It’s the age-old question of laboratory test and analyses, “How accurate is this?” The answer to this question is always, “It depends…” This answer is then followed by some lengthy explanation of what is best for the person being tested. When it comes to individual medical decisions, these discussions are best when had by a healthcare provider and the patient, not the patient and Google. But what about a question at the population level?

Take, for example, influenza surveillance. When I started working at a state health department, one of the first things I did was to reach out to clinical laboratories and ask that they provide the number of rapid influenza tests and their results. This would help me inform the public and public health workers of when and where influenza was active. But I had to keep in mind the performance of these tests as well as the prevalence (the existing cases of a disease) of influenza in the places where the tests were being done.

The rule of thumb is: If prevalence is low, then false positive rate will be high. If prevalence is high, then the false negative rate will be high. It’s all based on math and how that math breaks down on a 2×2 table based on a test’s sensitivity and specificity. Sensitivity is the probability that the test will detect a disease when the disease is there. Specificity is the probability that the test will be negative when there is no disease.

Let’s say that a test is 99% sensitive and 99% specific. That’s pretty good, right? It will catch 99% of all true cases with a positive test, and it will rule out 99% of non-cases with a negative test. If you have ten minutes, here’s how I explain it…

If you don’t have the ten minutes, then just know that there are four categories being looked at: TRUE positives, FALSE positives, TRUE negatives and FALSE negatives. As prevalence increases, the chance that a positive test is true increases. You have more true positives. The chance of a false positive decreases. Likewise, the chance of a negative result being a true negative decreases as prevalence increases.

So we go back to the question of what you want to achieve… If you are a physician and you want to catch the most number of cases, then you want the patients that you’re testing to be in a group with the highest prevalence. This is why healthcare providers will ask you all sorts of questions before you get tested. They want to make sure you fall into the categories for testing that will yield the highest POSITIVE PREDICTIVE VALUE. They want that positive test to have the highest chance of being a true positive. They also want to miss the fewest number of cases possible by increasing the chances that a negative test is negative, or having the highest NEGATIVE PREDICTIVE VALUE. There is a “sweet spot” when it comes to prevalence where this happens, but that’s for a whole other lecture.

Now, if you are an epidemiologist working an Ebola outbreak, then you don’t want to have false negatives that end up being sent home to infect others. You want that number low. Do you care about false positives? Well, maybe not if the therapy won’t kill someone, or maybe you do if a positive test means being put into a ward with people who are sick. It’s a delicate balancing act.

What about pregnancy tests to take at home? You probably don’t worry too much about false negatives (pregnant women who test negative) because those women will still be pregnant and probably take the test again if they continue to miss their period or feel other signs/symptoms of pregnancy. And you maybe care about false positives because a positive test means a trip to the obstetrician, blood work, and (if you’re anything like me) an ensuing panic of epic proportions for the would-be dad.

If you’re me and you just want to keep tabs on flu activity, you don’t say that the flu has arrived based on a screening test. You use a gold standard test for influenza, like a viral culture or a polymerase chain reaction test. Once the gold standard is positive, then you know the virus has arrived, and the chances of screening (aka “rapid”) tests being true influenza cases rise to tolerable levels. Once you stop seeing positives on gold standard tests, or you see that a lot of the rapid tests were in people without symptoms, then you stop using it as a marker of influenza activity.

Again, it’s all a balancing act. It’s kind of like the justice system. You want the chances of an innocent person going to jail to be as low as possible, so you set up all sorts of systems. You also want the chances of a guilty person to be as high as possible to protect the population from criminals, so you set up those systems. You’re still going to have innocent people going to jail and criminals getting out, but it’s all about minimizing it. (Don’t get me started on how the current justice system in the United States is failing at this.)

Now you know why a test that is 99% accurate (99% sensitive and 99% specific) is still going to throw out a lot of false positives or false negatives, because it’s about prevalence. If you’re a healthy person in the middle of the summer in the United States, and you haven’t traveled abroad or work with pigs/chickens, then you probably will not get tested for the flu. There’s a very high chance that you’ll test positive when you’re not. On the other hand, if you’re feeling miserable, it’s the middle of winter in the United States, and you have been around other sick people, then you have a very low chance of testing negative when you are indeed sick.

These are the kinds of things that one needs to think about very, very carefully when using a screening test of device. But you also need to think about the population you’re testing in general, the individuals you’re testing in particular, how they would benefit or be hurt by the test results, and whether or not you should just use the gold standard or diagnostic (not screening) test instead if your degree of suspicion is high enough to warrant it.

What worries me is a researcher who sees too many false positives or too many false negatives and gets all riled up over them without seeing the bigger picture. Maybe, in the situation you are describing, too many of either is not bad. Maybe the proportion of each (i.e. the Positive/Negative Predictive Value) is really what you should be worried about? Context matters when dealing with these things. And context is something epidemiologists need to have in mind when interpreting results of their research, especially if they’re calling for any kind of action.

Don’t you love thinking of all the possible scenarios?

I do.

A Quick Geographical Analysis of Homicides in Baltimore Before and During the Current Homicide Epidemic

There was a lot of police movement yesterday as I headed home. By the time I did get home, I found out via the news that a homicide detective had been shot in the head. His condition is very severe, and the prognosis is poor according to all reports. What a lot of people — from talking heads to politicians — are saying today is that “something” needs to be done about the violence in Baltimore. Indeed, homicides are at an all-time high in terms of per capita rates.

Baltimore Homicides per 10,000 Residents 1970-2016
There has been both a population decline and an increase in homicides, leading to the elevated rates.

When you look at the rolling average number of homicides per day for the previous 365 days, you can clearly see when the acceleration started (around April 2015) and how it has not stopped:

The red dot represents the day of the Freddie Gray riots.

The graph above is a little hard to understand so let me tell you how I created it. First, I took the number of homicides the year before any given day. So, for example, if there were 300 homicides from April 15, 2014, to April 15, 2015, then the number plotted is 300/365, or 0.82. Then, on April 16, 2015, I look at the number of homicides from April 16, 2014 to April 16, 2015. If there are no homicides on April 16, 2015, then the number plotted is 299/365, or 0.819… And so on each day.

As you can see, sometime around April of 2015, shortly before the Freddie Gray riots, this rolling average began to increase. It then increased sharply the rest of the year and into 2016. At one point, there were almost 365 homicides in the previous 365 days. Then there was a decline until early 2017, when the number climbs again. This rolling average eliminates the artificial boundary of date/time conventions in the calendar. Instead of comparing the homicides from, say, January to March of one year to the previous year, this average allows you to see the previous 365 days in context and without the calendar as a “bin” in which to classify the numbers.

Interestingly, but not surprisingly, the shooting happened a couple of blocks from an area of Baltimore that has seen a marked increase in the rate of homicide per 10,000 residents since the violence started to pick up in 2015. Before this shooting happened, I created a map using some publicly available data to look at what places in Baltimore had seen an increase or decrease in homicide rates since the epidemic began. (I’m defining the epidemic period as 2015 to the present. For the map, I defined it as 2015 and 2016.) The map is on the header to this blog post and linked below. It’s a big file, so I couldn’t fit it into the body of this blog post.

Click here for the full map (15MB)

So here’s what I did…

  1. Take the list of homicides since 2005 provided to me by a Baltimore Sun reporter. The Baltimore Sun has been tracking these homicides since 2004, or so, and keeping close tabs on the outcomes of investigations. They have a very nice, accessible web page with the cases and mapping of their locations.
  2. I geocoded the list to make sure the addresses of the incidents were plotted correctly on a map. Since most addresses (about 99%) are at the block level, meaning that we only get the block of the location of the incident and not the actual address, there may be a slight error of a few meters on where the incident occurred. Still, this should not affect the results as very few incidents happened on a boundary line where I would have to make an arbitrary decision on where exactly the incident happened (which side of the line?).
  3. I then took information from the Baltimore Neighborhood Indicators Alliance (BNIA) and the US Census Bureau to look at the demographics — i.e. the population count — of the different Community Statistical Areas (CSAs). The BNIA uses CSAs instead of individual neighborhoods to promote more homogeneity in the area that they’re analyzing. For example, a neighborhood might overlap two census tracts, so it is easier to lump it with the rest of the neighborhoods on those tracts and get better numbers from the Census Bureau.
  4. I took the homicide counts per year in those CSAs and derived an average per year for the two time periods: the pre-epidemic period of 2005-2014 and the epidemic period of 2015-2016.
  5. I then took the population size for the CSAs and determined the average rate of homicides per CSA for the two time periods.
  6. I then subtracted the rate in the pre-epidemic period from the rate of the epidemic period, resulting in an absolute change in homicide rates per year per 10,000 residents.
  7. Finally, I color-coded the CSAs according to how much that rate changed between the two time periods.

Here are my results:

Out of 55 CSAs:

  • 16 declined in the yearly homicide rate (Green on the map)
  • 21 had an increase between 0 and 2.0 homicides per 10,000 residents (Cream on the map)
  • 9 had an increase between 2.1 and 4.0 homicides per 10,000 residents (Yellow on the map)
  • 8 had an increase between 4.1 and 8.0 homicides per 10,000 residents (Orange on the map)
  • 1 CSA (“Poppleton/The Terraces/Hollins Market“) had the highest increase. (Red on the map)
    • 2005-2014: the homicide rate was 5.31 homicides per 10,000 residents per year (3 homicides per year).
    • 2015-2016: the homicide rate was 19.66, for a change of 14.35 homicides per 10,000 residents per year (10 homicides per year).

The shooting of the detective was two blocks north of the red area on the map, an area that has seen plenty of violence in the last few years. This morning, the Mayor, State’s Attorney, Police Chief and other authorities were all talking about how “something” needs to be done. This confuses me because it makes me think that nothing was being done in the previous days, months, or years where crime in the area has been very high… Or that those deaths and those shootings didn’t matter as much as the shooting of a detective. (Though, let’s face it, society places a higher value on the detective than, say, a teenage gang-banger.)

The map does give a ray of hope, though. The Cherry Hill CSA has historically been a violent place with plenty of gang activity and youths involved in violence, among other criminal activity. Between the two time periods I looked at, the average homicide rate per year dropped slightly. This is significant in that many other similar CSAs — similar in size, demographics, socioeconomics, etc. — saw an increase or no change between the two time periods. So something happened in Cherry Hill. Something worked.

What that something was is for another blog post at another time. For now, you have the results of a very quick and somewhat dirty (as I’m using not-so-clean data from news reports and not “clean” data from, say, the Maryland Violent Death Reporting System) analysis of the situation. There is still plenty of work to be done to fully understand what his happening and what is being done about it. Some of these things that are happening are happening at the individual level. Others are happening at the neighborhood level. And others at the institutional level. Whatever the interplay is, it’s costing a lot of lives.

What we should have been doing all along: Translational Epidemiology

When I was applying to get into the DrPH program, the interviewer — who would later become my academic advisor — asked me for my thoughts on Translational Epidemiology. Translational Epidemiology (TE) is the use of epidemiology in different stages between identifying a population-level problem to identifying a solution for it, to evaluating what that problem was. It is presented in four phases:

“In T1, epidemiology explores the role of a basic scientific discovery (e.g., a disease risk factor or biomarker) in developing a “candidate application” for use in practice (e.g., a test used to guide interventions). In T2, epidemiology can help to evaluate the efficacy of a candidate application by using observational studies and randomized controlled trials. In T3, epidemiology can help to assess facilitators and barriers for uptake and implementation of candidate applications in practice. In T4, epidemiology can help to assess the impact of using candidate applications on population health outcomes.”

Take this a little further, and a little to the left or right, and you have epidemiology and epidemiologists who guide policy. They identify the problem, look at the evidence for solutions, and then they evaluate the implementation of those solutions. When done right, the decisions made are science-based, and they are the right decisions for the right time.

Dr. Moyses Szklo, a professor at Johns Hopkins and also the editor-in-chief of the American Journal of Epidemiology, gave a talk at Harvard about translational epidemiology:

“Epidemiology is the study of patterns, causes, and effects of health in defined populations. Szklo defined “translational epidemiology” as the effective transfer of new knowledge from epidemiologic studies into the planning of population-wide and individual-level disease control programs and policies.

In addition to Snow’s famous work, Szklo cited a number of other public health policies influenced by epidemiologic findings, including cigarette advertising bans, food labeling requirements, and air pollution standards.

Szklo also discussed a variety of issues to think about when “translating” epidemiologic knowledge into interventions, programs, or policies. For example, he said, it is important to consider whether or not a particular association between one risk factor and a disease is “confounded”—if it is to some extent questionable because there are one or more other risk factors also at play.

“Translational epidemiology is not an exact science,” Szklo noted. “It’s judgment.”

In a question-and-answer session at the end of the presentation, HSPH’s [[Walter Willett]], chair of the Department of Nutrition and professor of epidemiology and nutrition, asked what Szklo thought of the notion that epidemiologists should not become involved in policy because it makes them less objective in evaluating their data.

Szklo acknowledged that while such involvement might pose a problem, “I don’t think it’s possible to talk about development of [health-related] policies without strong input from epidemiologists.””

I agree. It may be difficult to separate subjective judgment from objective evidence, but there’s really no good way around it. Policymakers need to act on the best science and evidence instead of acting on their gut instinct. (Gut instincts are notoriously off most of the time.) They need to be surrounded by people who know how to collect, analyze and interpret the wealth of information out there on a myriad of issues.

Unfortunately, there are plenty of policymakers who want nothing to do with science, evidence, or even with reality. That, or we epidemiologists shy away from having conversations with policymakers. We think that policy is not our purview, and that our purview is just to apply for research grants, do the research, get it published, and move on to the next bit of research… When, for all that time, we should have been doing the research, publishing the results, and then advocating for policy changes based on those results. We should have been calling our members of congress and telling them to do something based on what we found.

Because we haven’t been doing that, we ended up with a crappy healthcare system in the beginning and a very imperfect solution in the Affordable Care Act. Furthermore, because the policymakers think they know better and won’t listen to experts, we have the atrocity of a bill that cleared the House of Representatives recently. And don’t get me started on how the US Government has responded to Ebola, Zika and during the H1N1 influenza pandemic. Very few epidemiologists were being listened to then.

Non-Biostatistician, Non-Epidemiologist Tries to Complain About Biostats and Epi

Don’t you love it when people who don’t know better think that they know better, and then they end up making fools of themselves? There is a particularly interesting anti-vaccine man by the name of Brian S. Hooker. He has a doctorate in biochemical engineering, according to his Wikipedia page. Maybe you remember BS Hooker from his foray into epidemiology, which went fantastically terrible. So bad was his “re-analysis” of a study looking into the MMR vaccine and its association with autism that the journal in which his “re-analysis” was published had to retract the paper and apologize for ever letting it into the wild.

I also tore the paper a new one here, here, and here.

Anyway, BS Hooker has decided to dive into biostatistics this time. He wrote a letter to the editor about a study looking into the influenza vaccine given to pregnant women and autism diagnoses in children born to those women. Here’s what he wrote. It’s a bit long, so bear with me:

“To the Editor: The JAMA Pediatrics article by Zerbo et al reported a statistically significant association between the administration of the maternal influenza vaccine in the first trimester of pregnancy and the incidence of autism spectrum disorder. The authors stated that the analysis adjusted for covariates yielded a P value of .01 when applying a Cox proportional hazards regression model to the data.

However, this P value was erroneously adjusted to reduce the possibility of type I errors by applying the Bonferroni adjustment for 8 separate analyses completed on the data sampling. Using this adjustment, the authors stated that this association “could be due to chance (P = .10).” In this instance, it is inappropriate to apply a Bonferroni adjustment because the associations were highly interdependent, contrary to the independence assumption used by the adjustment. This can be seen by the fact that knowing the results for each trimester will yield the result for the total period.

In the Zerbo et al article, comparison is made of the autism spectrum disorder incidence in each of 3 groups depending on the trimester in which the mother received the influenza vaccination against the autism spectrum disorder incidence in a “zero exposure” control group. Rather than a set of independent tests where “set A” is compared with “set B,” “set C” is compared with “set D,” and so on, in this instance, all maternal vaccinated data sets were compared with the same control set (ie, the unvaccinated sampling). In addition, in a fourth comparison, 3 sets were combined for a comparison of vaccination in any time during pregnancy to the unvaccinated control set. Thus, the full data set in this case was a dependent combination of the data from the first, second, and third trimesters in pregnancy.

Bland and Altman 1995 warned against the use of the Bonferroni adjustment when associations are correlated and cite the danger of missing “real differences.” The study authors apply a degree of caution regarding the autism spectrum disorder finding for influenza vaccination in the first trimester of pregnancy by stating that the findings “suggest the need for additional studies on maternal influenza vaccination and autism.” However, the application of the Bonferroni adjustment in this instance is inappropriate. Furthermore, the use of any adjustment for the first trimester is especially questionable because it has long been suspected a priori that an effect, if any, is likely to be concentrated in that trimester.”

My emphasis in bold.

The explanation to a lay audience is the following… There are two big types of errors you can make in conducting a study, Type I and Type II. A Type I error is when you fail to reject the null hypothesis that there is an association between an exposure and an outcome when, in fact, there is no association. In essence, you have a false positive. A Type II error is when you fail to reject the alternative hypothesis that there is no association when, in fact, there is an association. In essence, you have a false negative.

There is always a balance between these two errors, but it’s the Type I error that you want to avoid the most. (This all depends on the impact of your study, but, for academic purposes, it’s the Type I error that is the big one.) If you commit a Type II error, well, you might get to try again at a later time. In the gibberish above, BS Hooker is trying to say that, in making their adjustment, the authors of the study not only did away with a statistically significant result (p-value less than 0.05), but they also increased the chances of false negatives happening. (They did increase that chance of false negatives. More on that later.)

Furthermore, BS Hooker warns that there was a violation to the assumption of independence between the observations. The observations, in this case, were giving the influenza vaccine at trimester 1 or 2 or 3. As you can imagine, there is a dependence between these three observations since, if you don’t give the vaccine at trimester 1, then you must give it at 2 or 3. If you give it at 2, then you must give it at 1 or 3. If you give it at 3, then you must give it at 1 or 2. However, the problem with the last two statements is that you cannot go back in time. That is, if you don’t give it at 2, there’s no way you can go back and give it at one. If you don’t give it at 3, you’re not giving it at all.

Thus, there is independence, of sorts. The analysis is valid. (More about the “dependence/independence” thing later.)

The other thing that I found interesting was that BS Hooker wanted to compare one group to another, one by one. This is the same mistake he made in his “re-analysis” of the MMR-autism study. Doing it that way misses the interactions between different factors in the analysis. That’s why you do the more complex analyses, the less “simple” statistics that give you more realistic results.

What is that Bonferroni Adjustment he speaks of, though?

In a study, you want to keep the chance of a Type I error at less than 5%. That’s the p-value. It’s basically saying that you’d have to replicate the study 100 times to see 5 or more false positives, which is unacceptable. If you have a p-value less than that, you say that the probabilities of your association being a false positive are very low, so your results are “statistically significant.”

But what if you’re doing a bunch of different comparisons at the same time with the big dataset? This paper explains it very well:

“Say you have a set of hypotheses that you wish to test simultaneously. The first idea that
might come to mind is to test each hypothesis separately, using some level of significance α. At first blush, this doesn’t seem like a bad idea. However, consider a case where you have 20 hypotheses to test, and a significance level of 0.05. What’s the probability of observing at least one significant result just due to chance?

P(at least one significant result) = 1 − P(no significant results) = 1 − (1 − 0.05)20  ≈ 0.64

So, with 20 tests being considered, we have a 64% chance of observing at least one significant result, even if all of the tests are actually not significant. In genomics and other biology-related fields, it’s not unusual for the number of simultaneous tests to be quite a bit larger than 20… and the probability of getting a significant result simply due to chance keeps going up. Methods for dealing with multiple testing frequently call for adjusting α in some way, so that the probability of observing at least one significant result due to chance remains below your desired significance level.”

The Bonferroni Adjustment takes care of that by dividing 0.05 (or whatever your desired level of probability is) by the number of comparisons (hypotheses) being tested. In the case of the paper that BS Hooker seems to be trying to discredit, the formula is more like this:

P(at least one significant result) = 1 – P(no significant results) = 1 – (1 – 0.05)8 ≈ 0.34

So, in this study, you’d have about a 34% chance of committing a Type I error. That’s pretty high. Imagine the consequences of a false positive in this case. Influenza can kill a pregnant woman and her child. At the very least, influenza in a pregnant woman is serious business. Using the Bonferroni Adjustment, the authors correctly diminished the probabilities of a false positive. Yes, they increased the probability of a false negative, but what’s the harm in that? What’s the harm in seeing no association between the influenza vaccine and autism when there might be one? Probably none, given that autism is nowhere near as bad as, say, death… Or all the other complications from influenza.

But the true sign of an anti-vaccine believer is to compare autism to death, to say that autistic children might as well be dead. That’s where they make their bread and butter. It’s a trope as old as the false association between vaccines and autism.

You don’t have to take my word for it, though. The authors of the study slapped down BS Hooker’s assertions themselves in a response to his letter to the editor. A response that, in my opinion, didn’t need to be done. BS Hooker is not a biostatistician, nor is he an epidemiologist. Why he continues to dabble in these disciplines is beyond me, though some have suggested to me that he’s doing it because vaccines causing autism are his only lifeline to a cash reward in the vaccine court, a claim denied last year. If he can somehow tie his child’s autism to a vaccine — any vaccine, at this point, given how he’s gone after the MMR and now influenza vaccines — maybe he can revive his claim?


Anyway, here’s the authors’ response, my emphasis in bold:

“In Reply: We appreciate the comments presented by Donzelli and colleagues and Hooker about our study titled “Association Between Influenza Infection and Vaccination During Pregnancy and Risk of Autism Spectrum Disorder.” Statisticians and epidemiologists have debated at length whether this type of epidemiologic study should adjust for multiple testing, and no consensus has been reached. We used the conservative Bonferroni adjustment following suggestions received from JAMA Pediatrics reviewers. We agree with Donzelli et al and Hooker that the 3 trimesters are not independent of the entire pregnancy period. However, a less-conservative adjustment for multiple testing, accounting for the dependence of the entire pregnancy on the trimesters, would still yield a P value of .07 or higher, which should not change interpretations of our findings.

We do not see enough evidence of risk to suggest changes in vaccination guidelines and policies, but additional studies of maternal influenza vaccination during pregnancy are needed.”

(Donzelli et al, by the way, wrote a letter to the editor that was less fallacious than BS Hooker’s, in my opinion. You can read it here.)

But wait, the authors admit that there was dependence. Yeah, that’s why I wrote that there is independence “of sorts.” See, the design of this study leads to some dependence between the time periods when you give the vaccine, but, because of temporality, it leads to independence because you can’t say that women were given the vaccine in the first or second trimester because they were not given it in the third. Likewise, you can’t say that giving the vaccine in the third trimester caused them not to get it in the first or second… Or that not giving it in the third assured that they got it in the first or second. And so on.

Policy is not only about statistical significance.

In the end, good policy decisions are not made solely based on one scientific study. Heck, good policy decisions sometimes are not made based on a hundred studies. Good policy decisions require people who can see the forest for the trees, the big picture, if you will. When the government was looking at the anthrax vaccine for use in children, Dr. Paul Offit (a “vaccine industrialist,” according to his detractors) opposed using the vaccine in children. It’s not that the vaccine wouldn’t be safe or effective in children. It’s just that the risk of them catching anthrax is negligible compared to, say, a soldier on the front line of a war where the opposing army is known to have a bioweapons program.

In essence, you weigh the pros and the cons of a vaccine both under ideal conditions (i.e. clinical trials and such) and under real-world conditions (i.e. taking into account the risk of the disease in the general population). You certainly don’t do it based on one study, Bonferroni adjustment or not, and you certainly don’t do it based on the thoughts of someone who is not a biostatistician nor an epidemiologist, and someone who likes to do biostatistics the “simple” way.

(Special thanks to The Spaniard for his review of this blog post for accuracy regarding the biostats.)

Another day, another bad anti-vaccine study

Let’s say that you think food A caused disease B. To test your theory, you get cases of people who got B and controls of people who did not get B. Then you compare the odds of exposure to A. The ratio between the odds is called the odds ratio, and anything significantly different from 1.0 means there’s some sort of an association (if both ratios are equal, two equal odds divided by each other are equal to one). For these case-control studies, it is very important to choose your cases and controls carefully. If you are not careful, you mess up the odds of exposure, messing up the odds ratio. When that happens, you make it look like two things are associated when they’re not.

This is what we in the biz call “bias.”

Now, let’s say that you’re dead-set on blaming A for B. But, as you get older, you’re more likely to be exposed to A, and you only get diagnosed with B as you get older. Can you see where, even if there is no association between A and B, you could see one because time moves forward? You see an example of this in real life with Nobel Prize awards. You would think that only older adults get Nobel Prizes, or that being old is associated with getting a Nobel Prize. The problem is that all the work you need to do to earn the prize takes time, so you can only get it after working in your field a long time. Very rarely will someone very young get it.

Get it?

So why am I writing all this? I got a tip from Todd W.  (you should go read his blog) of a study by some known anti-vaccine luminaries. The study, published in Brain Sciences (an online and open source journal) attempts to link the Hepatitis B vaccine given at a young age (usually between birth and six months in American children, but available to people of any age) and Hyperkinetic Syndrome of Childhood (HKSoC). Guess what the anti-vaccine activist “researchers” found?

Go ahead and guess. I’ll give you a few seconds.

Surprise! They found an association between getting the dreaded “mercury” (aka thimerosal) in Hepatitis B vaccines to a diagnosis of HKSoC later. My guess is that they would have published nothing if there was no observed association, given that at least one of the authors — Brian S. Hooker — has an ongoing case in the vaccine court. In my opinion, it is essential for BS Hooker to find an association between vaccines and almost anything so that his case can have a little more heft to it. A couple of the other authors bill themselves as vaccine experts, although they have some questionable credentials. So let’s start there.

Not too long ago, David Geier (“The Son”) and Mark Geier (“The Father”) got into hot water in Maryland because of The Father’s medical misadventures. For a while, The Father and The Son diagnosed children with autism as having an overabundance of testosterone and treated them accordingly. That is, mistreated them. There is no evidence that testosterone causes any aggravated symptoms of autism. As Todd W. reports:

“Dr. Geier, through his Institute of Chronic Illness and Genetic Centers of America, misdiagnosed autistic children with precocious puberty so he could claim that he was using Lupron on label, rather than for an unapproved, experimental indication (i.e., autism). This also allowed him to bill insurance companies for the lupron. His actions got him into hot water with various state medical boards, starting with his medical license in Maryland being suspended on April 27, 2011. Since then, one by one, 11 of his 12 medical licenses were suspended, an application for a thirteenth license in Ohio was denied, and some of those suspensions became complete revocations. The last actions I wrote about were the revocation of his license in Missouri and suspension of his Illinois license. At the time, the only state left in which Dr. Geier could practice was Hawaii.

As of April 11, 2013, that is no longer the case.

Although his license listing on the Hawaii state Professional and Vocational Licensing search has yet to be updated as of this writing, searching the state’s RICO Complaint History database reveals that the board revoked his license last month. The case number is MED 2011-79-L (if the link to the PDF doesn’t work, go to the OAH Decisions web site, click on OAH Final Orders and search for “Geier”). According to the Final Order, a petition for disciplinary action against Dr. Geier was filed on July 17, 2012, Geier was notified on November 19, and a hearing was held on February 5 of this year. Dr. Geier failed to appear for the hearing and did not file for exceptions or extensions to delay the hearing.”

Not only that, but The Son (David Geier) was charged with and found guilty of practicing medicine without a license.

In other words, the authors have more conflicts of interests than what they led on in the study’s COI statement:

“All of the investigators on the present study have been involved in vaccine/biologic litigation.”

To say the least.

But what about the study? Is it any good? No, it’s not good. Before I tell you why it’s not good, let’s talk about their conceptual framework. These anti-vaccine luminaries’ theory is that exposure to the Hepatitis B vaccine and the thimerosal therein leads to HKSoC. To test their theory, they used the Vaccine Safety Datalink (VSD), a database maintained by the Centers for Disease Control and Prevention (CDC) where vaccine outcomes are tracked. Basically, if you are part of a healthcare system that reports to VSD, the vaccines you get (their lot numbers, dose, etc.) are reported to the database. If you ever get sick and go to the same healthcare system, those following visits (for whatever cause) are also reported to VSD. Anyone with the time and money can then “dumpster dive” through the data and try to come up with something to publish.

This is not to say that the VSD is a bad idea or a waste of time. Serious scientists who account for all possible sources of bias — and thus all possible confounders — can get a lot from the data in it. It’s actually the reason we know that HPV doesn’t cause all the horrible things that anti-vaccine people attribute to that vaccine, or any vaccine. There are millions of doses given to millions of people and reported in the VSD, and there is yet to be any credible sign of anything really bad coming from all those vaccinations. There is also no evidence of autism rising from increased vaccination. (Note that I didn’t lump autism with “anything really bad” because autism is not bad. I know. I know. Antivaxxers won’t believe this.)

Anyway, as I told you in the opening paragraph, when you choose your cases and controls, you have to choose them at random is possible, and without any kind of bias as to how to classify them, except for whether or not they’re cases or controls. You don’t say, “Okay, you are a case if you have the disease and…” There is no “and.” You’re a case if you have the disease. You’re a control if you don’t have the disease. Injecting any more requirements triggers a bias. Again, if I think that A causes B, and I only choose cases who have been exposed to A, how do I know that B isn’t caused by something else if I left anyone exposed to that something else outside of my study?

But that is exactly what these people did. From their methods section, with my emphasis:

“To locate the initial cases of a diagnosis that fell within the HKSoC spectrum (ICD-9 code: 314.xx), including the following subtypes: attention deficit disorder without mention of hyperactivity (314.00), psychiatrically known as ADD; attention deficit disorder with hyperactivity (314.01), psychiatrically known as ADHD; hyperkinesis with developmental delay (314.1); hyperkinetic conduct disorder (314.2), and other specified manifestations of hyperkinetic syndrome; (314.8) and unspecified hyperkinetic syndrome (314.9), the outcome files were examined. This included both outpatient and inpatient diagnoses. When multiple cases of HKSoC umbrella in a child were discovered, only the initial one was used. Table 1 summarizes the year of birth of the children diagnosed with HKSoC identified in the present study. Among those children diagnosed with an HKSoC identified in Table 1, only children where the HKSoC diagnosis came after they received a HepB vaccine were allowed in the analyses. This step was incorporated to be sure of the necessary temporal cause and effect relationship.”

Read that again. Only children where the diagnosis came AFTER the exposure were allowed into the analysis.

“But Hepatitis B vaccine is given immediately after birth, of course the diagnosis will come after the exposure?” you ask. Well, not for everyone. And even if those numbers are relatively small, it is important to know if and how many cases of HKSoC are occurring in the yet-to-be-vaccinated or unvaccinated.

Then this is how they got their controls (children without the HKSoC diagnosis), with my emphasis:

“To find control children who did not have an HKSoC diagnosis and only a low probability of getting that diagnosis later as they were followed-up, the control children had to be enrolled continuously from after birth up until they were at least 7.55 years old (mean age of initial HKSoC diagnosis + SD of mean age of initial HKSoC diagnosis). When this rule was applied, it left a group of control children numbering 20,584, with males = 10,303, females = 10,281, and male/female ratio = 1.002. Their year of birth ranged from 1991 to 1993. Thus, the exclusion criteria for the control children (those without a diagnosis of HKSoC) were lack of continuous enrollment, lack of record of the child’s gender, and an age of less than 7.55 years. The year of birth of the control children used in the analyses are summarized in Table 1.

Reading how they got cases and how they got controls, do you think that they’re cases and controls are comparable in their chances of making it into the study? No. The answer is no. Then, if you remember what I said about the Nobel Prize, and what I’ve told you about the vaccine, and what we see about HKSoC diagnosis… What can you conclude about the vaccine and the chances of diagnosis?

Yes, the older you are, the more likely you are to have been diagnosed if you were going to develop HKSoC. And, the older you are, the more likely you are to have been exposed to more thimerosal because you’re more likely to have all three doses of the vaccine. But these “researchers” didn’t control for that. They didn’t even take into account that maybe there were cases out there who were vaccinated post-diagnosis. By inserting that requirement that you had to be diagnosed AFTER AND ONLY AFTER being vaccinated, they wiped out an entire universe of possible cases that we don’t get to know in this “research.”

So, do exposed cases and unexposed cases have an equal chance of making it into the study? No, because of the requirement of being vaccinated before the diagnosis. And do exposed controls and unexposed controls have an equal chance of making it into the study? No, because most children will be vaccinated (exposed) by age 7, raising the chances of the exposed to be in the study more so than the unexposed.

To seal the deal of doing epidemiology and biostatistics in a really weird way, the only statistical analysis the authors report doing is a Fisher’s Exact Test, which is fine for 2×2 tables, but it’s not really as good as linear regression, especially with all the possible covariates that exist in the databases (e.g. age, gender, ethnicity, location). They did a Fisher’s on the gender strata, but that is as far as they controlled for covariates.

Really sloppy work.

So what do we have in this new anti-vaccine piece of propaganda disguised as “research”? We have:

  • Authors with serious conflicts of interest.
  • An author whose previous “research” has been retracted because it was just as bad as this one.
  • An author who conducted very questionable “medicine” on children and lost his medical license for it.
  • An author who was found guilty of practicing medicine without a license.
  • And funding by an anti-vaccine foundation, the Dwoskin Family Foundation.

So does the Hepatitis B vaccine cause HKSoC? Not according to this analysis, and using this analysis to say that it does is irresponsible at best and dangerous at worse due to the seriousness of a hepatitis B infection if the vaccine is not taken/given. I cannot tell you what the intentions of the authors were in writing this and submitting it to publication. I can also not tell you why it got published. But I can tell you that, based on the authors’ previous and current anti-vaccine projects, my money’s on them trying to associate yet another vaccine with yet another neurological condition. There’s money in it, after all.

(Featured image via Spry on FlickrCC BY-NC-ND 2.0)

Where do you begin to understand Zika?

It’s all the rage these days to get worked-up about Zika. Just like last year with Ebola, this year we’re freaking out over a disease from “over there” coming “over here” and hurting Americans. Also, the observed association between Zika infection in pregnancy and microcephaly is scaring the crap out of people. (It’s really scaring the far-right, anti-abortion people because women will start thinking of abortions as an alternative to having a microcephalic or anencephalic child.)

A letter went out to students and faculty at the school the other day asking for students to help do research to pin down the incubation period of Zika. The incubation period is the time from initial exposure/infection to the time of initial symptoms. It’s somewhat hard to pin down this time with Zika because it is transmitted primarily by mosquitoes. If you go to a place teeming with mosquitoes, it’s hard to figure out if yesterday’s exposure lead to infection, or the one from last week.

This problem is the same one we see with foodborne diseases. We eat a few times a day, so out opportunities for exposure are many, and they are continuous. But we figure out the likely culprit when different people start reporting the same exposure, e.g. eating the same food at the same event or from the same restaurant. So what do you do when the exposure is mosquito bites all the time, every day, all over the place?

For those, you look at people who travel into and then out of the areas with heavy mosquito presence and then got sick. You determine when the last day they were there was and count from there to get the soonest incubation time, then you determine when they arrived at the endemic area and get the latest incubation time. You this over and over again with as many travelers as possible, and then you figure it out.

As it turns out, Africa, Asian and South American researchers have done this. Even some European scientists who responded to a large outbreak in French Polynesia in the South Pacific have come out with a good estimate of the incubation period. They all agree that it’s between 5 days and two weeks, and that the disease lasts about one week (as long as two weeks). However, for some reason, the school is recruiting students to do a literature review to figure this out. (I cheated. I contacted tropical disease epidemiologists who’ve already done the work.)

And this is the thing about epidemiology education in the United States. As I mentioned before in “The Two Kinds of Epidemiologists“:

“The research and academic epidemiologist looks at a public health problem and designs a study to better understand it. He or she makes sure that the measurements are valid and that the information collected from the study is reliable. They take good care to choose the subjects carefully so as to not introduce bias into the study. With data in hand, they test several hypotheses about the mechanisms that cause whatever disease or condition that they’re studying. They use the “dark arts” — as one frequent reader/commenter has called biostatistics — to make sure that their observations are not due just by chance, or that they’re not being influenced by things seen or unseen. Finally, they put all of their findings in a research article and get it published at one of many reputable scientific journals.”

And then all that information sits in a journal, waiting to be used. Take, for example, the story of Brian Foy. From the Washington Post:

“Brian Foy, a researcher who studies mosquito-borne diseases, said in a 2011 paper that he had found likely evidence of a little-known virus spreading through sex. If true, it would be the world’s first documented case of sexual transmission of the virus, he said at the time.

Foy wanted to study it further, but no one would give him the funding he needed: He had found just one example, and the virus — known as Zika — was too obscure, he was told.”

I read that the other day, and my head exploded.

She gets me.

My head exploded because it was yet another example of how public health is failing to take all that knowledge into action. In 2010, two researchers published a really good paper on “Present and Future Arboviral Threats.” (Arboviruses are viruses transmitted by arthrobods. ARthropods + BOrne + VIRUS, get it?) They wrote:

“Perhaps the greatest health risk of arboviral emergence comes from extensive tropical urbanization and the colonization of this expanding habitat by the highly anthropophilic (attracted to humans) mosquito, Aedes aegypti. These factors led to the emergence of permanent endemic cycles of urban DENV and chikungunya virus (CHIKV), as well as seasonal interhuman transmission of yellow fever and Zika viruses.”

Had I been sitting at the White House and read that paper, I would have convened a panel from CDC, NIH, etc., to come up with an immediate plan to survey for these infections the world over and both track them and combat them… Years before they came to the United States.

But that’s the thing. There are plenty of us doing research and writing papers, and few of us working to put that knowledge into action. What is worse is that those of us who are working on it are not communicating well with each other. We either want to be protective of our work, or we just plain don’t know how to communicate our work. (Look at how we fail miserable to communicate the importance of vaccination in, say, Orange County, California. Effective communication would make anti-vaccine advocates be no more credible by the general public than people who believe the Earth is flat.)

We desperately need a Neil deGrasse Tyson of Public Health.

So the answer to the question I posed in the title is not “in the literature,” to be honest. The answer is “from each other… And now, before the next thing comes.”

“This is the time when things must be done before their time.” – 1949, Vol 5. No II of the Bulletin of the Atomic Scientists.