It’s the age-old question of laboratory test and analyses, “How accurate is this?” The answer to this question is always, “It depends…” This answer is then followed by some lengthy explanation of what is best for the person being tested. When it comes to individual medical decisions, these discussions are best when had by a healthcare provider and the patient, not the patient and Google. But what about a question at the population level?
Take, for example, influenza surveillance. When I started working at a state health department, one of the first things I did was to reach out to clinical laboratories and ask that they provide the number of rapid influenza tests and their results. This would help me inform the public and public health workers of when and where influenza was active. But I had to keep in mind the performance of these tests as well as the prevalence (the existing cases of a disease) of influenza in the places where the tests were being done.
The rule of thumb is: If prevalence is low, then false positive rate will be high. If prevalence is high, then the false negative rate will be high. It’s all based on math and how that math breaks down on a 2×2 table based on a test’s sensitivity and specificity. Sensitivity is the probability that the test will detect a disease when the disease is there. Specificity is the probability that the test will be negative when there is no disease.
Let’s say that a test is 99% sensitive and 99% specific. That’s pretty good, right? It will catch 99% of all true cases with a positive test, and it will rule out 99% of non-cases with a negative test. If you have ten minutes, here’s how I explain it…
If you don’t have the ten minutes, then just know that there are four categories being looked at: TRUE positives, FALSE positives, TRUE negatives and FALSE negatives. As prevalence increases, the chance that a positive test is true increases. You have more true positives. The chance of a false positive decreases. Likewise, the chance of a negative result being a true negative decreases as prevalence increases.
So we go back to the question of what you want to achieve… If you are a physician and you want to catch the most number of cases, then you want the patients that you’re testing to be in a group with the highest prevalence. This is why healthcare providers will ask you all sorts of questions before you get tested. They want to make sure you fall into the categories for testing that will yield the highest POSITIVE PREDICTIVE VALUE. They want that positive test to have the highest chance of being a true positive. They also want to miss the fewest number of cases possible by increasing the chances that a negative test is negative, or having the highest NEGATIVE PREDICTIVE VALUE. There is a “sweet spot” when it comes to prevalence where this happens, but that’s for a whole other lecture.
Now, if you are an epidemiologist working an Ebola outbreak, then you don’t want to have false negatives that end up being sent home to infect others. You want that number low. Do you care about false positives? Well, maybe not if the therapy won’t kill someone, or maybe you do if a positive test means being put into a ward with people who are sick. It’s a delicate balancing act.
What about pregnancy tests to take at home? You probably don’t worry too much about false negatives (pregnant women who test negative) because those women will still be pregnant and probably take the test again if they continue to miss their period or feel other signs/symptoms of pregnancy. And you maybe care about false positives because a positive test means a trip to the obstetrician, blood work, and (if you’re anything like me) an ensuing panic of epic proportions for the would-be dad.
If you’re me and you just want to keep tabs on flu activity, you don’t say that the flu has arrived based on a screening test. You use a gold standard test for influenza, like a viral culture or a polymerase chain reaction test. Once the gold standard is positive, then you know the virus has arrived, and the chances of screening (aka “rapid”) tests being true influenza cases rise to tolerable levels. Once you stop seeing positives on gold standard tests, or you see that a lot of the rapid tests were in people without symptoms, then you stop using it as a marker of influenza activity.
Again, it’s all a balancing act. It’s kind of like the justice system. You want the chances of an innocent person going to jail to be as low as possible, so you set up all sorts of systems. You also want the chances of a guilty person to be as high as possible to protect the population from criminals, so you set up those systems. You’re still going to have innocent people going to jail and criminals getting out, but it’s all about minimizing it. (Don’t get me started on how the current justice system in the United States is failing at this.)
Now you know why a test that is 99% accurate (99% sensitive and 99% specific) is still going to throw out a lot of false positives or false negatives, because it’s about prevalence. If you’re a healthy person in the middle of the summer in the United States, and you haven’t traveled abroad or work with pigs/chickens, then you probably will not get tested for the flu. There’s a very high chance that you’ll test positive when you’re not. On the other hand, if you’re feeling miserable, it’s the middle of winter in the United States, and you have been around other sick people, then you have a very low chance of testing negative when you are indeed sick.
These are the kinds of things that one needs to think about very, very carefully when using a screening test of device. But you also need to think about the population you’re testing in general, the individuals you’re testing in particular, how they would benefit or be hurt by the test results, and whether or not you should just use the gold standard or diagnostic (not screening) test instead if your degree of suspicion is high enough to warrant it.
What worries me is a researcher who sees too many false positives or too many false negatives and gets all riled up over them without seeing the bigger picture. Maybe, in the situation you are describing, too many of either is not bad. Maybe the proportion of each (i.e. the Positive/Negative Predictive Value) is really what you should be worried about? Context matters when dealing with these things. And context is something epidemiologists need to have in mind when interpreting results of their research, especially if they’re calling for any kind of action.
Don’t you love thinking of all the possible scenarios?