Statistics can be misleading and sometimes mind-boggling. They can be used to hide the true nature of a relationship, and stated in such a way that falsely supports or detracts from a certain position.
Here, I will explore some common pitfalls about the meaning of statistics. I am not an expert in statistics. I’ve taken the usual college courses in classical statistics, and later on, was trained in geostatistics which differs from classic statistical methods in that one has to pay attention to the context, not just the pure mathematics. (For any statisticians out there, geostatistics is akin to Bayesian methods of statistical analysis).
In this article, I present three examples of how statistics can be misleading.
1. Relative risk versus absolute risk
Relative risk is the risk in relation to something else. It can be scary, but it tells you nothing about the actual risk. Absolute risk is simply the probability of something happening. For example, in newspaper stories we frequently read something like this: If you use substance X, you double your chances of contracting dread condition Y. That’s a relative risk.
Let’s say that the incidence of condition Y in the general population is 1 in 100,000. Among long-time users of substance X, the incidence of condition Y is 2 in 100,000. The relative risk says you double your chances of getting Y; sounds scary. But the absolute risk or chance of contracting condition Y rises from a risk of 0.00001 to a risk of 0.00002. Not so scary.
The reasoning in this example can be applied in reverse. For example, have you seen claims that dietary supplement X or drug Z cuts the incidence of condition Y in half? Again, this is relative risk, while the real benefit might actually be very, very small.
P.S. to this section. According to the Arizona Lottery website, your chances of winning the Powerball jackpot are 1 in 195,249,054. Written as a decimal, the chance of winning is 0.000000005. Since this number is very close to zero, a cynic might say your chances of winning are almost the same whether or not you buy a ticket. (But you can double your chance of winning if you buy two tickets!) Risking a dollar on Powerball is a good bet only when the jackpot exceeds $195,249,054 because then the reward equals the risk.
2. What are the odds?
(Adapted from an essay by Tom Siegfried, Science News, 27Mar2010)
Let’s pose a hypothetical example and say that our favorite baseball player, “Slugger Bob” is one of a group of 400 players that were tested for steroid use, and that Slugger Bob tested positive. We will stipulate that the test correctly identifies steroid users 95% of the time. The test also has a 5% incidence of false positives. So, what are the chances that Slugger Bob is a steroid user?
Most people might say that there is a 95% chance that Slugger Bob is a steroid user, and perhaps classical statistics would agree.
But here is where the real world collides with classical statistics and where context matters. Let’s say that we know from prior testing and other experience that about 5% of all baseball players are actually steroid users. We would expect, therefore , that out of the 400 players, 20 are users (5 percent) and 380 are not users.
Of the 20 users, 19 (95 percent of 20) would be identified correctly as users.
Of the 380 nonusers, 19 (5 percent false positives) would incorrectly be indicated as users.
If 400 players were tested under these conditions, 38 would test positive. Of those, 19 would be guilty users and 19 would be innocent nonusers. So if any single player’s test is positive, the chance that he really is a user is 50%, since an equal number of users and nonusers test positive.
3. Clusters and Patterns
Geologists look for patterns in nature because patterns can give clues to special structural situations and mineral deposits. Other kinds of clusters or patterns may be of concern also, for instance, the apparent high incidence of childhood leukemia at Fort Huachuca. Such clusters must be investigated to see if a cause can be identified. But sometimes, such clusters occur just by chance.
In the figure below, we see an array of red dots superimposed over a geologic map of Arizona. Let’s say for now that the dots represent high copper values obtained from assays of stream sediment samples. The array of dots show some apparent clusters and patterns which may indicate some cause of interest.
What might catch a geologist’s eye is the line of dots extending from the northwest to the southeast, exactly along the Mogollon rim which is a structural separation between the lowlands of the southwest and highlands of the Colorado Plateau. There also is a cluster near Ajo, site of a copper deposit. There is a dot near Rosemont, another copper deposit, and a dot in the Galiuro Mountains, again an area with copper deposits. Dots also occur near uranium and coal deposits on the Colorado Plateau and near gold deposits in western Arizona. So, do the dots actually have significance? No, the dots are not copper assays; they represent random numbers. On my computer I generated 100 random numbers and normalized them to values between 1 and 100. I used the first 50 as the X-coordinates and paired them with the second 50 for the Y-coordinates and made a scatter plot of the data. The dots have no significance at all. The patterns occur just by chance, but our rationalizations can give them meaning when there is none.
So, be skeptical. Nothing can be proved with statistics, but sometimes statistics help us look in the right direction.