Prepared by J. Laurie Snell, Bill Peterson, Jeanne Albert, and Charles Grinstead, with help from Fuxing Hou and Joan Snell.
We are now using a listserv to send out Chance News. You can sign on or off or change your address at this Chance listserv. This listserv is used only for mailing and not for comments on Chance News. We do appreciate comments and suggestions for new articles. Please send these to:
jlsnell@dartmouth.edu
The current and previous issues of Chance News and other materials for teaching a Chance course are available from the Chance web site.
Chance News is distributed under the GNU General Public License (so-called
'copyleft'). See the end of the newsletter for details.
Our new Constitution is now established, and has an appearance that promises permanency; but in this world nothing can be said to be certain, except death and taxes.
Benjamin Franklin (1706-90) in a letter to Jean-Baptiste
Leroy, 1789.
Contents of Chance News 11.01
1. Forsooth.
2. Court statistics.
3. St. John's wort vs Placebo for depression: A not so definitive
definitive clinical trial.
4. Height and earnings: Walk tall.
5. New book uses statistical methods to analyze avant-garde art.
6. Fingerprints
7. Are you suffering from security obsession syndrome
8. The violence connection.
9. Error in study on soot in air and deaths--know what your statistical
package does!.
10.Gerd Gigerenzer's new book: Calculated Risks: How To Know When Numbers Deceive You.
11. Experts strive to put diseases in proper perspective.
12. Robert Johnson sends a probability problem from Dean Koontz's novel From the Corner of His Eye
13. Another great "Beyond the Formula Statistics Conference" August 8,9 2002.
14. Edward Kaplan sends a correction and two interesting articles.
15. Don Poe comments on Marylin's discussion of tossing coins in sports.
Here are some Forsooth items from the May 2002 issue of RSS News.
Richard Davidson, European strategist at Morgan Stanley, says the correlation between Europe and the US stock markets is now 0.75; - in other words, for every 10 per cent the S&P 500 moves, the Eurotop 300 moves 7.5 per cent.
Financial Times, 12 January 2002
Voters consider Labour's image to be sleazier than that of the Tories, according
to a survey. The Sunday Times poll said 60% thought Labour gave the impression
of being 'sleasy and disreputable' compared to 40% who thought that of the Tories.
Ceefax (BBC) p122, 17 February 2002
Note: The poll referred to here was carried out by the British internet polling organization YouGov . Those who register on their web site and answer a poll are paid 50p for each poll or survey the answer. On some polls they are also eligible for a prize. For example, on the current poll, which happens to be on the British National Lottery, they can win a £500 (or one of the two £50 runner-up prizes).
In the survey related to the above forsooth item they asked: "Do you agree or disagree with the following statement: The Conservatives these days give the impression of being very sleazy and disreputable; " and then asked the same question about Labour.
DISCUSSION QUESTIONS:
(1) Why the Forsooth!?
(2) What do you think about this kind of polling?
The first article discusses the ruling of the Court of Appeals for the Ninth Circuit U.S. that the words "under God" in the Pledge of Allegiance were unconstitutional. It discusses why this court is so often "wrong". It begins with:
Over the last 20 years, the Court of Appeals for the Ninth Circuit has developed a reputation for being wrong more often than any other federal appeals court.
It then discusses two possible explanations: (a) the court is especially liberal and (b) it is unusually big.
Court statistics is a letter to the editor about this explanation from a Judge on the Ninth Circuit Court.
To the Editor:
Re "Court That Ruled on Pledge Often Runs Afoul of Justices" (front page, June 30), about the United States Court of Appeals for the Ninth Circuit:
If a fallible human being has a 1 percent error rate and does 100 problems, he will get 1 problem wrong. If he does 500 problems, he will get 5 wrong. If a second person does only 100 problems, he will make four fewer mistakes than a person who does 500 problems. This does not make him more accurate.
In the calendar year 2001, the Ninth Circuit terminated 10,372 cases, and was reversed in 14, with a correction rate of 1.35 per thousand. The Fourth Circuit, reputedly the most conservative circuit and the circuit with the second-largest number of cases reviewed by the Supreme Court, terminated 5,078 cases and was reversed in 7, making a correction rate of 1.38 per thousand.
JOHN T. NOONAN JR.
U.S. Circuit Judge, 9th Circuit
San Francisco, July 1, 2002
In 1993 Congress established the National Center for Complementary and Alternative Medicine (NCCAM) within the National Institute of Health (NIH) for the purpose of supporting clinical trials to evaluate the effectiveness of alternative medicine.
The news release and the JAMA article report the results a study designed to see if the herb, St. John's wort, is effective in treating moderately server cases of depression. It was a $6-million study and the first multi-centered clinical trial funded by the National Center for Complementary and Alternative medicine, the National Institute of Mental Health (NIMH) and the Office of Dietary Supplements (ODS). In the October 1, 1997 NIH news release announcing the study, the Director of the NIMH stated:
This study will give us definitive answers about whether St. John's wort works for clinical depression. The study will be the first rigorous clinical trial of the herb that will be large enough and long enough to fully assess whether it produces a therapeutic effect.
In the 9 April, 2002 NIH news release we read:
An extract of the herb St. John's wort was no more effective for treating major depression of moderate severity than placebo, according to research published in the April 10 issue of the Journal of the American Medical Association. The randomized, double-blind trial compared the use of a standardized extract of St. John's wort (Hypericum perforatum) to a placebo for treating major depression of moderate severity. The multi-site trial, involving 340 participants, also compared the FDA-approved antidepressant drug sertraline (Zoloft®) to placebo as a way to measure how sensitive the trial was to detecting antidepressant effects.
The Coordinating Center for the St. John's wort trial was led by Dr. Davidson
and his colleagues at the Duke University
Medical Center, Department of Psychiatry and Behavioral Sciences, the Duke
Clinical Research Institute, and the Research Triangle Institute.
Lilly's Prosac and Pfizer's Soloft are trade names for two of the most popular drugs approved by the FDA to treat depression. The article uses the technical names "setaline" for Soloft and "hypericum perforatum" for St. John's wort but we often replace these by the popular names in discussing their results.
Here is an abbreviated summary of the design of the clinical trial as presented in the JAMA article:
Context Extracts of Hypericum perforatum (St John's wort)
are widely used for the treatment of depression of varying severity. Their efficacy
in major depressive disorder, however, has not been conclusively demonstrated.
Objective To test the efficacy and safety of a well-characterized
H perforatum extract (LI-160) in major depressive disorder.
Design and Setting Double-blind, randomized, placebo-controlled trial conducted in 12 academic and community psychiatric research
The outcome measures use scores based on a number of different self assessment and clinician assessment tools indicated an acronyms such as HAM-D, CGI-I, GAF, CGI, BDI, SDS. For all by the GAF low scores are best. :
Efficacy End Points The prospectively defined primary efficacy measures were the change in the 17-item HAM-D total score from baseline to week 8 and the incidence of full response at week 8 or early study termination. Full response was defined as a CGI-I score of 1 (very much improved) or 2 (much improved) and a HAM-D total score of 8 or less. Partial response was defined as a CGI-I score of 1 or 2, a decrease in the HAM-D total score from baseline of at least 50%, and a HAM-D total score of 9 to 12. Secondary end points comprised the GAF, CGI, BDI, and SDS scores.
The authors give the following graph of the weekly HAM-D total scores:
Graphed values are means with vertical bars extending to one SD.
The authors report that the linear trends with time did not differ significantly by treatment (St. John's vs. placebo p = .59, Zoloft vs. placebo p = .18)
The results of the three treatments as related to clinical response were:
Response |
St. Johns wort (n = 113)
|
Placebo (n=116)
|
Zoloft (n=109)
|
Any response |
43.0 (38.1%)
|
50.0 (43.1%)
|
53.0 (48.6)%
|
Full response |
27.0 (23.9%)
|
37.0 (31.9%)
|
27.0 (24.8%)
|
Partial response |
16.0 (14.2%)
|
13.0 (11.2%)
|
26.0 (23.9%)
|
No response |
70.0 (61.9%)
|
66.0 (56.9%)
|
56.0 (51.4%)
|
The authors report that there were so significant differences in full response between St. John's wort and placebo (p= .21) or between Zoloft and placebo (p = .26).
Finally the authors give the estimated means for the improvements as measured by the assessments scores. Lower scores denote greater improvement accept for GAF.
Summary measure
|
St. John's wort
|
Placebo
|
Zoloft
|
St John's wort |
Zoloft |
HAM-D total score |
-8.68
|
-9.20
|
-10.53
|
.59
|
.18
|
GAF score (week 8 - baseline |
12.31
|
11.76
|
14.16
|
.75
|
.17
|
BDI total score (week 8 - baseline) |
-7.84
|
-6.83
|
-8.75
|
.43
|
.15
|
SDS total score (acute exit - baseline) |
-3.34
|
-3.19
|
-4.01
|
.88
|
.39
|
CGI-S total score (week 8 - baseline) |
-1.07
|
-1.19
|
-1.47
|
.48
|
.11
|
CGI-I score |
2.50
|
2.45
|
2.04
|
.75
|
.02
|
On all but one of the measures neither St. John's wort or Zoloft were performed significantly better than placebo. The exception was that Zoloft performed significantly better than placebo on the CGI-I score.
The authors state their conclusion from this study as:
Conclusion This study fails to support the efficacy of St John's wort in moderately severe depression. The result may be due to low assay sensitivity of the trial, but the complete absence of trends suggestive of efficacy for St John's wort is notewworthy
This conclusion and the headline of the NIH news release Study shows St. John's wort ineffective for major depresson of moderate severity led to similar headlines in the newspapers. Given the authors findings of no signficant differences in the primary measures of successone could just as well replace St. John's wort by Zoloft in both the author's conclusion and the NIH headline.
Of course the NIH should know better than that author's conclusion that the study failed to support the efficacy of St John's wort is not the same as saying that St. John's wort is ineffective. Since studies have shown that Zoloft is more effective than placebo (see for example "Sertrline safety and efficacy in major depression," Biological Psychiatry, 1995) and the same is true for St. John's wort, ( "St. John's wort for depression--an overview and meta-analysis of randomised clinical trials" British Medical Journal 1996;313:253-258), the most likely explanation for the study failing to show that study did not have sufficient power to detect the differences.
The way this study was reported makes one worry about the role of subjectivity in presenting the results of a study. The close connections that researchers have these days with the subject of their research makes this especially worrisome. For examples from the Financial Discosures section of the study we find the following financial disclosure for the primary investigator Johathon R.T. Davidson
Financial Disclosure: Dr Davidson holds stock in Pfizer, American Home Products, GlaxoSmithKline, Procter and Gamble, and Triangle Pharmaceuticals; has received speaker fees from Solvay, Pfizer, GlaxoSmithKline, Wyeth-Ayerst, Lichtwer, and the American Psychiatric Association; has been a scientific advisor to Allergan, Solvay, Pfizer, GlaxoSmithKline, Forest Pharmaceuticals Inc, Eli Lilly, Ancile, Roche, Novartis, and Organon; has received research support from the National Institute of Mental Health (NIMH), Pfizer, Solvay, Eli Lilly, GlaxoSmithKline, Wyeth-Ayerst, Organon, Forest Pharmaceuticals Inc, PureWorld, Allergan, and Nutrition 21; has received drugs for studies from Eli Lilly, Schwabe, Nutrition 21, PureWorld Botanicals, and Pfizer; and has received royalties from MultiHealth Systems Inc, Guilford Publications, and the American Psychiatric Association.
Other investigtors for the study have similar financial disclosures.
(1) The authors say:
Sample-size calculations were based on detecting a difference in full-response rates at 8 weeks, assuming full-response rates of 55% for hypericum and 35% for placebo. Accordingly, a sample size of 336 patients (112 per group) was specified to ensure 85% power with a type I error rate of 5% (2-sided).
Does this suggest that the study should have had sufficient power to detect signficant differences between the performance of St. John's wort and placebo and Zoloft and placebo?
(2) Since the authors introduced Zoloft only to evaluate the study's sensitivity so they did not include a comparison of the performance St. John's wort and Zoloft. If they had what do you think the result would have been?
(3) The desing of the study, "Sertraline safety and efficacy in major depression: a double-bline fixed-dose comparison with placebo", Fabre et.al, Biological Psychiatry, 1995;38 592-602, is quite similar to the present study and yet it did show a significant difference between Zoloft and placebo. Look at this study and see if you can explain how the difference might have resulted.
Al Gore was taller than George W. Bush, but in 10 of the last 13 presidential elections the taller candidate has won. The article features a picture of Abe Lincoln, our tallest president, along with a series plot comparing the heights of US presidents with the average heights of white males at the corresponding points in history. It does appear that presidents tend to be taller than average.
In the past, research has indicated that height is also an advantage in the workplace. Three University of Pennsylvania economists recently investigated this phenomenon using two large datasets, Britain's National Child Development Study and America's National Longitudinal Study for Youth. You can download the full 38 page pdf version of their research report. As expected, height and earnings were positively associated. Overall, for white British males, an additional inch of height was associated with a 1.7% increase in wages; the corresponding advantage for white American males was 1.8%. In both countries, the shortest quarter of workers averaged 10% lower earnings than the tallest quarter.
The intriguing twist in the new study is suggested by the article's subtitle: "Why it pays to be a lanky teenager," The researchers discovered that most of the earnings advantage can be attributed to height at age 16. Differences in height later in life or in childhood do not add much additional explanatory power. The researchers proposed social adjustment as a possible explanation. Taller teens may have an easier time participating in athletics and other activities, where they develop positive self-esteem and social skills that contribute to later success.
You can also listen to an interview from ABC Perth (Australia), with Andrew Postlewaite, one of study's three authors. Postlewaite explains that the study focused on white males to prevent confounding with gender and race effects. He reports that a height effect was also found for women, but that the teenage height differences do not completely explain adult earnings differences. For black males, the height effect was also observed, but the sample was not large enough to give statistical significance.
(1) The article concludes with the following observation: "The differences in schools and family backgrounds of tall and short youths are tiny compared with those of white and black youngsters. If a teenage sense of social exclusion influences future earnings, it may have great implications for youngsters from minority groups." What implications do you see for minority groups?
(2) (inspired by Rick Cleary) If the taller candidate has won ten of the last thirteen elections, what do you suspect might be true of the last fourteen? What are the implications for assessing whether there really is a height effect?
What happens when statistics meets art history? University of Chicago economist David Galenson found out that a quantitative perspective isn't always appreciated. His new book Painting Outside the Lines: Patterns of Creativity in Modern Art (Harvard University Press) is a statistical analysis of creativity in avant-garde painting. He tried to publish his research in art journals, but he reports that editors rejected his papers without even sending them out for review.
Using regression analysis, Galenson identified two distinct forms of innovation: "experimental" and "conceptual". The innovations of experimentalists, such as Paul Cezanne, evolve gradually over time, as new techniques are developed and refined. By contrast, artists like Pablo Picasso seem to innovate in quantum leaps. Using the language of market economics, Galenson views artistic creativity and skill as goods, for which artists are compensated with recognition in their community. Many years later, such recognition translates into value placed on the paintings (the artists themselves presumably may never see this). Galenson found that the highest valuations for experimentalist paintings tended to be for work produced towards the end of the artist's career, whereas conceptualists tended to produce highly valued work earlier.
Predicting that critically acclaimed works will eventually command higher prices may not sound controversial. The real shock comes from Galenson's claim that by looking at price data he is able to use his model to "predict what the artist said about his work and how me made preparatory drawings." This may be too radical a conceptual shift for the art community. University of Kentucky art historian Robert Jensen explains that scholars in his field see their work as "the study of incommensurable art objects, these unique things in space and time... We've lost the capacity to generalize about the whole history of art."
People want to be viewed as individuals, not as "statistics", which can lead them to resist findings from epidmiological research. How does this compare with the reaction that art historians are having to Galenson's work?
As a result of this ruling, lawyers may request a "Daubert Hearing" if they believe that the above standards should apply in a given case, but one or more of the criteria have not been met. This is precisely what Mitchell's defense lawyers did, arguing that fingerprint matching did not meet the standards--particularly number 5. Although Pollak initially sided with Mitchell, on appeal he reversed his decision. Nevertheless, the reliability of fingerprint evidence has never received such public scrutiny and criticism. To this day there is no accepted estimate for the probability that a portion of one person's fingerprint matches a portion of another person's print, nor is there a known error rate in the ability of finger print experts to identify a match.
Perhaps the most influential figure in the history of fingerprints is Sir Francis Galton. To this day, the portions of a fingerprint deemed the most important for identification purposes are called "Galton characteristics" or "Galton minutiae". There are three main ridge pattern types that appear on fingers: loops (~65%), whorls (~30%), and arches (~5%), and these are further divided into subcategories. Roughly speaking, the minutiae are features at the borders of the above types and subtypes, and are formed when ridges are added or lost. For example, a ridge line may end abruptly, fork into two or more lines, or join another line via a bridge. The presence (or absence, for that matter) of Galton characteristics has been at the heart of virtually all fingerprint identification methods.
Galton was also the first to quantify the uniqueness of fingerprints. In an attempt to determine the chance that two fingers exhibit the same overall pattern, he proceeded as follows. Using enlargements of fingerprint photographs, he let square pieces of paper of various sizes fall at random onto the photographs. Galton then attempted to fill in the squares, using only the ridge patterns that he could see at the edges. His goal was to determine the size of a square so that he could correctly infer the ridge pattern inside with probability .5. Through repeated experiments he determined that the appropriate size was five ridges on a side. There were 24 such regions per enlargement, so assuming independence of the square regions, he calculated that the chance of obtaining a specific print configuration, given the surrounding ridges, was P(C/R) = (1/2)^24.
Galton next somewhat arbitrarily determined that P(R) = the chance that the
surrounding ridge pattern would occur = (1/16)(1/256); the first factor estimates
general fingerprint pattern type, and the second estimates the chance that the
appropriate number of ridges enter and exit the 24 regions. Using Galton's model
it follows that the probability of finding any given fingerprint is
P(FP) = P(C/R) P(R) = [(1/2)^24](1/16)(1/256) = 1.45 * 10^(-11).
In Galton's day the world population was approximately 1.6 billion, yielding
16 billion fingers. Using the above figure he concluded that given a particular
finger, the chance of finding another finger with the same ridge pattern was
1/4.
A more recent approach {Osterburg et al, JASA 12/1997, vol. 72, No. 360} uses a similar overall method: a fingerprint is overlaid with a grid of one millimeter squares and each cell is either empty or contains one or more minutiae (13 possibilities in all.) The difference here is that probabilities for each of the 13 possible "states" for the cells are carefully estimated through examination of actual prints. For example, an empty cell is by far the most likely (76.6%), while a fork has observed frequency 3.82%. Again assuming independence (this assumption is discussed later in the article), the probability of a particular configuration may be found by multiplying the appropriate probability estimates.
Apparently most of the more recent studies have been ignored by the criminal justice community.
(1) Galton's model has been criticized by several authors (see below). Most
find fault with Galton's assignment of a 1/2 probability of a particular ridge
configuration given a surrounding, five ridge pattern. What do you think? What
happens if the region is shrunk to, say, one ridge square?
(2. Do you think Galton accepted the 1/4 figure?Do you?
3. The Mitchell case raised questions concerning the abilities of fingerprint "experts". (In fact in some studies a 22% false positive rate has been observed.) How would you devise a test to determine a person's ability to match whole or partial fingerprints?
4. Do you think fingerprint evidence should be admissible in court? Why or
why not?
Although violent crime has remained relatively rare in society, investments in home alarm systems and self-defense training are on the rise. Spectacular media coverage of crime may be partly to blame for fueling people's perception that they are ever more at risk. According to the article, one in seven people fear they will be mugged, while the British Crime Survey shows that only one in 200 will actually become a victim.
Psychologists have coined the term "security Obsession Syndrome" (SOS) for the new preoccupation with safety. People are literally worrying themselves sick. Health officials report that the stress associated with being constantly on guard ultimately leads to physical symptoms, such high blood pressure and cholesterol, backache, insomnia, and skin problems.
The article concludes with a number of recommendations to help people keep a healthy perspective on crime. One victims' support group urges people to be aware of their body langauge, because "We know from anecdotal evidence that people who look vulnerable are more likely to attract victims and become victims of crime."
(1) Over what time period do you think the one in 200 risk of being mugged
applies? Why does it matter?
(2) Do we ever "know" anything from anecdotal evidence? Can you suggest any way to get a more scientific perspective on the last point?
Children's groups have long warned of links between watching violent television programs and violent behavior in real life. Research by L. Rowell Huesmann, a social psychologist at the University of Michigan, confirms that these concerns are well-founded. Moreover, Huesmann found that the adverse effects now apply to girls as well as boys.
Huesmann's study began in 1977 with a group of 329 boys and girls, who were then in elementary school. The investigators interviewed the children, and developed a measurement scale for television viewing that combined both time spent watching and level of violence. Children in the top 20% were designated "heavy watchers." In 1992, the investigators followed up with new interviews of the subjects, who were now in their twenties. They also searched to find if any of the subjects had criminal records.
After controlling for such variables as socioeconomic status and intelligence, the investigators found that the "heavy watchers" had tended to become more aggressive adults. As described in the article, women who had been heavy watchers as children were twice as likely as the other women in the group to have hit or choked another adult in the last year. Men who had been heavy watchers were three times as likely to have been convicted of a crime. Among the women who had married, the heavy watchers were twice as likely to have thrown an object at their spouse in the last year. Among the men who had married, heavy watchers were almost twice as likely to have grabbed or pushed their spouse in the last year.
In 1999, Huesmann presented his case before a US Senate Committee on Commerce, Science and Transportation. You can download a pdf transcript of his testimony, "Violent video games: Why do they cause violence and why do they sell?" There he draws parallels between violent TV programming and tobacco. For example, he notes that both are widely distributed and attractive to children; both have been linked to undesirable effects in numerous longitudinal studies; and for both a dose-response relationship has been observed. Nevertheless, in both cases our inability to conduct a controlled experiment has allowed industry advocates to perpetuate doubts about any causal link.
The Health Health Effects Institute (HEI) was established in 1980 as an unbiased research group to look into the health effects of moter vehicle emmissions. Their funding is typically one half from the Environmental Protection Agency (EPA) and one half from the automobile companies. Their research has concentrated on the determining the effect on health of small particles called particulated matter (PM) in the air that we breath. These particles come primarily from industrial plants such as steel mills and power plants, automobiles and trucks, and coal and wood burners. Under the Clear Air Act established by Congress in 1970, the EPA has developed regulations limiting the amount of partculated matter in the air. The size of the particles envolved is measured in microns (a micron is a millionth of a meter). The regulations involve particulate matter up to 10 microns (PM10) called "course particles" and up to 2.5 microns (PM2.5) called "fine particles". 10 microns is about 1/7 the diameter of an average human hair, and 2.5 microns is about 1/30 the diameter of an average human hairwide.Both course and fine particles have been shown to cause health problems, but the fine particles are particularly dangerous because they easily enter the lungs. More information about particulate matter and the EPA regulations can be found here.
The studies that led up to the 1997l EPA regulations were typically time time-series studies on specific cities which identified an association between daily changes in concentration particulate matter and the daily number of deaths (mortality). They also showed increased hospitalization (a measure of morbitity) among the elderly for specific causes associated with particulate matter. However, they did not demonstate causality and the limitation to specific cities left questions about generalization to the general public. To help answer these questions the HEI carried out a large study involving the 90 largest cities. They included data on the amounts of other pollutants, the weather to determine if the health effects could have other explanations. The results of their study were reported in articles in the Royal Statistical Society, Series A, American Journal of Epidemiology, New England Journal of medicine and JASA. Detailed references can be found here. They concluded:
(1)Tthere is strong evidence of an association between acute
exposure to particulate air pollution (PM10) and daily mortality, one day later;.
(2) This association is strongest for respiratory and cardiovascular
causes of death.
(3) This association can not be attributed to other pollutants including NO2, CO, SO2 or O3 nor to weather.
(4) The average particulate pollution effect across the 90 largest U.S. cities
was a 0.41% increase in daily mortality per 10 microns per cubic meter of PM10.
Recently, while looking again at their data, Scott Zeger found that when he looked at data where he expected the sas program to give different results it did not.He looked into this here is his explanation of what was wrong as reported in the Science article.They were using a model, the Generalized Additive Model (GAM) which is a multivariate regression model particularly well suited to their problem of disentangling the role of the particles from other confounding factors. The software carried out an iterative process and was supposed to stop when the results did not differe by epsilon. The default value for epsilon was 1/1000. However, the researchers were dealing with tiny changes in daily rates and so should have used a smaller vaue of epsilon. When they changed epsilon to a much smaller number they found that they got slightly different results. Thus they had to redo all their calculations. When they did this they found that the .41% increase in finding (4) became a .27%. Thus they had a 34% decrease in their estimate of the overall increase in daily mortality per 10 microns per cubic meter of PM10.
The authors have re-analyzed the data and they report that results related to mortality reported in (1), (2), and (3) above remain valid. You can see the results of this re-analisis here.
The following graphic from this report
Maximum likelihood estimates and 95% confidence intervals of the log-relative rates of mortality per 10 microns per cubic meter increase in MP10 for each location. The solid squares with the bold segments denote the posterior means and 95% posterior intervals of the pooled regional effects. At the bottom, marked with a triangle is the overall effect for PM10 for all the cities, for the 88 U.S. cities (Honolulu and Anchorage are excluded).
Note that the overall effect is just barely signficant. The authors also present the following more convincing illustrating the results of a Bayesian analysis.
Marginal posterior distributions for the re-analyzed pooled effects of PM10 at lag 1 for total mortality, cardiovascular-respiratory mortality and other causes mortality, for the 90 U.S. cities. The box at the top right provides the posterior probability that the overall effects are greater than 0.
The Science article comments that this error gives new amunition to industry groups that have criticized the science behind the present federal air pollution rules. It also serves as a warning to researchers that they need to completely understand how a statistical package carries out its calcuations. Obviously, this experience of the John Hopkins researchers will lead other researchers to take a second look at their results if they used the SAS GAM program.
(1) Will this discovery give support to those who claim that caution should be used in Exel because of known errors in their calculations?
(2) The Times article writes
In the original analysis, the rise was .4 percent above the typical mortality rate for each jum of 10 micrograms of soot per cubit meter of air. In the new alaysis, the increase is half that.
What would you like to know to get a better idea about the risk to you of particulate particles? (See the next item).
There are two popular ideas as to why the general public has trouble understanding probability and statistics that they meet in their daily lifes. One is that people who imploy statistics often have clever ways to lie with statistics. Another is that statistical concepts are just too difficult for the general public to understand. The first has led to a number of books that tell you how people lie with statistics though none quite as good as Huff's classic book How to Lie with Statistics. The second has led to a number of books that try to teach the general public standard probability and statistical concepts the way they are taught in statistics books but using English instead of formulas when possible.
This book takes a different approach. The author suggests that the real problem is not that those report statististical results do no generally lie but simple do not present their results in the most informative manner. And those who try to explain probability and statistics do not do so in the way easiest for the public to understand. We found a striking example of this while researching our previous article. We went to the researcher's web site where we found an FAQ page related to the error in determining the risk of small particles.. Recall that the researchers had reported a relative risk of .41% for mortality which became .27% when the error was corrected. In the FAQ we read:
Question: But doesnt the decrease from 0.41% to 0.27% represent a 35% decrease in your assessment of the effect of PM10 on mortality?
Answer: Yes, this is one way to look at our result. However, the implications may be less than is implied by a 35% reduction in the relative rate. Although we use relative changes to measure the association of air pollution with mortality, for public health purposes, it is the change in absolute risk associated with pollution that is more important. In our case, the change in risks due to implementing the improved algorithm is 0.41-0.27=0.14% per 10 micrograms per m3 of PM10.
To understand these changes, consider the city of Baltimore where there are roughly 20 deaths per day or 7,300 deaths per year. If we could reduce PM10 in Baltimore from the current average value of 35 down to 25 micrograms per m3, our prior estimate of 0.41% corresponds to saving 30 lives per year from the acute effects alone. Our updated estimate would correspond to 20 deaths, a change of 10 deaths per year.
Readers of this book would naturally ask: why not use the clearer method of explaining the results for public health purposes? A key theme of this book is that absolute risks are often more informative than relative risk. For example, the author mentions that women in Britian have gone through several "Pill scares." For example, an official statement said that "combined oral contraceptives containing desogestrel and gestoden arfe associated with around a two-fold increase in the risk of thromboembolism (blockage of a blood vessel by a clot). In terms of absolute risk the chance of thromboembolism would be reported to increase from 1 to 2 in 14,000 women.
Note that the Johns Hopkins researchers also realize that results are better understood when given in terms of frequency rather than probabilities. This is an important them of this book. Here is an example that is included in the book based on experiment carried out by Gigerenzer and Hoffrage (Academic Medicine, Vol. 73, No.5/May 1998). The authors of this study asked 48 doctors to answer four different classical false positive questions that would arise naturally in their work. One of these related to screening for breast cancer. here is how this example went: They were all told:
To facilitate early detection of breast cancer, women are encouraged from a particular age on to participate at regular intervals in routine screening, even if they have no obvious symptoms. Imagine you conduct in a certain region such a breast cancer screening using mammography. For symptom-free women aged 40 to 50 who participate in screening using mammography, the following information is available for this region.
The doctors were divided into two equal groups and assigned basically the same question in two different forms. Half were given it in the "probability format" and the otherhalf in the "frequency format":
(probability format)
The probability that one of these women has breast cancer is
1%. If a woman has breast cancer, the probability is 80% that she will have
a positive mammography test. If a woman does not have breast cancer, the probability
is 10% that she will still have a positive mammography test. Imagine a woman
(aged 40 to 50, no symptoms) who has a positive mammography test in your breast
cancer screening. What is the probability that she actually has breast cancer?________%
(frequency format)
Ten out of every 1,000 women have breast cancer. Of these 10 women with breast cancer, 8 will have a positive mammography test. Of the remaining 990 women without breast cancer, 99 will still have a positive mammography test. Imagine a sample of women (aged 40 to 50, no symptoms) who have positive mammography tests in your breast cancer screening. How many of these women do actually have breast cancer?_____out of______
only 8% of those given the probability format got it right while 46% of those given the frequency format got it right. The frequency format was the clear winner in the other questions also. In discussing this experiment in another article (The psychology of good-judgment, MedicalDecisionMaking, 1996;3;273-280) Gegernzer included the following amusing cartoon of the two methods for solving the problem:
Another theme of the book is the illusion of certainty. This occurs in the law with the idea that matches of ordinary fingerprinting and dna fingerprinting demonstrate with certainty that the accused was at the scene of the crime. in medicine with doctor's diagnosis of an ailment, with presidents who say the economy is is good shape ect. It is interesting to note that the frequency method when it uses statements like the one used above: "Ten out of every 1,000 women have breast cancer. Of these 10 women with breast cancer, 8 will have a positive mammography test" might be contributing to the illusion of certainty.
Another major theme in this book is risk and making decisions in the face of uncertainty. The author illustrates the many issues here by a case study if screening for breast cancer. The author has clearly done his homework here presenting very up-to-date information of the many studies that have been carried out related to screening for breast cancer. We can think of no better example to show the difficulties in making decisions in the face of uncertainty. Here a false positive can lead to serious depression, further invasive testing, and unecessary medical treatments. This example provides a wonderful laboratory for all the authors ideas on how best to communicate to doctors and patients what studies have and have not shown about breast cancer screening. See the next item for an elegant applications of the methods discussed in this book applied to presenting cancer risks.
If this book is widely read it will make a significant contribution to statistical literacy. The author has made a strong case that probability and statistical information is not presented to the general public in the right way and has made recomendations on how this can be improved. The book is completely assessible to the general public and will give them a new appreciation of the role of probability and statistics in their daily life. It will help them to ask the right questions when they have to make decisions under uncertainty. Reading this book will also help researchers, doctors, and news writers transmit information in a way that will be most helpful to the general public. Finally, it will give teachers a new perspective on ways to teach probability and statistics in a way that will stay with them when they face decisions under uncertainty.
Of course, no single book can expect to solve this problem. However, it is surprising how little has been done to improve statistical literacy in the general public and this should make a signicant contribution to the important task that we all share.
Postscript:
The author starts and finishes his book with two famous quotations:
:...in this world there is nothing certain but death and taxes
Benjamin Franklin
Statistical thinking will one-day be as necessary for efficient citzenship as the ability to read and write
attributed to H. G. Wells.
About the second quotation Gigerenzer writes in his notes:
This statement is quoted from How Lie with Statistics (Huff, 1954/1993), where it serves as an epigraph. No reference is given. I have searched through scores of statistical textbooks in which it has since been quoted and found none where a reference was given. I could not find this statement in Wells's work either. Thus, the source of this statement remains uncertain, another example of Franklin's law (Our first quote).
We could not resist the challenge to find the origin of this famous quotation. It's origin is described by James W. Tankard (The H.G. Wells Quote on statistics: a question of accuracy, Historia Mathematica 6 (1979), 30-33). It turns out to be another example of statistics stealing from mathematics. Tankard states that the earliest occurrence of the quotatio, in writings about statistics, is as an epigraph at the beginning of Helen M Walker's Studies in the History of Statistical Method. Walker gave the quotation as
The time may not be very remote when it will be understood that for complete initiation as an efficient citizen of one of the new great complex world-wide States that are now developing, it is as necessary to be able to compute, to think in averages and maxima and minima, as it is now to be able to read and write.
and attributed it to H.G. Wells book Mankind in the Making. Tankard states that this is part of a longer sentance in this book:
The great body of physical science, a great deal of the essential fact of financial science, and endless social and political problems are only accessible and only thinkable to those who have had a sound training in mathematical analysis, and the time may not be very remote when it will be understood that for complete initiation as an efficient citizen of one of the new great complex world-wide States that are now developing, it is as necessary to be able to compute, to think in averages and maxima and minima, as it is now to be able to read and write.
In other words, Walker left out the reference to a need for a sound training in mathematical sciences. Apparently, "Statistical thinking" first occured in the quote in Sam Wilks' presidential address to the American Statistical Society (Undergraduate statistical education, JASA, vol 46, 1-18) Here wefind the statement:
Perhaps H. G. Wells was right when he said "Statistical thinking will one day be as necessary for efficient citizenship as the ability to read and write.!"
Thus the takeover was completed! Tankard remarks that statistics does not occur in the index in any of Well's autobiographs and biographer Lovat Dickson told him that he could not recall any place in Well's writings where he dealt specifically with statistics.
Incidently, it is interesting to compare the discussion of undergraduate staitistics courses in Wilk's Presidential address with that in David Moore's 1998 Presidential address Statistics among the Liberal Arts on a similar topic. For example, The word data occurs once in Wilks' address and 53 times in Moore's.
(1) Some say that stating a problem using the frequency format is, to some extent, giving away the answer. What do you think about this claim?
(2) Would you be concerned that given results of experiments using the frequency format would give the public a false sense of variations in statistics? If so, how could we guard against this and still use the frequency format.
(3) Why do you think researchers and news writers tend to prefer relative risk to absolute risk?
The New York Times article has a good discussion of the need for better ways to present statistical information to the public. This includes issues like relative risk vs absolute risk, how populations risks should be interpreted by individuals, probabilitist vs. a frequency methods to give odds etc. The commentary by Woloshin and his colleague uses the frequency approach to provide an impressive chart to help people decide on risks of cancer. This is reproduced in the New York Times article in the following elegent form:
Looking at the column for your age you can find the number of people per 1000 that will be expected to die from a variety of causes depending on whether you are a smoker or not.. Looking at the age 80 Laurie Snell finds that his biggest worry is a Heart Attack and the same is true for Joan Snell at 75. Bill Peterson looking at the age 45 finds his biggest worry is lung cancer and Jeanne Albert looking at age 40 really doesn't have anything much to worry about. So Joan and Laurie will quit playing tennis in 90 degree weather, Bill will resolve not to take up smoking and Jeanne will enjoy the good life while it lasts.
(1) The time article quotes Steven N. Goodman, an epidemiolgist at Johns Hopkins School of Medicine as saying:
One way to think about risk is that each of us has a little ticking time bomb, a riskometer inside us. It sais that everyone is at risk and that eating and health habits can raise or lower risk.
The other way comes closer to the way many people experience risk. It says that the risk is not to them but that each individual has a fate. I either going to to survive this cancer or not. Probabilities are uncertainties that the doctor has about what is going to happen to us. And that is a profoundly different way of thining about risk.
The complication is that probability has both of these meanings simulataneously.
Which way is closest how you think about risk? How would you describe the way you look at risk?
(2) The Times article reports that Dr. Daavid McNamee, an editor at the Lancet, sent an e-mail to Woloshin saying the charts work for him and commenting:
Follishly, I started smoking again last year after quitting for labout 20 years. I can see it is time to stop again (I am 50 in November), and get back to the gym. I have put the charts on the office wall.
What would be your estimate for the number of people who would quit smoking after looking at this chart in the New York Times article?
We received the following suggestion from Roger Johnson at the South Dakota School of Mines & Technology for Chance News:
In Dean Koontz's novel From the Corner of His Eye Jacob shuffles four decks of new cards. Maria then draws 12 cards - 8 aces followed by 4 jacks of spades. What is the chance of this? Later, we find that Jacob is a "card mechanic" and has deliberately shuffled the decks to produce the first 8 aces but not the 4 jacks of spades:
The odds against drawing a jack of spades four times in a row
out of four combined
and randomly shuffled decks were forbidding. Jacob didn't have the knowledge
necessary to calculate those odds, but he knew they were astronomical."
[p. 234]
With the understanding that the aces were arranged to be drawn first but that, beyond this, the cards were randomly arranged, what is the chance that the 4 jacks of spades are then drawn? In the novel, 2 cards are set aside between drawings so that 22 cards have been drawn at the time the eighth ace is drawn (and, so, 4*52 - 22 or 186 cards remain).
Regarding this strange draw of cards the author writes:
Not every coincidence, however, has meaning. Toss a quarter one million times, roughly half a million heads will turn up, roughly the same number of tails. In the process, there will be instances when heads turn up thirty, forty, a hundred times in a row. This does not mean that destiny is at work or that God - choosing to be not merely his usual mysterious self but utterly inscrutable - is warning of Armageddon through the medium of the quarter; it means the laws of probability hold true only in the long run, and that short-run anomalies are meaningful solely to the gullible." [pp. 200-201]
Schilling's (1990) article "The longest run of heads" in the College Mathematics Journal (vol. 21, pp. 196-207) gives the result that in n tosses of a fair coin the expected longest run of heads is nearly ln n/ln 2 - 2/3 with a standard deviation of nearly 1.873 (constant in n). Given this, what do you think of the quote above from Koontz's novel about seeing heads "thirty, forty, a hundred times in a row" in a million tosses? Would you say that Koontz is correct "in spirit"? Why (read Schilling's article)?
Bob Johnson sent us the following note about the "Beyond the Formula Statistics Conference" held every year at Monroe Comunity College in Rochester New Jersey.
We note that this year's keynote speaker is Joan Garfield a long time member of the "Chance team." Bob writes:
August 8 & 9, 2002. Save those dates!!
Plan to attend the 6th annual Beyond The Formula Statistics Conference,
Constantly Improving Introductory Statistics: The Role of Assessment.
As always the program features topics in the areas of curriculum, teaching techniques,
technology usage, and applications along with this years thread topic
Assessment.
Speakers include: Joan Garfield (Keynote), James Bohan, Beth Chance, Mark Earley, Patricia Kuby, Allan Rossman, John Spurrier
For detailed Information, please visit our web site: http://www.monroecc.edu/go/beyondtheformula/
Professor Kaplan sent us a correction to our discussion
in Chance News 11.02 of the following article:
Hi -- by "chance" I came across your newsletter, which just happened to mention my Op Ed on travel risks to Israel.
You have one of the numbers wrong -- as did the Jerusalem Post -- the copyeditor changed some figures AFTER I had returned my last version. They corrected it, so now some online versions of the J. Post have the right numbers and some the wrong! However, the numbers in the OR/MS Today version were correct.
I bet some of your readers saw the inconsistency -- 120 deaths over 442 days in a population of 6.3 million does NOT work out to a death risk of 19 per million (per year!). It works out to a risk of 120 * (365.25/442) / 6.3 = 16 per million per year.
At the time I wrote this, I believed I was being conservative -- the 442 days I included were during a period of above-average terror attacks (in both frequency and casualties). Sadly things have gotten worse -- but even if you were to substitute more recent statistics, you would still find the death risk from driving in the US to be just as scary. There is no question that people overestimate the risk of being a terror victim -- just like people overestimate the chance of winning the lottery. The nature of rare events is that, while one can happen every day, the likelihood of one happening to YOU on a given day is very, very small.
Edward H. Kaplan, Ph.D.
William N. and Marie A. Beach Professor of Management Sciences
Professor of Public Health
Yale School of Management
Professor Kaplan has also sent us two papers that we think Chance readers and students would enjoy. The first is a working paper entitled A New Approach to Estimating the Probability of Winning the Presidency written Arnold Burnett.
Abstract:
As the 2000 election so vividly showed, it is Electoral College standings rather
than national popular votes that determine who becomes President. But current
pre-election polls focus almost exclusively on the popular vote. Here we present
a method by which pollsters can achieve both point estimates and margins of
error for a presidential candidate's electoral-vote total. We use data from
both the 2000 and 1988 elections to illustrate the approach. Moreover, we indicate
that the sample sizes needed for reliable inferences are similar to those now
used in popular-vote polling.
You can obtain the full text version of this paper here.
Readers may recall that Arnold Burnett gave one of our first and favorite Chance Lectures.
The second paper is March Madness and the Office Pool Co-authored with Stanley J. Garstka which appeared in Mamagement Science Vol. 47, No. 2, March 2001, pp. 369-382.
Abstract
March brings March Madness, the annual conclusion to the US men's college basketball season with two single elimination basketball tournaments showcasing the best college teams in the country. Almost as mad is the plethora of office pools across the country where the object is to pick a priori as many game winners as possible in the tournament. More generally, the object in an office pool is to maximize total pool points, where different points are awarded for different correct winning predictions. We consider the structure of single elimination tournaments, and show how to efficiently calculate the mean and the variance of the number of correctly predicted wins (or more generally the total points earned in an office pool) for a given slate of predicted winners. We apply these results to both random and Markov tournaments. We then show how to determine optimal office pool predictions that maximize the expected number of points earned in the pool. Considering various Markov probability models for predicting game winners based on regular season performance, professional sports rankings, and Las Vegas betting odds, we compare our predictions with what actually happened in past NCAA and NIT tournaments. These models perform similarly, achieving overall prediction accuracies of about 58%, but do not surpass the simple strategy of picking the seeds when the goal is to pick as many game winners as possible. For a more sophisticated point structure, however, our models do outperform the strategy of picking the seeds.
You can obtain the full text of this paper here.
Don Poe at the American
University sent us the following note:
Laurie -
I love Chance News and often use it as the source of materials
for my statistics courses. I do, however, want to comment on an item in the
current edition (11.02).
Item #10 contains a brief comment by Marilyn vos Savant about the fairness of deciding important sports outcomes by a coin toss. In an addendum she says that such decisions could still be fair, "even if the coin were two-headed or two-tailed, as long as the team that chooses has no information about the coin." Loathe though I am to ever disagree with Marilyn (statisticians who do usually end up with egg on their face), in this case I think I would at least want to add a qualification to her answer. That is, this odd situation would be fair if the team doing the choosing did so completely randomly. But I personally doubt that this is true of human nature.
Imagine the following situation (it's more than a bit contrived,
but bear with me): Dartmouth and Princeton are playing football for the Ivy
League championship. The game is tied after regulation and the rules require
a sudden death playoff where the first team to score wins the game. The teams
are about to have a coin toss to see who gets the ball first, in this case obviously
a tremendous advantage. The unscrupulous referee secretly has a huge bet on
the game's outcome and will stand to win big if Dartmouth wins the game. In
his pocket he has 3 coins - a fair coin, a two-headed coin and a two-tailed
coin. When he is not refereeing Ivy League football games this man is a sports
psychologist who recently read an empirical article that shows that in coin
tosses, the team doing the choosing tends to pick heads significantly more often
than tails. If Dartmouth is calling the coin toss, he uses the two-headed coin
under the assumption that Dartmouth is at least slightly more likely to call
heads. If Princeton calls the toss, he uses the two-tailed coin for the same
reason. According to Marilyn, the referee's choice of the coin to toss could
not possibly have an impact on the outcome since the person calling the toss
has no information. If the empirical article is right (i.e., people do not call
coin tosses randomly), then she is wrong and the referee has changed the odds
in favor of Dartmouth.
This reminded us of the activity that our colleague Psychologist George Wolford did in our Dartmouth Chance class. He said that he understand that the class had tried exp experiments and they seemed not to have extracenory perception. However, he would show them that they did. He put down five coins in a row and wrong the sequence of heads and tails on a piece of paper that he put in a sealed envelope. He then asked the students to think very hard and write down what they thought his sequence was. Then he wrote each persons sequence on the board and lo and behold the correct sequence was the most popular answer.
This work is freely redistributable
under the terms of the GNU
General Public License published
by the Free Software Foundation.
This work comes with ABSOLUTELY NO
WARRANTY.