Prepared by J. Laurie Snell, Bill Peterson, Jeanne Albert, and Charles Grinstead, with help from Fuxing Hou and Joan Snell.
Please send comments and suggestions for articles to
Back issues of Chance News and other materials for teaching a Chance course are available from the Chance web site:
Chance News is distributed under the GNU General Public License (so-called 'copyleft'). See the end of the newsletter for details.
Chance News is best read with Courier 12pt font and 6.5" margin.
They (Authors of "The Game of Life") are the sort of people who think that no observation is so intuitive that it can't be improved by regression analysis.
The New Yorker 22 Jan 2001
If you're an affected individual here, the statistics about how many it affects or doesn't affect somewhere else in the culture don't mean much to you because for you it's 100 percent.
Attorney General John Ashcroft
discussing racial profiling.
Contents of Chance News 10.03
Note: The chance of getting at least 2200 heads when tossing a coin 2900 times is 2.97*10^(-179) which is indeed pretty close to (1/1000000)^30 = 10^(-180).
Statistics point to more than random error in Florida vote:
Economics professor Tom Carroll began running statistical equations on Thursday on the net gains that both Gore, who gained more than 2200 votes, and Bush, who added about 700 votes, had made in the recount. He found that the statistical chances for such large and different totals to occur as a result of random glitches was less than infinitesimal.
"The probability of being struck by lightning is about one in a million", Carroll said. "The same person would have to be hit by lightning 30 times to compare with what we've seen in this recount."
Las Vegas Sun
10 Nov 2000
On average our customers saved 25% on their home insurance.
(Small print on back: figures based on independent research with customers who made a saving by switching their insurance to ....)
Insurance flyer from Lloyd's TSB
(summarizing the conclusions from two linear regression analyses on factors affecting the song repertoire of warblers) "That is, feather length significantly predicts repertoire size, while body mass almost does so".
Proc. Roy. Soc.
(1) Do you think there are more Republican or more Democratic members of Congress who are members of Phi Beta Kappa?
(2) In the Senate, 4 Republicans and 8 Democrats are members of Phi Beta Kappa. Are there significantly more Democrats than Republicans who are members of Phi Beta Kappa?
(3) In the House of Representatives, 7 Republicans and 17 Democrats are members of Phi Beta Kappa. Is this difference significant? Are the differences for Congress as a whole significant?
(4) Is the proportion of senators that are members of Phi Beta Kappa significantly higher than the proportion of members of the House that are members of Phi Beta Kappa?(5) Which Senators do you think are members of Phi Beta Kappa? Check your answers with the correct answer at the end of this Chance News. Did you do better than guessing?
Car Talk recently asked the following question: You are told that each of three pieces of paper has one of three distinct positive numbers written on it. They are placed face down on the table so you cannot see the numbers. You are allowed to pick up the pieces of paper one at a time and your objective is to stop when you think you have the biggest number. Once you have rejected a number you cannot go back to it. Is there a strategy that gives you better than the 1/3 chance you have when you just choose a piece of paper at random and say this is the biggest (the naive strategy)?
Car Talk says the answer is yes and here is their rule for stopping: randomly choose a piece of paper. Then randomly choose a second piece of paper. If the number on the second piece is bigger than that on the first piece stop. If not then choose the third piece of paper and you have to stop.
On the Car Talk web site you can contribute to a simulation of this problem using the naive strategy or Car Talk's solution and see if those who chose the Car Talk solution are doing better than those who chose the naive solution.
This is a special case of a famous problem called by various names: the secretary problem, the beauty contest, the dowry problem and the Googol problem. As formulated by Car Talk, it is the Googol problem. Martin Gardner gave it the name Google in his discussion of the problem (See Chapter 3 of "New Mathematical Diversions", Martin Garner, MAA). For an elegant discussion of this problem see "Recognizing the maximum of a sequence", Gilbert and Mosteller, J. Amer. Statist. Assoc. 61, 35-73 (available from JSTOR)).
(1) What is the probability of getting the biggest number using the naive strategy and using the Car Talk solution?
(2) Does the Car Talk solution maximize the chance that you get the largest number?
(3) Suppose that Car Talk said that they chose distinct positive integers to put on the slips of paper. Would their solution still be correct?
(4) Assume that your are told that the three numbers were chosen randomly from the unit interval (0,1). What would be your best strategy now? Hint: work backwards.(5) As the last two problems show, knowing something about what the actual possible numbers are or how they are chosen can make a difference. How could you state the original problem to avoid this?
The Savvy traveler discusses the NTSB report. This report found that the survival rates for accidents involving Part 121 carriers (large scheduled airliners) from 1983 through 2000 was 97.5 percent. For serious Part 121 accidents (those involving fire, serious injury, and either substantial aircraft damage or complete destruction), the survival rate was 56 percent. The report suggests that the public perception of survivability may be substantially lower that the 95.7 percent survival rate found by the study.
The Savvy traveler then introduces Paul Bailey who has developed a web site where you can estimate the chance that you do not survive a flight you are going to make. Paul remarks that he started his web site because he was tired of the media's attention to air accidents. He wanted to show how safe air travel is. On his web site you can put in information about a flight you plan to take and obtain the chance of your dying on this flight.
Laurie checked out his forthcoming March flight from Boston to San Francisco. He found that if he went on a United Airline Boeing 777 flight his chance of dying is 1 in 12,872,967. Switching to a Boeing 727 flight would reduce this to 1 in 19,503,394. Switching to a 727 flight on Northwest would reduce this further to 1 in 22,678,360.
(1) If you were Laurie, would you pay attention to the results of amigoindown.com in planning your trip?(2) From the NTSB web site we find that in 1999, for large scheduled airliners, there were .43 accidents per 100,000 flights. Using the result of the new NTSB study, what would you estimate to be the probability you do not survive your next airplane trip? How does this compare with the estimates Laurie got from amigoingdown.com? Recall that Arnold Barnett estimated that if you choose a random flight your chances are about 1 in 7 million of dying (See Chance News 8.09). Would you expect these three methods of estimating your chance of dying would give approximately the same answers?
About a decade ago a reader posed the following problem to the Ask Marilyn column of Parade Magazine.
Suppose you're on a game show, and you're given the choice of three doors. Behind one door is a car, behind the others, goats. You pick a door but do not open it, say number 1, and the host, who knows what's behind the doors, opens another door, say number 3, which has a goat. He says to you, "Do you want to pick door number 2?" Is it to your advantage to switch your choice of doors?
The answer "yes" produced a flurry of responses, mainly from people who disagreed. This problem achieved notoriety as the "Monte Hall Problem" (A quick search on Google yielded about a hundred sites). Subsequent magazine issues contained some of the more interesting letters, many of which may be found in "Power of Logical Thinking" (by Marilyn vos Savant, St. Martin's Press, 1996). The problem is an excellent illustration of the mysteries of conditioning when used to model a phenomenon.
Today we have reality television and similarly, a probability problem and answer, this time on the internet, about the finalists in Survivor II which generated numerous e-mails complaining about the flawed logic. This problem arose from the following article in the Boston Herald:Inside track:Newton "Survivor" goes forth
The authors of this article write:
Don't breathe a word of this. Because, of course nobody's supposed to know. But our spies Down Under swear that, despite the plot against her, local chick Elisabeth Filarski, the "Survivor II" contestant with a heart of gold, makes it into the FINAL FOUR!
The web site www.survivorsucks.com spreads the news on so-called "spoilers" related to the CBS Survivors II program.
Commenting on the prediction in the above article they wrote:
The Boston Herald thinks they know one of the Final Four. Do we believe them? Of course not, silly. We sleep with the list of the Final Four under our pillow, and we haven't shown anyone. With only 12 little Indians left, they've got a one in three shot at this, so they might get lucky. You've been warned. Don't go if you don't want to know...
Readers took exception with the 1/3 probability and wrote in with a variety of solutions, some with products of combinatorial terms. Hundreds of e-mails were sent with both simple and complicated solutions, both correct and incorrect. Some of these are on the above web site.
The original problem was not crisply presented and this may have caused some confusion. What was wanted was the probability of guessing the identity of one of the final four survivors with one guess -- that should be 4/12.
One person wrote,
The chance of randomly picking the final four from the current 12 is quite a bit less than 1 in 3.
Just saw your message stating that the Boston Herald has a 1 in 3 chance of getting the final four right, since there are 12 contestants 4/12 = 1/3. Actually, this is not correct ...
A third wrote
... but choosing 4 finalists out of 12 is not a one in three chance.
Evidently people interpreted the question in different ways.
Most interesting are the complex ways in which the right answer was achieved. Here is one of these:
Total Combinations 479001600 Combinations where Liz comes first 39916800 Combinations where Liz comes second 39916800 Combinations where Liz comes third 39916800 Combinations where Liz comes four 39916800 Combinations where Liz comes 1st, 2nd, 3rd or 4th 159667200Odds of Liz coming in top four 0.33333333
1. If you pick four contestants at random, what is the probability that you pick all four correctly?2. If you pick four contestants at random, what is the probability that you pick at least one correctly?
Mind-exercising hobbies can help stave off Alzheimer's.
Minneapolis Star Tribune, 6 March, 2001
Patients with Alzheimer's disease have reduced activities in
midlife compared with healthy control-group members.
Proceedings of the National Academy of Sciences
13 March 2001, 3440-3445
Friedland and others
This is a good article to compare a news account of a study with the original journal article where the study was reported.
The Star Tribune article reports that a study shows that adults with hobbies that exercise their brains--such as reading, jigsaw puzzles or chess--are two and a half times less likely to have Alzheimer's disease, while leisure limited to TV watching could increase the risk.
This article reports that the investigators surveyed people in their 70's to get information about their leisure activities in young adulthood--ages 20 to 39--and middle adulthood--ages 40-60. They obtained this information on 193 Alzheimer's patients (cases) and 358 people who did not have Alzheimer's disease (controls).
Joan asked: How can people with Alzheimer's report their leisure activities in their young or middle adulthood? Of course, one might also ask how people in their 70's can remember how much TV they watched in their late 40's and early 50's. We certainly can't.
This mystery was solved by looking at the original paper. We find there that information about 26 different types of activities was obtained through a questionnaire. The control-group answered the questionnaire but, for the Alzheimer-group, the answers were provided by family and friends.
The activities were divided into three groups described as passive, intellectual, and physical. The word television does not appear in the journal article though we assume it was one of the passive activities.
The newspaper article emphasized the protective effect of the intellectual activities, but, in fact, the study showed that the physical and the passive activities also had a protective effect, though less than that of the intellectual activities.
(1) Does it seem strange to you the Star Tribune editors were not concerned about the same question Joan was? (The original Associated Press article did say that the questionnaire for those with Alzheimer's disease was filled out by family and friends, but this was omitted from most newspaper reports of the study.)
(2) In their Journal, the authors say "We cannot exclude the possibility that our data reflect the very early effects of the disease, several decades before symptom onset." How would this affect their conclusions?(3) Do you think Laurie's brothers (78 and 80 years old) could give an accurate account of Laurie's activities when he was in his twenties?
This year is the 50th anniversary of "The Shot Heard Round the World." For those readers who are not baseball fans (are there any statistical types who are not baseball fans?), on October 3, 1951, Bobby Thomson hit a home run with two men on base in the bottom of the ninth inning of the last playoff game between the Giants and the Dodgers. The home run won the game by a score of 5-4, and propelled the Giants into the World Series against the Yankees. (That year was the only year when all three New York baseball teams finished in first place.)
The fact that it was the bottom of the ninth is important; it means that the Giants were playing at home. Over the years, it has emerged that in the last months of the season, when they were playing at home, the Giants were stealing the opposing teams' pitching signs. They did this by stationing a team member, with binoculars, in the clubhouse, which was behind the centerfield fence, and signaling to their bullpen whether the next pitch was going to be a fastball or a curve. A team member in the bullpen would then make one of two motions, depending upon the type of pitch. The bullpen was visible to the batters, since it was behind left field. This story has been corroborated by many members of the Giants' team.
The initial statistics that one might look at certainly tend to support this story. The sign stealing started on July 20. On August 11, the Giants were 13 1/2 games behind the Dodgers. They proceeded to win 16 straight games, of which 13 were at home. At the end of this streak, they were only 5 games behind the Dodgers. In the last 18 games of the season, they won 14. They finished tied with the Dodgers, forcing a three-game playoff.
The New York Times article gives some statistics that make one wonder whether the sign stealing helped the Giants. The statistics cover the period from August 11 to the end of the season, during which time the team won 39 games and lost only 9. (The Giants were on a long road trip before August 11.) During this period, the Giants averaged 4.95 runs per game on the road, versus 4.79 runs per game as a visiting team before this period. At home, they averaged only 4.27 runs per game during this period, versus 5.54 runs per game before this period. After August 11, five of their eight regular players batted for a higher average on the road than at home. Before August 11, the Giants had hit 130 home runs in 109 games; in their last 48 games, they hit 49.
The Wall Street Journal article gives a wonderful account of the history of this event and how the signal was finally verified. Readers of the Journal praised it as one of the all time great baseball articles. Here are two excerpts from the beginning and the end of the story.
Early in the story we read:
The answer begins with a small leather case on a boy's desk in Westford Mass. Robert Henry Ehasz, 16, pops open a shiny steel snap engraved with the maker's name, Wollensak, and lifts out a telescope. "Papa use it to spy on the Germans," he says. "No, not the Germans," corrects his mother, Susan. "An opposing baseball team."
And at the end of the story:
But did he take the sign?
"I'd have to say more no than yes," he says. "I don't like to think of something taking away from it."
Pressed further, Mr. Thomson later says, "I was just being too honest and too fair. I could easily have said, 'No, I didn't take the sign.'"
He says. "It would take a little away from me in my mind if I felt I got help on the pitch."
But did he take the sign?
"My answer is no," Mr. Thomson says.
(1) Did he take the sign?(2) Are there other statistics that one might want to look at to determine whether the Giants were helped by the sign stealing?
This article reports conclusions from a study by the National Highway Safety Administration that examined death rates of car drivers involved in an accident during the period from 1991 to 1997. The death rates for car drivers varied from 10 per 1000 crashes with a Ford Explorer to 5 to 7 per 1000 for other SUVs, such as the Jeep Grand Cherokee, Toyota 4Runner, and Chevrolet Blazer. In crashes with other cars, the driver death rate was .6 per 1000.
There is also some discussion of the variation in rates, not all of it clear. For example, the article states that "limited numbers of crashes in the database for each model created a fairly wide range of errors in the calculations. For the Explorer, for example, there was a 95 percent chance that the true death rate of car drivers was 7 to 13 per 1000 crashes.... The study also cautioned that the error ranges themselves might be imprecise."
Although the study failed to achieve its primary objective--to establish the contributions of vehicle weight, stiffness, and height in causing damage to other vehicles in a crash--the article suggests that the higher death rate for the Explorer may arise because its front steel bumper sits sufficiently above (2.6 inches) those of many cars. Interestingly, even though "Ford dismissed the study as meaningless because of the wide ranges of error", they apparently have redesigned the Explorer for 2002 so that its bumper is within millimeters of those of cars.
(1) In discussing "error ranges," Bradsher tries to explain the meaning of a 95 percent confidence interval. Do you think he does a good job? Why or why not? How would you explain it?
(2) Bradsher also writes, "The error range meant that it was statistically possible, although unlikely, that one or more of the other midsize sport utilities was deadlier than the Explorer." What evidence is there to support this statement? What additional information would you like to know?(3) According to Hans Joksch, the statistician who did the analysis for the study, although "error ranges may be wide for individual models, the errors in comparing models are likely to be smaller." What do you think he means?
After a brief biography, this article presents an interview with Robert Fenichel, who retired last year as deputy director of the drug-evaluating branch of the F.D.A. that tests high blood pressure, congestive heart failure, and kidney disease medicines, among others. Most of the questions concern the perception and evaluation of the risks in taking drugs and the "abiding truth [that] any medicine strong enough to have good effects on the body is strong enough to have bad effects as well."
Here are some excerpts (most suggested by Norton Starr):
Q. When a drug is said to be safe, does that mean 100 percent safe?
A. Everything has some adverse effects. When people need medications, that's fine. For instance, you need penicillin for bacterial endocarditis, which has 100 percent mortality if you don't treat it. True, you might die from an allergic reaction to the antibiotic, but it doesn't matter. The calculation has already been done. You're still better off taking the thing, and if you decline you're just stupid.
Q. Do you think most people know how to put medication risks in perspective?
A. Of course not. With drugs, it's not obvious what the risk is. People may feel, "I didn't really grasp this," and that's probably true often. Does that mean an inadequate effort was made to transmit the information? Maybe and maybe not. Do I really understand a one-in-a-million risk of hepatitis from taking something? I'm not sure I do. These numbers when they're very small are hard to comprehend. They don't correspond to human experience. People don't even know that 25 percent means 1 in 4. They worry much more about safety in airplanes than in cars. They worry more about mad cow disease than about smoking. Go to France and see what people worry about. They're all smoking, and they've given up on beef.
(1) Why do you think people are more concerned about airplane safety than automobile safety?
(2) What does a "one-in-a-million" risk mean to you?
Q. Are symptoms ever bad enough to justify taking a drug with considerable risks?
A. The most dramatic case, I think, was a drug called flosequinan, made by Boots Pharmaceuticals in the United Kingdom, for congestive heart failure. People with severe congestive heart failure are terribly disabled. Some cannot even walk across a room without becoming desperately short of breath, and their median survival from diagnosis is only two or three years. Flosequinan really made patients feel lots better. They stayed out of the hospital and they could move around. And the drug increased mortality, by about 50 percent. I mean really a lot. They died of their congestive failure sooner than people who weren't taking the drug. Well, we thought about this, and the results with respect to feeling better were so impressive that people in the division thought, Gee, if I had that disease, I would want that drug. Now not everyone said that. But many people in the division thought, I would want that drug. And so it was approved. And the company finally lost its nerve, and they never marketed it.
Would you take the drug if you had congestive heart failure and only two to three years to live?
Q. Why do some bad effects not turn up before marketing, even though drugs are tested in thousands of people?
A. If you have 3,000 patients, which is a typical number of patients exposed to a drug before marketing, and something on average will occur 1 in 1,000 times in people using the drug, it's 95 percent likely that you will see such a thing in those 3,000 patients. Five percent of the time you won't see this thing. But if something is happening 1 in every 5,000, your chance of seeing it is pretty small. The only real hope you have is if it's something that is so distinctive, like the thalidomide babies, even if it's happening one in a million, you say, "My God, what's that?" and then if you see another one you say, "This can't be a fluke." But most things are very difficult to detect. And that means plenty of small effects will never be detected.
(1) How was the 95 percent arrived at?(2) What are the chances of seeing one or more occurrences among 3,000 subjects, if the average incidence is one in 5000?
John R. Lott is a senior research scholar at Yale School of Law. In his book "More Guns, Less Crimes" Lott claims to show that states which have laws that require officials to issue concealed weapon permits have less crime. He does so by using multiple regression with dependent variable crime rate and independent variables including many factors that might be related to crime rates including concealed gun laws.
Paulos sees little problem with the formal statistical calculations. On the other hand he is not convinced by the explanation Trott gives for why having concealed guns prevents crimes. Trott argues that the prospect of guns being used defensively scares some criminals into pursuing less violent careers.
Paulos mentions other well-known concerns about using linear regression to make future predictions. He asks: Who is to say that the linear relation will continue if the number of people with concealed weapons falls outside the range of present data? Paulos also mentions the problem of going from correlations to causation. He remarks that consumption of hot chocolate is also associated with less crime and both are brought about by cold weather.
Paulos also wonders how comfortable we would feel if we knew that just about everyone was walking around with a concealed weapon.
Of course gun control is a controversial topic and so, not surprisingly, a number of others have criticized Lott's book. David Hemingway of the Harvard School of Medicine, in his review of Lott's book in the New England Journal of Medicine, writes:
The central problem is that crime moves in waves, yet Lott's analysis does not include variables that can explain these cycles. For example, he uses no variables on gangs, drug consumption, or community policing. As a result, many of Lott's findings make no sense. He finds, for example, that both increasing the rate of unemployment and reducing income reduces the rate of violent crimes and that reducing the number of black women 40 years old or older (who are rarely either perpetrators or victims of murder) substantially reduces murder rates. Indeed, according to Lott's results, getting rid of older black women will lead to a more dramatic reduction in homicide rates than increasing arrest rates or enacting shall-issue laws.
You can find a link to Hemingway's article plus other critiques of Lott's book at: Critiques of Libertarianism: Gun Control
DISCUSSION QUESTION:What would you need to do to show that more liberal concealed weapons laws causes a decrease in crime?
The question of whether higher levels of salt in the diet leads to higher levels of blood pressure has been hotly debated. The history of this debate was described by Gary Taubes in his article: "(Political) science of salt", Science, 14 August, 1998, pp. 898-90, See Chance News 7.08.
In this technical report, Freedman and Petitti give a critique of a major study, called Intersalt, that was designed to test the hypothesis that higher levels of salt in the diet lead to higher levels of blood pressure. The authors of this technical report conclude that the data in this study do not support the hypothesis. This is in contrast with the authors of Intersalt, who claimed the opposite. The present article is well-written, with many interesting comments about Intersalt, but it is written at a fairly high statistical level. Even so, we recommend that anyone interested in issues such as publication bias and how one might critically examine the results of statistical studies should at least skim this paper.
Intersalt was an observational study. It was conducted at 52 centers in 32 countries. At each center, about 200 subjects, ages 20 to 59, were recruited. Blood pressure and urinary sodium (and potassium) levels were measured. The reason that sodium was measured in this way is that very little sodium is retained by the body, or excreted in ways other than through the kidneys. It is also much more problematic to measure the amount of sodium which is in one's diet.
At each center, the blood pressures of the subjects were regressed on their ages. The slopes of the regression lines indicate how rapidly blood pressure increases with age. These slopes were then correlated with salt levels across centers. The correlation was significant and positive, which means that at those centers with higher average salt levels, the rate of increase of blood pressure with age was greater than at those centers with lower average salt levels.
In each center, the subjects' blood pressure was regressed on their urinary salt levels. It is interesting that these regression coefficients were of varying sign, and some were significant while others were insignificant. However, once the data were pooled between the centers, the results were significant.
At this point, a paradox emerges. It is best to quote from the paper: "In more detail, suppose (i) there is a linear relationship between age (x) and blood pressure (y) for subjects within each of the 48 centers [4 centers were removed by the present authors as being outliers]; (ii) across the centers, as average salt intake goes up, the slope of the line goes up; (iii) subjects in all 48 centers have the same average age (x-bar) and average blood pressure (y-bar). As always, the regression line for each center has to go through the point of averages (x-bar, y-bar) for that center. The point of averages is the same for all the centers - assumption (iii). Therefore, the lines for the high-salt centers have to start lower than the lines for the low-salt centers, in order not to pass over them at x-bar."
Now, here is the paradox. Estimated systolic blood pressure at age 20 (the lower age of the subjects in the study) is plotted against the level of urinary salt. The relationship is negative and significant. The authors cannot resist a bit of wit: "If dietary advice is to be drawn from these data, it would seem to be the following. Live the early part of your life in a high-salt country, so your blood pressure will be low as a young adult; then move to a low-salt country, so your blood pressure will increase slowly. The alternative position, which seems more realistic, is that differences in blood pressures among the Intersalt study populations are mainly due to uncontrolled confounding -- not variations in salt intake."
It is interesting to the authors that the data in the Intersalt study has not been made publicly available. They quote the Intersalt authors as saying that they are not publishing the data "because of the need to preserve the independence of scientific investigation, the integrity of the data, and the confidentiality of information..." The present authors are mystified by this statement. They also give other quotes from people who are sympathetic to the validity of the salt hypothesis. Some of these quotes are quite surprising. They certainly cause this reviewer to wonder about vested interests and the making of public policy.
Another interesting statistical idea that is discussed in this paper is that of publication bias. The authors plot, for many different studies, the change in blood pressure (due to a decrease in salt intake) on the vertical axis, versus the square root of the sample size, on the horizontal axis. If all studies are published, the points corresponding to these studies should have the shape of a funnel with the wide part to the left and the thin part to the right. The funnel should be horizontal, since the size of the study should have no effect on the average effect of reducing salt intake by a specified amount. In fact, the actual funnel plot that is shown in this paper, which shows the results of more than 30 studies, shows a clear bias towards studies with large negative changes in blood pressure.
The authors distributed their paper to other experts in the field. Here are some of the responses they got:
Which of these comments would you agree with?
Epidemiologists can never wait for final proof. Instead, recommendations must be made in the interest of promoting good health for the public.
The effect of salt reduction may be detectable only in hypertensives, but today's normotensives are tomorrow's hypertensives.
Public health guidelines to reduce sodium consumption from three grams to one gram will hurt no one, and may benefit thousands.
Access to data can distort, confuse, intimidate, and muddy the waters of medical care and public health.
The reform movement for the teaching of statistics began in 1992 with the publication of the recommendations of the ASA/MAA Joint Curriculum Committee chaired by George Cobb. These recommendations were:
1. Emphasize the elements of statistical thinking:
(a) the need for data
(b) the importance of data production
(c) the omnipresence of variability
(d) the measuring and modeling of variability
2. Incorporate more data and concepts, fewer recipes and derivations. Wherever possible, automate computations and graphics. An introductory course should:
(a) rely heavily on real (not merely realistic) data
(b) emphasize statistical concepts, e.g., causation vs.
association, experimental vs. observational and
longitudinal vs. cross-sectional studies
(c) rely on computers rather than computational recipes
(d) treat formal derivations as secondary in importance
3. Foster active learning, through the following alternatives to lecturing:
(a) group problem solving and discussion
(b) laboratory exercises
(c) demonstrations based on class-generated data
(d) written and oral presentations
(e) projects, either group or individual
There has been such general agreement with these principles that statistical reform talks have become downright monotonous. However, as they say, the devil is in the details, and this book is meant to be a teaching manual for this statistical reform.
The first chapter is called Hortatory Imperatives (a title surely suggested by George Cobb). It includes an article, "Teaching Statistics: More Data, Less Lecturing" by George which expands on the recommendations and a Bibliography on Resources for Teaching Statistics.
It is interesting to ask how well the book covers the 1992 recommendations. Since this book contains articles by many of the leaders in the statistical reform movement, this will also indicate how much agreement there still is with these recommendations.
Consider first the recommendations relating to real data. Robin Lock is the guru for the use of data in elementary statistics courses and his article "WWW Resources for Teaching Statistics" shows how to obtain useful data from the web. This includes the Journal of Statistics Education which, under Robin's direction, provides real data sets along with articles on how they have been used in the classroom. In her article, "Real Data in Classroom Examples," Karia Ballman tells the reader not only how to find good data sets but also explains how she uses them in her teaching. She suggests ways to establish goals for learning from data, how to develop discussion questions and provides examples to illustrate these processes.
The recommendation to teach data production would seem to go along with the recommendation to emphasize causation vs. association, experimental vs. observational, etc. Of course, we feel that a good resource for this is newspaper articles that report the results of surveys, poll, and studies. We were, of course, pleased that several authors mentioned Chance News as a useful resource.
Issues of data production arise naturally when case studies are employed in a course. Norean Sharpe contributed a chapter describing sources for case studies and how she uses them in her courses here.
Student projects also provide a way to get students to think about the kind of data they need to obtain meaningful results. Robert Wardrop contributes an article on how he uses small projects in his teaching, and Katherine Halvorsen and Tom Moore discuss how they use the more traditional final project in their courses. Katherine and Tom have had a great deal of experience with projects and provide detailed information about how they guide the students through their projects and also describe interesting examples of projects that their students have produced. Their projects also satisfy the committee's recommendation that students have experience with writing and oral reports.
The recommendation for more active-learning led to the very influential books: Workshop Statistics by Allan Rossman et al and Activity-Based Statistics by Scheaffer. Excerpts from these books are provided. Michael Seyfried writes about his experience using Workshop Statistics in his courses and Bruce King writes about his experience using Activity-Based Statistics. These are great resources to show the "omnipresence of variability." They also provide a wonderful way to satisfy the recommendation to provide demonstrations based on class-generated data.
The Elementary Statistics Laboratory Manual by Spurrier and others is used to illustrate how a statistics lab can be used to add to student's understanding of statistical concepts. An excerpt from this manual is provided. Sneh Gulati discusses how he has used this manual in his teaching.
The recommendations refer to the use of computers only in the recommendation to "rely on computers rather than computational recipes" and the recommendation for statistical labs. Robin Lock, Tom Moore, and Rosemary Roberts describe how to evaluate statistical packages and Patrick Hopfenspereger describes how to use graphing calculators in teaching statistics.
The members of the committee could not anticipate the development of the web and the ease of making interactive teaching tools. To show the possibilities of such tools, the authors include discussions of Paul Velleman's "Active-Stats," George Cobb's "An Electronic Companion to Statistics," David Doane's "Visual Statistics," and "StatConcepts" by Newton and Harvill. All of these are illustrated and discussions of how they can be used are provided by teachers who have used them in the courses.
These resources have also been a tremendous help in implementing the recommendation: treat formal derivations as secondary in importance. Especially important has been their ability to simulate experiments. This has made it possible to give students an understanding of such basic theorems of probability as the law of large numbers and the central limit theorem without understanding the concepts of sample space and random variables which often seem mysterious to students.
Of course even with all these wonderful resources one still has to choose a text. Bob Hayden contributes an article "Advice to Mathematics Teachers on Evaluating Introductory Statistics Textbooks." Katherine Halvorsen provides a similar article "Assessing Mathematical Statistics Textbooks"-- but there is nothing similar about the recommended books for these two courses!
The MAA/ASA recommendations do not say anything about the role of student assessment in statistics reform. On the other hand educational research has shown that the way students are assessed plays an important role in the learning process. The importance of this is recognized by the editor's choice to end this teachers manual with an article by Joan Garfield "Beyond testing and grading: new ways to use assessment to improve student learning" and an article by Beth Chance, "Experiences with authentic assessment techniques in an introductory statistics course."
So we see that the authors who contributed to this book still support the 1992 MAA/ASA recommendations. Editor Tom Moore deserves great credit for putting together this teaching guide which will surely make it possible for many more teachers of statistics to have the fun and their students the rewards of participating in the statistical reform movement.
Footnote: In the introduction, Tom writes that "we will maintain a web site for this volume that allows us to provide information on good ideas that come along later." See the web site.
Encouraged by this invitation we mention two suggestions that we would have for a teacher's manual for statistics education reform.
(1) We would like to see more emphasis on the MAA/ASA committee's recommendation to employ group problem solving and discussion. The Journal of Statistics Education recognized the importance of this recommendation by including, in their first issue, the article: "Teaching Statistics Using Small-Group Cooperative Learning," Joan Garfield, v. 1, n.1 1993. Perhaps a link to this article could be placed on the web site.(2) We admit to a bias here, but think that statistical reform should encourage the use of media accounts of statistical information and studies. We think that students learn a lot from discussing news articles and get a better appreciation of the role of statistics in the real world. We would encourage those who have had experience using news articles in their introductory statistics courses to contribute a discussion of their experiences to the statistics reform web site. Again we would turn to Joan Garfield who regularly uses news articles in her statistics courses with great success. Better yet, perhaps the web site should have an "Ask Joan Garfield column!"
"The Shape of the River" and "The Game of Life" are two books that provide a wonderful examples of the role of data in public policy decisions. The Shape of the River considers the experience of African American students in highly selective colleges and their impact on the educational objectives of these colleges and universities. The Game of Life does the same for athletes.
The Science article has interesting commentaries on The Shape of the River and the New Yorker article provides a good overview of The Game of Life.
The two groups typically enjoy an advantage in admission to selective colleges. Arguments for and against this policy often involve claims such as: minority students who need an advantage in admission would do better in schools that do not have highly selective admission, football pays for itself, winning teams increase alumni giving, etc. The books use data to test these arguments and to help schools make rational policy decisions regarding admission policies for athletes and minorities. Both books use data from the same data set assembled by the Mellon Foundation and called the College and Beyond (C&B) Database.
The C&B database begins with admission and transcript records for 93,660 full-time students who entered thirty-four colleges with selective admission policies in the fall of 1951, 1976, and 1989. The colleges include large public universities such as University of Michigan, and Penn State; private universities such as Columbia, Duke, Stanford, Princeton, and Yale; coeducational liberal arts colleges such as Dennison, Oberlin, Swarthmore and Williams, and women's colleges such as Smith and Wellesley.
Information about the students before they entered college was obtained from two different sources. The College Board provided the student's response to the Student Descriptive Questionnaire that students taking the SAT test are required to provide in their junior or senior year in high school. The Higher Education Research Institute provided results of their annual freshman questionnaire given to approximately 350,000 entering freshmen at a nationally representative sample of colleges and universities. The Questionnaire and the survey provided information about academic performance and interests, career goals, extracurricular activities, demographic information, etc.
Information about the students after they left college was obtained by a survey. This survey carried out in 1996 asked students in the C&B database questions related to their educational and occupational histories, retrospective views of college, personal and household income, civic participation, and satisfaction with life. The number of student respondents to this survey was 45,184. As a control, the same questions were asked of a nationally representative sample of students who were approximately eighteen years of age in either 1951 or 1976. 4,036 individuals responded to this survey.
The authors were able to match information from these various sources with specific students so they did not have to rely on summary data. Unfortunately, this also means that they cannot make the dataset publically available.
In most cases, the authors make their points by looking at histograms giving summary information as to how one group of students compares with other groups. Each book has about 70 such histograms. Comparisons are made between groups within a specific entering cohort and specific groups are compared through time by looking at the three different cohorts. Also many subdivisions are made: athletes are divided into those participating in high profile and low profile sports, colleges are divided by the degree of selectivity, etc. These histograms provide readers with a gentle way to see what the data says. But as the New Yorker quote suggests, the authors do not hesitate to use regression analysis to try to get a deeper understanding of what is going on. A detailed description of the database and the methodological methods used is provided in appendices of The Shape of the River.
What do the data show? Looking at the most recent cohort one is struck by similarities between the African American students and the athletes in their academic preparation and performance in college. For example, both groups have average SAT scores and high school grade point averages that are significantly lower than the student population. Both groups under-perform in their college work, meaning that their college grades are not as high as their previous academic record would predict. Both groups tend to major more in the social sciences than does the class as a whole. And both groups are given an advantage in admission.
Despite these similarities the two books reach very different conclusions about the contributions of these two groups to the academic life of the colleges. More specifically the authors of The Shape of the River make a strong case for continuing affirmative action while the authors of the Game of Life make a strong case for decreasing the emphasis on high profile athletics. To understand why, one has to look more carefully at their arguments, some of which are based on a more detailed look at the data and others on value judgments.
The authors of The Shape of the River base their support for the continuation of affirmative action on looking at the data for minority students both during and after admission to college. They observe that the minority students do well in selective colleges and through time the gap in academic indicators is decreasing. They find that students black and white feel that a diverse student body contributes significantly to their educational experience.
The data shows that upon graduation African American students are accepted in top professional and graduate schools. A higher percentage of black graduates than white graduates enter medical and law schools. About the same percentage enter business schools and there is only about a one percent difference in the percentage going on to PhD programs. Many go on to have distinguished careers in their chosen field and provide role models for minorities.
The Game of Life tells a very different story. About the same time that the colleges in this study were making increased effort to attract minorities they were also making a significant increase in their recruitment of athletes. In 1989, roughly 90 percent of the athletes who played in the high profile sports said that they had been recruited, a much higher percent than in the '76 cohort. The advantage given to admission of athletes also increased significantly from '76 to '89 and then from '89 to '99. The authors write:
At a representative non-scholarship school for which we have complete data on all applicants, recruited male athletes applying for admission to the '99 entering cohort had a 48 percent greater chance of being admitted than did male students at large, after taking differences in SAT scores into account. The admission advantages enjoyed by minority students and legacies were in the range of 18 to 24 percent.
This recruiting brought in athletes who were very different from the student body as a whole. We have already remarked on their academic credentials. Their goals in life tended to be to achieve financial success which leads them to choose business school for their graduate work. Thus the case against the athletes is not against students participating in athletics but rather bringing students to the school whose first priority is athletics rather than academics. Of course, this is not entirely new. Bill Pruden, a college counselor at the Ravenscroft School in Raleigh, N.C., pointed out that the first college football game between Princeton and Rutgers in 1869 was won by a Rutgers team that included 10 freshmen, three of whom were failing algebra.
A recent study at Swarthmore College estimated that, in order to be competitive in all 24 intercollegiate sports on campus, 30 percent of each incoming freshman class of 375 students would have to be recruited essentially as athletes. The study concluded that recruiting more than 15 percent of the freshman class as athletes jeopardized the overall diversity of the student body. This led Swarthmore to drop three sports, football, wrestling and badminton, to allow them to decrease the number of students recruited as athletes and to still keep the other sports competitive.
Along the way, these two books provide a number of examples where common beliefs are shown not to be supported when one looks at the data. For example the authors find no evidence that a winning team increases alumni giving. And they find that even successful football programs do not typically pay their way. A common argument against affirmative action is that minority students who are admitted to a select college because of an admission advantage would do better at a less competitive school. The authors provide evidence that this is not the case.
This last question as well as a number of others are studied using regression techniques. Here is another such question the authors asked: Can the under-performance of athletes be attributed to the time needed to participate in sports? To answer this they used the students who participated in other time-consuming activities such as editing the newspaper, playing in the orchestra, etc. as a control group. Controlling for SAT scores, field of study, and socioeconomic status they found that these students over-performed in their academic work.
The authors also used regression analysis to show that the under- performance of athletes was not affected by the number of years they participated in athletics, i.e., by the dosage. This led them to suggest that the students' choice of priorities of athletics over academics probably started well before entering college.
The authors are dealing with really difficult issues that directly affect students college life. It would be good experience for statistics students to look carefully at some of the authors' conclusions to see what statistical methods were used to support these conclusions and how convincing the statistical arguments are.
(1) Note that the claim that athletes have a 48% advantage in admission was based on only one college. Would you expect much variation in this figure between colleges?
(2) The Science article notes that an important kind of data missing from the C&B data base was financial aid information. Why do you think the authors did not include this? What might they have learned from including this information?(3) It might be argued that it is just the prestige of the select colleges that gets students into good graduate schools. How could you test this hypothesis?
Science magazine reported on a talk at the AAAS meeting on apportionment that praised the Webster method and two of our readers sent us additional notes about this method.A house divided.
At the AAAS meeting in San Francisco last month Peyton Young reported that the current method of apportionment (the Hill method) is biased, giving less populous states 3% to 4% more seats than they deserve. He gives an example of this. Recall that for both the Webster and the Hill we start with a divisor d and determine the states quota as its population divided by d. Then the resulting fraction is rounded up or down. The divisor d is chosen so that the resulting number of representatives is 435.
The difference in the Webster and Hill methods is the way the quota are rounded up or down. For the Webster method the quota is rounded in the usual manner using the average of the two nearest integers to the quota as the dividing point. The Hill method rounds by using the geometric mean of the two nearest integers to the quota as the dividing point. Suppose that the quota for states A is 1.45 for state B it is 54.45. Then by Webster's method they both will be rounded down and A will be given 1 representative and B will be given 54. On the other hand, using the Hill method the geometric mean of 1 and 2 is sqr(2) = 1.414 so A's quota 1.45 will be rounded up giving A 2 representatives. But the geometric mean of 54 and 55 is sqr(54*55) = 54.498 so B's quota 54.45 will be rounded down given B only 54 representatives.
Young says he thinks it is time for Congress to switch back to Webster's method but Steven Brams, an expert on voting systems, does not see any groundswell movement for this. However he agrees that "Webster's scheme-or even the older method--would be superior to what's in place today."
We also received two letters from our readers both defending the Webster method. Maybe we will start the required groundswell movement.
Stan Seltzer and John Maceli write:
Your readers may be left with the impression that Balinski and Young's Quota Method is an apportionment panacea. Although it *is* house monotone (no Alabama paradox) and satisfies quota, Balinski and Young subsequently proved that there is no method that satisfies a stronger monotonicity criteria and satisfies quota. The stronger condition (which they call "population monotonicity") means that the method avoids the population paradox; i.e., it guarantees that if state A grows faster than state B, then A will not lose a seat while B does not and that B will not gain a seat while A does not.
As for your illustrious Dartmouth alum, Daniel Webster felt quite strongly that obeying quota was important. Here's what he had to say in 1832:
"The House is to consist of 240 members. Now, the precise portion of power, out of the whole mass presented by the number of 240, to which New York would be entitled according to her population, is 38.59; that is to say, she would be entitled to thirty-eight members, and would have a residuum or fraction; and even if a member were given her for that fraction, she would still have but thirty-nine. But the bill gives her forty .... for what is such a fortieth member given? Not for her absolute numbers, for her absolute numbers do not entitle her to thirty-nine. Not for the sake of apportioning her members to her numbers as near as may be because thirty- nine is a nearer apportionment of members to numbers than forty. But it is given, say the advocates of the bill, because the process [Jefferson's method] which has been adopted gives it. The answer is, no such process is enjoined by the Constitution."
Balinski and Young are quite impressed by Webster. In their book, they note that "it is said that no man could be as great as Webster looked."
Finally, turning to your question, "But will it [discussion of apportionment] ever end?", the answer seems clear: No. Walter F. Willcox-- a fierce advocate of Webster's Method (major fractions) for much of his professional life -- wrote "Last Words on the Apportionment Problem" in 1951. This paper closes with graphs of average district sizes (color coded showing large states, small states, and very small states) under Jefferson's Method (rejected fractions), Huntington-Hill (equal proportions), and his proposed method (included fractions, generally known as John Quincy Adams' Method). Of course, this was not his last word on apportionment; in 1954, at age 93, he published "Methods of Apportioning Seats in the House of Representatives" (which appears in JASA).
Note: The Balinski and Young book that they refer to is "Fair representation: meeting the ideal of one man, one vote," Michel L. Balinski and H. Peyton Young, Yale University Press, 1982.
This is a very readable book on the subjection of apportionment. The history and the general principles are informally and then the mathematics behind it all is developed in appendices.
David Rothman sent us a note on the Webster method. He remarked that he had demonstrated two properties that the Webster method has that he regards as important. One is suggested by properties of our standard method of rounding numbers. The other has to do with minimizing the variance of the vote when the representative bodies vote on an issue--for example the house and senate of New Hampshire voting on a new method of financing the schools. Rothman writes:
We round reals the way we do because, with probability one, if any two reals sum to an integer, their rounded values must add to the same integer. If any two legislature's sizes sum up to a size permitting perfect apportionment by population, the two apportionments add up to that perfect apportionment. This is true only for Webster (bicameral rule). Here is an example to show how Webster works:<<<========<<
Suppose we have two states, with populations 69 and 40. If our legislature has 105 seats, Webster and Hill both indicate 66, 39 as the representation vector. But if we have a legislature of 4 seats, only Webster indicates a vector 3,1. Since the total seats in the two chambers, 109, allows perfect representation 69,40 to occur, we'd expect our two vectors to give this sum. The bicameral condition is satisfied in general if and only if we use Webster.
If we model the function of a legislature as estimation of a parameter, then suppose we adopt the 3,1 solution. The representatives then have a district size vector of 23,23,23,40. The variance of the opinion average would then be(k/23 + k/23 + k/23 + k/40)/16 = 143k/14720 = .00971k,
If we adopt the 2,2 solution, the variance would be(2k/69 + 2k/69 + k/20 + k/20)/16 = 109k/11040 = .00987k,a larger value.
Source: The Key Reporter Winter 2000-01 p 9. You can find there the Phi Beta Kappa members who are in the House and the Cabinet. The only member of Phi Beta Kappa who was a candidate for President on the ballot in 2000 was Ralph Nader.
Susan Collins Maine(R) Jon Corzine New Jersey(D) Russell Feingold Wisconsin(D) Bob Graham Florida(D) Tim Johnson South Dakota(D) Jon Kyl Arizona(R) Joseph Lieberman Connecticut(D) Richard Lugar Indiana(R) Paul Sarbanes Maryland(D) Charles Schumer New York(D) Arlen Specter Pennsylvania(R) Paul Wellstone Minnesota(D)
This work is freely redistributable under the terms of the GNU General Public License as published by the Free Software Foundation. This work comes with ABSOLUTELY NO WARRANTY.