CHANCE News 4.12
(19 August 1995 to 7 September 1995)
Prepared by J. Laurie Snell, with help from William Peterson, Fuxing Hou, Jeanne Albert and Joan Snell, as part of the CHANCE Course Project supported by the National Science Foundation.

Please send comments and suggestions for articles to jlsnell@dartmouth.edu.

Back issues of Chance News and other materials for teaching a CHANCE course are available from the Chance Web Data Base http://www.geom.umn.edu/locate/chance.

If you would like to join a discussion group to share experiences using current chance events in class please send a note to jlsnell@dartmouth.edu.

I am a chemistry teacher at Round Rock High School. Each fall my students do an activity where they find the percentage of each color M&M in a bag of M&M's. Since blue M&M's are now being included I need the new color percentages for both plain and peanut. -- Debby Reddig (Austin American-Statesman, July 1, 1995)

Ans. M&M Plain: 30% Brown, 20% Yellow, 20% Red, 10% Orange, 10% Green, 10% Blue.
M&M Peanut: 20% each of brown, yellow, red, blue; 10% each of green, orange.
M&M Peanut butter and almond: 20% each of brown, yellow, red, green, blue.



We had a record number of suggestions for this Chance News and that is great! Here is an amusing article suggested by Yolanda Baumgartner.

Twin brothers beat long odds: They both bowl perfect games.
Newsday, 2 Sept. 1995, Nassau and Suffold Edition, A7
Marshal Lubine and Andrew Smith

Identical twins Jeff and Jim Lizzo both bowled perfect 300 games within minutes of each other. Jeff averaged 221 before this game and had 9 previous perfect games, but Jeff had previously rolled only one unsanctioned 300 game.

We read: "Considering that the probability for an average league bowler having a perfect game is one in 34,000, the odds against twins doing it at the same time are almost incalculable. The chances of identical twins soar to one in 385 billion - high odds, considering there are a mere 5 billion people on the planet, and not all of them bowl."

Mathematician Richard Shelp of the University of Memphis is quoted as saying "Maybe this would happen once in 10 million years. This time it happened early in the 10 million years."

The article points out several related "coincidences": Jim picked up his ball first and it had serial number 5S53552. Jeff picked up his three weeks later, and it had serial number 5S53553. Jim bowled his perfect game first, and was born first (six minutes before Jeff)


(1) How do you think the probability for an average league bowler having a perfect game was estimated?

(2) How did the authors obtain the estimate one in 385 billion for the odds that identical twins would both have a perfect game?
(If it is any help, in another place in the article they give these odds as one in 385,333,333,333)

(3) Does the fact that there are a mere 5 billion people on the earth make odds of 1 in 385 billion meaningless?

John Mathias writes about the survey carried out by the fourth grade girl who wanted to see if her allowance was too low (See Chance News 4.10.) I especially liked the observation that the average allowance could not be correct!

"Girl finds salary gap could begin at home" inspired a very good class discussion on the first day of my elementary statistics course here at Bethel College. The questions that I asked the students to consider were:

(1) What was Beth Peres' original question? What was the process through which the original question (hypothesis?) changed?

(2) What questions might you have about the survey?

[Leads into a good discussion of surveys: trustworthiness of participants? who responded? what was population? what was sample?]

(3) What is your impression of the action by the National Committee on Pay Equity?

(4) Are the conclusions of Kelly Jenkins supported by the evidence found by Beth Peres?

Finally, I've not seen it mentioned anywhere else, but the average for the seven boys as reported in this article is incorrect. It is impossible for the highest allowance to be 10 dollars, the lowest to be 3 dollars, and for the average to be 3.18 dollars.

(5) What do you make of the fact that the average computed for the boys is mistaken?

Carolyn Dobler writes: "I recently ran across an interesting article that I thought would be useful in a CHANCE course."

The association between alcohol and breast cancer: popular press coverage of research.
American Journal of Public Health, August 1995, Vol 85, No 8, 1082-1086
Florence Houn and others

This is a report of a study designed to see how the popular press does in reporting a scientific issue. As a case study, the authors chose the association between alcohol and breast cancer. This is an issue of considerable interest to the public and one where there have been conflicting studies. The authors searched for articles on this topic in scientific journals and newspapers and magazines published between January 1, 1985 and July 1, 1992. The press articles were analyzed to find which medical articles were publicized and what information was reported.

The authors found 58 scientific articles, 64 newspaper articles and 23 magazine articles related to this issue. The press cited 11 studies, or 19%, of the scientific articles. Three studies were discussed in 77% of the press stories. No scientific review articles were reported. 63% of the stories gave behavioral recommendations.

The authors discuss difficulties that the press had with different ways of representing risk, bias in the journals cited, and other issues that would interest anyone teaching a Chance course.

Science writers often remark that it is important to look at all studies but, at least on this issue, they did not practice what they preach. The authors encourage the press to give the public a broader understanding of public issues.


(1) The authors found that the way risk was conveyed was not always clear. For example, one expert is quoted as saying: "Sixty percent is a big-sounding effect, but a small number in epidemiological terms." What did this mean?

Another paper writes: "An American woman faces a 10% chance of developing cancer at some time in her life; a 50% increase in that risk would mean her chances of developing the disease were 15%. How is this?

One magazine explained the added risk of breast cancer found in a study as: "The increased risk in breast cancer found is roughly comparable to the elevated risk associated with having a first baby after age thirty rather than before twenty." Is this comparison method a useful technique for explaining risk?

(2) Given that this is an issue of concern to the public and there have been contradictory studies, why do you think none of the review articles were discussed in the popular press? (There were 15 such articles among the 58 scientific articles found.)

(3) Articles in two journals were cited in the press much more frequently than those in any other journal. Which journals were these and why were they cited so frequently?

Bob Griffith suggested the following article and discussion questions relating to it.

Using your head may not always be a good thing.
Milwaukee Journal Sentinel, Monday August 28, 1995, G8
John Fauber

A study, reported at the American Psychological Association annual convention earlier this month, found that young soccer players who frequently (more than 10 times a game) used their head to hit a soccer ball took significantly more time to finish a test measuring attention, visual searching and mental flexibility when compared with players who headed the ball infrequently or not at all.

The study reported that they also had lower average IQ scores (103 vs. 112). Their scores on the ability to hold their attention seemed to suffer the most. Ten of the 17 players who were frequent headers scored in the "impaired" range compared with only 3 of 19 who headed infrequently or not at all. Despite the lower scores, the frequent headers still scored within the normal range on their tests.

When this study was reported in the news, it caused concerned parents from all over the world to write and call the investigators and others for advice.

This article gives a number of reactions to the report, from interviewing Milwaukee parents of children who play soccer. Here is one reaction:

Gita Baruah closes her eyes and turns the other way every time her 12-year-old son tries to hit the ball with his head. Baruah has reason to be concerned. As a doctor specializing in physical medicine and rehabilitation, she works with patients recovering from brain injuries. The recent soccer study only reinforced her concerns about heading.

But she said she is not about to pull her son out of soccer, especially given the small number of subjects in the study (60) and the lack of evidence that heading causes any permanent impairment.


(1) The author of this newspaper column uses anecdotes and quotations to show that people should still take due caution in acting on the results of this study. In fact, one of the study's two authors is quoted as saying that: "More study is needed. It's too soon to pull your child out of soccer." What kind of additional evidence would be necessary to show that "heading" a soccer ball causes impairment of intellectual functions?

(2). What alternative explanations can you think of that would account for a relationship between heading a soccer ball and mental impairment? The author of this column does not explain any such reasons to readers. How would you explain them, in everyday terms?

(3). One father says that his two sons "head" soccer balls but show no intellectual impairment, perhaps because they "head" the ball correctly. By itself, does such evidence show that "heading" does not impair intellectual functions?

Tom Moore suggested this article.

Rape is still underreported.
The New York Times, 26 Aug. 1995, I19
Lynn Hecht Schafran

In passing the crime bill last year, the Democratic Congress relied on a 1992 study called "Rape in American". This study, financed by a grant from the National Institute of Drug Abuse, reported 683,000 rapes a year. Justice Department data for the same time estimated only 150,000 rapes in a year.

A new Justice Department report, covering 1992-1993 and released after the crime bill was signed, concluded that there were 500,000 incidents of sexual assault a year that included 310,000 rapes or attempted rapes. The Justice Department survey, conducted annually, interviews 100,000 people. Before 1992, the survey had asked only general questions about attacks and threats, leaving it to each interviewee to mention rape. The new survey asks whether the person was raped or sexually assaulted in the prior year and whether the assailant was a stranger, a casual acquaintance or someone the victim knew well.

Besides doubling the number of estimates of rapes and attempted rapes, the revision changed the estimate that 50% of the rapes were committed by someone known to the victim to 80%.

While the improved Justice Department survey brings their results more in line with those of the "Rape in America" survey, the "Rape in America" survey had other methodological advantages that would suggest their figures are the most accurate. The author remarks that, while discussing the methodological differences is important, the real problem is that there are too many rapes.


(1) Why do you think asking more direct questions changed significantly the estimate for the percentage of rapes that were committed by someone known to the victim?

Ruma Falk enjoyed the following paper and thought our readers would also.

Confessions of a coin flipper and would-be-instructor.
The American Statistician, Vol. 49, No. 2, May 1995, 03-209
Clifford Konold

Cliff Konold discusses the value of simulation in teaching probability. The emphasis is not on just getting an answer to a probability problem by simulation but rather on motivating students to raise problems about probability experiments and to think about how they might be solved. The basic example used to illustrate this is the following coin tossing question:

A coin is tossed until either the pattern HHHHH or HTHHT turns up. Which, if either, pattern is more likely to appear first?

Cliff describes in detail how he explored this question with a student, Kim Davis. Initially, Cliff thought the two patterns were equally likely to appear first, and the student thought that HTHHT was more likely to appear first. They discuss the reasons for their beliefs and carry out simulations. They toss a coin until HHHHH occurs and then toss it again until HTHHT occurs and record the number of tosses required for each pattern. They repeated this a series of times with Kim betting that HTHHT will require fewer tosses.

There is sufficient variation in the results to allow them to hold their different beliefs for quite a while. Eventually, the issue is resolved by simulation and by theoretical reasoning. Along the way they learn a lot about raising questions, exploring answers, and changing beliefs about the solution to a probability problem.

Of course, Cliff learns that he was wrong. However, he remarks that it is hard to chance your original conjecture and ends the article worrying about the fact that if a coin is tossed a large number of times, looking through a window covering only the last 5 tosses you should see each pattern about the same proportion of times. So why don't HHHHH and HTHHT occur about the same number of times? Of course, to fit their original question, only patterns that do not overlap are counted and this rules out more HHHHH patterns than HTHHT patterns, giving another way to see that Kim had the correct answer.

My favorite way to solve this pattern problem is due to S-Y. R. Li ("A Martingale Approach to the Study of Occurrence of Sequence Patterns in Repeated Experiments," Annals of Probability, vol. 8 (1980), pp. 1171-1176). Here is how Li solves the problem.

Suppose you want to find how long, on average, it takes to get the pattern HHH for the first time. Consider the following casino game. A coin is tossed a sequence of times and before each toss a gambler with $1 enters the game. He bets $1 that heads will come up on the next toss. If he loses, he leaves. If he wins, he bets his $2 that heads will come up on the next toss. If he loses, he leaves. If he wins, he bets his $4 that heads will come up next time. If he loses, he leaves. If he wins the pattern HHH has occurred, the game is over and he has won $8.

If the pattern HHH occurred for the first time on the kth toss, the gambler who entered on the k-1st toss won $2, the gambler who entered before the k-2nd toss won $4 and the gambler who entered before the k-3rd toss won $8. Everyone else lost but only their original dollar. Thus the casino paid out $2+$4+$8 = $14.

Now each bet of every gambler is a perfectly fair bet so the overall game is fair. Thus the expected amount the casino takes in equals the expected amount they pay out. (Here's where you use Martingale theory). The casino took in $1 from each gambler, so the expected amount the casino took in equals the expected number of tosses required to first get the pattern HHH. Since they paid out $14 this expected time is 14. If we had been trying to get the pattern HTH the next to last gambler would have bet on heads and lost, and so the casino had to pay out only $8 + $2 = 10$. Thus the expected time to reach the pattern HTH for the first time is only 10. For THH both of the last two gamblers would bet on tails and lose, so the expected time to reach THH is only 8.

Here is another tossing coins issue:

Game of chance.
New Scientist, 26 Aug. 1995, Letters. 52
Robert Beech

In a letter to the "New Scientist" (July 29,1995, 50) Andrew Chester defends the use of coins to determine random assignments. He remarked that, while he has used tables and computer-generated random numbers, as a last resort he has also used his favorite coin to allocate different experimental treatments. In describing what he means by a "good coin" he remarks:
Imagine now that I attach a sticky object, of appreciable size and mass, to one side--a 5p coin stuck onto a disc similar to a 10p coin will do. Do you now expect the outcome of repeated tossing of the "doctored coin" to be statistically "correct"? No, of course not.

In replying to this letter, Robert Beech writes:

Philosophy untested by experiment is dangerous. I make no claim to statistical significance, but, until boredom set in, I tossed such a combination 250 times. The scores were: 5p coin uppermost, 125 times: 10p coin uppermost, 125 times.


(1) J. L. Doob once remarked to the writer that "If you believe Newton's laws have anything to do with what happens when you toss a coin, then having one side of a coin heavier than the other should not change the probability of getting heads. Why do you think he said this? Is he correct?

(2) Make your own experiment and let us know the result.

Wringing "The Bell Curve".
Chance, Summer 1995, 27-36
Bernie Devlin, Stephen E. Fienberg, Daniel P. Resnick, Kathryn Roeder

This is a review of the book "The Bell Curve" by Richard Herrnstein and Charles Murray (referred to as H&M).I will refer to the authors of the review by DFR&K. In addition to a summary of what is in the book, these well-known statisticians provide their concerns about the statistical methods employed in the book. A more technical article is to appear in the "Journal of the American Statistical Society." While the amusing title might suggest otherwise, this is not the usual "Bell Curve" bashing but rather a serious attempt to evaluate the findings in this controversial book.

DFR&K start with the observation that the distribution of intelligence scores is not even approximately normal. They remark that the transformations made by the authors to make them normal are not necessary and, in fact, misleading for the statistical techniques used. They suggest that they are made to justify a catchy title for their book, to make their graphs more attractive, and to allow H&M to pretend they are explaining the complex statistics needed for their book.

Next DFR&K review the notion of general intelligence and then address the question: Is IQ inherited? They first point out that the estimates of heritability from twin studies has a bias that leads to overestimation. This is because of the environmental womb effect before birth and, in some studies, additional environment effects after birth.

They also state that geneticists have two versions of heritability called narrow sense and broad sense. DFR&K assert that twin studies report the broad sense while H&M need the narrow sense for their arguments. They went back to the original studies and calculated the narrow-sense heritability finding it more like .3 rather the .6 to .8 suggested by H&M. They give reasons why even this modest .3 may be too high an estimate.

DFR&K turn next to the regression analysis used by H&M on the NSLY data to show that IQ is a the best predictor for most outcomes of interest to them. Here they state that a detailed analyses of the appropriateness of the models used by H&M are going on by other statisticians and preliminary indications are that there are real problems with the models used. For example they remark that, "H&M omitted the variable education and when it is added it appears to swamp the effects of IQ". They also criticize H&M for not making clear the difference between association and causation.

DFR&K conclude their article with the remark that Congress should not rely on the validity of the arguments advanced in "The Bell Curve" in the considerations of Welfare and other related legislation.


(1) What do you think the distribution of raw intelligence test scores looks like?

(2) DFR&K remark that "The Bell Curve" ends with a set of social conclusions that could not have been timed any better for publication. What do you think they meant by this?

Statistically speaking, it's unbreakable.
Baltimore Sun, 5 Sept. 1995, 8C
Buster Olney

This is the first of two articles in the current news that discuss the probability that the Cal Ripken streak of playing in 2,131 consecutive major league baseball games will occur again.

The author discusses an estimate suggested by Dr. Richard Larsen, a statistics teacher at Vanderbilt and an avid baseball fan.

Larsen remarks that there are many ways one might try to compute this probability and suggests that "one way to judge the unlikelihood of Ripken's streak is to calculate the probability that today's next most durable shortstop, San Francisco Giant's shortstop Royce Clayton, would replicate the feat." Based on his performance this season, Clayton appeared in 111 of the Giants first 113 games (as of August 28) or 98.2 percent of the games. Larsen then estimates that the probability of appearing in the next 2,130 games is .982 to the 2,130th power which makes his chances less than 2 in 10,000,000,000,000,000.

Larsen remarks that this computation somewhat overstates the unlikelihood of Ripken's streak, since the streak could occur anytime during Clayton's career and could be achieved by another player. He observes that 2,130 games constitute a major portion of a player's total number of games in the majors; so, unlike a hitting streak, it could not occur at too many places during a player's career. On the second point he remarks that, even if you considered the 6 or 7 thousand position players since the game began, you would still have an astronomically small number after multiplying his estimate by 7000.


(1) What do you think of Larsen's method of estimating the probability that Ripken's streak will be beaten?

(2) Do you think that Ripken's streak was an event of probability less than 2 in 10,000,000,000,000,000?

(3) A colleague said "a model that estimates such impossible odds for an event that has happened twice in our lifetime cannot be a reasonable model." Do you agree?

(4) How would you estimate the probability that Ripken's streak will be beaten?

The last Ripken; After 2,131 articles, the statistical truth, he's impossibly good.
Washington Post, 10 Sept. 1995, C5
David Leonhardt

Leonhardt discusses Cal Ripken's streak from a more philosophical view. He starts by asking us to decide "which, among the records we most admire, are, from a statistical viewpoint, truly amazing?"
He comments that Stephen Jay Gould (The New York Review of Books v. 35, Aug. 18, 1988, p. 8-10 claimed that Dimaggio's 56-game hitting streak in 1941 is the greatest accomplishment in the history of baseball. Editors comment: Scott Berry (Chance Magazine, Fall 1991, pp 8-11) argued that the feats of Joe Dimaggio's hitting streak and Ted Williams .406 batting average are comparable.

Leonhardt then asks "where does Ripken fit in?" He comments that the independent trials model used in the calculations for the Dimaggio streak is not appropriate since injuries cause players to miss games in bunches. This leads him to suggest estimating the probability that someone will play a full season. Here he gets some help from John Rickert, a mathematician at the Rose-Hulman Institute of Technology in Indiana.

Looking at the past century, Rickert estimates that, among the serious players, about 1 in ten completes a season without missing a game. Thus the probability that such a player goes thirteen seasons without missing a game is 1/10 raised to the 13th power or about one in 10 trillion. This is much smaller than the estimates for the Dimaggio streak. (The most recent estimate of the probability of the Dimaggio streak is presented by Giles Warrack in the current Chance Magazine(Chance, Summer 1995, pp 41-51). Warrach estimates this probability of the Dimaggio streak to be about 1 in 3,700.

Leonhardt remarks that a player's attitude about the event appears to play a bigger role in a game streak than in a hitting streak. For example, a player might well voluntarily take a day off ending a game streak but would not be apt to voluntarily end a hitting streak.

He asks the question:" Is Cal the king of the record book?" and answers it by saying "Unfortunately, it turns out, there is no answer. Ripken's record is too different from the others to be evaluated in the same way."

However, as his title suggests, Leonhardt clearly thinks that Ripken is "impossibly good."


(1) What do you think are the differences between the problem of assigning a probability for a game streak record as compared to that for a streak of hits in consecutive games?

(2) Do you think it is reasonable to assume that the event "Prosser plays in every game next year" and the event "Prosser plays in every game the year after next are independent?

(3) How do the odds for Ripken's streak, given in this article, compare with those given in the previous article?

The A-level question; do recent record passes reflect a rise in standards?
Daily Mail, 22 Aug. 1995, 46
Alan Smithers

The pass rate for the A-level exam in England set a record for the 14th year in a row. In the early years the pass rate was set at 70 percent and, even when this strict quota was removed, the examiners aimed at this rate. It is suggested that in the 80's more "positive grading" was started by examiners who felt that it was not reasonable that 30% of the students who had already done well in difficult exams at age 16, should fail them at age 18.

Another reason given for the increase is that the examining boards are businesses and compete for business. "As one examination board has left out some of the harder bits in syllabuses, so the other have tended to follow suit. In mathematics, for example, mechanics has largely given way to statistics."

All this has led to an inquiry into the boards' procedures by the Office for Standards in Education.


(1) Do you think that mechanics is harder than statistics?

(2) Would you expect mathematics teachers in the Universities in England to agree that they are getting, each year, better prepared students?

US News & World Report's annual college rankings under attack.
Boston Globe, 8 Sept. 1995, 5
Alice Dembner

The new version of the U.S. News & World Report's annual college rankings is out. Critics argue that this ranking is arbitrary based partly on subjective judgments and partly on imperfect measures such as SAT scores.

Last Spring the "Wall Street Journal" exposed blatant inflation of SAT scores, graduation rates and other data by colleges attempting to improve their ranking and image (See Chance News 4.06). Under pressure from the U.S. News, about 50 schools changed the way they reported SAT scores. This led to lower scores for some schools. For example, Northeastern reported an SAT average of 942, rather than 996, last year. About 50 others, who were not reporting as requested, held out and were chastised in footnotes and saw their ratings drop.

As usual, Harvard led the Universities, followed by Princeton and Yale tied for second and then Stanford and M.I.T. Amherst topped the liberal arts colleges.


(1) How important were ratings like those of the U.S. News in your choice of college?

(2) Do you think that it is reasonable to try to give a ranking to colleges and universities?

(3) In a new category, teaching, Dartmouth was ranked highest. How do you think they measured good teaching?

Drug combination may offer abortion option.
Los Angeles Times, 31 Aug. 1995, A1
Terence Monmaney

A study, reported in the current "New England Journal of Medicine", showed that a combinations of two drugs was 96% effective in inducing abortion in the first nine weeks of pregnancy. The drugs are methotrexate, an anti-cancer drug that is toxic to fast-multiplying tissue, including fetal tissue, and misoprostol, an anti-ulcer drug that causes uterine contractions, which can expel an embryo. Both drugs are widely available, having been approved by the FDA for other purposes. This agency does not stop physicians from prescribing drugs for reasonable "off-label" purposes.

Those who favor abortion approve of methods that allow women to have abortions accomplished through visits to their family doctors or internist as well as to their ob/gyn. Anti-abortion forces are very critical of the drug approach.

The author of the study, Dr. Richard Hausknecht, reported last fall that he had performed more than a hundred abortions using the two drugs. Some doctors criticized him for not doing this under the conventions of a study with proper overseers. The study Hausknecht now reports on was overseen by the Mt. Sinai School of Medicine and the FDA. Hausknecht administered the drugs to 178 healthy women who had been pregnant for no more than nine weeks. 88% of the women aborted within 24 hours. Eight percent required a second dose of misoprostol several days later, and the 4% who did not abort even after a repeated dose underwent a surgical abortion.

The Deputy Commission of the FDA, Mary Penderegast, said: "We encourage women and their doctors who want to administer the drugs for abortion to do so in a scientific trial. This is still experimental."

The life-saver docs ignore; the miraculous properties of aspirin.
Daily Mirror, 28 Aug, 1995, 17
Jill Palmer

The author reports that Professor Richard Peto of Oxford University says that aspirin is not regularly prescribed, despite drug trials which showed it improved survival rates of stroke and heart-attack victims by as much as a quarter. Peto headed the largest such study.

Peto feels that if aspirin were 100 times its present price it would be heavily promoted and its use much more widely encouraged.

Small doses of aspirin have been found to benefit other conditions, including angina, mini-strokes, irregular heartbeat and diseases of the veins and colon cancer.

Doctors continue to warn that healthy people should not take aspirin simply as a precautionary measure because of the possibility of rare side effects such as internal bleeding.

In West, fight on speed laws has few limits.
The Boston Globe, 22 August 1995, 1
Brian McGrory

The article focuses on Montana, where, if Congress decides next month to repeal the 65 mph national speed limit, state law allows daytime speed limits to be completely abolished. (Nighttime limits would remain at 65). Chuck Hurley, senior vice president with the Insurance Institute for Highway Safety in Arlington, VA, insists that increased speeds will bring increased highway fatalities. He states that about 400 deaths a year are attributable to the 1987 increase in rural speed limits from 55 mph to 65 mph; this death toll represents an increase of 15-20% from the period before the law.

Hurley's figures are presumably national. A data graphic accompanying the article shows annual highway deaths for Montana for the years since 1974, as reported by the Montana Highway Patrol. The first and last years' totals are labeled exactly; the rest of the figures below are my best reading of the graph (vertical axis spacing is 50):
1972   395 
1973 325 1984 240
1974 300 1985 225
1975 295 1986 225
1976 300 1987 235
1977 320 1988 200
1978 275 1989 185
1979 330 1990 220
1980 325 1991 200
1981 340 1992 190
1982 255 1993 195
1983 285 1994 202
During 1972-73, there was no speed limit; from 1974-1987 there was a 55 mph limit; since 1987 the limit has been 65 mph. Enforcement is not strict: a speeding ticket carries only a $5 fine, and by state law insurance companies are not allowed to use speeding violations in setting rates.


(1) Col. Craig Reap, head of the Montana highway patrol, says studies have shown that the average car on a Montana highway travels 71.6 mph. How do you think such an estimate is made?

(2) Mr. Hurley is quoted as saying: "There have been some remarkable gains in highway safety in the last few years. We've had a drunk driving campaign, more seat belt laws and use, child safety seats, air bags. A lot of these hard-won gains are being given up to higher speeds, as if speed was a safe thrill." How could these gains be documented if fatalities are in fact up?

(3) Compare the pre-1987 years with the post-1987 years in Montana. Is lax enforcement a boon to safety? How do you think Mr. Hurley would explain these figures?


SAT scores rise strongly after test is overhauled.
Washington Post, 24 Aug. 1995, B1
Steve Stecklow

Scores on the SAT had the largest increase in a decade this year. The average verbal score increased 5 points to 425 while the average math score rose 3 points to 482.

The new average for the high school classes of 1995 are the first results from the "New SAT-1: Reasoning Test," which represents a radical overhaul of the SAT exams. The new SAT has fewer questions, longer reading passages, fewer multiple-choice math questions and no antonym section in the verbal part. Students have an extra 30 minutes to take the test and are encouraged to use calculators during the math section.

The College Board officials say that they scaled scoring on the new test in such a way that the results are comparable with prior years. They attribute the increase to students working harder and taking harder courses. Others have different explanations. The U.S. Department of Education in its 1995 report asserts that only the math scores on the new SAT are comparable with past years and that the big changes were in the verbal part of the test. Most of the increase came from increased scores for students near the top of their class. The Board says that these are the ones working harder while critics say top students did not have time to show their stuff on the old exam, but with the extra half hour they do. Average scores for the class of 1995 for the American College Test showed no increase at all.

The coaching companies say that the increase is due to their efforts. Kaplan, for example, saw a 50% increase in its SAT business when the test changed. Presumably the students were more worried than usual because they were facing a new kind of exam.


(1) Which of the reasons given for the increase do you think is most likely to explain the increase? What other reasons can you think of for the increase?

(2) How can scores on the new SAT-I be compared to the scores over the last ten years on the SAT?

(3) What will happen to such comparisons when the scores are recentered for the high school class of 96?

Ask Marilyn
Parade Magazine, 20 August 1995, 16
Marilyn vos Savant

A reader asks:

After a wonderful life on earth, you find yourself standing at the Pearly Gates in front of St. Peter. He tells you that, to get inside, you must predict the outcome of the next spin of a black/red roulette wheel. The last 20 spins have ended in black. Although I know you're correct when you say the odds are still 50/50 on each spin, I'd side with probability here and choose red. Which would you choose?

Marilyn explains (implicitly assuming a Bernoulli trials process with p = 0.5) that probability in fact does not point to red; the chances are equal on the next spin. However, she adds, knowing that, she would pick black. If we really are looking at independent 50-50 trials, then the choice doesn't matter. On the other hand, if there really is information to be gleaned from the history so far, it would seem to indicate a bias towards black.


(1) What would you do?

Survey raises estimate on toddler vaccinations.
The New York Times, 25 August 1995, A14
Warren E. Leary

A survey by the Federal Center for Disease Control in Atlanta has found that, nationally, 75% of children 19-35 months old are properly vaccinated. The data are based on more than 25,000 telephone interviews from April to December 1994 in which parents were questioned and immunization records verified. The interviews resulted from more than 1.2 million telephone numbers being called.

Previous estimates of immunization rates were based on data from the National Health Interview Survey, an ongoing study based on household interviews. Those data had suggested a national rate of 67%. Dr. Walter Ornstein of the CDC said that much of the rise found in the new estimate can be attributed to improved sampling techniques.


(1) Improved technique?! Getting 25,000 interviews from 1.2 million phone sounds like an abysmal response rate. What do you think is going on here?

(2) How do you think Dr. Ornstein knows the larger estimate is due to sampling technique and not an actual surge in immunizations?

Health Sense: The day that many women dread the most.
The Boston Globe, 14 August 1995, 25
Judy Foreman

"The day" is the yearly mammogram screening for breast cancer. The article presents statistical evidence intended to help women cope with the anxiety surrounding this event. Dr. William Black, a radiologist at Dartmouth-Hitchcock Medical Center, sent a questionnaire to 145 women in their 40's and documented a wide variety of hopes and fears. Women overestimated their risk of dying of breast cancer in the next ten years more than twenty-fold, and they thought that screening was six times more effective at reducing risk than it actually is.

Cindy Pearson of the National Women's Health Center says the anxiety is worth it: regular screenings for women over 50 cut the chance of dying of breast cancer by one-third. She says: "Nothing else we know how to do saves that many lives."

The article encourages women to "put statistics in their place...remember that the numbers, overall, are on your side." As for the famous figure that breast cancer will strike one woman in eight, the article says that many women interpret this to mean that, if they are sitting in the waiting room with seven other women, one will get bad news. Noting that the one-in-eight is a cumulative lifetime risk, the article suggests looking at the numbers in a less threatening way, in terms of the number of new cases diagnosed each year. For example, a woman between 50 and 54 has one chance in 454 of being diagnosed with breast cancer in the next year (the risk increases slightly each year). A complete chart of these risks appears in the article:
     Age           One-in-
40-44 770
45-49 625
50-54 454
55-59 384
60-64 303
65-69 256
70-74 238
75-79 217
80-84 222
A companion chart shows the cumulative risk, for a woman age 20 today, of developing breast cancer by various later points in life:
By the age of:   Number diagnosed with cancer:   
25 1 in 19,608
30 1 in 2,525
35 1 in 622
40 1 in 217
45 1 in 93
50 1 in 50
55 1 in 33
60 1 in 24
65 1 in 17
70 1 in 14
75 1 in 11
80 1 in 10
85 1 in 9
Lifetime 1 in 8
Another set of figures is presented for putting the day of the exam into perspective. Of 1000 women--all ages, never previously screened--waiting in a room to have mammograms, about 70 will be recalled because the radiologist notes something of concern and wants a second look. Of these 70 women, 50 will be found healthy, and the remaining 20 will undergo biopsies. Five or six of these biopsies will find cancer.


(1) The final set of figures indicates that of 1000 women never previously screened, only about 6 will have breast cancer. Dr. Norman Sadowsky, director of breast imaging at Faulkner Hospital, says that this figure drops to 2 per thousand among women screened every year. Does this mean that screening actually prevents breast cancer? If not, what does it mean?

(2) In Chance News 4.08, it was reported that 1 in 41 chance was given for a woman without breast cancer at age 40 to have the disease by age 50. Can you reconcile this with the above 1 in 454 annual risk given for ages 50-54 and the 1 in 384 for ages 55-59?

(3) The first table in the article gives a 1 in 545 annual risk for ages 50-54. The second table gives cumulative risks of 1 in 50 risk by age 50 and a 1 in 33 risk by age 55. Can you reconcile these figures?

Who's number 1 in college football?...and how might we decide?
Chance Magazine, Summer 1995, 7-14
Hal S. Stern

Stern remarks that college football is perhaps the only major sport that does not include a unique champion via competition each year. There are two polls, one a survey of journalists and the other a survey of coaches. Sometimes they agree and sometimes they don't. Last season both preferred Nebraska to Penn State but others disagreed with this choice.

Stern suggests that statistics should be able to provide some insight into obtaining a rating. He observes that it is necessary first to decide what you are trying to measure. For example, are you trying to decide which team would win a playoff or are you trying to decide which team has the best overall record in the season under consideration? Teams with weak opponents might have great records but might not do well in a playoff with other teams.

Stern suggests trying a least squares rating. Here is the model that Stern considers. We want to come up with a rating x(i) of the ith team and a home team advantage H so that when i is the home team and j the away team the expected difference in scores of the home and away teams is d = H + x(i) - y(j). If the actual score when these teams played was y, then (y-d)^2 is a measure of the error caused by using the rating to estimate the difference in scores. Summing this over all games gives the total error. The rating function x(i) and the home team advantage H are chosen to minimize the total error over all games played during the season. If last year's college scores are used to determine this rating, then Penn State is the top team and Nebraska drops to fourth place with Florida and Florida State second and third respectively.

Stern mentions a number of possible modifications. For example, to reduce the effect of large differences in scores he considers a modified model where differences in scores greater than 20 are replaced by 20 + the square root of the excess of 20. A number of other refinements are discussed.

Stern recommends testing a particular rating system by seeing how it predicts the outcomes of future games. He tested the methods that he proposed using data from professional football. He predicted scores in the second half of the season using previous scores in the season to determine the rating. He finds that first model is pretty good, predicting the winner 63.6 percent of the time. Most of the modifications he proposed decreases the predictability of the rating method. The Las Vegas spread predicted the winner 66.3% of the time even though this is not the purpose of the spread --- its purpose being to keep the money bet on the two teams more or less equal.


(1) Why do you think the Las Vegas does a better job of predicting the winner than the ratings schemes proposed by Stern?

Please send comments and suggestions for articles to jlsnell@dartmouth.edu.
CHANCE News 4.12
(19 August 1995 to 7 September 1995)