Prepared by J. Laurie Snell, with help from William Peterson, Fuxing Hou, Jeanne
Albert and Joan Snell, as part of the CHANCE Course Project supported by the National
CHANCE News 4.12
(19 August 1995 to 7 September 1995)
Please send comments and suggestions for articles to firstname.lastname@example.org.
Back issues of Chance News and other materials for teaching a CHANCE course are available
from the Chance Web Data Base http://www.geom.umn.edu/locate/chance.
If you would like to join a discussion group to share experiences using current chance
events in class please send a note to email@example.com.
I am a chemistry teacher at Round Rock High School.
Each fall my students do an activity where they find
the percentage of each color M&M in a bag of M&M's.
Since blue M&M's are now being included I need the
new color percentages for both plain and peanut.
-- Debby Reddig (Austin American-Statesman, July 1, 1995)
Ans. M&M Plain: 30% Brown, 20% Yellow, 20% Red, 10% Orange,
10% Green, 10% Blue.
M&M Peanut: 20% each of brown, yellow, red, blue; 10% each of green, orange.
M&M Peanut butter and almond: 20% each of brown, yellow, red, green, blue.
We had a record number of suggestions for this Chance News and that is great! Here
is an amusing article suggested by Yolanda Baumgartner.
Twin brothers beat long odds: They both bowl perfect games.
Newsday, 2 Sept. 1995, Nassau and Suffold Edition, A7
Marshal Lubine and Andrew Smith
Identical twins Jeff and Jim Lizzo both bowled perfect 300 games within minutes of
each other. Jeff averaged 221 before this game and had 9 previous perfect games,
but Jeff had previously rolled only one unsanctioned 300 game.
We read: "Considering that the probability for an average league bowler having a perfect
game is one in 34,000, the odds against twins doing it at the same time are almost
incalculable. The chances of identical twins soar to one in 385 billion - high odds, considering there are a mere 5 billion people on the planet, and not all of them
Mathematician Richard Shelp of the University of Memphis is quoted as saying "Maybe
this would happen once in 10 million years. This time it happened early in the 10
The article points out several related "coincidences": Jim picked up his ball first
and it had serial number 5S53552. Jeff picked up his three weeks later, and it had
serial number 5S53553. Jim bowled his perfect game first, and was born first (six
minutes before Jeff)
(1) How do you think the probability for an average league bowler having a perfect
game was estimated?
(2) How did the authors obtain the estimate one in 385 billion for the odds that
identical twins would both have a perfect game?
(If it is any help, in another place in the article they give these odds as one in
(3) Does the fact that there are a mere 5 billion people on the earth make odds of
1 in 385 billion meaningless?
John Mathias writes about the survey carried out by the fourth grade girl who wanted
to see if her allowance was too low (See Chance News 4.10.) I especially liked the
observation that the average allowance could not be correct!
"Girl finds salary gap could begin at home" inspired a very good class discussion
on the first day of my elementary statistics course here at Bethel College. The
questions that I asked the students to consider were:
(1) What was Beth Peres' original question? What was the process through which the
original question (hypothesis?) changed?
(2) What questions might you have about the survey?
[Leads into a good discussion of surveys: trustworthiness of participants? who responded?
what was population? what was sample?]
(3) What is your impression of the action by the National Committee on Pay Equity?
(4) Are the conclusions of Kelly Jenkins supported by the evidence found by Beth
Finally, I've not seen it mentioned anywhere else, but the average for the seven
boys as reported in this article is incorrect. It is impossible for the highest
allowance to be 10 dollars, the lowest to be 3 dollars, and for the average to be
(5) What do you make of the fact that the average computed for the boys is mistaken?
Carolyn Dobler writes: "I recently ran across an interesting article that I thought
would be useful in a CHANCE course."
The association between alcohol and breast cancer: popular press coverage of research.
American Journal of Public Health, August 1995, Vol 85, No 8, 1082-1086
Florence Houn and others
This is a report of a study designed to see how the popular press does in reporting
a scientific issue. As a case study, the authors chose the association between alcohol
and breast cancer. This is an issue of considerable interest to the public and one
where there have been conflicting studies. The authors searched for articles on this
topic in scientific journals and newspapers and magazines published between January
1, 1985 and July 1, 1992. The press articles were analyzed to find which medical
articles were publicized and what information was reported.
The authors found 58 scientific articles, 64 newspaper articles and 23 magazine articles
related to this issue. The press cited 11 studies, or 19%, of the scientific articles.
Three studies were discussed in 77% of the press stories. No scientific review articles were reported. 63% of the stories gave behavioral recommendations.
The authors discuss difficulties that the press had with different ways of representing
risk, bias in the journals cited, and other issues that would interest anyone teaching
a Chance course.
Science writers often remark that it is important to look at all studies but, at least
on this issue, they did not practice what they preach. The authors encourage the
press to give the public a broader understanding of public issues.
(1) The authors found that the way risk was conveyed was not always clear. For example,
one expert is quoted as saying: "Sixty percent is a big-sounding effect, but a small
number in epidemiological terms." What did this mean?
Another paper writes: "An American woman faces a 10% chance of developing cancer at
some time in her life; a 50% increase in that risk would mean her chances of developing
the disease were 15%. How is this?
One magazine explained the added risk of breast cancer found in a study as: "The increased
risk in breast cancer found is roughly comparable to the elevated risk associated
with having a first baby after age thirty rather than before twenty." Is this comparison method a useful technique for explaining risk?
(2) Given that this is an issue of concern to the public and there have been contradictory
studies, why do you think none of the review articles were discussed in the popular
press? (There were 15 such articles among the 58 scientific articles found.)
(3) Articles in two journals were cited in the press much more frequently than those
in any other journal. Which journals were these and why were they cited so frequently?
Bob Griffith suggested the following article and discussion questions relating to
Using your head may not always be a good thing.
Milwaukee Journal Sentinel, Monday August 28, 1995, G8
A study, reported at the American Psychological Association annual convention earlier
this month, found that young soccer players who frequently (more than 10 times a
game) used their head to hit a soccer ball took significantly more time to finish
a test measuring attention, visual searching and mental flexibility when compared with players
who headed the ball infrequently or not at all.
The study reported that they also had lower average IQ scores (103 vs. 112). Their
scores on the ability to hold their attention seemed to suffer the most. Ten of the
17 players who were frequent headers scored in the "impaired" range compared with
only 3 of 19 who headed infrequently or not at all. Despite the lower scores, the frequent
headers still scored within the normal range on their tests.
When this study was reported in the news, it caused concerned parents from all over
the world to write and call the investigators and others for advice.
This article gives a number of reactions to the report, from interviewing Milwaukee
parents of children who play soccer. Here is one reaction:
Gita Baruah closes her eyes and turns the other way every time her 12-year-old son
tries to hit the ball with his head. Baruah has reason to be concerned. As a doctor
specializing in physical medicine and rehabilitation, she works with patients recovering from brain injuries. The recent soccer study only reinforced her concerns about heading.
But she said she is not about to pull her son out of soccer, especially given the
small number of subjects in the study (60) and the lack of evidence that heading
causes any permanent impairment.
(1) The author of this newspaper column uses anecdotes and quotations to show that
people should still take due caution in acting on the results of this study. In
fact, one of the study's two authors is quoted as saying that: "More study is needed.
It's too soon to pull your child out of soccer." What kind of additional evidence would
be necessary to show that "heading" a soccer ball causes impairment of intellectual
(2). What alternative explanations can you think of that would account for a relationship
between heading a soccer ball and mental impairment? The author of this column does
not explain any such reasons to readers. How would you explain them, in everyday
(3). One father says that his two sons "head" soccer balls but show no intellectual
impairment, perhaps because they "head" the ball correctly. By itself, does such
evidence show that "heading" does not impair intellectual functions?
Tom Moore suggested this article.
Rape is still underreported.
The New York Times, 26 Aug. 1995, I19
Lynn Hecht Schafran
In passing the crime bill last year, the Democratic Congress relied on a 1992 study
called "Rape in American". This study, financed by a grant from the National Institute
of Drug Abuse, reported 683,000 rapes a year. Justice Department data for the same
time estimated only 150,000 rapes in a year.
A new Justice Department report, covering 1992-1993 and released after the crime bill
was signed, concluded that there were 500,000 incidents of sexual assault a year
that included 310,000 rapes or attempted rapes. The Justice Department survey, conducted
annually, interviews 100,000 people. Before 1992, the survey had asked only general
questions about attacks and threats, leaving it to each interviewee to mention rape.
The new survey asks whether the person was raped or sexually assaulted in the prior
year and whether the assailant was a stranger, a casual acquaintance or someone the
victim knew well.
Besides doubling the number of estimates of rapes and attempted rapes, the revision
changed the estimate that 50% of the rapes were committed by someone known to the
victim to 80%.
While the improved Justice Department survey brings their results more in line with
those of the "Rape in America" survey, the "Rape in America" survey had other methodological
advantages that would suggest their figures are the most accurate. The author remarks that, while discussing the methodological differences is important, the real
problem is that there are too many rapes.
(1) Why do you think asking more direct questions changed significantly the estimate
for the percentage of rapes that were committed by someone known to the victim?
Ruma Falk enjoyed the following paper and thought our readers would also.
Confessions of a coin flipper and would-be-instructor.
The American Statistician, Vol. 49, No. 2, May 1995, 03-209
Cliff Konold discusses the value of simulation in teaching probability. The emphasis
is not on just getting an answer to a probability problem by simulation but rather
on motivating students to raise problems about probability experiments and to think
about how they might be solved. The basic example used to illustrate this is the following
coin tossing question:
A coin is tossed until either the pattern HHHHH or HTHHT turns up. Which, if either,
pattern is more likely to appear first?
Cliff describes in detail how he explored this question with a student, Kim Davis.
Initially, Cliff thought the two patterns were equally likely to appear first, and
the student thought that HTHHT was more likely to appear first. They discuss the
reasons for their beliefs and carry out simulations. They toss a coin until HHHHH occurs
and then toss it again until HTHHT occurs and record the number of tosses required
for each pattern. They repeated this a series of times with Kim betting that HTHHT
will require fewer tosses.
There is sufficient variation in the results to allow them to hold their different
beliefs for quite a while. Eventually, the issue is resolved by simulation and by
theoretical reasoning. Along the way they learn a lot about raising questions, exploring
answers, and changing beliefs about the solution to a probability problem.
Of course, Cliff learns that he was wrong. However, he remarks that it is hard to
chance your original conjecture and ends the article worrying about the fact that
if a coin is tossed a large number of times, looking through a window covering only
the last 5 tosses you should see each pattern about the same proportion of times. So why
don't HHHHH and HTHHT occur about the same number of times? Of course, to fit their
original question, only patterns that do not overlap are counted and this rules out
more HHHHH patterns than HTHHT patterns, giving another way to see that Kim had the correct
My favorite way to solve this pattern problem is due to S-Y. R. Li ("A Martingale
Approach to the Study of Occurrence of Sequence Patterns in Repeated Experiments,"
Annals of Probability, vol. 8 (1980), pp. 1171-1176). Here is how Li solves the problem.
Suppose you want to find how long, on average, it takes to get the pattern HHH for
the first time. Consider the following casino game. A coin is tossed a sequence of
times and before each toss a gambler with $1 enters the game. He bets $1 that heads
will come up on the next toss. If he loses, he leaves. If he wins, he bets his $2 that
heads will come up on the next toss. If he loses, he leaves. If he wins, he bets
his $4 that heads will come up next time. If he loses, he leaves. If he wins the
pattern HHH has occurred, the game is over and he has won $8.
If the pattern HHH occurred for the first time on the kth toss, the gambler who entered
on the k-1st toss won $2, the gambler who entered before the k-2nd toss won $4 and
the gambler who entered before the k-3rd toss won $8. Everyone else lost but only
their original dollar. Thus the casino paid out $2+$4+$8 = $14.
Now each bet of every gambler is a perfectly fair bet so the overall game is fair.
Thus the expected amount the casino takes in equals the expected amount they pay
out. (Here's where you use Martingale theory). The casino took in $1 from each gambler,
so the expected amount the casino took in equals the expected number of tosses required
to first get the pattern HHH. Since they paid out $14 this expected time is 14.
If we had been trying to get the pattern HTH the next to last gambler would have
bet on heads and lost, and so the casino had to pay out only $8 + $2 = 10$. Thus the expected
time to reach the pattern HTH for the first time is only 10. For THH both of the
last two gamblers would bet on tails and lose, so the expected time to reach THH
is only 8.
Here is another tossing coins issue:
Game of chance.
New Scientist, 26 Aug. 1995, Letters. 52
In a letter to the "New Scientist" (July 29,1995, 50) Andrew Chester defends the use
of coins to determine random assignments. He remarked that, while he has used tables
and computer-generated random numbers, as a last resort he has also used his favorite
coin to allocate different experimental treatments. In describing what he means by
a "good coin" he remarks:
Imagine now that I attach a sticky object, of
appreciable size and mass, to one side--a 5p
coin stuck onto a disc similar to a 10p coin
will do. Do you now expect the outcome of
repeated tossing of the "doctored coin" to be
statistically "correct"? No, of course not.
In replying to this letter, Robert Beech writes:
Philosophy untested by experiment is dangerous.
I make no claim to statistical significance, but,
until boredom set in, I tossed such a combination
250 times. The scores were: 5p coin uppermost, 125
times: 10p coin uppermost, 125 times.
(1) J. L. Doob once remarked to the writer that "If you believe Newton's laws have
anything to do with what happens when you toss a coin, then having one side of a
coin heavier than the other should not change the probability of getting heads.
Why do you think he said this? Is he correct?
(2) Make your own experiment and let us know the result.
Wringing "The Bell Curve".
Chance, Summer 1995, 27-36
Bernie Devlin, Stephen E. Fienberg, Daniel P. Resnick, Kathryn Roeder
This is a review of the book "The Bell Curve" by Richard Herrnstein and Charles Murray
(referred to as H&M).I will refer to the authors of the review by DFR&K. In addition
to a summary of what is in the book, these well-known statisticians provide their
concerns about the statistical methods employed in the book. A more technical article
is to appear in the "Journal of the American Statistical Society." While the amusing
title might suggest otherwise, this is not the usual "Bell Curve" bashing but rather
a serious attempt to evaluate the findings in this controversial book.
DFR&K start with the observation that the distribution of intelligence scores is not
even approximately normal. They remark that the transformations made by the authors
to make them normal are not necessary and, in fact, misleading for the statistical
techniques used. They suggest that they are made to justify a catchy title for their
book, to make their graphs more attractive, and to allow H&M to pretend they are
explaining the complex statistics needed for their book.
Next DFR&K review the notion of general intelligence and then address the question:
Is IQ inherited? They first point out that the estimates of heritability from twin
studies has a bias that leads to overestimation. This is because of the environmental
womb effect before birth and, in some studies, additional environment effects after
They also state that geneticists have two versions of heritability called narrow sense
and broad sense. DFR&K assert that twin studies report the broad sense while H&M
need the narrow sense for their arguments. They went back to the original studies
and calculated the narrow-sense heritability finding it more like .3 rather the .6 to
.8 suggested by H&M. They give reasons why even this modest .3 may be too high an
DFR&K turn next to the regression analysis used by H&M on the NSLY data to show that
IQ is a the best predictor for most outcomes of interest to them. Here they state
that a detailed analyses of the appropriateness of the models used by H&M are going
on by other statisticians and preliminary indications are that there are real problems
with the models used. For example they remark that, "H&M omitted the variable education
and when it is added it appears to swamp the effects of IQ". They also criticize
H&M for not making clear the difference between association and causation.
DFR&K conclude their article with the remark that Congress should not rely on the
validity of the arguments advanced in "The Bell Curve" in the considerations of Welfare
and other related legislation.
(1) What do you think the distribution of raw intelligence test scores looks like?
(2) DFR&K remark that "The Bell Curve" ends with a set of social conclusions that
could not have been timed any better for publication. What do you think they meant
Statistically speaking, it's unbreakable.
Baltimore Sun, 5 Sept. 1995, 8C
This is the first of two articles in the current news that discuss the probability
that the Cal Ripken streak of playing in 2,131 consecutive major league baseball
games will occur again.
The author discusses an estimate suggested by Dr. Richard Larsen, a statistics teacher
at Vanderbilt and an avid baseball fan.
Larsen remarks that there are many ways one might try to compute this probability
and suggests that "one way to judge the unlikelihood of Ripken's streak is to calculate
the probability that today's next most durable shortstop, San Francisco Giant's shortstop Royce Clayton, would replicate the feat." Based on his performance this season,
Clayton appeared in 111 of the Giants first 113 games (as of August 28) or 98.2 percent
of the games. Larsen then estimates that the probability of appearing in the next
2,130 games is .982 to the 2,130th power which makes his chances less than 2 in 10,000,000,000,000,000.
Larsen remarks that this computation somewhat overstates the unlikelihood of Ripken's
streak, since the streak could occur anytime during Clayton's career and could be
achieved by another player. He observes that 2,130 games constitute a major portion
of a player's total number of games in the majors; so, unlike a hitting streak, it could
not occur at too many places during a player's career. On the second point he remarks
that, even if you considered the 6 or 7 thousand position players since the game
began, you would still have an astronomically small number after multiplying his estimate
(1) What do you think of Larsen's method of estimating the probability that Ripken's
streak will be beaten?
(2) Do you think that Ripken's streak was an event of probability less than 2 in
(3) A colleague said "a model that estimates such impossible odds for an event that
has happened twice in our lifetime cannot be a reasonable model." Do you agree?
(4) How would you estimate the probability that Ripken's streak will be beaten?
The last Ripken; After 2,131 articles, the statistical truth, he's impossibly good.
Washington Post, 10 Sept. 1995, C5
Leonhardt discusses Cal Ripken's streak from a more philosophical view. He starts
by asking us to decide "which, among the records we most admire, are, from a statistical
viewpoint, truly amazing?"
He comments that Stephen Jay Gould (The New York Review of Books v. 35, Aug. 18, 1988,
p. 8-10 claimed that Dimaggio's 56-game hitting streak in 1941 is the greatest accomplishment
in the history of baseball. Editors comment: Scott Berry (Chance Magazine, Fall 1991, pp 8-11) argued that the feats of Joe Dimaggio's hitting streak and Ted
Williams .406 batting average are comparable.
Leonhardt then asks "where does Ripken fit in?" He comments that the independent
trials model used in the calculations for the Dimaggio streak is not appropriate
since injuries cause players to miss games in bunches. This leads him to suggest
estimating the probability that someone will play a full season. Here he gets some help from
John Rickert, a mathematician at the Rose-Hulman Institute of Technology in Indiana.
Looking at the past century, Rickert estimates that, among the serious players, about
1 in ten completes a season without missing a game. Thus the probability that such
a player goes thirteen seasons without missing a game is 1/10 raised to the 13th
power or about one in 10 trillion. This is much smaller than the estimates for the Dimaggio
streak. (The most recent estimate of the probability of the Dimaggio streak is presented
by Giles Warrack in the current Chance Magazine(Chance, Summer 1995, pp 41-51).
Warrach estimates this probability of the Dimaggio streak to be about 1 in 3,700.
Leonhardt remarks that a player's attitude about the event appears to play a bigger
role in a game streak than in a hitting streak. For example, a player might well
voluntarily take a day off ending a game streak but would not be apt to voluntarily
end a hitting streak.
He asks the question:" Is Cal the king of the record book?" and answers it by saying
"Unfortunately, it turns out, there is no answer. Ripken's record is too different
from the others to be evaluated in the same way."
However, as his title suggests, Leonhardt clearly thinks that Ripken is "impossibly
(1) What do you think are the differences between the problem of assigning a probability
for a game streak record as compared to that for a streak of hits in consecutive
(2) Do you think it is reasonable to assume that the event "Prosser plays in every
game next year" and the event "Prosser plays in every game the year after next are
(3) How do the odds for Ripken's streak, given in this article, compare with those
given in the previous article?
The A-level question; do recent record passes reflect a rise in standards?
Daily Mail, 22 Aug. 1995, 46
The pass rate for the A-level exam in England set a record for the 14th year in a
row. In the early years the pass rate was set at 70 percent and, even when this strict
quota was removed, the examiners aimed at this rate. It is suggested that in the
80's more "positive grading" was started by examiners who felt that it was not reasonable
that 30% of the students who had already done well in difficult exams at age 16,
should fail them at age 18.
Another reason given for the increase is that the examining boards are businesses
and compete for business. "As one examination board has left out some of the harder
bits in syllabuses, so the other have tended to follow suit. In mathematics, for
example, mechanics has largely given way to statistics."
All this has led to an inquiry into the boards' procedures by the Office for Standards
(1) Do you think that mechanics is harder than statistics?
(2) Would you expect mathematics teachers in the Universities in England to agree
that they are getting, each year, better prepared students?
US News & World Report's annual college rankings under attack.
Boston Globe, 8 Sept. 1995, 5
The new version of the U.S. News & World Report's annual college rankings is out.
Critics argue that this ranking is arbitrary based partly on subjective judgments
and partly on imperfect measures such as SAT scores.
Last Spring the "Wall Street Journal" exposed blatant inflation of SAT scores, graduation
rates and other data by colleges attempting to improve their ranking and image (See
Chance News 4.06). Under pressure from the U.S. News, about 50 schools changed the way they reported SAT scores. This led to lower scores for some schools. For example,
Northeastern reported an SAT average of 942, rather than 996, last year. About 50
others, who were not reporting as requested, held out and were chastised in footnotes and saw their ratings drop.
As usual, Harvard led the Universities, followed by Princeton and Yale tied for second
and then Stanford and M.I.T. Amherst topped the liberal arts colleges.
(1) How important were ratings like those of the U.S. News in your choice of college?
(2) Do you think that it is reasonable to try to give a ranking to colleges and universities?
(3) In a new category, teaching, Dartmouth was ranked highest. How do you think
they measured good teaching?
Drug combination may offer abortion option.
Los Angeles Times, 31 Aug. 1995, A1
A study, reported in the current "New England Journal of Medicine", showed that a
combinations of two drugs was 96% effective in inducing abortion in the first nine
weeks of pregnancy. The drugs are methotrexate, an anti-cancer drug that is toxic
to fast-multiplying tissue, including fetal tissue, and misoprostol, an anti-ulcer drug that
causes uterine contractions, which can expel an embryo. Both drugs are widely available,
having been approved by the FDA for other purposes. This agency does not stop physicians from prescribing drugs for reasonable "off-label" purposes.
Those who favor abortion approve of methods that allow women to have abortions accomplished
through visits to their family doctors or internist as well as to their ob/gyn. Anti-abortion
forces are very critical of the drug approach.
The author of the study, Dr. Richard Hausknecht, reported last fall that he had performed
more than a hundred abortions using the two drugs. Some doctors criticized him for
not doing this under the conventions of a study with proper overseers. The study
Hausknecht now reports on was overseen by the Mt. Sinai School of Medicine and the
FDA. Hausknecht administered the drugs to 178 healthy women who had been pregnant
for no more than nine weeks. 88% of the women aborted within 24 hours. Eight percent
required a second dose of misoprostol several days later, and the 4% who did not abort
even after a repeated dose underwent a surgical abortion.
The Deputy Commission of the FDA, Mary Penderegast, said: "We encourage women and
their doctors who want to administer the drugs for abortion to do so in a scientific
trial. This is still experimental."
The life-saver docs ignore; the miraculous properties of aspirin.
Daily Mirror, 28 Aug, 1995, 17
The author reports that Professor Richard Peto of Oxford University says that aspirin
is not regularly prescribed, despite drug trials which showed it improved survival
rates of stroke and heart-attack victims by as much as a quarter. Peto headed the
largest such study.
Peto feels that if aspirin were 100 times its present price it would be heavily promoted
and its use much more widely encouraged.
Small doses of aspirin have been found to benefit other conditions, including angina,
mini-strokes, irregular heartbeat and diseases of the veins and colon cancer.
Doctors continue to warn that healthy people should not take aspirin simply as a precautionary
measure because of the possibility of rare side effects such as internal bleeding.
In West, fight on speed laws has few limits.
The Boston Globe, 22 August 1995, 1
The article focuses on Montana, where, if Congress decides next month to repeal the
65 mph national speed limit, state law allows daytime speed limits to be completely
abolished. (Nighttime limits would remain at 65). Chuck Hurley, senior vice president
with the Insurance Institute for Highway Safety in Arlington, VA, insists that increased
speeds will bring increased highway fatalities. He states that about 400 deaths
a year are attributable to the 1987 increase in rural speed limits from 55 mph to
65 mph; this death toll represents an increase of 15-20% from the period before the law.
Hurley's figures are presumably national. A data graphic accompanying the article
shows annual highway deaths for Montana for the years since 1974, as reported by
the Montana Highway Patrol. The first and last years' totals are labeled exactly;
the rest of the figures below are my best reading of the graph (vertical axis spacing is 50):
During 1972-73, there was no speed limit; from 1974-1987 there was a 55 mph limit;
since 1987 the limit has been 65 mph. Enforcement is not strict: a speeding ticket
carries only a $5 fine, and by state law insurance companies are not allowed to use
speeding violations in setting rates.
1973 325 1984 240
1974 300 1985 225
1975 295 1986 225
1976 300 1987 235
1977 320 1988 200
1978 275 1989 185
1979 330 1990 220
1980 325 1991 200
1981 340 1992 190
1982 255 1993 195
1983 285 1994 202
(1) Col. Craig Reap, head of the Montana highway patrol, says studies have shown
that the average car on a Montana highway travels 71.6 mph. How do you think such
an estimate is made?
(2) Mr. Hurley is quoted as saying: "There have been some remarkable gains in highway
safety in the last few years. We've had a drunk driving campaign, more seat belt
laws and use, child safety seats, air bags. A lot of these hard-won gains are being
given up to higher speeds, as if speed was a safe thrill." How could these gains be
documented if fatalities are in fact up?
(3) Compare the pre-1987 years with the post-1987 years in Montana. Is lax enforcement
a boon to safety? How do you think Mr. Hurley would explain these figures?
SAT scores rise strongly after test is overhauled.
Washington Post, 24 Aug. 1995, B1
Scores on the SAT had the largest increase in a decade this year. The average verbal
score increased 5 points to 425 while the average math score rose 3 points to 482.
The new average for the high school classes of 1995 are the first results from the
"New SAT-1: Reasoning Test," which represents a radical overhaul of the SAT exams.
The new SAT has fewer questions, longer reading passages, fewer multiple-choice
math questions and no antonym section in the verbal part. Students have an extra 30 minutes
to take the test and are encouraged to use calculators during the math section.
The College Board officials say that they scaled scoring on the new test in such a
way that the results are comparable with prior years. They attribute the increase
to students working harder and taking harder courses. Others have different explanations.
The U.S. Department of Education in its 1995 report asserts that only the math scores
on the new SAT are comparable with past years and that the big changes were in the
verbal part of the test. Most of the increase came from increased scores for students
near the top of their class. The Board says that these are the ones working harder
while critics say top students did not have time to show their stuff on the old exam,
but with the extra half hour they do. Average scores for the class of 1995 for the
American College Test showed no increase at all.
The coaching companies say that the increase is due to their efforts. Kaplan, for
example, saw a 50% increase in its SAT business when the test changed. Presumably
the students were more worried than usual because they were facing a new kind of
(1) Which of the reasons given for the increase do you think is most likely to explain
the increase? What other reasons can you think of for the increase?
(2) How can scores on the new SAT-I be compared to the scores over the last ten years
on the SAT?
(3) What will happen to such comparisons when the scores are recentered for the high
school class of 96?
Parade Magazine, 20 August 1995, 16
Marilyn vos Savant
A reader asks:
After a wonderful life on earth, you find yourself
standing at the Pearly Gates in front of St. Peter.
He tells you that, to get inside, you must predict
the outcome of the next spin of a black/red roulette
wheel. The last 20 spins have ended in black. Although
I know you're correct when you say the odds are still
50/50 on each spin, I'd side with probability here and
choose red. Which would you choose?
Marilyn explains (implicitly assuming a Bernoulli trials process with p = 0.5) that
probability in fact does not point to red; the chances are equal on the next spin.
However, she adds, knowing that, she would pick black. If we really are looking
at independent 50-50 trials, then the choice doesn't matter. On the other hand, if there
really is information to be gleaned from the history so far, it would seem to indicate
a bias towards black.
(1) What would you do?
Survey raises estimate on toddler vaccinations.
The New York Times, 25 August 1995, A14
Warren E. Leary
A survey by the Federal Center for Disease Control in Atlanta has found that, nationally,
75% of children 19-35 months old are properly vaccinated. The data are based on
more than 25,000 telephone interviews from April to December 1994 in which parents
were questioned and immunization records verified. The interviews resulted from more
than 1.2 million telephone numbers being called.
Previous estimates of immunization rates were based on data from the National Health
Interview Survey, an ongoing study based on household interviews. Those data had
suggested a national rate of 67%. Dr. Walter Ornstein of the CDC said that much
of the rise found in the new estimate can be attributed to improved sampling techniques.
(1) Improved technique?! Getting 25,000 interviews from 1.2 million phone sounds
like an abysmal response rate. What do you think is going on here?
(2) How do you think Dr. Ornstein knows the larger estimate is due to sampling technique
and not an actual surge in immunizations?
Health Sense: The day that many women dread the most.
The Boston Globe, 14 August 1995, 25
"The day" is the yearly mammogram screening for breast cancer. The article presents
statistical evidence intended to help women cope with the anxiety surrounding this
event. Dr. William Black, a radiologist at Dartmouth-Hitchcock Medical Center, sent
a questionnaire to 145 women in their 40's and documented a wide variety of hopes and
fears. Women overestimated their risk of dying of breast cancer in the next ten
years more than twenty-fold, and they thought that screening was six times more effective
at reducing risk than it actually is.
Cindy Pearson of the National Women's Health Center says the anxiety is worth it:
regular screenings for women over 50 cut the chance of dying of breast cancer by
one-third. She says: "Nothing else we know how to do saves that many lives."
The article encourages women to "put statistics in their place...remember that the
numbers, overall, are on your side." As for the famous figure that breast cancer
will strike one woman in eight, the article says that many women interpret this to
mean that, if they are sitting in the waiting room with seven other women, one will get bad
news. Noting that the one-in-eight is a cumulative lifetime risk, the article suggests
looking at the numbers in a less threatening way, in terms of the number of new cases diagnosed each year. For example, a woman between 50 and 54 has one chance in 454
of being diagnosed with breast cancer in the next year (the risk increases slightly
each year). A complete chart of these risks appears in the article:
A companion chart shows the cumulative risk, for a woman age 20 today, of developing
breast cancer by various later points in life:
By the age of: Number diagnosed with cancer:
Another set of figures is presented for putting the day of the exam into perspective.
Of 1000 women--all ages, never previously screened--waiting in a room to have mammograms,
about 70 will be recalled because the radiologist notes something of concern and wants a second look. Of these 70 women, 50 will be found healthy, and the remaining
20 will undergo biopsies. Five or six of these biopsies will find cancer.
25 1 in 19,608
30 1 in 2,525
35 1 in 622
40 1 in 217
45 1 in 93
50 1 in 50
55 1 in 33
60 1 in 24
65 1 in 17
70 1 in 14
75 1 in 11
80 1 in 10
85 1 in 9
Lifetime 1 in 8
(1) The final set of figures indicates that of 1000 women never previously screened,
only about 6 will have breast cancer. Dr. Norman Sadowsky, director of breast imaging
at Faulkner Hospital, says that this figure drops to 2 per thousand among women screened every year. Does this mean that screening actually prevents breast cancer?
If not, what does it mean?
(2) In Chance News 4.08, it was reported that 1 in 41 chance was given for a woman
without breast cancer at age 40 to have the disease by age 50. Can you reconcile
this with the above 1 in 454 annual risk given for ages 50-54 and the 1 in 384 for
(3) The first table in the article gives a 1 in 545 annual risk for ages 50-54.
The second table gives cumulative risks of 1 in 50 risk by age 50 and a 1 in 33 risk
by age 55. Can you reconcile these figures?
Who's number 1 in college football?...and how might we decide?
Chance Magazine, Summer 1995, 7-14
Hal S. Stern
Stern remarks that college football is perhaps the only major sport that does not
include a unique champion via competition each year. There are two polls, one a
survey of journalists and the other a survey of coaches. Sometimes they agree and
sometimes they don't. Last season both preferred Nebraska to Penn State but others disagreed
with this choice.
Stern suggests that statistics should be able to provide some insight into obtaining
a rating. He observes that it is necessary first to decide what you are trying to
measure. For example, are you trying to decide which team would win a playoff or
are you trying to decide which team has the best overall record in the season under consideration?
Teams with weak opponents might have great records but might not do well in a playoff
with other teams.
Stern suggests trying a least squares rating. Here is the model that Stern considers.
We want to come up with a rating x(i) of the ith team and a home team advantage H
so that when i is the home team and j the away team the expected difference in scores
of the home and away teams is d = H + x(i) - y(j). If the actual score when these
teams played was y, then (y-d)^2 is a measure of the error caused by using the rating
to estimate the difference in scores. Summing this over all games gives the total
error. The rating function x(i) and the home team advantage H are chosen to minimize
the total error over all games played during the season. If last year's college
scores are used to determine this rating, then Penn State is the top team and Nebraska
drops to fourth place with Florida and Florida State second and third respectively.
Stern mentions a number of possible modifications. For example, to reduce the effect
of large differences in scores he considers a modified model where differences in
scores greater than 20 are replaced by 20 + the square root of the excess of 20.
A number of other refinements are discussed.
Stern recommends testing a particular rating system by seeing how it predicts the
outcomes of future games. He tested the methods that he proposed using data from
professional football. He predicted scores in the second half of the season using
previous scores in the season to determine the rating. He finds that first model is pretty
good, predicting the winner 63.6 percent of the time. Most of the modifications he
proposed decreases the predictability of the rating method. The Las Vegas spread
predicted the winner 66.3% of the time even though this is not the purpose of the spread ---
its purpose being to keep the money bet on the two teams more or less equal.
(1) Why do you think the Las Vegas does a better job of predicting the winner than
the ratings schemes proposed by Stern?
Please send comments and suggestions for articles to firstname.lastname@example.org.
CHANCE News 4.12
(19 August 1995 to 7 September 1995)