CHANCE News 3.13
(2 Sept to 21 Sept 1994)

Prepared by J. Laurie Snell, with help from Jeanne
Albert, William Peterson and Fuxing Hou, as part of the
CHANCE Course Project supported by the National Science
Foundation.

jlsnell@dartmouth.edu

Back issues of Chance News and other materials for
teaching a CHANCE course are available from the
Chance Web Data Base in the Multimedia Online
Document Library at the Geometry Center
(http://geom.umn.edu/) or from their Gopher
(geom.umn.edu) in Geometry Center Resources.

Statistics are like alienists - they will
testify for either side.
Firello LaGuardia
OTHER INTERNET SOURCES

Peter Doyle's Web

Roger Johnson
Anonymous

ARTICLES ABSTRACTED

1.  Rating quarterbacks: an amplification.
2.  How numbers can trick you.
3.  When is a coincidence too bad to be true?
5.  DNA testing raises questions in county.
6.  Competition marked fervid race for cancer gene.
7.  Moderate exercise tied to drop in breast cancer.
8.  Music to operate with.
9.  Against the odds.
10.  What the polls say--and what they mean.
11.  Minorities, women lag on medical exams.
13.  Exploring baseball hitting data.
14.  Estimating with selective binomial data.

OTHER INTERNET SOURCES

In the last Chance News we discussed the game of
frustration solitaire that came up in a recent letter to
Marilyn vos Savant.  Recall that this game is played as
follows: you shuffle a deck of cards and then run
through the deck, turning over the cards one at a time
as you call out, `Ace, two, three, four, five, six,
seven, eight, nine, ten, jack, queen, king, ace, two,
three, ...,' and so on, so that you end up calling out
the thirteen ranks four times each. If the card that
comes up ever matches the rank you call out as you turn
it over, then you lose.

We promised to provide a solution to the probability of
winning that used only very elementary methods.  We have
done so, and it is available from the Peter Doyle's Web
You will find solutions by Peter and friends to other
interesting problems.

We hope by next time to have added the solution to the
much more difficult problem of finding value of the game of
Treise attempted by Montmort in the early 1700's but not
completed..
Rating quarterbacks: an amplification
The College Mathematics Journal, September 1994
(Original article in the November 1993 issue.)
Roger W. Johnson

The National Football League (NFL) has established a
rating for "All-Time Leading Passers" based on
percentage of completions, percentages of touchdowns
passes, percentage of interceptions and average gain per
pass attempt. This rating is used to gauge the relative
performance of quarterbacks for a current season.

For example, in the September 13 Seattle Times, in
discussing the upcoming  game between the Seattle
Seahawks and the San Diego Chargers,  we read that
"Just like Seahawk quarterback Rick Mirer, Humphries
hasn't thrown an interception. He has the highest
quarterback rating  in the NFL - 127.2. Mirer is fourth
at 115.1."

The formula that is used is not publicized, but  Professor
Johnson found that the results from least squares analysis
yields the formula:

Rating = [25 + 10(%Compl) + 40(%TDs) - 50(%Inter) + 50
(Yards/Att)]/12

which is used in most cases.

The National Collegiate Athletic Association (NCAA) uses
the simpler rating:

Rating = (%Comp) + 3.3(%TD) - 2(%Int) + 8.4(Yds/Att).

Professor Johnson has provided us with the data he used
to obtain these formulas. We put NFL data and NCAA data
in Chance Data Base,  in Teachers Aids.  Students might
enjoy trying to obtain these formulas.

DISCUSSION QUESTIONS

(1) What other factors do you think should be taken into
account in rating quarterbacks?

(2) Who do you thnk has the highest rating recorded so
far in the NFL?  Who do you think has the second highest
rating?
The reader who suggested the following article has
to be listed as  anonymous since the name disappeared
in transit.

How numbers can trick you; the six deadly sins
of statistical misrepresentation.
Technology Review, October 1994, pp 39-45
Arnold Barnett
Barnett proposes  six deadly sins of statistical
interpretation.  We give one example of each.  The
article provides more discussion of these and other
examples.  You will find Barnett also quoted in the
New York Times article on the statistics of the
USAair accidents discussed in the next abstract.

1. Generalizing from non-random samples.

Researchers at the Harvard Medical School found, in
interviews with 1,500 people who had suffered heart
attacks in the previous few days,that a
disproportionate number reported episodes of extreme
anger in the two hours preceding the attack. They were
led to an estimate that anger was associated with 2.3
times the usual heart attack risk. The Boston Globe
generalized this to all of us by simply reporting that
anger "can double the chance for heart attack".

2. Look! a Trend.

In 1993 the International Airline Passenger Association
began rating airlines in terms of safety.  An Airline
having the fewest deaths over a five-year period might
be considered a particularly safe airline.  However, the
data shows that the "safest" airline in one period is
apt to be the least safe in another period, suggesting
that the small trend is normal chance fluctuations
having nothing to do with the difference betwen
airlines.

3. Unjust Law of "Averages".

In 1987, the Department of Transportation required U.S.
airlines to report each month the percentage of their
flights into the nations 30 busiest airports that
arrived on time.  This information has been used in
"the number one on-time airline".  If an airline has a
large percentage of its flights into a city like
Seattle, with lousy weather, it is at a disadvantage in
such a contest with an airline that has the highest
percentage of its flights into Pheonix.

4. Verbal imprecision

A statistical study reported that the odds of a death
sentence in a white-victim case were 4.3 times the odds
in a black-victim case in Georgia. The New York Times
reported this as 4.3 times as likely, and the Supreme
Court used this interpetation.  Using this incorrect
interpretation, the probability of a death sentence
being 99% when the victim is white leads to a 23% chance
if the victim is black.  With the correct interpetation
in terms of odds, the 23% becomes 96%.

5. The Unsound Comparison.

In early 1992, the New York Times reported a record
number of killings occurred in 1991 in four of the
nation's ten largest cities: Los  Angeles, San Diego,
Dallas, and Phonenix.  They failed to point out that all
four of these cities also reached new highs in
population in 1991.

6. The Hidden Defect.

An article in the journal "Risk Analysis" in 1991
reported that a U.S.driver -- age 40, sober, wearing a
seat belt and driving a heavier than average car -- has
a "slightly less" mortality risk on a 600-mile trip than
a person who takes the same trip by air.

The analysis began with the overall death rate per mile
driven on rural interstate highways.  This was
multiplied by the risk factors for age, wearing a seat
belt, and driving a heavier car.  Multiplying these risk
factors led to a much smaller final risk than was
justified, since the factors are not independent.

DISCUSSION QUESTIONS

(1)  Why is the subpopulation of the study in Example 1
not representative of the population as a whole?

(2)  Why are the risk factors used in Example 6 not
independent?

(3)  Can you provide your other examples of the seven
When is a coincidence too bad to be true?
The New York Times, 11 September 1994, Section 4, Pg. 4
Gina Kolata
The recent crash of a USAir plane near Pittsburgh has
prompted many questions about the safety of air travel
and the relative risks between airlines.  According to
the article, USAir's accident record does look grim:
among major commercial carriers, the last three fatal
crashes in the United states were USAir planes; the
airline has been involved in four out of the last seven
major air disasters; and USAir has had five fatal
crashes in the past five years. Ms. Kolata asks: "Would
it be a rational decision to avoid flying USAir in favor
of its competitors?  Or, considering the vast number of
passengers carried by airlines, can USAir's tragic
losing streak be attributed to the vagaries of chance?"

The question of safety seems to arouse little
controversy:  by all accounts, air travel, as far as
major accidents are concerned, is an extremely low risk.
Dr. Arnold Barnett of M.I.T. makes this very clear:
"roughly speaking, if you were to board a jet flight at
random every day, it would take 26,000 years on average
before you succumb to a major crash."

But what about the relative risks between airlines? Here
the answer is not so straightforward.  Using safety
records, Dr. Barnett ranked eight major airlines over
several ten-year periods and found that, not only was
the first-ranked airline different each time, but this
same airline finished in the bottom half of the other
rankings.

As for USAir, Barnett says that, while there is a 2 to
10 percent chance that the airline's crash record is due
to chance alone, at the same time, if you were to board
a USAir jet at random in the 1990's, your chances of
being killed would be nine times higher than on any
other airline.

DISCUSSION QUESTIONS:

1.  If you had to fly today, would you choose USAir?

2.  According to Dr. Barnett, media coverage of fatal
air crashes can influence people's perception of the
risks of flying.  He claims that over a two-year period
there was 8,100 times as much coverage per death for
commercial jet accidents as there was for cancer.  He
suggests that such reporting tends to make flying seem
more dangerous than it actually is.  What do you think
of this argument?

3.  Dr. Brad Efron of Stanford points out that USAir
carries 20 percent of all domestic flights and says this
should be taken into account.  He calculates that the
chance that one major airline has four out of seven
fatal crashes, assuming it has a 20 percent market
share, is about 10 to 15 percent.  What does this mean?
How would you explain the difference between this
conclusion and Dr. Barnett's 2 to 10 percent figure?
Dr. Efron says that his findings are "enough to begin
getting suspicious but not enough to hang them."  Do you
agree?

4.  The Dartmouth Math Department softball team makes
a lot of errors. Or rather, the players make errors. Say
that over the course of the game the team makes 20 errors,
and that each time an error is made, it is equally likely to
be made by each of the 10 players.  How likely do you think
it is that at some point in the game, a single player will be
responsible for the last three errors made, or for four out
of the last seven? How could you get a good idea of what
these unknown probabilities are?

5 The following statement appeared in the July 20  issue
of ``Flight International".    More people died in   airline
accidents during the first half of 1994 than in  the same
period of any other year in the last decade-except for the
record year of 1985, according to the  Flight International
Airline Safety Review published  this week.
In the war against grade inflation, Dartmouth scores a hit.
Wall Street Journal, 8 September 1994
Jackson Toby
colleges and universities during the past several
decades and some recent attempts to reverse the trend.
The author, a professor at Rutgers University, claims
that, 43 years ago, 13 percent of the grades at Rutgers
were A's, and 29 percent were B's. Today these two
account for two-thirds of all grades awarded at the
university.  Private institutions have even higher
A's and B's, 73 percent at Harvard.  Several colleges
have abolished the F grade, and, at Oberlin, the D

What is being done to reduce the number of higher
grades?  Some universities are reintroducing the F
showed that 48 percent were opposed to the idea, in the
1995-96 academic year the university will include a
failing grade--now called "NP", or "not-passed".

Dartmouth has taken a different approach.  On
transcripts this fall the college will include the size
and median grade of the class along with a student's
grade for the course.  Professor Toby claims this will
tackle the grade inflation problem on two fronts by
reducing the likelihood of professors giving so many A's
and B's, and by giving students less incentive to choose
courses and professors because of their grading
policies.  Presumably, these two effects would reinforce
each other, as well.

DISCUSSION QUESTIONS:

1. Professor Toby says that not only do students need
grades but that "society--and society includes parents--

2. What do you think of Dartmouth's new grading
strategy?  Do you think including the class size and
inflation, and, if so, how?  Why do you think the median
is being used instead of, say, the mean?
DNA testing raises questions in county.
The New York Times, 11 September 1994, Pg. WC19
Fay Ellis

Simpson team taking aim at DNA laboratory.
The New York Times, 7 September 1994, Pg. B10
Barry Meier
Both of these articles focus on the questions raised by
the use of DNA testing in criminal court cases.  The two
most pressing issues are the accuracy of the tests and
what role the results of such tests should play in
determining guilt or innocence. As the articles point
out, the two issues are of course related.

The first article discusses the significant impact that
DNA testing has had in recent years: in Westchester
County, New York, prosecutors have used DNA analysis to
link a suspect to a crime in nearly 100 crimes since
1988. On the other hand, if,  in addition to other
"substational doubts" of a suspect's involvement in a
crime, there is no DNA match, the case will not go to
trial.

The accuracy and reliability of the test appears to be
more of a significant issue, especially as relating to
the current O.J. Simpson case.  The second article
focuses primarily on the credibility of Cellmark
Diagnostics, the company which performed the DNA tests
for this case. According to the article, defense
attorneys attack DNA laboratories in their statistical
analyses as well as in the quality of their work, and
there is ample discussion here of both of these issues.
The current examination of Cellmark arose after
prosecutors said DNA tests showed that "a sample of Mr.
Simpson's blood closely resembled blood recovered by
investigators".
Fierce competition marked fervid race for cancer gene.
The New York Times, 20 September 1994, C1
Natalie Angier
The race to find the cancer gene called BRCA1 is finally
over with victory going to the team at the University of
Utah headed by Dr. Mark H. Skolnick.  Dr. Skolnick
attributed their success to the extraordinary genetic
resource: Utah's large, stable families and the huge
genealogical archives of the Mormon church.  Others on
his team also gave "luck" a lot of credit.

Researchers now can concentrate on studying how the gene
works and on developing a screening test to check for
mutations in the gene.  It is estimated that as many as
5% of all cases of breast cancer might be due to
inherited defects in the genes. Frances Visco, president
of the National Breast Cancer Coalition  is quoted as
saying "Women will have to be very careful.  You're
talking about giving them a test telling them they have
an 85 percent chance of getting a disease that we don't
know how to prevent, and for which there is no known
cure."
Regimen of moderate exercise tied to drop in breast  cancer.
The New York Times, 21 Sept. 1994, C10
Jane E. Brody
A new study reported in the current issue of the
"Journal of the National Cancer Institute" which studied
more than 1000 California women has found that moderate
exercise can reduce a women's risk of developing
premenopausal breast cancer by as much as 60%. Lifetime
exercise habits and other relevant factors were
determined through personal interviews with 545 women with
ages up to 40 with newly diagnosed breast cancer and an
equal number of women who did not have cancer but
matched those who did in other respects.

If further studies confirm this finding, this will be
the first risk factor for breast cancer that women can
control.  Other risk factors that have been identified
are: family history -- risk is lowest among those who do not
have a family history of breast cancer,  age at onset of
menstruation -- risk is lowest among those who start
menstruating late,  age of first pregnancy -- risk is
lowest for those who have their first child by age 20
and who have the largest number of pregnancies, and
finally socioeconomic states -- higher status associated
with higher risk.
Music to operate with.
The New York Times, 21 Sept. 1994, C10
AP
Another triumph for music! An article in the current
"Journal of the American Medical Association" reports that
surgeons are who have background music for their
operations are apt to do a better job.

The study tested 50 men, 31 to 61 years old, all of whom
regularly listened to music when they operated.   They
were hooked up to a polygraph and asked to count
backward by 13's, 27's etc., from a five-digit number.
This task was repeated while they were listening to no
music, while they were listening to special stress-
reduction music, and while they were listening to music
of their choice.  The subjects provided the quickest, most
accurate, and least stressful results were obtained with
the music of their choice and the worst results were
obtained with no music.
Against the odds.
The Economist, 20 Aug. 1994, pp. 59-60
No author given.
Two economists William Chrisy and Paul Schultz wrote a
forthcoming article in the "Journal of Finance" titled
"Why do NASDAQ avoid odd-eight quotes?  The absence of
odd-eight quotes is illustrated by comparing a histogram
of bid/offer spreads, for 100 stocks in the NASDAQ with
a corresponding histogam for 100 stocks in the
NYSE/AMEX.

The authors suggest that major dealers in NASDAQ are in
collusion to round out figures to the nearest 1/4, to
keep the bid/off spreads at least 1/4, thereby making
their deals more profitable.  This has led to lawsuits
against these dealers.  The dealers have an explanation
for the scarcity of 1/8 prices, but, after the Christy
Schultz article, 1/8 bid/offer quotes have become much
more common.
What the polls say--and what they mean.
New York Times, 17 September, 1994, Section 1 Page 23.
Daniel Yankelovich
Daniel Yankelovich is well-known for his theories of
the meaning of poll and he has explained these theories
in his recent book "Coming to Public Judgements."

Here he uses the health care issue to show that, while
polls faithfully represent what people say, only the
most sophisticated tell what they believe.  He suggests
that the large  public support (average over polls of
It only means that people do not think anyone should be
deprived of health care, but only if the country can
afford it, choice of doctors is not limited etc.  People
have not thought through the consequences of their opinions.
Yankelovich calls such polls "raw opinions".  He feels
that, as public debate proceeds, people's opinions will
become based on more solid information and the polls
will be more meaningful.
Minorities, women lag on medical exams; research: study says
gaps are linked to students' education, not race or sex.
Los Angeles Times, 7 September 1994, Pg. 14
Thomas H. Maugh II
A study of medical board exams--standardized tests taken
by second-year medical students nationwide--has shown
that, while white males score higher than women and
ethnic minorities, the difference is due more to
than to sex or race.  According to the article, on a
test with a mean score of 500, Asian Americans scored 15
to 20 points lower than white males, Latinos scored 60
points lower, and African Americans scored 100 to 120
points lower.  On average, women scored 30 points lower
than males from the same ethnic group.  The pass rates
for the exam also varied according to race, with 88% for
whites, 84% for Asian Americans, 66% for Latinos, and
49% for African Americans.

For all groups except Asian Americans and women, the
study found that board exam scores could be predicted
from undergraduate GPA's, the number of science courses
taken, and MCAT scores.  Asian Americans and women
"scored lower on the boards than would have been

The article also addresses the issue of the usefulness
of the exam in determining which students are likely to
become good doctors, and offers some differing
viewpoints.  What is not disputed is that many of the
students with low scores on the boards do not graduate
from medical school or become practising physicians.  By
contrast, the article notes that it is well-known that,
while high MCAT scores predict good performance in
medical school, low scores are not so useful, especially
for ethnic minorities.  In particular, African Americans
with a low MCAT score are more likely to succeed than
whites with a similar score.
The New York Times, 11 September 1994, Sec. 6, Pg. 60
Gregg Easterbrook
"Throughout the world, many more people die each year
from filthy air and dirty water than from asbestos,
PCB's, pesticide residues, and ultraviolet rays," Mr.
Easterbrook writes.  Although these problems are "real
enough and must be dealt with", the author vehemently
asserts that air and water pollution, especially in
developing countries, is of primary concern.

Citing the World Health Organization and Unicef, Mr.
Easterbrook reports that last year, 4 million children
under the age of five died from diseases stemming
primarily from air pollution, and that 3.8 million under
five died from diseases, such as diarrhea, caused mostly
by impure water.  In the developing world, diarrhea
kills far more people than cancer, he writes.

Mr Easterbrook, a contributing editor to Newsweek and
The Atlantic Monthly, takes many Western
environmentalist to task for concentrating on problems
which he implies are minor compared to the lack of clean
air and water in many areas of the world.  His data can
indeed be alarming:

1.3 billion people in the developing world live in zones
of dangerously unsafe air;

meets the crudest safety standards;

In 1991 there was more toxic water pollution in China
alone than in the whole of the Western world, after an
estimated 25 billion tons of unfiltered industrial
pollutants went directly into the waterways.

The article also contains Mr. Easterbrook's suggestions
for solving these problems, mostly in the form of hydro-
electric dams, petroleum refining, and high-efficiency
power plants for the clean combustion of coal.  Many
environmental groups oppose these large-scale projects,
and there is ample discussion on this issue.
The Journal of the American Statistical Association has
established a new section, "Statistics in Sports".  The
current issue (September 1994) of JASA has a collection
of papers from this section.  The introductory remarks
of the editor, Donald Guthrie, include "The popularity
of statistical analysis of sports offers another
opportunity -- education of spectators, particularly
young people, in the principles of sound statistical
reasoning. A statistical argument presented in the
context of one's experience is far more likely to be
retained than one presented in the context of a
hypothetical situation."

We mention two of the articles that we found
interesting.  The analysis here naturally gets a little
technical, but the problems are ones that students might
enjoy exploring on their own.
Exploring baseball hitting data.
JASA, September 1994, pp. 1066-1074.
Jim Albert
A recent book by Cramer and Dewan ("STATS 1993 player
profiles" published in 1992 by STATS Inc.) provides data
relative to aspects of a player's performance, say his
batting average, in different "situations".  This
article uses this data to look for situations that
significantly effect a players batting average.

The author starts with Wade Boggs' performance in 1992,
to see how he performed against left- and right-handed
pitchers, pitchers that induce mainly groundballs as
compared to flyball pitchers, night games as compared to
day games, grass as compared to artificial turf, and
home games as compared to away games.  Some seem to have
an effect and others not. This gives a chance to
illustrate that if you look at enough features of a data
set, just by chance, one or more will seem unusual.  The
author then looks at a whole group of players asking
the same kind of questions and finally looks to see if
differences established here carry over to different
seasons.

Albert found that "variation in batting averages by
pitch count is dramatic--batters generally hit 123
points higher when ahead in the count than with 2
strikes."  Smaller, but significant, differences appear
when facing a pitcher of opposite arm, facing a
groundball pitcher rather than a flyball pitcher, and
playing a home.  These do seem to carry over to
different seasons.

Professor Albert has made the data used in this
study available by ftp from isds.duke.edu in
pub/albert/situation_data
Estimating with selective binomial information.
JASA, September 1994, pp. 1080 to 1089.
George Casella and Roger L. Berger
When Dave Winfield is batting, rather than giving his
batting average, the announcer might say "he's really
hitting well these days; he's 8 for the last 17."
This is selectively reported data and the question is,
what can we learn from it about Winfield's true batting
average?

Of course, we see more serious examples of this problem
all the time -- for example,  people who do meta-studies
may choose only those studies that have been published
but then want to make inferences about all studies.

The author shows that we cannot do a lot with the small
amount of data we have for Dave Winfield, but we can
make quite good estimates in similar situations with a
larger data set which results fro selective reporting.
CHANCE News 3.13
(2 Sept to 21 Sept 1994)

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!