CHANCE News 5.12
(9 October 1996 to 10 November 1996)
Prepared by J. Laurie Snell, with help from Bill Peterson,
Fuxing Hou, Ma.Katrina Munoz Dy, and Joan Snell, as part of the
CHANCE Course Project supported by the National Science
Please send comments and suggestions for articles to
Back issues of Chance News and other materials for teaching a
CHANCE course are available from the Chance web site:
Note: We had a number of great contributions from others and even this does not include
them all. We will incorporate them in the next issue which we will try to send out
By a small sample we may judge the whole piece.
Miguel de Cervantes (1547-1616)
Professor Nancy Reid is teaching her course "Lies, Damned Lies and Statistics" again
this year and keeps materials for her course on the web site.
Her course is based on current articles in the news. She does not use a traditional text but rather books such as Tufte's "Visual Display of Quantitative
Information" and "Tainted Truth" by Cynthia Crossen. Each week Professor Reid puts
technical notes for the students and brief summaries of articles in the major Canadian newspaper "the Globe and Mail" on her web page. Here are her summaries of articles
for the week of November 5 to 12th.
In the Globe & Mail this week
"Librarians take on Internet", (S. Strauss), Nov. 6,
A8. The Metro Reference Library is cataloging
internet sites on astronomy using the Dewey Decimal
System. As you'll have noticed, search engines
aren't really the answer to finding what you want
on the Internet. This should help.
"U.S. Election", (B. Bennell), Nov. 6, A6. A map of
the U.S. showing states' votes for Clinton and Dole.
Gives a very nice quick summary of the electoral
college vote. [The map is provided on her web version]
"Doubt cast on gene linked to behavior", (N. Angier,
NY Times), Nov. 6, A8. A new study has cast doubt on
a widely heralded finding reported earlier this year
that there is a gene that controls a personality trait
for novelty seeking. There is a very readable and
persuasive account of the earlier work in "American
Scientist" (sometime in spring 96). The new study
appeared in the November issue of "Molecular Psychiatry".
Note from Editor: The "American Scientist" has full text versions of most of their recent articles available
from the Sigma Xi homepage.
The article referred to in the this review is "Reward Deficiency Syndrome" by Kenneth Blum, John G. Cull,
Eric R. Braverman and David E. Comings (March-April issue). In the July-August 1995 issue there is an article "The Role of Intelligence in Modern Society" by Earl Hunt that would be useful in discussing issues raised by "The Bell Curve".
"Sibling fights worsen if parents 'lose it'" (V. Galt,
Nov. 9, A9. If you think we're overwhelmed by numbers,
you might be right. This study is reported as stating
that "the younger sibling tattles 34.5 percent of the
the time and the older sibling tattles 17.2 per cent of
the time. This makes mothers feel angry 50 per cent of
the time, upset 16.7 per cent of the time, not bothered
16.7 per cent of the time and worn out 10 per cent of
the time." I'm worn out just thinking about it.
"Walking reduces risk of heart attack, study says"
(W. Immen), Nov. 11, A1. This is a report on a study
presented at the American Heart Association. A group
of 238 women in their late fifties with no immediate
risk factors for heart disease and no current exercise
regime, were identified. Half of the women were
encouraged to make walking a part of their social activity
and the other half were allowed to remain inactive. Ten
years later, only 3 in the active group had suffered
heart attacks, compared with 18 in the inactive group.
About 70 per cent of the active group reported still being
active, while the inactive group was generally even less
"Drugs, surgery equally good, heart study says" (Reuters),
Nov. 11, A6. This study was published in last month's
"New England Journal of Medicine". It is based on
an analysis of 3,145 heart-attack victims treated in 19
Seattle hospitals. The article points out that the study
"is not a tightly controlled comparison of the two
techniques" and reports later in the article that two
earlier studies, in which 344 heart attach patients were
randomly assigned to get either drugs or surgery, concluded
that there was a benefit for surgery.
Now that the recentered SAT exam has become a reality in college admissions we have
been asked a number of questions about why it was done and how it is done. We will
give what we know about it. Why they were recentered was explained in a letter to
the editor from the president of the College Board. The letter is given verbatim.
Why we centered.
Washington Post, 14 September, 1996, A24
Letter to the editor.
The Scholastic Assessment Test (SAT) is good at
detecting changes in students' academic preparation
for college, but that is not why students take it or
why colleges use the scores ["Are Test Scores Improving?",
editorial, Aug. 31]. The test's major value is its
ability to predict the success of individual students
in the first year or two of college. Its primary assets
are its predictive validity and reliability, which help
colleges be objective and fair as they sort through
various, more subjective admissions criteria.
We decided to recenter the SAT score scale because our
first obligation is to score and scale the SAT so that it
will most fairly and accurately predict students'
prospects in college. Recentering does this by
distributing scores to reflect the composition of
the million-plus college-bound seniors who take the
SAT today, not the 10,654 who took it in 1941 -- mostly
men (62 percent) and many from independent schools (41
percent). Yet some would index today's students' scores
to that small and unrepresentative group of students
who took the SAT prior to World War II. In 1996,
1,084,725 students took the test; 53 percent were women,
30 percent minorities and 83 percent from public high
Anyone concerned about score trends should know that
all trends remain clear after recentering because
concordance tables distributed to schools and colleges
make it easy to translate old scores into recentered
scores for individuals and groups and to track average
scores over time.
DONALD M. STEWART
The College Board
We found on the College Board web page
the following table resulting from a study done by ETS to evaluate the effect of
recentering on the validity of the SAT in predicting the freshman grade point average. ("Effects of Scale Choice on Predictive Validity" by R. Morgan,
This table gives the correlation of the SAT exams and High School (HS) grade-point
averages with college freshman grade- point The correlations are the average correlations
for 75 colleges and universities using the original scale (O) and then the recentered scale (R).
Total Male Female
While, in each case, the average correlation with the recentered scale is at least
as big as it is with the old scale, the differences are at most .01
O R O R O R
SAT Verbal .42 .43 .40 .40 .45 .46
SAT Math .46 .46 .44 .44 .48 .49
SAT Total .50 .51 .49 .49 .53 .54
HS GPA .48 .48 .47 .47 .49 .49
SAT plus HS GPA .59 .59 .57 .58 .61 .62
SAT Increment .10 .11 .11 .11 .12 .13
From our admission office learned that the conversion table is as follows.
800 800 800
790 800 800
780 800 800
770 800 790
760 800 770
750 800 760
740 800 740
730 800 730
720 790 720
710 780 700
700 760 690
690 750 680
680 740 670
670 730 660
660 720 650
650 710 650
640 700 640
630 690 630
620 680 620
610 670 610
600 670 600
590 660 600
580 650 590
570 640 580
560 630 570
550 620 560
540 610 560
530 600 550
520 600 540
510 590 530
500 580 520
490 570 520
480 560 510
470 550 500
460 540 490
450 530 480
440 520 480
430 510 470
420 500 460
410 490 450
400 480 440
390 470 430
380 460 430
370 450 420
360 440 410
350 430 400
340 420 390
330 410 380
320 400 370
310 390 350
300 380 340
290 370 330
280 360 310
270 350 300
260 340 280
250 330 260
240 310 240
230 300 220
220 290 200
210 270 200
200 230 200
(1) Why do you think some people will have lower math score when rescaled than they
would have had if the old scale had been used?
(2) The College Board also provides a table to convert mean SAT scores for previous
years into a mean score using the recentered scale. This conversion depends on both
the mean and the standard deviation of the original scores. What is being assumed
about the actual distribution of scores in this conversion?
(3) Do you think that rescaling the scores was a good idea? What are the arguments
that might be given pro and con?
An example of Dartmouth putting a good spin on a news article, suggested by Bob Norman,
has to do with a note in the November issue of the "Dartmouth Alumni Magazine".
After discussing all the exciting talks going on at Dartmouth they remark:
With that brand of headlines, you can see why it's
no wonder that Hanover was recently the only New
England community to be rated among the top 20 of the
"101 Smartest Spots" in the United States, according
to the magazine "American Demographics". People holding
bachelor degrees or higher average 20 percent of the
U.S. population but for Hanover (and Norwich as well)
the figure is three times that.
The study referred to was reported in "American Demographics", October, 1995. This
study looked at communities with at least 2,500 people and, using the 1990 census
ranked them according to the percentage of the residents 25 years or over who had
a bachelor's degree or higher. In this ranking Hanover N.H. is in 9th place with 73%, and
indeed no other New England town is in the top 20. Stanford CA is in first place
with 90.9% having a bachelor's degree.
Bob remarked that the data looked as if zip codes might have been used to define the
communities. He found that the zip code 94305 consists of Stanford estate. In this
area there are only Stanford University buildings, the hospital and Stanford faculty
who are permitted to build on this area when they get tenure. If Bob's conjecture is
correct it is not surprising that Stanford was number one. Using the 1990 census
lookup on the census bureau web site, we found that, in the 1990 census for zip code
94305, there were 6090 people 25 years or over. This was exactly the number listed for Stanford
in the study. We consider that this verifies Bob's conjecture and his explanation
why Stanford was number one. In Hanover the majority of the people over 25 would
be expected to be connected with Dartmouth College or the Hospital so, again, it is not
too surprising that Hanover fared well.
Dan Velleman wrote us about the problem that we discussed last time, from the Aug.18, 1996 NPR weekend edition. Recall that the problem was:
There are four colored balls in a bag; two red,
one black, and one blue. If you draw two balls
at random, and then you're told that one of them
is red, what is the likelihood that the other ball
is also red?
In a letter Dan wrote to Shortz relating to the solution on the NPR website Dan writes:
Your explanation of the problem with the four colored
balls is very nice until near the end. But after
explaining why some listeners who got an answer of
1 in 3 were wrong, you incorrectly stated the
assumption you used in your solution. You said
that "you have to assume that the color revealed to
you has been randomly picked from the colors of the
two selected balls."
Dan goes on to point out that this assumption would lead to the very answer 1/3 that
he wanted to show was wrong.
Dan also did not think much of the experiment that Shortz suggested to verify that
the answer 1/5 was correct. Shortz
suggested that people do the following experiment
Put the 4 balls in a bag, pick two out at random,
then pick one of the two chosen balls at random and
see if it's red. If it is, look to see if the other
is also red.
Conditional probablities of this type continue to cause trouble because people do
not realize how carefully they have to specify how they got the given information.
Faithful readers of Chance News will have seen many discussions of this problem.
A good reference for this kind of problem is:
Bar-Hillel, M., Falk, R. (1982) Some teasers concerning conditional probability. Cognition,
(1) Do you agree that Will Shortz's explanation was wrong?
(2) If you carried out the simulation he proposed what estimate would you get for
(3) Does the problem give enough information to determine an experiment, or computer
simulation to estimate the answer? If so, what experiment or computer program would
you use? What answer would you get?
Jessica Utts writes:
The following article is from the "Consumer Reports Travel Letter" Vol. 12, No. 10,
October 1996, p 217 After quoting the article verbatim, I will also suggest some
How Useful Are Airline 'Safety' Records?
Under the Travelers' Rights Bill (S. 2023) introduced recently by Senator Harry Reid
(D-NV), airlines would be required to provide passengers, on request, with information
about an aircraft's safety record and the competency of its flight crew. The same
bill would allow interested persons to obtain from the FAA safety information about
an airline's fleet. Beyond that, the FAA would be required to supply Congress with
an annual report on the airline industry, describing accidents for each airline and
including the names of the aircraft manufacturers involved.
That sounds like a great idea - but what meaningful data could the FAA provide? As
far as we can tell, none of the available statistics can serve as a reliable predictor
of future crashes. No historical data, for example, could possibly have predicted
that an American Airlines 757 would fly into a mountain in Colombia, or that somebody
would load fully charged oxygen generators onto a ValuJet flight (if that finally
turns out to be the cause of that crash).
Historical statistics related to safety (or to any other aspect of airline operations,
for that matter) are of use to consumers only if they can reliably predict future
performance. Since they can't, we're afraid that the current effort will be ineffectual.
1. Do you agree that safety records from the past are little use in predicting future
safety? Why or why not?
2. If a particular airline such as ValuJet had a poor safety record in the past,
would it matter when trying to decide which airline to fly that the exact cause of
a future crash could not be predicted?
3. Are there airline records that might be reliably used to predict future performance,
such as on-time statistics? Why is that different from trying to predict future
Jessica Utts also suggests the following:
E-mail users earn more.
Chronicle of Higher Education, 1 Nov 1996, A25
A new study conducted by a professor of economics and business administration at Ursinus
College shows that workers who use e-mail earn, on average, 7.4% more than colleagues
in similar situations who don't. The data comes from a 1993 U.S. Census Bureau survey of nearly 10,000 workers. The study showed that the discrepancy between wages
of e-mail users and non-users was greatest among service workers. Executives who
used e-mail out-earned their non-wired peers by almost 10%.
Do you think companies which have e-mail facilities would tend to be wealthier than
companies which do not? If so, would that cause problems with such a study? What
do you think is meant by "colleagues in similar situations"?
Where on Earth? The GPS solution. Scrambling policy irks many experts.
The Boston Globe, 7 October 1996, pC1
Peter J. Howe
The Defense Department's Global Positioning System (GPS) is based on a network of
24 satellites, through which users can find where they are anywhere on earth to within
100 yards. Commercial applications for sailing, aviation and surveying represent
a $1 billion-dollar global business. Critics say the Pentagon could make available information
accurate to within 15 or 20 yards but has been deliberately scrambling information
available to civilians under a policy of "selective availability". (Military personnel can receive more accurate encrypted information.) They note that, even as the
Defense Department is working to scramble the signal, other arms of government, including
the FAA, are spending tens of millions on their own systems to circumvent the scrambling.
The article includes web addresses for a number of web sites with information on how
GPS. Among these:
Trimble Navigation, a GPS supplier, publishes a primer on how GPS works
The following web pages describe applications to earthquake mapping:
The article notes that the potential improvement from 100 yards error to 15-20 yards represents "about 95 percent greater precision measured as a circle of error." What does this mean?
Are you part of growing population?
The Boston Globe, 16 October 1996, pA12.
This short piece notes that the government considers a person with a body mass index
(BMI) over 25 to be too fat. It includes the following formula for readers to compute
their own BMI:
First, multiply your weight in pounds by .45 to
get kilograms. Next, convert your height to
inches. Multiply this number by .0254 to get
meters. Multiply that number by itself. Then
divide this into your weight in kilograms.
1. What is being measured here?
2. The article notes that your BMI will "probably be a number in the 20s or low 30s."
What fraction of the population do you suppose is included in this range?
A special news report about life on the job--and trends taking shape there.
The Wall Street Journal, 29 October 1996, pA1
A collection of short news items. The following one presents some curious data.
Diversity study's printing error lives on as fact.
A 1987 study entitled Workplace 2000, compiled by the Hudson Institute in Indianapolis,
reported that the number of non-Hispanic white males entering the work force would
drop to 15% from 47% by the turn of the century. This figure has been frequently
quoted in support of diversity programs. The study's authors acknowledge that the report
caused confusion because the word "net" was dropped in an explanation of the statistic.
The present article notes that "The US Bureau of Labor Statistics recently forecast
that the number of non-Hispanic white males in the work force will decline from by
only 3% between 1994 and the year 2005, to 38% from 41%."
1. Where would you insert the word "net" in the original report in order to make
sense of these numbers?
2. What exactly is dropping by 3% in the BLS forecast?
Here are two pieces on the PSAT.
Closing the gender gap on PSATs.
US News & World Report, 14 October 1996, p28.
This is a report on the recent decision by the Department of Education to change the
format of the PSAT (Preliminary Scholastic Aptitude Test) in response to charges
by civil rights groups that the tests are gender biased. While 55% of the students
who take the test are girls, only 39% of the National Merit Scholarships (for which the
PSAT is a qualifying test) ultimately go to girls. Furthermore, because girls actually
receive, on average, higher grades than boys in both high school and college, critics
have argued that the PSAT is underpredicting their performance.
As a remedy, the PSAT plans to add a test of writing skills, an area in which girls
tend to outperform boys. The boys have traditionally had the edge in the math section.
Don't assume bias in testing.
Voice of the People (letter to the editor)
Chicago Tribune, 11 November 1996, p18.
Mr. Chrzanowski takes exception with an earlier Tribune editorial that had praised
the decision to revise the PSAT. He argues that one might expect boys' average PSAT
scores to be higher precisely because fewer boys take the test: perhaps boys with
marginal ability are less likely to take the test than girls of comparable ability. As
for the fact that boys have higher PSAT averages, while girls have higher high school
grades, he notes that the difference could equally well be attributed to bias in
high school grading as to bias in the PSAT.
(1) Do you agree with Chrzanowski's arguments? How would you respond?
(2) Girls tend to have a smaller variance in their test scores than boys do. Could
this be an explanation for the apparent bias in the PSAT scores?
Please send comments and suggestions for articles to
CHANCE News 5.12
(9 October 1996 to 10 November 1996)