Chance News 5.12.html

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

CHANCE News 5.12

(9 October 1996 to 10 November 1996)

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

Prepared by J. Laurie Snell, with help from Bill Peterson, Fuxing Hou, Ma.Katrina Munoz Dy, and Joan Snell, as part of the CHANCE Course Project supported by the National Science Foundation.

Please send comments and suggestions for articles to
jlsnell@dartmouth.edu.

Back issues of Chance News and other materials for teaching a CHANCE course are available from the Chance web site:

http://www.geom.umn.edu/locate/chance

Note: We had a number of great contributions from others and even this does not include them all. We will incorporate them in the next issue which we will try to send out soon. .

=====================================================

By a small sample we may judge the whole piece.

Miguel de Cervantes (1547-1616)

=====================================================

Contents

1. Lies, Damned Lies and Statistics.
2. Dartmouth's SAT scores have gone up!
3. Hanover is one of the "smartest" towns.
4. Another conditional probability disaster.
5. How Useful Are Airline 'Safety' Records?
6. E-mail users earn more.
7. Where on Earth? The GPS solution.
8. Are you part of growing population?
9. Life on the job.
10. Closing the gender gap on PSATs.
11. Don't assume bias in testing.

<<<========<<

>>>>>==============>
Professor Nancy Reid is teaching her course "Lies, Damned Lies and Statistics" again this year and keeps materials for her course on the web site. Her course is based on current articles in the news. She does not use a traditional text but rather books such as Tufte's "Visual Display of Quantitative Information" and "Tainted Truth" by Cynthia Crossen. Each week Professor Reid puts technical notes for the students and brief summaries of articles in the major Canadian newspaper "the Globe and Mail" on her web page. Here are her summaries of articles for the week of November 5 to 12th.

In the Globe & Mail this week

"Librarians take on Internet", (S. Strauss), Nov. 6, A8. The Metro Reference Library is cataloging internet sites on astronomy using the Dewey Decimal System. As you'll have noticed, search engines aren't really the answer to finding what you want on the Internet. This should help.

"U.S. Election", (B. Bennell), Nov. 6, A6. A map of the U.S. showing states' votes for Clinton and Dole. Gives a very nice quick summary of the electoral college vote. [The map is provided on her web version]

"Doubt cast on gene linked to behavior", (N. Angier, NY Times), Nov. 6, A8. A new study has cast doubt on a widely heralded finding reported earlier this year that there is a gene that controls a personality trait for novelty seeking. There is a very readable and persuasive account of the earlier work in "American Scientist" (sometime in spring 96). The new study appeared in the November issue of "Molecular Psychiatry".

Note from Editor: The "American Scientist" has full text versions of most of their recent articles available from the Sigma Xi homepage. The article referred to in the this review is "Reward Deficiency Syndrome" by Kenneth Blum, John G. Cull, Eric R. Braverman and David E. Comings (March-April issue). In the July-August 1995 issue there is an article "The Role of Intelligence in Modern Society" by Earl Hunt that would be useful in discussing issues raised by "The Bell Curve".

"Sibling fights worsen if parents 'lose it'" (V. Galt, Nov. 9, A9. If you think we're overwhelmed by numbers, you might be right. This study is reported as stating that "the younger sibling tattles 34.5 percent of the the time and the older sibling tattles 17.2 per cent of the time. This makes mothers feel angry 50 per cent of the time, upset 16.7 per cent of the time, not bothered 16.7 per cent of the time and worn out 10 per cent of the time." I'm worn out just thinking about it.

"Walking reduces risk of heart attack, study says" (W. Immen), Nov. 11, A1. This is a report on a study presented at the American Heart Association. A group of 238 women in their late fifties with no immediate risk factors for heart disease and no current exercise regime, were identified. Half of the women were encouraged to make walking a part of their social activity and the other half were allowed to remain inactive. Ten years later, only 3 in the active group had suffered heart attacks, compared with 18 in the inactive group. About 70 per cent of the active group reported still being active, while the inactive group was generally even less active.

"Drugs, surgery equally good, heart study says" (Reuters), Nov. 11, A6. This study was published in last month's "New England Journal of Medicine". It is based on an analysis of 3,145 heart-attack victims treated in 19 Seattle hospitals. The article points out that the study "is not a tightly controlled comparison of the two techniques" and reports later in the article that two earlier studies, in which 344 heart attach patients were randomly assigned to get either drugs or surgery, concluded that there was a benefit for surgery.

<<<========<<

>>>>>==============>
Now that the recentered SAT exam has become a reality in college admissions we have been asked a number of questions about why it was done and how it is done. We will give what we know about it. Why they were recentered was explained in a letter to the editor from the president of the College Board. The letter is given verbatim.

Why we centered.
Washington Post, 14 September, 1996, A24
Letter to the editor.

The Scholastic Assessment Test (SAT) is good at detecting changes in students' academic preparation for college, but that is not why students take it or why colleges use the scores ["Are Test Scores Improving?", editorial, Aug. 31]. The test's major value is its ability to predict the success of individual students in the first year or two of college. Its primary assets are its predictive validity and reliability, which help colleges be objective and fair as they sort through various, more subjective admissions criteria.

We decided to recenter the SAT score scale because our first obligation is to score and scale the SAT so that it will most fairly and accurately predict students' prospects in college. Recentering does this by distributing scores to reflect the composition of the million-plus college-bound seniors who take the SAT today, not the 10,654 who took it in 1941 -- mostly men (62 percent) and many from independent schools (41 percent). Yet some would index today's students' scores to that small and unrepresentative group of students who took the SAT prior to World War II. In 1996, 1,084,725 students took the test; 53 percent were women, 30 percent minorities and 83 percent from public high schools.

Anyone concerned about score trends should know that all trends remain clear after recentering because concordance tables distributed to schools and colleges make it easy to translate old scores into recentered scores for individuals and groups and to track average scores over time.

DONALD M. STEWART

President
The College Board
New York

We found on the College Board web page the following table resulting from a study done by ETS to evaluate the effect of recentering on the validity of the SAT in predicting the freshman grade point average. ("Effects of Scale Choice on Predictive Validity" by R. Morgan, ETS, 1994.)

This table gives the correlation of the SAT exams and High School (HS) grade-point averages with college freshman grade- point The correlations are the average correlations for 75 colleges and universities using the original scale (O) and then the recentered scale (R).

                  Total          Male       Female



                   O    R       O    R      O    R



SAT Verbal       .42  .43     .40  .40     .45  .46

SAT Math         .46  .46     .44  .44     .48  .49

SAT Total        .50  .51     .49  .49     .53  .54

HS GPA           .48  .48     .47  .47     .49  .49

SAT plus HS GPA  .59  .59     .57  .58     .61  .62

SAT Increment    .10  .11     .11  .11     .12  .13

While, in each case, the average correlation with the recentered scale is at least as big as it is with the old scale, the differences are at most .01

From our admission office learned that the conversion table is as follows.

Old                  New

               Verbal     Math

                                                  

800              800      800                                                   

790              800      800                                                   

780              800      800                                                   

770              800      790                                                   

760              800      770                                                   

750              800      760                                                   

740              800      740                                                   

730              800      730                                                   

720              790      720                                                   

710              780      700                                                   

700              760      690                                                   

690              750      680                                                   

680              740      670                                                   

670              730      660                                                   

660              720      650                                                   

650              710      650                                                   

640              700      640                                                   

630              690      630                                                   

620              680      620                                                   

610              670      610                                                   

600              670      600                                                   

590              660      600                                                   

580              650      590                                                   

570              640      580                                                   

560              630      570                                                   

550              620      560                                                   

540              610      560                                                   

530              600      550                                                   

520              600      540                                                   

510              590      530                                                   

500              580      520                                                   

490              570      520                                                   

480              560      510                                                   

470              550      500                                                   

460              540      490                                                   

450              530      480                                                   

440              520      480                                                   

430              510      470                                                   

420              500      460                                                   

410              490      450                                                   

400              480      440                                                   

390              470      430                                                   

380              460      430                                                   

370              450      420                                                   

360              440      410                                                   

350              430      400                                                   

340              420      390                                                   

330              410      380                                                   

320              400      370                                                   

310              390      350                                                   

300              380      340                                                   

290              370      330                                                   

280              360      310                                                   

270              350      300                                                   

260              340      280                                                   

250              330      260                                                   

240              310      240                                                   

230              300      220                                                   

220              290      200                                                   

210              270      200                                                   

200              230      200

DISCUSSION QUESTIONS:

(1) Why do you think some people will have lower math score when rescaled than they would have had if the old scale had been used?

(2) The College Board also provides a table to convert mean SAT scores for previous years into a mean score using the recentered scale. This conversion depends on both the mean and the standard deviation of the original scores. What is being assumed about the actual distribution of scores in this conversion?

(3) Do you think that rescaling the scores was a good idea? What are the arguments that might be given pro and con?
<<<========<<

>>>>>==============>
An example of Dartmouth putting a good spin on a news article, suggested by Bob Norman, has to do with a note in the November issue of the "Dartmouth Alumni Magazine". After discussing all the exciting talks going on at Dartmouth they remark:

With that brand of headlines, you can see why it's no wonder that Hanover was recently the only New England community to be rated among the top 20 of the "101 Smartest Spots" in the United States, according to the magazine "American Demographics". People holding bachelor degrees or higher average 20 percent of the U.S. population but for Hanover (and Norwich as well) the figure is three times that.

The study referred to was reported in "American Demographics", October, 1995. This study looked at communities with at least 2,500 people and, using the 1990 census ranked them according to the percentage of the residents 25 years or over who had a bachelor's degree or higher. In this ranking Hanover N.H. is in 9th place with 73%, and indeed no other New England town is in the top 20. Stanford CA is in first place with 90.9% having a bachelor's degree.

Bob remarked that the data looked as if zip codes might have been used to define the communities. He found that the zip code 94305 consists of Stanford estate. In this area there are only Stanford University buildings, the hospital and Stanford faculty who are permitted to build on this area when they get tenure. If Bob's conjecture is correct it is not surprising that Stanford was number one. Using the 1990 census lookup on the census bureau web site, we found that, in the 1990 census for zip code 94305, there were 6090 people 25 years or over. This was exactly the number listed for Stanford in the study. We consider that this verifies Bob's conjecture and his explanation why Stanford was number one. In Hanover the majority of the people over 25 would be expected to be connected with Dartmouth College or the Hospital so, again, it is not too surprising that Hanover fared well.
<<<========<<

>>>>>==============>
Dan Velleman wrote us about the problem that we discussed last time, from the Aug.18, 1996 NPR weekend edition. Recall that the problem was:

There are four colored balls in a bag; two red, one black, and one blue. If you draw two balls at random, and then you're told that one of them is red, what is the likelihood that the other ball is also red?

In a letter Dan wrote to Shortz relating to the solution on the NPR website Dan writes:

Your explanation of the problem with the four colored balls is very nice until near the end. But after explaining why some listeners who got an answer of 1 in 3 were wrong, you incorrectly stated the assumption you used in your solution. You said that "you have to assume that the color revealed to you has been randomly picked from the colors of the two selected balls."

Dan goes on to point out that this assumption would lead to the very answer 1/3 that he wanted to show was wrong.

Dan also did not think much of the experiment that Shortz suggested to verify that the answer 1/5 was correct. Shortz
suggested that people do the following experiment

Put the 4 balls in a bag, pick two out at random, then pick one of the two chosen balls at random and see if it's red. If it is, look to see if the other is also red.

Conditional probablities of this type continue to cause trouble because people do not realize how carefully they have to specify how they got the given information. Faithful readers of Chance News will have seen many discussions of this problem. A good reference for this kind of problem is:

Bar-Hillel, M., Falk, R. (1982) Some teasers concerning conditional probability. Cognition, 11, 109-122

DISCUSSION QUESTIONS:

(1) Do you agree that Will Shortz's explanation was wrong?

(2) If you carried out the simulation he proposed what estimate would you get for the answer?

(3) Does the problem give enough information to determine an experiment, or computer simulation to estimate the answer? If so, what experiment or computer program would you use? What answer would you get?
<<<========<<

>>>>>==============>
Jessica Utts writes:

The following article is from the "Consumer Reports Travel Letter" Vol. 12, No. 10, October 1996, p 217 After quoting the article verbatim, I will also suggest some discussion questions.

How Useful Are Airline 'Safety' Records?

Under the Travelers' Rights Bill (S. 2023) introduced recently by Senator Harry Reid (D-NV), airlines would be required to provide passengers, on request, with information about an aircraft's safety record and the competency of its flight crew. The same bill would allow interested persons to obtain from the FAA safety information about an airline's fleet. Beyond that, the FAA would be required to supply Congress with an annual report on the airline industry, describing accidents for each airline and including the names of the aircraft manufacturers involved.

That sounds like a great idea - but what meaningful data could the FAA provide? As far as we can tell, none of the available statistics can serve as a reliable predictor of future crashes. No historical data, for example, could possibly have predicted that an American Airlines 757 would fly into a mountain in Colombia, or that somebody would load fully charged oxygen generators onto a ValuJet flight (if that finally turns out to be the cause of that crash).

Historical statistics related to safety (or to any other aspect of airline operations, for that matter) are of use to consumers only if they can reliably predict future performance. Since they can't, we're afraid that the current effort will be ineffectual.

DISCUSSION QUESTIONS:

1. Do you agree that safety records from the past are little use in predicting future safety? Why or why not?

2. If a particular airline such as ValuJet had a poor safety record in the past, would it matter when trying to decide which airline to fly that the exact cause of a future crash could not be predicted?

3. Are there airline records that might be reliably used to predict future performance, such as on-time statistics? Why is that different from trying to predict future safety?
<<<========<<

>>>>>==============>
Jessica Utts also suggests the following:

E-mail users earn more.
Chronicle of Higher Education, 1 Nov 1996, A25

A new study conducted by a professor of economics and business administration at Ursinus College shows that workers who use e-mail earn, on average, 7.4% more than colleagues in similar situations who don't. The data comes from a 1993 U.S. Census Bureau survey of nearly 10,000 workers. The study showed that the discrepancy between wages of e-mail users and non-users was greatest among service workers. Executives who used e-mail out-earned their non-wired peers by almost 10%.

DISCUSSION QUESTION:

Do you think companies which have e-mail facilities would tend to be wealthier than companies which do not? If so, would that cause problems with such a study? What do you think is meant by "colleagues in similar situations"?
<<<========<<

>>>>>==============>
Where on Earth? The GPS solution. Scrambling policy irks many experts.
The Boston Globe, 7 October 1996, pC1
Peter J. Howe

The Defense Department's Global Positioning System (GPS) is based on a network of 24 satellites, through which users can find where they are anywhere on earth to within 100 yards. Commercial applications for sailing, aviation and surveying represent a $1 billion-dollar global business. Critics say the Pentagon could make available information accurate to within 15 or 20 yards but has been deliberately scrambling information available to civilians under a policy of "selective availability". (Military personnel can receive more accurate encrypted information.) They note that, even as the Defense Department is working to scramble the signal, other arms of government, including the FAA, are spending tens of millions on their own systems to circumvent the scrambling.

The article includes web addresses for a number of web sites with information on how GPS. Among these:

Trimble Navigation, a GPS supplier, publishes a primer on how GPS works

The following web pages describe applications to earthquake mapping:

DISCUSSION QUESTION:

The article notes that the potential improvement from 100 yards error to 15-20 yards represents "about 95 percent greater precision measured as a circle of error." What does this mean?
<<<========<<

>>>>>==============>
Are you part of growing population?
The Boston Globe, 16 October 1996, pA12.
Globe Staff

This short piece notes that the government considers a person with a body mass index (BMI) over 25 to be too fat. It includes the following formula for readers to compute their own BMI:

First, multiply your weight in pounds by .45 to get kilograms. Next, convert your height to inches. Multiply this number by .0254 to get meters. Multiply that number by itself. Then divide this into your weight in kilograms.

DISCUSSION QUESTION:

1. What is being measured here?

2. The article notes that your BMI will "probably be a number in the 20s or low 30s." What fraction of the population do you suppose is included in this range?
<<<========<<

>>>>>==============>
A special news report about life on the job--and trends taking shape there.
The Wall Street Journal, 29 October 1996, pA1

A collection of short news items. The following one presents some curious data.

Diversity study's printing error lives on as fact.

A 1987 study entitled Workplace 2000, compiled by the Hudson Institute in Indianapolis, reported that the number of non-Hispanic white males entering the work force would drop to 15% from 47% by the turn of the century. This figure has been frequently quoted in support of diversity programs. The study's authors acknowledge that the report caused confusion because the word "net" was dropped in an explanation of the statistic.

The present article notes that "The US Bureau of Labor Statistics recently forecast that the number of non-Hispanic white males in the work force will decline from by only 3% between 1994 and the year 2005, to 38% from 41%."

DISCUSSION QUESTIONS:

1. Where would you insert the word "net" in the original report in order to make sense of these numbers?

2. What exactly is dropping by 3% in the BLS forecast?
<<<========<<

>>>>>==============>
Here are two pieces on the PSAT.

Closing the gender gap on PSATs.
US News & World Report, 14 October 1996, p28.
Editorial

This is a report on the recent decision by the Department of Education to change the format of the PSAT (Preliminary Scholastic Aptitude Test) in response to charges by civil rights groups that the tests are gender biased. While 55% of the students who take the test are girls, only 39% of the National Merit Scholarships (for which the PSAT is a qualifying test) ultimately go to girls. Furthermore, because girls actually receive, on average, higher grades than boys in both high school and college, critics have argued that the PSAT is underpredicting their performance.

As a remedy, the PSAT plans to add a test of writing skills, an area in which girls tend to outperform boys. The boys have traditionally had the edge in the math section.

Don't assume bias in testing.
Voice of the People (letter to the editor)
Chicago Tribune, 11 November 1996, p18.
Peter Chrzanowski

Mr. Chrzanowski takes exception with an earlier Tribune editorial that had praised the decision to revise the PSAT. He argues that one might expect boys' average PSAT scores to be higher precisely because fewer boys take the test: perhaps boys with marginal ability are less likely to take the test than girls of comparable ability. As for the fact that boys have higher PSAT averages, while girls have higher high school grades, he notes that the difference could equally well be attributed to bias in high school grading as to bias in the PSAT.

DISCUSSION QUESTIONS:

(1) Do you agree with Chrzanowski's arguments? How would you respond?

(2) Girls tend to have a smaller variance in their test scores than boys do. Could this be an explanation for the apparent bias in the PSAT scores?

Please send comments and suggestions for articles to
jlsnell@dartmouth.edu.

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

CHANCE News 5.12

(9 October 1996 to 10 November 1996)

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!