CHANCE News 6.11

(10 September to 9 October 9 1997)


Prepared by J. Laurie Snell and Bill Peterson, with help from Fuxing Hou, Ma.Katrina Munoz Dy, and Joan Snell, as part of the Chance Course Project supported by the National Science Foundation.

This issue and back issues of Chance News are available from the Chance web site:


Comments and suggestions can be left on the Chance discussion page on the Chance web site or sent to jlsnell@dartmouth.edu

It would appear that, for whatever purpose, be it for internal or external consumption, the management of NASA exaggerates the reliability of its product, to the point of fantasy.
Richard Feynman




(1) Suggested by Roger Pinkham:

The recent military airplane accidents beg for an answer to the following. Given an estimate for the annual rate of military plane accidents, is a bunching of 6 reasonable?

A back-of-the-envelope computation indicates, barring conceptual errors, that an annual rate of around 70-73 per year would render a bunch of 6 acceptable (from a probability point of view). What thinkest thou?

(2) Goran Djuknic asks: What would your students say about this quote?

In a survey taken several years ago, all incoming freshman at MIT were asked if they expected to graduate in the top half of their class. Ninety-seven percent responded that they did.

(3) Norton Starr provided this postscript to his article on the 1970 draft lottery in the current issue of The Journal of Statistics Education.

Selective Service System
Mobilization Readiness

[SOURCE: Budget Justifications FY 1993 (SSS) p.4 ]

The registration form includes information needed in case of later induction into military service. To be sure it is ready for action in an emergency, each year the SSS practices running a draft lottery -- i.e., filling capsules with numbers and loading and sealing the drums.

This statement is from a budget database developed by Colossus, Inc.

How much do you think was requested for this exercise? (The total 1993 SSS budget was 28 million dollars.)

(4) Jean V. Adams writes:

Did any one else notice? The cover photo on the spring 1997 issue of Chance (vol. 10 no. 2) is reversed!

I became suspicious when Siskel was shown sitting on the right (the viewer's right, that is) and Ebert on the left, since I think it's usually the other way around on their show. Then I noticed what looks like a wedding ring on Ebert's RIGHT hand. Finally, the buttons on their shirts clinched it. (Men's shirts are usually made so that you can tuck your RIGHT hand in, Napoleon style.)

Estimate the number of other readers who noticed this.

The following article was suggested by Norton Starr who learned of it from George Cobb who learned of it from X.

Homogenized damages: Judge suggests statistical norms to determine whether pain and suffering awards are excessive.
ABA (American Bar Association) Journal, September 1997, p22
Michael Higgins

This article is based on a case in the United States District Court for the Eastern district of New York with the decision made by Judge Jack B. Weinstein. The case number is 94-CV-1427 and is easily available from Lexis-Nexus (U.S. District Lexis 14332). We strongly recommend that you read the original. Judge Weinstein is an innovative and widely respected judge. You will really enjoy his discussion of the role that statistics can and should play in an important legal problem.

In suits beginning in March 1994, three women Patricia Geressy, Jill M. Jackson, and Jeannette Rotolo sued Digital Equipment claiming that Digital's computer keyboard caused repetitive stress injuries (RSI). A jury awarded them:

Geressy: Economic loss        $1,855,000
         Pain and suffering   $3,490,000
         Total                $5,345,000

Jackson: Economic loss          $259,000
         Pain and suffering      $43,000
         Total                  $302,000

Rotolo:  Economic loss          $174,000
         Pain and suffering     $100,000
         Total                  $274,000

Digital appealed these awards and Judge Weinstein made the following rulings:

The Geressy case should be retried because new evidence had been uncovered since the trial suggesting that her medical problems were not the result of using the keyboard.

The Jackson award was thrown out on statute of limitations considerations.

The award in the Rotolo case was considered reasonable and allowed to stand.

Even though he had to determine, for only one case, the reasonableness of the award, Weinstein used this case as an excuse to give his opinions on how a judge might go about assessing the reasonableness of such a jury award under appeal.

Weinstein first observes that there are usually agreed upon ways to assess economic loss so he will not consider the economic loss part of the award. There are not usually agreed upon ways to asses pain and suffering awards. Referring to pain and suffering awards, he writes:

These awards rest on the legal fiction that money damages can compensate for a victim's injury.... We accept this fiction, knowing that although money will neither ease the pain nor restore the victim's abilities, this device is as close as the law can come in its effort to right the wrong.
He quotes another court as saying:

The law does not permit a jury to abandon analysis for sympathy for a suffering plaintiff and treat an injury as though it were a winning lottery ticket. Rather, the award must be fair and reasonable, and the injury sustained and the amount awarded rational related. This remains true even where intangible damages, such as those compensating a plaintiff for pain and suffering, cannot be determined with exactitude.

Weinstein explains why he feels that the present system of determining these awards is not rational. He refers to a study that looked for a correlation between the amount of the award and the length of time suffered and found none. The author of the study was unable to find any rationale for jury awards for pain and suffering. Another researcher wrote:

Both anecdotal and empirical evidence indicates that the disparity between awards for pain and suffering among apparently similar cases defies rational explanation.

Weinstein proposes that, in considering the reasonableness of a jury verdict, the court should gather a group of similar cases as a comparison group. He comments:

The imprecision inherent in simply making a vague estimate by looking at a comparative group turns the court toward a statistical analysis.

Weinstein illustrates the method he proposes in terms of making a judgment of the amounts the jury awarded for pain and punishment for all three of the cases under consideration even though other circumstances made this necessary only for the Rotolo case.

He describes how he would find a comparison group of similar cases. He first gives careful consideration to what constitutes similar cases. He feels that the injuries in the various cases can have different causes but should have similar symptoms. However, for these RSI cases, he would rule out a case where, even though the injury resulted in similar symptoms, the injury was the result of a traumatic experience such as an airplane crash. Weinstein finds a group of 84 cases with symptoms similar to at least one of the three RSI cases. 27 of these cases were similar to the Geressy case, 16 to the Jackson case, and 21 to the Rotolo case.

Weinstein proposes determining the mean and standard deviation for each of these three comparative groups. Then, for the jury award to be considered reasonable, it should not be more than two standard deviations away from the mean of the comparison group. If it is more than two standard deviations above the mean of the reference group, the award should be reduced to make it two standard deviations above the mean. Similarly, if the jury award is less than two standard deviations below the mean of the comparison group, it should be increased to make it two standard deviations below the mean.

He then judges the amounts awarded in the three RSI cases. For the 27 cases in the Gerresy comparative group the average award was $747,000 and the standard deviation $606,873. The amount awarded for pain and suffering by the jury was $3,490,000. Two standard deviations above the mean of the comparison group is about $2,000,000, so Weinstein would reduce the award from $3,490,00 to $2,000,000.

The mean for the comparison group in the Jackson case was $147,925 and the standard deviation was $119,371. The jury awarded $43,000 for pain and suffering. This is about one standard deviation below the mean, and so Weinstein considers the jury award reasonable and would let it stand.

In the Rotolo case, the mean of the comparison group is $404,214 and the standard deviation is $465,489. The jury awarded $100,000 for pain and suffering. This is less than one standard deviation below the mean so Weinstein would let it stand.

In discussing his method, Weinstein remarks that the distributions of the comparison groups are not symmetric, so perhaps it would be more reasonable to use the median rather than the mean. As to his choice of two standard deviations he writes:

Using two standard deviations supports the judiciary's efforts to sustain jury verdicts whenever possible. This approach is consistent with the federal and New York state constitutions that guarantee the right to trial by jury in civil cases. Narrowing the range to figures that fall within one standard deviation, however, speaks to the state policy of controlling jury verdicts.

We have given only a small part of a wonderfully thorough discussion of an interesting application of statistics to the law. You should read the whole decision!


These questions were suggested by questions raised by Robert H. Carver relating to the Higgins article.

(1) The Higgins article quotes a plaintiff's attorney as saying "We are too individual for these statistical models. Comparing two people who lose a leg, for example, is extremely difficult. Some people may be devastated and never leave their bedrooms for the rest of their lives. Some people may go on to compete in athletic events." Is the attorney suggesting the jury try to predict the future?

(2) How do you think Judge Weinstein's proposal would effect the distribution of the comparison groups? Would these distribution become clustered around two standard deviations from the mean? What would happen to the mean and standard deviation of future comparison groups?

Here are two articles on HIV research, which present an interesting contrast in reporting.

Medical Notebook. The placebo question: Can research kill?
The Boston Globe, 18 September 1997, A3.
Richard A. Knox

The opening of this article states the issue quite bluntly: Should some Third World babies be denied a drug known to prevent HIV infections as part of a study to see if other treatments are more effective? Most people would instinctively say no, and this is the conclusion of two critics writing in the New England Journal of Medicine. The actual scenario involves transmission of HIV from pregnant women to their unborn children. In research trials now underway, the effectiveness of a proposed new treatment regime is being compared to a placebo, even though an effective treatment program is already available.

Is this ethical? The critics charge it will result in the needless infection of children. They draw comparisons with the notorious Tuskegee syphilis study, in which 412 black patients were denied penicillin so that doctors would have the opportunity to observe the normal course of the disease. In the present case, they feel the results should be compared not to placebo, but rather to a 1994 US study in which the drug zidovudine was shown to reduce mother-to-child transmission of HIV.

But according to Dr. Joseph Saba of the United Nations AIDS Agency, the issue is not so clear-cut. In particular, he says that the US study does not give the proper comparison. First, the US treatment regimen involves having women enter prenatal care earlier than most Third-World women do. Second, most HIV-positive women in these countries breast- feed their babies, which has a 14% infection risk. Third, studies that do not use placebo would take longer to complete. And if they found only that the new treatment was less effective than the US standard, we would still not know if they were better than nothing at all.

Medical Notebook articles are typically brief synopses. This one may be misleading, in that it doesn't make clear that zidovudine (more commonly known as AZT) was the drug used in the both 1994 US trial and in the research described here. The difference is that US care includes a longer period of AZT use. The treatment being tested in the Third World is a "short course" of AZT treatment.

For more detailed discussion on some of these points, see the next article.


(1) This article does not mention the rationale behind the shorter regimen. Can you anticipate what it is? Do you agree that a shortened regimen is tantamount to "withholding" the drug?

(2) Do you agree that without a placebo we could not tell if the new treatment was better than nothing? Is it possible to use some sort of observational design to partly alleviate the ethical concerns?

(3) With regard to breast-feeding, do you think the data be adjusted to take Dr. Saba's 14% figure into account?

(4) Why would a study without a placebo take longer?

AIDS research in Africa: Juggling risks and hopes.
The New York Times, 8 October 1997, A1.
Howard W. French

This story begins by describing the case of Cecile Guede, a 23-year-old HIV infected mother, who participated in an Ivory Coast study of the type described in the previous article. She may have received the new treatment to prevent transmission during pregnancy (a "short course" of the AIDS- drug AZT), or she may have been in the placebo group. She still doesn't know which. Moreover, she still does not know if her son, now one year old, is infected.

Interviews reveal that Guede doesn't really comprehend that she was part of a controlled experiment. She received a number of medications during her pregnancy--some for malaria and other conditions, some related to the HIV trial--and assumes that all were good for something. She enrolled in the trial because in part it was her only access to health care, so her participation cannot be counted as "voluntary." These themes run throughout the article, as the general plight of women faced with the decision to participate is discussed. In addition to the problems of poverty and low literacy, we learn that women must often make the decision to participate shortly after receiving the devastating news that they are themselves infected with HIV.

While some ethicists view this scenario as blatantly exploitative, others argue that there is really no alternative. Third-world countries simply cannot afford the extensive AZT program (known as the 076 regimen) that was tested in the US. Thus it becomes essential to search for low-cost methods. The alternative, according to one doctor in the program amounts to giving everyone the placebo treatment--that is, nothing at all! Even so, say critics, wouldn't it at least be better to compare several low-cost treatments? No, answer others, because this would require longer trial periods and in the end produce less reliable results.


(1) Why would a study without a placebo take longer to complete? How would it be less reliable?

(2) The article notes that US doctors feel it would be nearly impossible at this point to get approval for placebo trials in this country, given that there is a known effective treatment. It is interesting that this point in phrased in terms of getting approval. If there were approval, do you think there would be any volunteers for such a study?

(3) Which article do you find more persuasive on the ethical issues? In the final analysis, do you believe that the research should be allowed to go on?

(4) Suppose the short regimen has one-tenth the cost, and is found to be X% as effective (i.e., prevents X% of the cases that would be prevented with the 076 regimen). It's unlikely that X=100. How big does X have to be before it is ethical to recommend the use of the short regimen in third-world countries?

Parental age gap skews child sex ratio.
Nature, 25 Sept. 1997, p344
J. T. Manning, R. H. Anderton, M. Shutt

It is well-known that animals have some control over the sex ratio (ratio of male to female offspring) of their offspring. Examples of this were reported in a recent issue of Nature, (2 October, 1997 p 442). This article reports that a recent study found that a species of parrots that produce long runs of one sex--one female produced 20 sons in succession followed by a run of 13 daughters. An earlier study on the Seychelles warbler showed a large variation in the sex ratio. In this species of warbler, young females often remain on their parents' territory to help produce subsequent offspring, while young males moved away. On high- quality territories, daughters are an advantage, and about 77 percent of the offspring are female. On low-quality territories, they are a disadvantage because of the resources they use up and, as a result, only about 13 percent of the offspring are female.

There have been many attempts to show that similar variation can occur in humans. Studies of German, British and US editions of Who's Who have found that men register far more sons than daughters. Despite the contribution of the Clintons, American presidents have produced 50 percent more sons than daughters. In addition it has been observed that a higher proportion of boys are born during and shortly after wars.

Manning and his colleagues show that the sex ratio is correlated with the spouse age difference (age of husband - age of wife). Families in which the husband is significantly older than the wife tend to produce more boys while those in which the wife is significantly older than the husband produce more girls. The authors based these conclusions on a study of 301 families who attended a secondary school in Liverpool which recruited students from a wide range of socio-economic groups.

The authors also found that, in England and Wales, the mean spouse age difference increased during and immediately after the two world wars and was strongly correlated with the sex ratio during the period 1911-1952. Thus the increase in the proportion of boys during the two wars can be explained by the increase in the spouse age difference during these periods.


(1) Can you explain why the spouse age difference should increase during and immediately after a war?

(2) Can you think of any reason why the sex ratio should be correlated with the spouse age difference?

(3) Does the fact that those in Who's Who register more sons mean that they had more sons?

Portrait of the electorate.
The web from New York Times on the web
Marjorie Connelly

The New York Times has put together several exit polls to show how demographic groups have voted in presidential elections since 1972. For example, in the most recent election women, blacks, young voters, Democrats and liberals all gave Clinton a majority of their votes, permitting him to win without a majority of the popular vote. Democrats who described themselves as conservatives stayed with their party while Republicans who described themselves as liberals were closely divided between Clinton and Dole.

We have made this data available on the Chance web site in a form that can be imported into a statistical package. The data is broken up according to: age, economic status, educational background, economic status, geography, marital status, political ideology, race and religion, and voting status.


(1) What is the most useful way to present data of this kind? For example, should it be in ten separate files according to demographic groups, as it appears in the New York Times, or in one large file?

(2) Given this data, what kind of questions would you like to explore?

Perspective on space exploration; probability, plutonium don't mix.
The Los Angeles Times, 10 October 1997, B9
Op/Ed piece by Najmedin Meshkati

Meshkati is an Associate Professor of civil/environmental engineering and industrial and systems engineering at the University of Southern California. Here he questions the safety of the Cassini mission to Saturn, due to launch Wednesday carrying 72 pounds of plutonium-238 batteries. The craft will fly by Earth in August 1999, at an altitude of 500 kilometers, and use Earth's gravity to accelerate towards Saturn.

NASA has estimated the chances of an "Earth impact" accident at 8 in ten million. Meshkati finds this reminiscent of the optimistic reliability figures given for the space shuttle Challenger. As many of us will recall, Challenger crashed in 1986 when an O-ring failure caused a solid rocket booster to explode. (A case study of the O-ring failure data, suitable for an introductory class, is given in Chatterjee, Handcock and Simonoff's "A Casebook for a First Course in Statistics and Data Analysis." Meshkati quotes Nobel laureate Richard Feynmann, a member of the commission that investigated the Challenger disaster, who concluded: "It would appear that, for whatever purpose,...the management of NASA exaggerates the reliability of its product, to the point of fantasy." In the present case, Meshkati charges that NASA has discounted spacecraft software errors, errors in ground commands, and navigation design errors as contributors to the chance of Earth impact.


(1) In the case of Challenger, at least there were available data on O-ring failures. (They were apparently not correctly interpreted.) How do you think the chance of earth impact for Cassini was estimated?

(2) In another article we read that

The space agency says the chances of a radioactive release during the first 3 1/2 minutes after blastoff are 1 in 1,400. The chances of a release later in the rocket's climb to orbit are put at 1 in 476. The odds of the spacecraft falling to Earth during an August 1999 speed-boosting flyby are 1 in a million.

How do you think these odds were calculated and how reliable do you think they are?

Conquering Statistics: Numbers without the Crunch.
Plenum, New York 1997
ISBN 0-306-45572-2, $13.77 from Amazon.com
by Jefferson Hane Weaver

Why Flip a Coin? The Art and Science of Good Decisions.
John Wiley and Sons, 1997
ISBN-0-471-16597-2, $19.87 from Amazon.com
by H. W. Lewis

These two books are intended to give the lay public insight into topics usually thought to be too technical for general appreciation. Weaver is an attorney and Lewis a physicist and probably, in their own fields, they are as stuffy as most of the rest of us. However, writing outside their field, they bring a fresh approach that is fun to read. Both avoid mathematical notation and formulas, but they approach their subjects very differently. Weaver discusses absolutely standard and basic concepts of probability and statistics, interjecting humor to lighten up the discussion. Lewis discusses decision theory and uses famous problems such as the secretary problem, the prisoner's dilemma, voting paradoxes, and the infamous Monty Hall problem to keep the readers' interest.

Whereas Weaver's explanations of basic concepts are accurate and well done, they are not very different from what most of us would try to do in explaining statistics to our Uncle Charley. However, Lewis does better than most of us would do with Uncle Charley. His trick is simply to require the reader to ask: "what are possible outcomes of a decision? what are the assumptions made in calculating the probabilities of these outcomes? and what new information do you get at each step of the decision process? To show how effective this is, consider his explanation of the envelope paradox.

In this paradox, a genie has put money in two envelopes with twice as much money in one envelope than in the other. Mary randomly chooses an envelope at random and brother John gets the other envelope. Mary finds that she has $100 in her envelope and so knows that John has in his envelope either $50 or $200. Since she chose an envelope at random, Mary reasons that the expected amount in John's envelope is $125 and so offers to switch with John. The paradox comes from the fact the same argument seems to suggest that John should also want to switch envelopes. Lewis resolves this apparent paradox by asking Mary to consider what she has really learned by finding $100 in her envelope. She has learned that the genie put either a total of $150 or $300 in the envelopes. Mary's argument assumes that these two possibilities are equally likely when clearly she has no reason to believe this. In fact, she has not even been told how the genie chose the amounts to put in the envelopes.

The chapter on applications to the stock market discusses the random walk model, but Lewis might also have discussed more specific models really used in day-to-day transactions, such as the Black-Schole model for pricing options. Of course, he could not have anticipated Schole getting a nobel prize!

In the chapter on sports, Lewis remarks that baseball is a subject that loves statistics but yet seems not to make use of probability models to help with strategic decisions. For example, what is the optimal batting order? (One of John Kemeny's first computer programs studied this problem). Should the batter make a sacrifice bunt with players on first and second and no outs (an example considered by Hal Stern in his article in Chance Magazine Vol. 10, No.1)?

We were pleased that Lewis discusses a strategy we use to win the Dartmouth Math department football pool (We have already won 3 times in the first 7 weeks.) This is the "evil twin strategy" discovered at Dartmouth (See Chance News 2.08).


Discussing the problem of throwing a die 12 times, Lewis observes that the expected number of 7's is 2 but there will be fluctuation in the number of sevens when this experiment is repeated many times. He writes:

But how much will it actually fluctuate? Here there is a magic rule that tells you that for such a case, whatever the expected number of events may be, the average fluctuation is just about the square root of that number. (That magic number is called the standard deviation. The rule doesn't apply in all cases, but in most.)
Does it apply in this case? Lewis uses this "rule of thumb" in throughout the book. Is it a reasonable approximation?

Deadly deception.
The Boston Globe, 15 September 1997, C1
Richard A. Knox

In 1972, the journal "Pediatrics" announced that SIDS (Sudden Infant Death Syndrome) can run in families. A key part of the story was the record of the Hoyt family, in which 5 children's deaths had been attributed to SIDS. Dr. Alfred Steinschneider, who authored the report, postulated that a physiological defect caused babies to stop breathing. There followed years of searching for a biological risk factor, as well as ever-changing recommendations as to whether infants were safer sleeping on their sides, stomachs or backs.

Twenty-three years after the report, Waneta Hoyt was convicted of murdering her five children. She was diagnosed with a psychological disorder known as Munchausen syndrome by proxy, which involves faking symptoms of illnesses in one's children, even by inflicting physical harm. According to Dr. Marc Feldman of the University of Alabama, smothering deaths are "more common than anything else" among Munchausen by proxy cases.

But how common is smothering? Phipps Cohe of the SIDS Alliance cites a 1994 estimate by the American Academy of Pediatrics that 95 to 98% of all SIDS cases are correctly diagnosed. Other experts estimate that up to 10% of presumed SIDS cases may be murders. All of this discussion must be horrifying for parents whose children have died of SIDS. Still, child abuse experts warn that society must be vigilant in asking these questions.


Does Feldman's phrase "more common that anything else" imply that most Munchausen by proxy cases involve smothering incidents? What does it say, if anything, about the number of smotherings attributable to Munchausen by proxy?

The next article was suggested by Milt Eisner.

The hidden truth about liberals and affirmative action.
The Washington Post, 21 Sept. 1997, C5
Richard Morin

Surveys generally show that Democrats favor affirmative actions while Republicans are opposed to it. In their new book "Reaching Beyond Race" published by Harvard Press, Paul Snidermann and Edward Carmines say that these surveys do not reveal the true beliefs of the Democrats.

To show this, they randomly divided a representative sample of the national population into two groups. One group was told: I'm going to read you a list of three items that sometimes make people angry or upset. After I read you the list, just tell me how many upset you. I don't want to know which ones, just how many.

Then the interviewer read a list of three items: the federal government increasing the tax on gasoline; professional athletes getting million-dollar-plus salaries; large corporations polluting the environment.

The second group was presented with the same three items. But a fourth item was added to the list: Black leaders asking the government for affirmative action.

Since both groups got the same first three items, any difference between groups must be caused by the response of the second group to the fourth item. According to the article:

When researchers analyzed the results, they found the political divisions over affirmative action found in other polls were conspicuously missing. Liberals were as angry as conservatives, [57 percent versus 50 percent] and Democrats [65 percent] were as angry as Republicans [64 percent]
This article is not very clear where these percentages come from but looking at the book suggests that they arrived at them along the lines of discussion question 2.

In another national survey the experimenter asked about the attitudes toward affirmative action together with questions about attitudes toward racial stereotypes: blacks are lazy, blacks are irresponsible etc. The percentage of blacks who identified blacks as "lazy" was 20% when the affirmative action question was asked after the attitude question and 31% when it was asked before. When the affirmative action question was asked before the question "are most blacks irresponsible" 43 percent agreed with this statement, but when it was asked after only 26% agreed with it.


(1) Jim Baumgartner says this whole study upsets him. Why do you think he said this?

(2) Consider a particular group, say Democrats. Because the choice of who got three items and who got four was random, we can assume that about half the Democrats were given the three items and the other half were given the same three items plus the affirmative action item. Suppose that the average number of items that angered or upset the three item group was found to be 1 and for the four item group it was found to be 1.5. Then the authors would claim that about 50 percent of the Democrats must have said they were angry or upset about the affirmative action item. Why?

(3) When people indicate, in a poll like this, that they are angry about, for example, athletes getting million-dollar-plus salaries, do you think that they really are angry?

(4) Is it reasonable to assume that the response to the three items would be the same as they would be if they were asked together with a fourth item?

Breast implants don't foster cancer, US study finds.
The Boston Globe, 17 September 1997, A8
Associated Press.

A report in the "Journal of the National Cancer Institute" finds that silicone breast implants do not cause breast cancer. The report is based on a review of over 100 studies on the effects of the implants. A slight association with connective tissue disorder was found, but the co-author of the review insists that this is a "borderline" result that needs to be interpreted with caution. On the other side of the coin, there was some suggestion that implants might actually prevent breast cancer, but researchers said more studies were needed to support any such conclusion.

The report, of course, runs against the claims of thousands of women who have joined lawsuits against implant manufacturers. The ongoing saga has been discussed in earlier issues Chance News.


Do you think this review will lead to dismissal of any of the pending suits? Do you think it should?

Global warming confusion.
The Christian Science Monitor, 24 September 1997, p1
Brad Knickerbocker

With an upcoming White House conference on global warming, and an international conference scheduled for December in Japan, the public is becoming heavily involved in what had previously been a scientific debate. Industry and labor groups have engaged in an ad campaign warning that a proposed treaty limiting carbon-dioxide emissions will lead to a loss of 1.5 million jobs and cause drastic increases in consumer prices. Environmental groups, on the other hand, see potential economic benefits in moving away from fossil fuels, including 800,000 new jobs by 2010 and average annual savings to American households of $530.

Information related to global warning issues is available on the Web: the Environmental Protection Agency's pages.

There are links to other related Web sites. And, of course, many other web sites can be expected to emerge in the course of the debate.


Does it seem clear that any change in energy usage patterns will lead to creation of some jobs and the loss of others? How precise do you think any estimate of the magnitude of these numbers can be? How do you think the estimates are made?

Bob Griffin suggested the following article and the first two discussion questions. Bob points out that newspapers have gotten quite good at explaining the meaning of margin or error in a poll and wonders what they should do when dealing with statistical significance of means in surveys like the one in this article.

Driver satisfaction falls in DOT survey.
Milwaukee Journal Sentinel, 12 Sept. 1997, p1
Larry Sandler

The Wisconsin Survey Research Laboratory surveyed 845 Milwaukee residents last spring for the Transportation Department. This is the second such annual survey designed to help the Transportation Department decide how to spend state money to accommodate the needs of the citizens.

Survey respondents were asked to rate their satisfaction with department performance in operating and maintaining state highways. On a scale of 1 to, 10 the department scored a 6.4, down from 7.1 in 1996.

The article reports:

When respondents were asked to list things the department should do to maintain the state highways in their area, 46 percent called for fixing up potholes, up sharply form 26 percent a year ago.
In both surveys drivers rated snow and ice removal as a top priority leading the department last year to shift resources into snow and ice removal even though it meant delaying other maintenance projects. The harsh winter and delayed maintenance can contribute to pothole formation.

Survey respondents were also asked to rate the importance of 16 department services. Readable highway signs and working traffic lights rated high and, down at the bottom, we find eliminating weeds and planting grass and flowers along the roads. As a result, roadside gardening was among the first services to be cut back when spending rose on winter maintenance.


(1) What other information would you need to determine whether the percentage of motorists who cited the department for not fixing potholes increased to a statistically significant degree since last year? How would you explain sampling error for percentages (the margin of error at the 95% level of confidence) to a reader of a daily newspaper in no more than a few brief and clear sentences?

(2) What other information would you need to determine whether the average satisfaction rating respondents gave the department actually decreased from last year to a statistically significant degree? What statistical test might you run, given that the previous year's respondents are not the same as this year's? How would you explain this test to a reader of a daily newspaper in no more than a few brief and clear sentences?

(3) Is this a good way to decide whether you should have flowers along the roadsides?

Two-thirds with HIV know it, CDC reports.
The Boston Globe, 29 September 1997, A12
Associated Press.

Researchers at the US Centers for Disease Control and Prevention estimate that about 775,000 Americans carry HIV, and that of these, at least 500,000 have been tested and know their status. The finding is based on cases of infections filed with the CDC through June, from 25 states where doctors are required to report names of patients with the virus. In these states, 240,000 people had AIDS, and another 76,000 were HIV-positive but did not yet have symptoms. These data were used to estimate numbers of diagnosed infections in states without mandatory reporting.

Researchers add that the two-thirds estimate is conservative, because it does not include people who learned their status through anonymous testing. Among cases of infection known to authorities, 80% were diagnosed in hospitals, doctor's offices and clinics.


(1) How do these data lead to estimates for the number of infected people who don't know their status?

(2) Do you think people are more likely to be tested in states where doctors are not required to report names? In what direction would this bias the findings?

(3) Accepting the two-thirds estimate, how might one use the 80% figure to adjust it upwards?

Dueling data: Citing newer studies, some doubt PCBs cause cancer in humans.
The Boston Globe, 29 September 1997, C2
Michael Cohen

For four decades (1930s to 1970s), PCBs were widely used as electrical insulators. For the last 20 years, however, they have been listed as probable carcinogens, and their use was banned in 1976. Studies have shown that animals develop liver cancer and other serious disease when fed high doses of PCBs. However, the human data are less conclusive.

Attention has focused on electrical workers, who tend to have the highest exposure to PCBs. A 1987 federal study of 2588 workers at two plants found 2.5 times the expected rate of liver cancer, but not an overall increase in cancers compared to the general population. A 1992 study of 3588 workers a plant in Illinois found a four times higher than expected rate of melanoma. A 1987 study of 2100 exposed workers in Italy found three times the regional average rate of cancers of the gastro-intestinal system.

Dr. Philip Guzelain of the University of Colorado Health Sciences Center says such findings do not provide a clear cancer link because in each study the type of cancer was different. Dr. Renate Kimbrough, who did the original rat studies on PCBs, agrees. She recently completed a GE-funded study of workers at two GE plants, which found less cancer of all types than in the general population. She attributes this to the "healthy worker syndrome", which refers to the fact that the working population is healthier overall than the general population.


(1) In light of the "healthy worker syndrome," should we worry about situations in which workers are no healthier than the general population?

(2) Dr. John Villars of the EPA says:

I think Americans expect the EPA to err on the side of caution. There are always skeptics who need to see incontrovertible proof of cause and effect. But this issue, and many others we face, don't lend themselves to absolute judgments. And in the face of uncertainty, it's our legal and ethical respons- ibility to err on the side of caution.
Comment on this point of view.

Jean Adams suggested the following article and provided the discussion questions for it.

Delaying start of school may hurt child.
Chicago Tribune 7 October, 1997, p4
Brenda C. Coleman (The Associated Press)

A study in the October issue of the journal Pediatrics found that 12 percent of the children who started school when they were a year or more older than their classmates displayed extreme behavior problems, compared with 7 percent of children whose ages were normal for their grade. It is said that the problems become more apparent as the children grow older.

The study found that 19% of the children who had been retained while going through school displayed extreme behavior problems.

The article states that doctors had already known that adolescents who are older than most of their classmates are more likely to smoke, drink alcohol, use drugs, engage in risky sexual behavior, think about suicide and be violent. They have not known whether the problems are linked to delayed school entry or failing a grade or both.


(1) What factors might influence the decision to delay a child's entry to school?

(a) How does this affect the interpretation given in the article?

(b) The lead researcher Dr. Robert Byrd said: "We need to concentrate our efforts on getting kids ready to enter school at the age that they're supposed to." Do you agree?

(2) What factors might influence the decision to retain a child while going through school?

(a) How might the child react to repeating a grade? How might the other children react?

(b) How does this affect the interpretation given in the article?

(3) The article states that children who started school when they were a year or more older than their classmates were 70 percent more likely to display extreme behavior problems. How was this figure calculated?

(4) Can you conclude from this study that delaying entry into school or retaining students causes behavioral problems?

Dr. Ian C. McKay sent the following contribution:

These are the first two paragraphs of an article in the Times of 26th September 1997.

School test marks higher, but boys could do better. CONCERN over slow progress in improving boys' literacy overshadowed better results overall in national tests for 7, 11-and 14-year-olds yesterday. For the first time more than 60 per cent of 11-year-olds reached the average score or better in English, Mathematics and Science. However, far fewer boys than girls made the grade in English at all three ages for national testing.

The figures were hailed by the Government as a major step towards its target that by 2002 some 80 per cent of 11-year-olds should reach the average level in English and 75 per cent in Mathematics.


(1) What average levels are referred to? Are these the averages of the population for the age group being tested? What policies, if any, are likely to succeed in ensuring that "80 per cent of 11- year-olds should reach the average". Can this be done by teaching them better?

(2) Why stop at 80 per cent? Why not try to get 100 per cent of pupils above the average?

(3) Is the British Governments trying to ensure that the distribution of marks should have a strong negative skew?

We received the following response from Jeff Norman to a question raised by Richard Brucker in Chance 6.09.

Dear Chance Editor,

In Chance 6.09, Richard Brucker raised the issue of the relationship between intuition and chance. As someone who played poker for a living for many years I would like to add a few thoughts.

The poker game of Texas Hold'em requires the most intuition and provides good examples of the process. Each player receives two cards and uses them in any combination with five common cards to make the best five-card hand. There is a round of betting after each player receives two cards, a round after the first three common cards, and then another round after each of the last two common cards.

Since each player has only two cards, there is a relat- ively small number of possible hands (13 paired hands, 78 suited hands, and 78 unsuited unpaired hands). One simple model is to assume each player has an initial likelihood of each combination of two cards equal to the probability of that hand being dealt. Each time a player takes an action (bets, raises, folds, checks) all of the probabilities are adjusted accordingly, taking into account the possibility of actions deliberately intended to be misleading.

Naturally this is not possible to do accurately. This is where intuition comes in. Sometimes one realizes "in a flash" exactly what one's opponent has (with probability close to 1). Is this intuition, or has one logically determined that only one holding is consistent with all of your opponent's actions and table mannerisms? I would say this is intuition, aided by logic, and that this happens all the time. Expert poker players have highly developed intuition which also tells them when someone is bluffing or how a player will act in a given situation.

Players frequently make comments like "I knew a heart was coming," "I know I am going to win this hand," etc., but I agree with Richard Brucker that these statements are almost certainly invalid.

Intuition plays a role in other games, such as bridge. Recently my partner gave a loud snort while sorting his hand. One of the opponents asked "What does that mean?" I replied, half-jokingly, "It means he thought he had a six- card suit with honors and he realized he only has a five- card suit without honors!" After the hand was over my partner confirmed that was exactly what had happened!

One of the hardest things to do in poker is to recognize when that inner voice is based on accurate intuition and when it is based on fear or greed, and to trust it only when its based on accurate intuition.

Jeff Norman


John Finn, our expert on "abuse of language" in the press gave us another example from our last Chance News.

Dear Chance News

In Chance News 6.10, in the report on the article

Is using a car phone like driving drunk?
Chance Magazine, Spring 1997, pp.5-9
Donald A. Redelmeier and Robert J. Tibshirani
we read:

For example, a drunk driver's alcohol content may be significantly above the amount required to be legally drunk.


(1) Do you think the authors should be surprised at the media's interpretation of their statement in the NEJM article that compared the risk of using a cellular phone while driving to that of have a blood level of alcohol corresponding to being legally drunk?
Having a BAC (blood alcohol content) level that constitutes driving under the influence (and that is the name of the statute in most states; not "drunk driving" or "driving while intoxicated") does not constitute being "legally drunk".

That having this BAC does constitute being "legally drunk", and that being "legally drunk" is itself some sort of violation of the law, are two widespread misconceptions.

The facts, though, are that in many if not most states

(1) the statute prohibiting driving under the influence (of alcohol or other drugs) does not even mention intoxication, and

(2) the legal definition of intoxication is entirely in terms of behavior, and does not even mention BAC. Moreover, in both New Hampshire and Vermont (whose statutes I have handy) there is a very explicit statute that prohibits making intoxication itself a criminal offense. That is, in these states (and certainly others).

(3) being intoxicated is itself emphatically not a violation of any law.

(It's interesting to note here that the Vermont DUI statute applies to any drug whatever; if aspirin impairs your driving, then that's a violation of Vermont's DUI law. This is the case in about 30 states. In New Hampshire, on the other hand, the DUI law applies only to alcohol and controlled drugs, that is, "illegal drugs". Thus while in Vermont the emphasis is on driving in an impaired condition, in New Hampshire it's on driving in a sinful condition; if you're weaving all over the road because of the effects of the penicillin you took, that's no violation, since penicillin's not a bad drug that people take to get high.)

Surely we want to prohibit driving at a level of alcohol influence far short of drunkenness; it's not just drunk drivers we need to keep off the road, but anyone who's had enough to drink so that their driving is impaired.

In some European countries, Norway for instance, a BAC of .02 constitutes driving under the influence. That's a single drink for the average-sized man; in the U.S., .08, or 4 drinks, is common. If a Norwegian is convicted of DUI, can we say he was guilty of drunk driving? Well, American newspapers seem to think so; a couple of years ago the Mayor of Oslo had a drink at some ceremony, and shortly thereafter drove into another car and was arrested for DUI; but according to the American press he was arrested for "drunk driving" because "having a BAC of .02 is legally drunk in Norway".

This is nonsense, of course, but unfortunately this sort of supermarket tabloid hyperbole seems inevitable these days whenever the topic is alcohol or controlled drugs; other examples include using the term "binge drinking" to mean having as few as 4 drinks in an evening, and, of course, "drug abuse" to mean any use whatever of a controlled drug. The idea seems to be that unless we resort to this sort of hyperbole, others may not be convinced that we're on the Right Side in the battle against Demon Rum and the Devil's Drugs.

Of course, vilifying anyone charged with DUI as a "drunk driver" affords the age-old pleasure of stoning sinners, so I can hardly expect many to be inclined to desist on the grounds that this is slander.

But there's another good reason not to indulge in the hyperbole: as long as we keep current the idea that the offense is "drunk driving", those who have had a few drinks, but are not drunk, will think they can safely drive, not realizing that they are risking serious criminal penalties, and, more importantly, endangering themselves and others. As David W. Kelley, a California Highway Patrolman, puts it in his wonderful little book, "How to Talk Your Way Out of a Traffic Ticket",

Let me clear up one common misconception. "Drunk Driving" is inaccurate terminology. You do not have to be drunk to be a menace on the highway or even to be arrested. You only have to be influenced to some degree by the alcohol you have consumed...Often, the person who has had "only two or three beers" is more dangerous behind the wheel than the one who is obviously "drunk" -- because the drunk knows he is drunk and knows he has to use some caution or he won't make it home. The person who has had "only two or three drinks" usually thinks he or she is still fully capable of driving -- then causes others to suffer for this careless and foolish mistake.

There is a also a national organization, C.A.N.D.I.D., Citizens Against Drug Impaired Drivers, that endeavors to bring some rationality to the matter of driving under the influence of alcohol and other drugs. C.A.N.D.I.D. emphasizes that driving under the influence of "good" drugs, of prescription and over-the- counter medications having nothing to do with getting high, is dangerous and irresponsible driving, and a criminal violation in some 30 states.

Driving under the influence is not drunk driving, and relinquishing the indulgence of calling it that can save lives.

John Finn
Math Department, Dartmouth College
Hanover, NH

Editor's comment.
We searched on Lexis-Nexis both for "driving under the influence of alcohol" and "drunk driving". Almost all notices of the result of court cases use "driving while under the influence of alcohol" while articles not referring to a specific court cases but the problem in general say "driving while drunk". This is true even in discussing a bill in congress that would make the legal limit .08 nationally. The American Medical Association has recommended that it be .05.

Norton Starr wrote:

Regarding "Rich students deserting small private colleges" (article 13 of Chance News 6.10), I wonder whether the data were adjusted for inflation. Presumably the numbers (and perhaps even the proportion) of families in the richest group (annual income over $200,000) increased between 1980 and 1994, and the same could well have been true for the "upper- middle income families" ($100,000 to $200,000). Without an inflation adjustment this could be an important determinant of the percentage increases given. While I believe a decline in available financial aid, the intimidating effect of today's (and 1994's) levels of tuition (not adjusted for inflation), and the altered perceptions of public institutions have all played a role in the percentages reported, the inflation issue is at least as important a consideration.

The book to be published will, I hope, include more demographic information than was reported from the Globe article. Finally, some discussion of what is regarded as a desirable goal here would be helpful: does one seek the same proportion of students from each income category, or certain percentages or absolute numbers from select income categories, or stability over time of percentages in place as of a baseline date, etc?

We received the following correction from David Jackson, co-author of the article:

Heavy defeats in tennis: psychological momentum or random effect?
Chance Magazine, Spring 1997, pp 27-34
David Jackson and Krzysztof Mosurski

that we reviewed in the last Chance News.

In the review in Chance News 6.10 it is stated that

the authors observe that the excessive occurrence of heavy defeats could also be explained by a model in which the outcomes of the sets are independent but a player's probability of winning a set is not constant throughout the sets of the match, as it would be in a Bernoulli trials model.

This is not what the authors state. For the alternative random effects model, which they eventually reject, the probability of winning a set remains constant within a match but may vary from match to match for players where the difference in ability is similar, ie, ratio of ranks similar, due to the random match effect. For the random effects model, matches between the same two players on different days may have very different probabilities of winning a set but that probability remains fixed for each match.


Finally, even though Gigerenzer's way of presenting false positive problems that we discussed in Chance News 6.10 make the computations easier, as a number of readers observed, you still have to do the arithmetic correctly. We have fixed some arithmetic errors in our calculations in the version of Chance News 6.10 on the web.


CHANCE News 6.11

(10 September to 9 October 9 1997)