CHANCE News 5.04

(28 Feb. 1996 to 28 March, 1996)


Prepared by J. Laurie Snell, with help from William Peterson, Fuxing Hou, Ma.Katrina Munoz Dy, and Joan Snell, as part of the CHANCE Course Project supported by the National Science Foundation.

Please send comments and suggestions for articles to jlsnell@dartmouth.edu.

Back issues of Chance News and other materials for teaching a
CHANCE course are available from the Chance web site:


Chance favors the prepared mind.
from the movie: "Under siege part 2"



About Bill Massey's quote ("when you listen to popcorn pop are you hearing the central limit theorem?"): Erol Pekoz remarked "I think that you are really hearing the Glivenko-Cantelli Theorem." (The empirical distribution is a good approximations to the sampling distribution when the number of samples is large.).

In a first, 2000 census is to use sampling.
The New York Times, 29 February 1996, A16
Steven A. Holmes

For the year 2000 census, the Census Bureau will not attempt the traditional complete enumeration of the population but rather will incorporate sampling into the enumeration. They plan to obtain information from about 90% of the households from interviews and questionnaires that can be returned by mail, telephone, and possibly even the web. Then a 10% sample will be used to estimate the remaining population. It is stated that this will decrease the projected cost of the 2000 census from 4.8 billion to 3.9 billion and could result in a more accurate count.

Some are concerned that sampling violates the constitution's requirement for an "actual enumeration". Here is what the constitution says:

Representatives and direct taxes shall be apportioned among the several states which may be included within this union, according to their respective numbers, which shall be determined by adding to the whole number of free persons, including those bound to service for a term of years, and excluding Indians not taxed, three fifths of all other Persons. The actual Enumeration shall be made within three years after the first meeting of the Congress of the United States, and within every subsequent term of ten years, in such manner as they shall by law direct.
It is well known that attempts at complete enumeration lead to an undercount of the population. This undercount is more severe for certain groups, including minorities. This is an important issue since the census count of the population can change representation in congress, and the amount of federal money available to a state.
In the last census, after the official census was presented the Census Bureau estimated the undercount and arguments were made to adjust the official count taking into account the undercount estimate. After much debate, the government decided not to do this. This decision was challenged in the courts by states adversely affected by it. The government's decision has, just recently, been supported by a ruling of the Supreme Court.

To avoid this problem in the year 2000 Census, the Bureau is planning to do all of the statistical analysis before reporting an integrated final census count by the deadline needed for forming new districts. The Census Bureau is currently experimenting with the most efficient method to evaluate the undercount using a practice 1995 census count.

The are considering two possible methods for estimating the undercount: one is called the capture-recapture method that they have used in previous census' and the other is a new method called CensusPlus. A working paper by Tommy Wright in the Census Bureau suggests the following kind of simulation to compare these two methods..

It is desired to find the number who live in a six-block area A. The census carries out an initial enumeration in this area such that each person is counted independently with probability .85. To estimate the true number in this area, a second independent enumeration is made of an area B consisting of 2 blocks chosen at random. More effort is put into this enumeration so that each person is now counted with probability .98.

The capture-recapture method estimates the true population of the area A by multiplying the number counted in A by the first enumeration by the factor:

(number counted in B by the second enumeration)/(number counted in B by both enumerations).

The CensusPlus method estimates the number in the area A by multiplying the number found in A by the first enumeration by the factor:

(number found in B by either the first or second enumeration)/(number found in B by the first enumeration).

We assume that we know the true number in each of the blocks and carry out a number of simulations to see which method results in the better estimate for the true population.

Charles Grinstead wrote for us two Mathematica programs Census and Goldfish to do this. Using these programs we found that the two methods give very similar results suggesting that the decision which to use will depend upon practical matters relating to actually doing the sampling.


(1) The article actually said "after census-takers contact about 90 percent of the population, the bureau will use sampling techniques to round out the total". How would they know when they had 90% of the population? If they did know they had 90% of the population why would they have to do more?

(2) Do you think the constitution permits sampling?

(3) If constitutionality is not an issue, do you think it would be a good idea to just take a 10% sample of the entire population? What are some of the problems in doing this?

(4) The classical application of the capture-recapture method is to count fish in a lake. You select c fish, tag them, throw them back and then recapture r fish and count the number t of fish tagged in this recaptured sample. Then estimate the number in the lake as cr/t What assumptions are made in this model? What could go wrong with these assumptions? What would be the corresponding things that could go wrong using the capture-recapture method in the census? Would the same problems occur if the CensusPlus method were used?

(5) We recently tried in class to use the capture-recapture method to estimate the number of crackers in a bag of Pepperidge Farm goldfish crackers (about 330). We used c = 50 and r = 40. When we simulates this 100 times using Grinstead's capture-recapture program the histogram of the estimates was concentrated on a very few numbers and one estimate was 2000. What is going on here?

Are all those tests really necessary?
Washington Post, 5 March 1996, Z7
Sally Squires

Standardized tests in elementary and secondary schools more than doubled between 1960 and 1989 while student enrollments increased by only 15 percent. The increase was especially large in public schools since these schools want to show legislators that millions of dollars in education money is being effectively used and also to meet the federal guidelines of Goals 2000. This article discusses how these tests are used and some of the benefits from their use and some of the concerns that educators have about their use Here are some of the experts concerns:

Test results can set unreasonable expectations for children or diminished expectations, both of which are not good.
It is not a rare occurrence to find test scores going up and erroneously infer from that kids are learning more when in fact they are just doing better on tests.
Tests are like taking a child's temperature. They don't provide enough information to know if a child is sick or not, they are just one piece of a diagnostic.
Children who score high on tests are tracked into more stimulating classroom environments where they learn more, which leads to a kind of self-fulfilling prophesy.
This article was apparently inspired by the following report just released by the National Academy Press. It did not say much about the report so we will comment on the report separately.

The use of IQ tests in special education.
Board on testing and assessment.
National Research Council
Edited by Patricia Morrison, Sheldon H. White, and Michael J. Feurer, and published by the National Academy Press 1996.

The use of IQ tests in decisions about the placement of children in special education programs has long been controversial. More than 20 years ago, a landmark case, Larry P. v. Riles, resulted in the prohibition of the use of IQ tests with African American children in California. At the time of this case, IQ tests were used in determining if a child should be classified EMR (educably mentally retarded) and placed in separate special classes. The court found that IQ tests were biased against minorities and that the separate classes for EMR students were dead ends. The issues raised are still current and at the center of another case now being tried in California. The U.S. Department of Education requested this study to help in making policy decisions relating to special education.

The report summarizes the comments of a number of experts in the field of testing who made presentations at two workshops. Richard Snow described 10 categories of validity evidence needed for cognitive ability tests. Daniel Reschly looks at data on the proportion of students classified as mildly disabled. He observed that African American studies are still over represented but the gap between these students and others is narrowing. He presents the results of a study that lend support to the hypothesis that poverty is a plausible explanation for much of this over representation.

Early research on students at the lower tail of the distribution of reading skills suggested that they could be meaningfully divided into two groups: one with a specific reading disability identified by having IQ test scores significantly higher than achievement scores and the other "slow learners" with IQ scores consistent with achievement scores. Those in the first group are then identified as having a specific learning disability. Jack Fletcher presented evidence challenging the validity of this "two group" hypothesis and its use in identifying students with specific reading disability.

The report states that there has been 20 years of special education policies that encouraged the use of assessment for classification and eligibility decisions in a process quite separate from instructional planning. However, according to the report, recent research has shown that assessment can and should be directly linked with instruction. In addition, research shows that children can be identified early, before school failure and low achievement become entrenched patterns. The report states that the challenge is to implement these findings on a larger scale.

Ask Mr. Statistics.
Fortune 18 March, 1996, p. 137
Daniel Seligman, David C. Kaufman

Mr. Statistics discusses computer solitaire which is packaged with Microsoft Windows and has gotten millions of PC uses hooked. One version you can play is standard solitaire embellished with payoffs and called vegas solitaire. The player is asked to imagine that he is paying $1 for each card, or $52 per game and is returned $5 for each card that can be "put away" on the piles your develop starting with the aces. If you get all the cards played out you makes $208. Unfortunately, this does not happen very often. In an attempt to see how unfavorable the game is Mr. Statistics played and recorded 2,100 games. The most frequent outcome was a loss of $27 corresponding to only 5 cards put out. Mr. Statistics estimated that the house edge was about 11% corresponding to an average loss of $5.75 corresponding to an expected number of 9.25 cards played out. He estimated a 1 in 33 chance for getting all the cards out.

Mr. Statistics observes that these are tentative estimates since there are certain elements of strategy to the game and he is only human. He requested help from ProfNet and did not find anyone who had analyzed this game.


How do you think a more accurate estimate of the value of this game could be obtained?

A treatment for a cancer risks another.
The New York Times, 21 March, 1996, A18
Associated Press

Girls who survive Hodgkin's disease face an exceptionally high risk of breast cancer later in life, a study has shown.

More than 90 percent of victims of Hodgkin's disease in childhood are cured when they get chemotherapy and low-dose radiation. But the treatment itself raises the risk of cancer in later years.

These statements refer to a study in the current New England Journal of Medicine that followed up 1,380 children treated for Hodgkin's disease between 1955 and 1986. There were 88 second cancers. Only 4 would have been expected in this age group. The risk was highest in those who were older when treated.

In the latest study, doctors estimate that women treated with Hodgkin's disease as youngsters face a 35 percent risk of developing breast cancer by age 40. By age 45, this may reach 55 percent

An accompanying editorial noted that the cancer has a higher rate of fatality than the complications of treatment so "maintaining high cure rates remains the highest priority to the management of childhood Hodgkin's disease."


(1) Norton Starr suggested this article and remarked that the headline is not supported by the article. Why do you think he says this?

(2) Starr finds other problems with the account of the study as reported in the Times. What other problems do you see?

(3) Neither the New York Times article nor the original article says anything about the children who have Hodgkin's disease and are not treated by chemotherapy and low-dose radiation. Should they have?

Scientific American, April 1996, 104-105
Ian Stewart

Stewart suggests that the popular game of Monopoly is a fair game based on a Markov Chain analysis. He looks at a single player's movement around the board as a random walk on a circle with forty points. The length of a step is determined by the outcome of the roll of a pair of dice. Then standard Markov Chain theory shows that the long range probabilities that the player is at any point on the board is 1/40. He concludes from this that Monopoly can be considered a fair game.

Stewart comments briefly on the fact that the first player is more apt to get properties like Oriental Avenue or Vermont Avenue but suggests that winnings will get evened out in the long run.

A much more realistic calculation of the limiting distribution can be found in "Monopoly as a Markov Process", by Robert B. Ash and Richard L. Bishop (1972), "Mathematics Magazine" Vol 45, 26-29. These authors make only very minor simplifying assumptions. In their more realistic model, the limiting probabilities are not equal. They observe that much of the variation in the limiting probabilities is due to the effect of going to jail. They also find the expected income from holding a group with hotels. Green should be your favorite color.


(1) Why might it be advantageous to go first in Monopoly? Do you think there is much of an advantage in going first?

(2) Assume that there are three players of equal abilities. Estimate the probability that each player wins.

(3) Which limiting probabilities do you think are most affected by going to jail?

Hawking fires a brief tirade against the lottery.
The Daily Telegraph, 4 February, 1996 p. 7.
Robert Uhlig

Hawking writes in "Radio Times" that he thinks that gambling profits are a pretty sleazy way to raise money even for good causes. The real interest in this article is the comment that "Statisticians have determined that if you buy a National Lottery ticket on a Monday you are 2,500 times more likely to die before the Saturday draw than land the jackpot.

In long running wolf-moose drama wolves recover from disaster.
The New York Times, 19 March, 1996, C1
Les Line

Isle Royale is an island about 35 miles long and 7 miles wide in the middle of Lake Superior. I've been on the Island for the last sixty summers, but moose have been there even longer than I have. In 1949, nine years after Isle Royale became a National Park, the lake between Isle Royale and Canada was completely frozen and wolves came across the ice to the Island. Since that time biologists have had a wonderful place to study a predator-prey relationship.

Biologist Rolf Peterson has been studying this relationship since 1970. The park is completely closed during the winter except for his annual trip to observe the state of the wolf and moose herds.

Peterson's studies provide a wealth of data on this predator prey relationship. From his most recent annual report you can see a graph of the population of wolves and moose from 1960 to 1994. In his new book "The Wolves of Isle Royale: A Broken Balance", Peterson remarks that "for the period from 1959 to 1980 the wolf and moose population appeared to cycle in tandem, with wolves peaking about a decade after moose." In 1980 there were 50 wolves and about 1000 moose. This was followed by a dramatic drop in the number of wolves, believed to be caused by a disease brought to the Island by visitors to the park. Recently the number of wolves dropped to as low as 10, and this led to speculation that the wolves would die out. In addition to the low numbers, concern about the future of the wolves comes from DNA studies that have verified that all the wolves on the island have a single recent ancestor.

This winter, when Rolf made his annual seven-week study, he found that the wolves had increased to 22 and the moose population was reduced to an estimated 1200, about half the number of the previous year. Like the rest of us, the moose suffered a severe winter this year. In addition they suffered from a heavy winter tick infection that caused hair loss and made them more vulnerable to the cold.

Despite the continued concern about inbreeding, the possibility that this predator-prey relationship will continue has dramatically improved.


(1) How do you think they estimate the number of moose on the Island?

(2) Would you expect the graphs of the moose wolf population through the years to look like those produced by the classical predator-prey mathematical model.

(3) If the wolves die out, do you think they should be replaced?

Silent sperm.
New Yorker, January 15, 1996, 42-55
Lawrence Wright

Sperm counts: some experts see a fall, others poor data.
The New York Times, 19 March 1996, C10
Gina Kolata.

The "New Yorker" article is a typical long and very thorough discussion of the apparent decrease in the quality and amount of sperm produced by men. A number of studies in the last decade claim to show a dramatic drop in this sperm cont. In addition this study has shown that the quality of the sperm is also decreasing. The results of a meta-study were published in the British Medical Journal in 1992 that reviewed 61 papers published between 1938 and 1991. The authors reported that the average sperm count had declined from 113 million per milliliter to 66 million per milliliter. The randomness of the movement of the sperm and the number of things that have to go right for success in fertilization suggest that if these drops are real and continue they will have a serious effect on the fertility rate, to say nothing of the survival of the human race.

The "New Yorker" article discusses the many theories that have been put forth to explain this drop in sperm count. Some of the more interesting theories have to do with trying to explain why the sperm counts for the Finnish men have not decreased while those for the neighboring Danish men have. The theory that modern industrialization is the culprit receives support from the fact that Finland was industrialized much later than Denmark. A related theory, that the damage is done by synthetic chemicals in the environment from industries, is being popularized in a new book called "Our Stolen Future".

The impression that one gets from reading the "New Yorker" article and other recent articles is that the evidence for a dramatic decrease in sperm counts worldwide is very convincing, with the obvious disaster if it continues. In her article, Gina Kolata gives us some hope. She states that a significant number of experts have challenged the methodology of many of the studies showing a decrease in the sperm count. One of the principle concerns about earlier studies comes from recent studies showing that, not only is there considerable variation in the average sperm counts between countries, but even within the same country. For example, the average sperm count in New York is much higher than in Los Angeles. This would cause problems with some of the previous studies that compared data from one country at one time with data from different countries at a later time.

The author of the New York Times article evidently had trouble with the scientific notation, stating that the meta-study had shown that sperm counts had dropped from 1,130,000 per milliliter to 660,000 per milliliter. An astute student noticed that both these numbers are less than the 20 million per milliliter considered to be a number difficult to father a child.


(1) How do you think they estimate the number of sperm per milliliter?

(2) Most studies have been carried out by clinics where it is natural to collect sperm counts -- fertility clinics, sperm banks, etc. Do you see any problems with this?

(3) What might be confounding factors in a study that compares sperm counts at different times?

(4) If it could be shown that fertility rates have decreased, would it follow that sperm counts are down?

(5) How would you design a study to help settle this question?

Evaluation of the military's twenty-year program on psychic spying.
Skeptical Inquirer, March/April 1996, 20(2), 21-23
Ray Hyman.

"An Assessment of the Evidence for Psychic Functioning" by Jessica Utts, available from her home page on the web.

As discussed in Chance News 4.16, in the early 70's, the CIA supported a program to see if ESP could help in intelligence gathering. Laboratory studies were done at the Stanford Research Institute. In addition to this research, psychics were employed to provide information on targets of interest to the CIA. The program was abandoned by the CIA in the late 70's but taken over by the Defense Intelligence Agency (DIA) until 1995 when it was suspended. The DIA studied foreign use of ESP, employed psychics, and continued laboratory research at SRI and later at the Science Applications International Corporation (SAIC) in Palo Alto California.

The program was declassified in 1995 to allow an outside evaluation. The evaluation was carried out by Ray Hyman and Jessica Utts. Hyman is a Psychologist known for his skepticism of psychic behavior and Utts is a statistician known to support the reality of psychic behavior. Utts and Hyman concentrated their efforts on experiments carried out after 1986 that were designed to meet the objections that the National Research Council and other critics had aimed at previous studies. Hyman describes the typical experiment in the following way:

A remote viewer would be isolated in a secure location. At another location, a sender would look at a target that had been randomly chosen from a pool of targets. The targets were usually pictures taken from the "National Geographic" During the sending period the viewer would describe and draw whatever impressions came to mind After the session, the viewer's description and a set of five pictures (one of them being the actual target picture) would be given to a judge. The judge would then decide which picture was closest to the viewer's description. If the actual target was judged closer to the description, this was scored as a "hit"
Hyman and Utts agreed that these experiments seem to have eliminated obvious defects of previous experiments. They also agreed that the results of the ten best experiments could not reasonably be accounted for by chance.

These articles by Utts and Hyman explain why, despite this agreement, Hyman remains a skeptic and Utts a believer.


What would it take to convince you that ESP was a real phenomenon?

Hyman states that, even if one is convinced that the results are statistically significant, this does not mean there is such a thing as extra sensory perception. What do you think he means by this?

How safe are Tylenol and Advil? Helping patients sort out risks.
The New York Times, 27 March 1996, C11
Philip J. Hilts

Recent television attacks between Tylenol and Advil have led to confusion about the risks associated with taking either drug. The risks being debated are small: 1/1000 to 1/100,000 people, depending on the side effect, for Tylenol (acetaminophen) and Advil (ibuprofen). In addition, the risks seem to be confined to people who regularly drink high levels of alcohol and take higher-than-recommended doses of pain relievers.

Acetaminophen has been named in several studies in the last eight years as the cause of sudden liver failure in rare instances. In most of these cases, overdoses of acetaminophen were taken (above the recommended 4 grams or 8 tablets a day). In many cases, heavy drinking (6 drinks a day over a long period of time) was also found.

Ibuprofen also has a serious side effect -- it causes stomach irritation. Taken in larger than recommended doses and in combination with alcohol, it can produce stomach or intestinal bleeding severe enough to require hospitalization.

Dr. Brian Strom, the author of an editorial on the relative risk of the two drugs in last December's issue of "The Journal of the American Medical Association", said that a rough estimate would be that the number of cases of gastrointestinal bleeding from ibuprofen was 50 to 100 times greater than the number of cases of liver disease caused by acetaminophen. The reason for this is that the risk of bleeding is 50 to 100 times greater than the risk of liver disease to begin with, and it is known that use of ibruprofen can double or triple the risk of bleeding. There is insufficient data to say how much acetaminophen increases the risk of liver disease, but even if it is assumed that it doubles or triples the risk, the number of cases is still far lower than the number of cases for bleeding.

Dr. Debra Bowen of the FDA said the risks of the two products should not be compared since the risks are not the same for everyone. She suggests that those with liver problems should be more concerned about acetaminophen while those with ulcers and other gastrointestinal problems should worry about ibuprofen.


Do you agree with Dr. Bowen's comment that risks of the two products should not be compared since the risks are not the same for everyone?

Neyer's stats class back in session.
ESPNET SportsZone 0318
ESPNET SportsZone 0325

The article examines three factors relating to NCAA tournament performance --coaching experience, point-guard experience and playing in one's home state.

Neyer gives the following data for first-time tourney coaches from 1986 through 1995. The "Points For" column is the average number of points scored by the higher seed, and the "Points Against" column is the average number of points scored by the lower seed.

               Games    Points  Points    Differential  Effect
                          for   against    
All matchups     
-- overall      630     78.7       71.2         7.6       -3.9
First time --     
high seed        37     74.8       71.2         3.7       -3.9 
First time --
low seed         92     78.8       69.2         9.5       -1.9 
Overall                                                   -2.5
Note that the differential for Points for and Points against is low for first year coaches on high seed teams and high on low seed teams suggesting that first year coaches teams do not do as well as the group as a whole.

Neyer gives similar data for coaches in their first year at a school. He concludes that, based on the tournament data, a first time tourney coach has a 2.5-point disadvantage and a coach's first year at a school in the tourney has a 1.5 point disadvantage. He concludes that you might not want a coach in his first year at a school and his first tourney. The odds would be against him.

Again using data of the same form, Neyer shows that experience of the point-guard seems to help up to the Junior year, but then, mysteriously, senior point-guards did not help as much as junior point-guards. Finally he shows that teams that play in their home state have about a 4.8 point advantage comparable to that found in studies of home team advantage.


(1) What about the interaction between first year coach and first tourney? Is there interaction? That is, maybe a first year coach in the tourney that is his first tourney might be an advantage.

(2) Are we losing anything by collapsing the data into the two groups: high seed (top 8) and low seed (bottom 8)? How about 8th and 9th seeds? Is there really a difference? This year 3 of the 4 9th seeds won. What are the advantages and disadvantages of collapsing data?

(3) Might the fact that the best sophomore and junior point-guards usually go pro early help explain the senior point-guards did not help as much as the junior point-guards? Also, is there really such a thing as a freshman point-guard?

Fetal heart monitor called not helpful.
The Boston Globe, 7 March 1996, p 3
Alison Bass

It has been standard medical practice to electronically monitor fetal heartbeats during delivery, and to perform Cesarean sections when the monitors detect certain abnormal patterns. The goal is to reduce the risk of cerebral palsy, a disability resulting from damage to the brain's motor centers. However, in a California study of 156,000 live births, researchers from the National Institute of Neurological Disorders and Strokes found that this practice does not help prevent cerebral palsy. While some of the heartbeat abnormalities detected were associated with cerebral palsy, the article reports that children delivered by Cesarean section did not have a lower frequency of cerebral palsy than those delivered vaginally.


(1) If it has been standard medical practice to perform C-sections whenever certain patterns are observed, then what groups do you think are being compared in the last sentence above?

(2) The article goes on to say that "the researchers also found that in the vast majority of babies, abnormal heartbeats did not indicate cerebral palsy: 99.8% of babies in whom such heartbeats were detected, and who were delivered by C-section, did not develop cerebral palsy." What else would you like to know?

Intelligence: knowns and unknowns.
American Psychologist, 51(2),77-101
Ulric Neiser et. al.

In the Fall of 1994, the book "The Bell Curve" by Herrnstein and Murray discussed the concept of intelligence, its measurement, and its relation to human behavior and political decisions. This book led to heated discussions in the press. Believing that when science and politics are mixed, scientific studies tend to be evaluated in relation to their political implications rather than to scientific merit, the "American Psychological Association" appointed a task force to provide an authoritative report on the present state of knowledge on intelligence.

While the report was inspired by "The Bell Curve", the authors make no attempt to analyze the arguments in this book. Rather they follow their charge: "to prepare a dispassionate survey of the state of the art: to make clear what has been scientifically established, what is presently in dispute, and what is still unknown." While less lively reading than the "The Bell Curve" and its critics, the report is well written with a minimum of jargon. It provides an admirably balanced view of the present state of knowledge on the issues that it addresses:

What are the significant conceptualizations of intelligence at this time?

What do intelligence test scores mean, what do they predict, and how well do they predict it?

Why do individuals differ in intelligence and especially in their scores on intelligence tests? In particular, what are the roles of genetic and environmental factors?

Do various ethnic groups display different patterns of performance on intelligence tests, and, if so, what might explain these differences?

What significant scientific issues are presently unresolved?
It is a pity that readers of "The Bell Curve" did not have this article as a reference to check out some of the claims made in "The Bell Curve".

An interesting analysis of arguments in "The Bell Curve" by four statisticians, can be found in "Galton Redux: Eugenics, Intelligence, Race, and Society: A Review of 'The Bell Curve: Intelligence and Class Structure in America'", Journal of the American Statistical Association, 90(432), 1483-1488 by Devlin, Fienberg, Resnick, and Roeder and the related article by the same four authors: "Wringing 'The Bell Curve'", Chance 2, 27-36.

HMO prescription limit found to result in more doctor visits.
The Boston Globe, 20 March 1996, p4
Richard A. Knox

In efforts to control costs, an estimated 3 out of 4 HMOs place limitations on drugs that their participating physicians can prescribe. A study appearing in the "New England Journal of Managed Care" asserts that such practices may cost more than they save, because they ultimately lead to more visits to doctors.

The study followed 13,000 subscribers to 6 unnamed HMOs in locations from New England to the Southwest. Among patients with arthritis, asthma, high blood pressure and stomach ulcers, a positive association was found between greater limitations on prescriptions and more doctor visits, emergency room use and hospitalizations. Patients in the most restrictive plan saw the doctor 83% more often than did those in the plan without restrictions. Also, the use of generic drugs rather than brand-names was associated with higher yearly drug costs and more doctors visits.


(1) Comparing the HMO with no prescribing limits to the most restrictive one, researchers reported that doctors in the latter actually prescribed more than twice as many drugs, at more than twice the total cost. How can this be?

(2) Critics have pointed out that the $500,000 cost of the study was funded by a branch of the pharmaceutical industry. Why should we be concerned about this?

'Seeding' prostate cancer away: radioactive implants prove effective.
The Boston Globe, 25 March 1996, p3
Richard Saltus

An improved procedure for treating prostate cancer is becoming competitive with the common surgical procedures which often lead to side-effects such as impotence and incontinence. The new procedure, called "brachytherapy" involves implanting tiny radioactive seeds throughout the prostate. This was formerly done through an abdominal incision, placing the seeds by feel; the latest improvement uses needles injected from outside the body, guiding the seed placement with ultrasound images.

In a study of 320 men with early prostate cancer (confined to the gland), only six patients experience a recurrence during the 7-year follow-up to the implant treatment. Only seven patients suffered incontinence following the implant; the impotence rate was 25 to 30 percent, and increased with age.

According to Dr. Haakon Ragde, director of the Pacific Northwest Cancer Foundation, surgical removal of the gland results in incontinence in about 20% of patients, and a "higher rate of impotence," although he could not give an exact figure.


(1) What do you make of the fact that no exact figure could be given for the impotence rate?

(2) The above data indicate about a 10-fold reduction in the incontinence rate. Let's suppose the impotence rate actually increases with the new procedure. How would one trade this off against the improvement in incontinence rate?

Unconventional Wisdom.
The Washington Post, Pg. C05, 10 March 1996
Richard Morin

Love, Marriage, and the IRS

Morin reports that a significant percentage of Americans are delaying marriage or speeding up divorce in response to the "marriage tax." This marriage tax is the amount in federal taxes that married couples pay above what they would have paid if they were single (it was imposed three years ago).

Economists James Alm and Leslie A. Whittington have found that the probability of marriage falls and that of divorce rises with an increase in the marriage tax. They calculate that a 20% reduction in the tax would produce a 1% increase in the number of marriages. They also found a small, but statistically significant increase in the divorce rate following an increase in the marriage tax.

In addition, increases in the marriage tax cause some couples to delay the timing of their marriage from one tax year to another. Economists David Sjoquist and Mary Beth Walker analyzed four decades of data and found small but statistically significant shifts in the number of late-year weddings following changes in tax policy.

Economists Daniel Feenberg and Harvey Rosen calculated the cost of marital bliss in 1994:

52% of American couples paid an average marriage tax of $1,244.

38% of American couples received an average subsidy of $1,399.

10% of American couples broke even.

The marriage tax was usually paid by couples in which both partners worked, while the marriage subsidy was collected by single-earner households.

Taken together, all married couples paid an average of $124 in extra taxes in 1994.

Unconventional Wisdom.
The Washington Post,24 March 1996, p. C5
Richard Morin

A. Majoring in Money

In the latest issue of "Monthly Labor Review", Daniel Hecker, an economist for the Federal Bureau of Labor Statistics, computed the annual earnings of everyone who had graduated from college before 1991 and had a full-time job in 1993. He found that the least lucrative majors for both men and women are philosophy, religion, and theology.

Women college graduates in mid-career (between the ages of 35 and 44) made an average of $32,155 a year while men made $43,199 a year. Economists cite sex discrimination, family and lifestyle choices, and the fact that many women choose or are pushed toward majors that lead to less lucrative careers such as social work, home economics, and teaching as reasons for this income gap between the sexes.

Morin reports that the top five most lucrative majors for men are engineering, mathematics, computer science, pharmacy, and physics. For women, the top five most lucrative majors are economics, engineering, pharmacy, architecture, and computer science.

B. Tips and Smiley Faces

Temple University psychologists, Bruce Rind and Prashant Bordia, have found that penning a smiley face or scrawling "thank you" on a customer's bill can boost a waiter's or waitress' earnings.

At an upscale Philadelphia restaurant, a waitress and waiter drew happy faces on the checks to half their customers before presenting the bills (a total of 89 dining parties). There was a 19% increase in tips for the waitress, but a slight decrease in tips for the waiter. Rind and Bordia said that the difference in tips resulted from the fact that such expressive behaviors are acceptable from females. On the other hand, when a male server tries to be friendly, customers perceive him as strange.


Does it surprise you that mathematics is ahead of computer science in the top five most lucrative majors?

Does the explanation for the difference in customer's reaction to men and women drawing faces seem reasonable to you?

Why does toast always land butter-side down?
Sunday Telegraph, 17 March, 1996, p 4
Robert Matthews

Robert Matthews has a mission to explain that Murphy's law is not just selective memory but rather there are good scientific reasons for most of the things you blame on Murphy's law. He starts his story with the history of the Law which has been traced to a Captain Ed Murphy of the US Air Force in the late 1940's. He was involved in a project involving experiments into the effects of rapid deceleration on the human body. His project director, announced the results at a press conference, joked with reporters that, if there was a wrong way to do something, Captain Murphy would always find it. "We call it Murphy's law".

Examples, of phenomenon often attributed to Murphy's law can be found in earlier literature. For example, James Payn wrote "I had never had a piece of toast Particularly long and wide But fell upon the sanded floor And always on the buttered side." Matthews points out that simple experiments or calculations will show that the height of a table is such that a toast sliding off the table will have just enough angular velocity to turn it over enough to allow result in the butter side coming up. (You can verify this with a paper back book if you don't have toast handy).

Another example he gives has to do with odd socks. Murphy's law of Odd Socks is that if an odd sock can be created it will be. If you start with a drawer of 10 complete pairs and you lose just six socks at random, then it is 100 times more likely that you will be left with the worst possible outcome -- six odd socks-- than with a drawer free of odd socks. If you have ten pairs of socks in your drawer and they get mixed up you will have to rummage through about 30 percent to find one matching pair. If you can add to examples of Murphy's Law that may be scientifically based you are invited to write to Robert Matthews at The Sunday Telegraph, 1 Canada Square, Wharf, London E14AR. If it is easier to send them to me I will pass them on.


(1) Are Robert Mathew's sock odds correct?

(2) Can you give another good example of Murphy's law that can be explained "scientifically"?

(3) Murphy's supermarket law says that whatever line you get in another will go faster. Can this be explained scientifically?

(4) And what about Murphy's bus law that says that every time I wait for a bus going South the next bus that comes by is going North?

Ask Marilyn.
Parade Magazine, 3 March 1996, p 14
Marilyn vos Savant

A reader asks:

"My dad heard this story on the radio. At Duke University, two students had received A's in chemistry all semester. But on the night before the final exam, they were partying in another state and didn't get back to Duke until it was over. Their excuse to the professor was that they had a flat tire, and they asked if they could take a make-up test. The professor agreed, wrote out a test and sent the two to separate rooms to take it. The first question (on one side of the paper) was worth 5 points, and they answered it easily. Then they flipped the paper over and found the second question, worth 95 points: 'Which tire was it?' What was the probability that both students would say the same thing? My dad and I think its 1 in 16. Is that right?"
Marilyn says the chances are better--1 in 4. This is correct if we assume the students were lying about the flat and each now independently guesses a tire at random. Marilyn's solution notes that, for each possible choice by the first student, the second student has a 1/4 chance of matching. This implicitly invokes the Law of Total Probability. The following is an equivalent, but more direct approach. The answer is the sum of the probabilities of four (disjoint) events: both guess front right, both guess front left, both guess rear right, both guess rear left. Each of these has the 1/16 chance the reader apparently had in mind; summing them gives 1/4.

Editor: This story appeared in the April 4, 1994 edition of the San Francisco Chronicle in an article written by Jon Carrol. He said he got it from jimklent aol.com who said he heard it from a former student who in turn heard it from a good friend who swears that it is true. The professor involved was named Bonk, and so I sent an e-mail message to a Professor Bonk at Duke and got the following reply.


The story is part truth and part urban legend. It is based on a real incident and I am the person who was involved. However, it happened so long ago that I do not remember the exact details anymore. I am sure that it has been embellished to make it more interesting.

J. Bonk

Professor Bonk included a related e-mail message he had received from Professor Roger Koppl, an economist in the J. Silberman College of Business Administration at Fairleigh Dickinson University. The survey referred to in this note was sent to faculty in FDU's Becton College.

Dear bc-faculty,

Nine people answered my query. I thank them all. Now, what was I up to? When I read the story of Professor Bonk I thought immediately of the right front tire. I was then reminded of something economists call a "Schelling point," after the Harvard economist Thomas Schelling. Schelling had the insight that certain places, numbers, ratios, and so on are more prominent in our minds than others. He asked people to say where they would go to meet someone if they were told (and knew the other was told) only the time and that it would be somewhere in New York. Most chose Grand Central Station. How to divide a prize? 50-50. And so on. The existence of these prominent places and numbers and such permit us to coordinate our actions in contexts where a more "pure" and "formal" rationality would fail. These prominent things are called "Schelling points." It turns out that "right front" was indeed the most popular answer.
Of the nine respondents, seven named a tire. I had a tire in mind too. Let's think of the sample size, then, as eight. Here is the distribution:

                          number      per cent

          Right Front:       5         62.5

          Left Front:        0          0

          Right Rear:        2         25

          Left Rear:         1         12.5
If these percentages held for the whole population, then the probability that the two students would give the same answer would be about .47. This is considerable greater than the probability of .25 that would hold if each tire scored 25%. The expected value of each student's grade on the final would be 45, not 29. My calculations are shown in the PS.

Thanks again to all who indulged me. I enjoyed testing to see if "right front" is a Schelling point.


To: J. Laurie Snell:

If you are considering the Bonk story, you might like my own small contribution, which I picked up from my local tire salesman.: that is, the most likely tire to by punctured is the right rear one. That is because road debris tends to accumulate at the gutter side of the road. The front tire is usually not damaged when it runs over something like a nail, because nails normally lie flat on the pavement. However, the front tire may kick up the nail, so that before the nail has time to fall back, it is caught by the right rear tire. My salesman tells me that he used to enjoy "psychically" telling customers which tire had been punctured, until he grew tired of this game for lack of sport. If both students in Dr. Bonk's story were really road-knowledgeable, they might both guess right rear, which would give them a win in the make-up test. --Paul S. Boyer
Department of Chemistry & Geology

Dear Prof. Snell,

Prof. Koppl told me that you were interested in his result on Prof. Bonk story. I don't know if you are still interested but I also told the story to my students and ran an informal experiment. In three undergraduate classes, the results were as follows:
out of 16 students,       8 chose the right front tire
out of 24 " 11 " " " " "
out of 20 " 12 " " " " "
In all 3 classes the right front (not the driver side) was was systematically the "most popular" tire.

I hope it helps! Regards,

Maria Minniti

Why such a different result in the two polls?

Send comments and suggestions to jlsnell@dartmouth.edu


CHANCE News 5.04

(28 Feb. 1996 to 28 March, 1996)