CHANCE News 6.09

(9 July 1997 to 9 August 1997)


Prepared by J. Laurie Snell and Bill Peterson, with help from Fuxing Hou, and Joan Snell, as part of the Chance Course Project supported by the National Science Foundation.

Please send comments and suggestions for articles to jlsnell@dartmouth.edu.

Back issues of Chance News and other materials for teaching a Chance course are available from the Chance web site:



I cannot seriously believe in the [quantum theory] because it cannot be reconciled with the idea that physics should represent a reality in time and space, free from spooky action at a distance.
Albert Einstein




We have established a discussion site where readers can discuss Chance News items, make suggestions for future issues, and share experiences using news items in teaching. We encourage you to check this out and participate. You can access this discussion page from the Chance Web site.

Our colleague Dana Williams found an exam from his introductory probability course. Dana recalls that his instructor said that he had always wanted to make an exam in the form of a story and so he had done so. This instructor was Jack Kiefer, a truly remarkable statistician and friend of many of us. You can look at his story exam under "teaching aids" on our web site. You will find here also some remarks by Jerry Sacks about Jack Kiefer presented at a memorial event after his untimely death in 1981.

Journal of Statistics Education, Vol. 5, No. 2, July 1997
Carl James Schwarz

StatVillage is a hypothetical city on the web consisting of 128 blocks with 8 houses in each block and designed to permit students to carry out real-life surveys. Students decide on questions they want to ask, choose their sample, click on the houses in their sample, and get the results of their survey. The questions they can ask are ones that can be answered by census information and the answers they get are based on real census data.

The author illustrates the use of StatVillage in terms of having the class design and carry out a survey to determine if StatVillage could sustain a day care center. The fact that the students can easily carry out the same survey using their own samples allows the class to observe and study the variability in the individual results.

The author describes his experience using StatVillage in introductory courses, with students using simple sampling, and in more advanced courses using more complex sampling designs.

Bob Griffin suggested the following article:

The best places to live today.
Money, July 1997, pp. 133-157
Carl Fried with Jeanhee Kim and Amanda Walmac

Why Money magazine's "best places" keep changing.
Public Opinion Quarterly
Vol. 61, no. 2 Summer 1997, pp. 339-355
Thomas M. Guterbock

The ratings game.
Philadelphia Inquirer, 8 July, 1997, A13
Thomas M. Guterbock

Places Rated Almanac
Macmillan, 1997
David Savageu and Geoffrey Loftus

The Money Magazine article provides the results of the magazine's 1997 survey on the "best places to live today." The winner this year is our own Nashua New Hampshire. Most of the article is devoted to describing the ten best metropolitan areas. These, together with their rankings in 1996 and 1995, are:

1. Nashua N.H. (42), (19)
2. Rochester, Minn. (3), (2)
3. Monmouth/Ocean counties, (38), (167)
4. Punta Gorda, Fla. (2), (61)
5. Portsmouth, N.H. (44), (119)
6. Manchester, N.H. (50), (12)
7. Madison, Wis. (1), (16)
8. San Jose (19), (44)
9. Jacksonville Fla. (20), (3)
10. Fort Walton Beach, Fla. (18), (28)

This is the 11th year Money Magazine has provided a ranking of the country's 300 largest metropolitan areas. The annual report receives wide coverage in the press. For example, about 200 newspapers had articles related to this year's study. Not surprisingly, the majority are from New Hampshire papers. Wisconsin papers are also well represented, claiming that the first place rating received by Madison Wisconsin last year is still quite valid. One California paper's writer wondered what the weather is like in the winter in Nashua.

In his article in the "Public Opinion Quarterly" Thomas Guterbock gives a critique of the Money Magazine 1996 survey from the point of view of a professional survey researcher. Guterbock assess the survey as a whole but is particularly interested in trying to understand why the ratings of the cities vary so much from year to year. He notes for example that, from 1995 to 1996 Norfolk/Virginia Beach jumped from 283 to 117, Monmouth, NH, from 167 to 38, and Benton Harbor, MI, dropped from 47 to 249. The correlation between the ranks from 1995 to 1996 was only .74 (r^2 = .55).

The "Places Rated Almanac" provides a similar ranking of 351 metropolitan areas in the United States and Canada updated every four years. The correlation between the ranks for 1989 and 1993 was .90 (r^2 = .82), a significantly higher correlation over 4 years than Money Magazine over 1 year.

Both Money Magazine and Places Rated Almanac base final ratings on 9 demographic traits of a city called "factors." For Money Magazine these are: Economy, Health, Crime, Housing, Education, Weather, Transit, Leisure, Arts. For the Places Rated Almanac they are very similar: Cost of Living, Transportation, Jobs, Higher Education, Climate, Crime, The Arts, Health Care and Recreation.

From government and private statistical sources, each city is assigned a rating between 0 and 100 for each of the nine factors. The ratings are attempts to give "objective" ratings for each of the factors. For example Money Magazine assigns Nashua, NH the ratings: Economy 92, Health 79, Crime 81, Housing 36, Education 31, Weather 20, Transit 12, Leisure 44, Arts 82. The details of how these ratings are arrived at are considered proprietary information by Money Magazine.

The Places Rated Almanac tells their readers how they arrived at their objective ratings. For Nashua, NH they gave ratings: Cost of Living 13, Transportation 42, Jobs 33, Education 64, Climate 32, Crime 96, Arts 47, Health Care 46, Recreation 30. The lower rankings they gave for economic and health factors resulted in Nashua being ranked 221 in their final ranking.

The Places Rated Almanac simply weights their 9 factors equally to arrive that their final ratings. Readers are invited to determine their personal rating by filling out a questionnaire they provide.

Money Magazine tries to determine a weighting that they feel reflects the preferences of their readers. They do this by means of a poll of their readers asking them to evaluate the importance of 41 factors which may be considered subfactors of the basic 9 factors. For example, low property taxes, inexpensive living, and low unemployment rating are subfactors of the economy factor. This year they sampled 502 readers who were asked to rate, on a scale of 1 to 10 (10 most important) each of these 41 sub-factors.

From the means of the poll ratings Money Magazine arrived at weights the 9 factors, to obtain a final rating for each city. Again, exactly how this is done is considered proprietary information.

In his paper in the Public Opinion Quarterly, Guterbock tried to make some intelligent guesses on missing proprietary information. He used these to try to determine what in the process might cause the variability of the ratings from year to year that he regards as excessive. He used data from the years 1996 and earlier. During that time the sample was even smaller (250 in 1996). Despite the small sample, Guterbock showed that sampling error could not reasonably cause the amount of variation from year to year.

Guterbock felt that the explanation must be that certain of the factors or subfactors that were particularly volatile were being weighted especially heavily. By looking at the 1996 and earlier surveys he was led to the conclusion that the volatility resulted from the method Money Magazine determined weights for the 9 factors from the poll ratings of the subfactors. The number of subfactors of a given factor varied. For example there were about 10 subfactors that could be considered economic factors and only 5 that could be considered health factors. Guterbock estimated that, in determining the final ranking, Money Magazine weighted the economic factor about twice as much as the health factor -- in other words some kind of additive effect appeared in determining the factor weightings from the subfactor ratings. Since, in general, economic factors are more volatile than the other factors, Guterbock concluded that this is what causes the excessive volatility in these ratings.

In his column in the Philadelphia Inquirer, Guterbock observes that Money Magazine has invited its readers to go to their web site (www.money.com) where they can make their own weighting of the 9 basic factors to get their own personal ratings.

This gives Guterbock a chance to see what the results of the poll would have been if the factor weights were determined as the average of the subfactor ratings (clearly a more reasonable choice than an additive model). When he does this he gets a rating of the cities that is very different from that obtained by Money Magazine. For example Rochester Minnesota is first, and Nashua is not even in the first ten. Washington, which was 162 in the original rating becomes second.

We talked to the person who set up the ranking on the web for Money Magazine, and he said that the ranking method there was not the same as that carried out by the magazine. He said that he did not know the method and remarked: "frankly I don't want to know it."

Incidentally, the results for the top ten metropolitan areas by the Places Rated Almanac was quite differently than the top ten by the Money Magazine survey. They are:

1. Orange County, Ca.
2. Seattle-Bellevue-Everett, Wash.
3. Houston, Texas
4. Washington D.C.
5. Phoenix-Mesa, Arizona
6. Minneapolis and St. Paul, Minn.
7. Atlanta, Georgia
8. Fort Lauderdale, Fl.
9. San Diego, Ca.
10. Philadelphia, Pa.


(1) What are some of the problems in obtaining an objective rating of a city on a specific factor, say economics?

(2) What are some of the problems in determining, by a poll, a subjective ranking of sub-factors such as pollution?

(3) Like college ratings, these ratings have a large influence on people's decisions. Do you think they should?

(4) Money Magazine also has a rating of colleges (see their web site). This rating is based on what you get for your money and as a result no Ivy League college is in the top ten. Does this suggest that Guterbock is correct in concluding that their "best place to live" is also basically an economic rating and they should admit this?

(5) Why do you think Money Magazine is so secretive about how they determined their ranking of the metropolitan areas?

(6) Which of the two different "ten best places to live" do you find most reasonable?

Particles respond faster than light.
The New York Times, 22 July, 1997, C1
Malcolm Browne

Quantum physics
Audio from NPR program Science, Friday, August 1, 1997.

It has always seemed strange to us the introductory probability books rarely mentions quantum mechanics, which is based on probability theory. Perhaps recent popular articles on the use of quantum theory in computing and cryptography will encourage some discussion of quantum theory in our classes.

Some time ago physicist David Bohm proposed a "though experiment" in which you create a pair of photons moving in different directions but with total spin 0. You then measure the spin of one photon in the vertical direction. Quantum theory says that the direction of the spin is determined by the measurement and will with probability 1/2 be up and probability 1/2 be down. Suppose you find this spin up. Since the total spin is 0, you now know, even without measuring it, that the direction of the other photon in the vertical direction must be down. This seemed to be some kind of communication between the two photons that would take place instantaneously and thus faster than the speed of light. Einstein had worried about this kind of experiment and referred to the outcome as "spooky behavior" which he felt should discredit quantum theory.

It has been known for some time that Bohm's "thought experiment" can be achieved by a real experiment but had been verified only over a very short distance. The New York Times article describes the details of a new experiment that verified this could also be done over a significantly longer distance than had been previously demonstrated. The experimenters, led by Nicholas Gisin, sent a pair of photons, split off from a single photon, in opposite directions along optical fibers to villages north and south of Geneva about 7 miles apart. Measurements of the photons verified the expected correlations.

In 1990 A.K. Ekert proposed that this should make it possible to use quantum theory to obtain a secure code for encrypting messages. The idea is to send a sequence of, say 10,000, pairs of such "entangled photons" from, for example, a bank in Chicago to a bank in New York. Each bank will measure the spin of their photons in a horizontal or vertical direction determined by the toss of a coin. Chicago would make these measurments on their photons before sending the twins off to New York. Then by an insecure method, say telephone, the banks will communicate and tell each other which directions they measured but not the outcome of the measurements.

Then, for the photons the two banks measured in the same direction, they agree to assign a 0 if the measurement in Chicago was down and 1 if it was up. Now New York knows that their measurements on these photons were exactly opposite those of Chicago. Hence the sequence of 0;s and 1's obtained by Chicago is known to both New York and Chicago. Notice that, in effect, we have sent a random sequence of 0's and 1's of length approximately 5,000 from Chicago to New York. It is agreed that this sequence will be used to encode and decode messages between the two banks. Of course, we have to show that this sequence could not be intercepted by an eavesdropper.

Why could this code not be detected by an eavesdropper, say Eve? If quantum physics was like classical physics then Eve could just intercept the photons on their way from Chicago to New York and measure the spin in the vertical and horizontal directions and intercept the phone conversation to learn which photons were being used for the code and she would have the code. But the story in quantum physics is not so simple. Suppose you measure the spin in the vertical direction and find that it is up. This fixes it in the up position and if you measure it next again in the vertical spin you will definitely find it up. But suppose you measure the spin in the vertical direction and then measure it in the horizontal direction. Now you know the spin in the horizontal direction but you no longer know the spin in the vertical direction. In fact, if you measure it again in the vertical direction it ill be equally likely to be up or down.

Now assume that Eve intercepts the photons and tries to measure the spin in one or both of the directions. Then on about half of the photons that Chicago and New York measured in the same direction, she will have measured in the opposite direction. Thus, on these New York will get the complimentary states only half the time instead of all the time. This in turn will lead to an incorrect code. That the code is not correct can easily be detected by sending a trial message known to both Chicago and New York. If it is found that the code has been compromised they just try again.

For a more authoritative but still gentle discussion of the peculiar behavior of quantum mechanics and its relation to probability we recommend the following two books:

The Quantum World
J. C. Polkinghorne
Princeton Press

Where does the Weirdness Go?
David Lindley
BasicBooks, 1996.

You can hear David Lindley discussing these matters on the NPR program mentioned above.

Mammogram push tied to cut in deaths; State's free cancer screening part of solution, study indicates.
The Boston Globe, 8 July 1997, pA1.
Frank Phillips

Massachusetts' breast cancer mortality rate dropped 11% from 1991- 1995, compared to a 5.5% drop nationally. Specialists at the American Cancer Society said the decline was attributable in part to the state's screening program, known as the Breast and Cervical Cancer Initiative. Under this initiative, launched in 1992, Massachusetts became the first state to offer free mammograms to uninsured women.

So far, 30,000 uninsured and under-insured women have been screened, resulting in the detection of 200 cases of breast cancer and 187 cases of cervical cancer. Early detection is widely cited as the key to surviving breast cancer. The article notes that, in 1992, 70% of all breast cancer in Massachusetts were diagnosed in the earliest stage.


(1) Given the widespread fear of breast cancer, do you find it surprising that there has been a nationwide decline in breast cancer mortality? Does it surprise you that the program found nearly as much cervical cancer as breast cancer?

(2) The article quotes cancer officials as saying Massachusetts' advantage is "in part" attributable to the free screenings. What other factors might make Massachusetts stand out?

(3) How would you go about estimating how much of Massachusetts' advantage is due to the free screenings? Here is a start. Let's say all 200 cases caught by the initiative would have resulted in death within the five-year period. That adds 40 deaths a year. How big an effect do you think this could have on the overall mortality rate?

Michael Olinick suggested the following article.

Just how bad are economists at predicting interest rates.
The Journal of Investing, Summer 1997 (vol.6, no.2), p8
Kevin Stephenson

Kevin taught Economics for several years at Middlebury College before moving on to financial consulting. In this article he argues that actively managed bond funds offer no advantage over passive (index) funds, because the managers have no demonstrated ability to forecast interest rates. As recent evidence, he cites the Wall Street Journal's semiannual survey of economists. Results of the December 1996 survey were published January 2 of this year. Most of the 57 participating economists had predicted that the yield on the 30-year Treasury bond, which was then 6.64%, would drop by July 1; the average estimate was 6.52%. But by mid-April, when Kevin was writing this article, the rate was over 7%.

Is this failure unusual? Since December 1981, the WSJ has compiled 30 6-month surveys, in which economists have predicted yields on 3-month Treasury bills and 30-year Treasury bonds. Each participant is asked for estimates for the two rates, and the mean response for each is calculated to give a "consensus estimate." A table accompanying the article shows the consensus estimates and actual yields. Rates on the 3-month Treasury bills moved in the opposite direction of the consensus 16 times in the 30 periods. (16/30 = 53%!). The situation on the 30-year Treasury is worse, with the consensus estimate in the wrong direction 20 times (20/30 = 67%).

Kevin reports that the average error for the consensus estimate was 79 basis points for the Treasury bill and 86 points for the bond. If you had simply assumed the rates would stay the same in each period, your average errors would have been 74 and 78 basis points. Since missing on small changes is not disastrous for investors, one might hope that at least the economists could forecast big swings correctly. The 3-month bill moved 100 basis points or more on ten occasions. While the consensus got the direction right on six of these, it underestimated the size of the change by an average of 99 basis points. Again, the situation on the 30-year Treasury is worse. Of the 10 occasions when that rate moved more than 100 points, the consensus was wrong on 8. Moreover, one of the "correct" predictions called for a 19-basis point drop, when the true drop was 102 points!

Having discredited the consensus estimate, Kevin wonders if there are individual forecasters who consistently do better than others. Alas, the answer still appears to be no. Kevin looked at the records of the 44 economists who had participated in 10 or more surveys. Only 13 of these got the 30-year direction right more than half the time, and none had better than 60% success. On the 3-month rate, 24 economists did better than 60%, and one was correct 67% of the time. But even the most accurate forecasters were unable to perform consistently. By tracking the three economists in each period who came closest on the 30-year rate, Kevin found that only 44% ranked in the top half in accuracy for the next period.


In a previous Chance News, we mentioned that a reasonably accurate way to forecast weather is to guess that tomorrow's weather will be the same as today's. How do you think the analogous method would work for predicting the direction of change in interest rates for the next period?

The kind of sweep that is hard to come by.
The New York Times, 22 July, 1997, B9
Murray Chass

Mitchell Laks wrote the following letter to the New York Times about this article. His letter both reviews the article and shows once more how well a coin tossing model fits observed streaks.

To the Editor, New York Times   7/24/97

Dear Sirs,

On 7/22/97, Murray Chass, in his column "On Baseball" entitled "The Kind of Sweep That's Hard to Come By", commented on the recent sweep by the Mets of a 4-game series with the Reds. He pointed out what he perceived to be the special difficulty of this accomplishment, indicating that this is only the 12th sweep of a 4-game series between two teams so far this year in both baseball leagues. This is the second sweep of a 4-game series by the Mets this year. Only three other teams in baseball have accomplished this, and only 4 other teams have succeeded in sweeping a single series. The remaining 20 major league teams have not swept a single 4-game series. These results are out of a total number of such 79 such series to date this year.

He contrasted this with the relatively more frequent occurrence of winning streaks of at least 4 games, which has occurred some 85 times. He mentioned that only 2 teams in baseball , the hapless Phillies and Athletics, have not had winning streaks of 4 or more games this season. He belittled these two teams, suggesting obliquely that they were possibly worthy of demotion to the minor leagues.

I would like to point out that, statistically speaking, all of the occurrences that Mr. Chass cites in his article are more likely indicative of chance phenomena rather than reflective of any particular prowess of the teams involved. In fact if we were to replace all of the sports teams and games involved by coins and coin flips, substituting the 50% probability of heads or tails for wins and loses, then the exact phenomena cited by Mr. Chass would be reproduced as the expected outcome.

Thus, since in a series of 4 coin flips the probability of the outcome of "either all heads or all tails" is 1/8, it would be expected that in approximately 10 of the 79 four game series there would be a sweep (close to the 12 observed). Mr. Chass records also that 37 times this year the results of a 4-game series was 3-1 and 30 times the result was 2-2. In fact the corresponding expected numbers for random coin flips would be 40 and 30 times. Additionally, the observed distribution of 4 teams with two sweeps, 4 teams with one sweep and 20 teams with no sweeps is close to that predicted by the Poisson distribution governing such phenomena (2, 8, and 18 teams, respectively).

Moreover, on the date of Mr. Chass' article, the average number of games played by each of the major league teams was approximately 97. For this number of games for each team, it can be proven that the expected number of winning streaks of 4 games or greater is approximately 3. ( The exact formula for the expected number of 4 game or more winning streaks over N games is (N-2)/32; thus, for example, for a full season of 162 games we would expect an average of 5 such winning streaks). Thus for the 28 major league teams, we would expect 28x3 = 84 streaks, remarkably close to the 85 observed to date. In fact, using the Poisson distribution again, we find that approximately 2 of the 28 teams are expected to have no streaks of at least 4 games, accounting for the bad luck of the Phillies and Athletics.

Thus, based upon the evidence presented by Mr. Chass, we have no reason to assume that the major league teams are other than evenly matched. Therefore, perhaps we really should not fault the Phillies and Athletics for falling victim to the laws of averages. If it wasn't these two teams, it would likely be another pair of teams who had this outcome. Correspondingly, perhaps we should not make too much of the special prowess of the Mets, Braves, Anaheim and Chicago in sweeping two series. I say this despite my status as a devoted Mets fan. Professor William Feller has observed that very often to the untrained eye, randomness appears as regularity or as a tendency to cluster.


Mitchell P. Laks, M.D., PhD

Paul Alper suggested the following article as another example of the problems caused by financial arrangements between drug companies and medical researchers.

Embattled U-M chief urologist resigns.
Detroit Free Press, 17 July, 1997

The University of Michigan has accepted the resignation of Dr. Joseph Oesterling. Dr. Oesterling has been Professor of Surgery and Director of the Michigan Prostate Institute. He is a nationally recognized prostate cancer surgeon and researcher and editor in chief of the journal Urology. His opinions on the methods of treatment for prostate problems are widely quoted in the media.

The University of Michigan, like many other universities, has rules on how much a faculty member can make on outside consulting. The University also has disclosure requirements related to possible conflicts of interest.

The article reports that Dr. Oesterling ignored these rules and requirements while earning large sums of money beyond his $400,000 U-M salary with lucrative business contracts with more than a dozen companies and legal fees from testifying as an expert witness.

Dr. Lorris Betz, U-M Medical School Dean in a letter to President Lee Bollinger wrote: "The conclusion I have reached is that Dr. Oesterling's conduct is so egregious and inconsistent with standards expected of faculty ... that termination proceedings must be implemented against him.

Faced with this, Dr. Oesterling chose to resign.


According to the article: Oesterling wrote officials of the drug company VidaMed last year stating that he hoped his "positive presentations would help sales of TUNA, a new technique to treat enlarged prostate conditions that was developed by VidaMed. The article goes on to say that VidaMed paid Oesterling, or his prostate foundation, $35,000 between March and August 1996, and spent another $60,000 to buy copies of a new prostate cancer book he edited. A spokesman for VidaMed defended this saying that they needed to get the best possible advice.

(1) Is something wrong with this or is this just free enterprise at work?

(2) How many copies of the book do you think they got for $60,000?

Low-control job called heart risk.
The Chicago Tribune, 25 July 1997, p.3
Associated Press

In what the article bills as more bad news for "the Dagwoods and Dilberts of the world", a study in the British medical journal "The Lancet" finds that feelings of lack of control on the job are associated with increased risk of heart disease.

An earlier study of British bureaucrats, started in the 1960s, found those in low status jobs to be at increased risk for heart disease. In general, these workers had poorer health, were more likely to smoke and less likely to exercise, and died younger. The present study followed 7372 men and women in British civil service jobs from 1985 to 1993, looking in detail at the effects of smoking, inactivity, high blood pressure, and feeling of loss of control.

At the highest grades of civil servants, feelings of lack of control were reported by 8.7% of the men and 10.1% of the women. At the lowest grades, the corresponding figures were 77.9% and 75.3%. Overall, those reporting little or no control had 50% higher risk of heart disease as compared to executives. When researchers statistically adjusted to account for feeling out of control, the increased risk for low status workers was only 18%. This makes feeling out of control the largest single risk factor.

Physiologically, lower control was found to be associated with higher concentrations of plasma fibrinogen, a protein that binds blood cells into clots, which could increase heart attack risk.


What does it mean to "statistically adjust" the data for feeling out of control? How do you think levels of "feeling out of control" were compared across different jobs?

Job injuries, illnesses found costly.
The Boston Globe, 28 July 1997, p.3
Brenda C. Coleman

In 1992, there were 6500 work-related deaths and 13.2 million injuries in the US, according to research published in "Archives of Internal Medicine" This works out to 18 deaths and 36,000 injuries per day. Previous government estimates were 17 deaths and 9000 injuries per day.

The researchers worry that the impact this has on US health care costs is not fully appreciated. The direct cost of injuries and illness in 1992 was $65 billion. Indirect costs, including lost wages, were $106 billion, bringing the total cost to $171 billion. Excluding the costs of administering workers' compensation, Social Security and health insurance benefits leaves a cost of $151 billion. By comparison, the corresponding figure for AIDS was $30 billion; for Alzheimer's it was $67.3 billion.

The researchers expressed hope that their findings would lead to greater emphasis on workplace safety.


The figures are intended to show that occupational health costs are of the same magnitude as other major health problems. But do you think it makes sense to compare the total cost of workplace injuries with a specific disease?

Medical Notebook: Study links backache to job dissatisfaction.
The Boston Globe, 31 July 1997, pA3
Peter J. Howe

Here are some data on just what some of those costly workplace injuries are. This article reports that as many as one in every four workers' compensation claims, and as much as 40% of the money spent by the program, are attributable to lower back injuries To combat this problem, many businesses and government agencies have sent employees to "back school", where they learn to lift heavy objects safely.

Unfortunately, the prevention programs don't seem to be working, according to a new study of the back school programs run at two large Boston mail facilities. The study followed 2534 postal workers sent to back school, and compared them with a control group of 1500 who did not get the training. Over a 5 1/2 year period, no "statistically meaningful" differences were found; in fact, injury rates were slightly higher among the workers who got the training.

Writing in the "New England Journal of Medicine" about the study, Dr. L.H. Daltroy said that job satisfaction is one of the things most closely associated with lower back pain. People who dislike their jobs may rebel against any of their employer's demands--including exhortations to lift things more carefully.


(1) If workers' reactions to bosses' demands really are a problem, should we expect there to be significantly more injuries among the workers who went to back school?

(2) What do you think is the basis for Daltroy's comments?

BC researchers find boys' spatial skills a plus on math SAT.
The Boston Globe, 31 July 1997, pA1

In previous issues of Chance News, we have seen numerous studies reporting gender differences on standardized math tests. There is a long-standing debate on the reasons that boys as a group outperform girls. On the one hand, boys' better spatial skills have sometimes been cited as evidence for a biological gender connection. On the other hand, girls' lower confidence and higher math anxiety have been taken as indicating that differences in socialization are to blame for lower scores.

Now a Boston College study, based on research in Melrose (MA) schools, has directly investigated these conjectures in the same group of students. The study looked at the top third of college bound students, since previous studies found the most pronounced differences at the higher grades and among the top students. Within this group, the researchers found that 64% of the measurable gender difference on the math SATs was attributable to boys' better spatial skills, and 36% to girls' lower self-confidence. Girls' higher math anxiety was found to have no effect!

But Prof. M. Beth Casey, the study's leader, says the message is clearly "Wake up and think about spatial skills." Other researchers in the field are split on whether the findings point towards heredity or environment as an explanation for the differences. Julian Stanley of Johns Hopkins University is intrigued by the results but cautions that he has yet to see proof that spatial skills can be taught or that doing so will eliminate gender differences in math.


(1) What does it mean to say the 64% of the difference is attributable to spatial skills?

(2) How do you think the researchers distinguished low self- confidence from math anxiety?

(3) The article notes that a yet-to-be-published study by Boston College graduate Lorrie Kirchner has found that women, who competed in sports requiring a skill such as good aim, scored higher than non-athletes on math SATs. Do you think this will be seen as an argument for heredity or environment, or is it neutral?

Cover story.
Parade Magazine, 9 August 1997, p.1
Marylin vos Savant

Marilyn is featured on the cover of the August 9, 1997 issue of Parade Magazine. Here she poses the question:

Your dog has a litter of four. Is it most likely that two are males and two are females?
In her column she gives the answer:
Nope! The most likely split is three males and one female, or three females and one male. The same is true for families with four children. They're more likely to have three boys and a girl, or three girls and a boy.


What do you think of her answer to this question?

From our readers:

Marilyn vos Savant came in for some criticism from our readers. Jeff Simonoff writes:

This week's Ask Marilyn column (Parade Magazine, 27 March 1997, p. 6) gives more attention to the boy/girl problem discussed earlier in Chance News 6.05 (and earlier). She gives more examples of outraged response to her answer of 1/3, and then proposes to "prove" which answer is correct. May I make a few comments?

(1) I know that in earlier issues of Chance News you've argued that the question is ill-posed. I see your point in general, but I have to say that I don't really agree for this particular case. The question, as posed, was as follows:

A woman and a man (unrelated) each have two children. At least one of the woman's children is a boy, and the man's older child is a boy. Do the chances that the woman has two boys equal the chances that the man has two boys?"

To me, at least, it is clear that for the man, by specifying the gender of the older child we have fixed an ordering, and hence the probability is 1/2 (MM out of MM or FM). In the woman's case, by not specifying which child is male, we have not fixed an ordering, and hence the probability is 1/3 (MM out of MM, FM, or MF). This is akin to your example from Chance 6.05 of flipping two coins, being told that one was heads and asking what the probability was that both were heads. This is not the same as being asked what the probability is that, given one coin is heads, what is the probab- ility that the other coin was heads.

Gee, after going through it like that, maybe I under- stand the confusion a little better!

(2) In the recent column, Marilyn tries to "prove" which answer is right in order to settle a bet with a reader. Here is what she proposes:

Readers, here's how you can help prove which answer about the woman is correct. To my women readers: If you have exactly two children (no more), and at least one of them is a boy (either child or both of them), write -- or send e-mail -- and tell me the sex of both of your children. Don't consider their ages.

In other words, it's fine to write if your older child is a boy and your younger child is a girl. It's also fine to write if your older child is a girl and your younger child is a boy. And it's fine to write if both your children are boys. I need to hear from all of you (but only if you have two children and no more).

We'll publish the results in an upcoming column.

From this description, it's obvious that Marilyn is thinking of the unspecified ordering version of the problem, with probability 1/3. Unfortunately, she has done statisticians a grave disservice by proposing to "prove" herself correct this way. This will give her readers the impression that self-selected polls of this type have some scientific validity, which they of course do not (saying "I need to hear from all of you" doesn't get her off the hook, either, she will obviously NOT hear from all of them). The problem of abuse of these self- selected polls is a lot more serious than the problem of people having trouble with conditional probability, I would say, so that on balance her proposal to "prove" the prob- ability question has done more harm than good.


(1) We think that Jeff is right in his comments under (1). Do you?

(2) In which direction would you expect Marilyn's self selection poll to be biased?

And Domenico Rosa sent us the following item about Marilyn's treatment of the boy/girl problem that he posted on the math-teach listserve, whose archives are available at

The July 27, 1997 issue of Parade Magazine (p. 6), which is carried by many Sunday newspapers, provides another glaring example of the highly unethical conduct of Marilyn vos Savant.

Vos Savant's latest column contains the sixth installment of the second-sibling paradox, which was discussed in her columns of March 30, 1997 (p. 16), December 1, 1996 (p. 19), and May 26, 1996 (p. 17). This problem, involving two baby beagles instead of two children, had appeared originally in her columns of October 13, 1991 (p. 24), with a follow-up on January 5, 1992 (p. 22).

It is unfortunate that vos Savant keeps recycling probab- ility paradoxes without informing readers about any references. Some members of this listserve may be familiar with the publicity that vos Savant received in 1991 over the game-show problem involving a car and two goats.

Following this controversy, Ed Barbeau prepared two lengthy lists of references to the handful of paradoxes that are used to teach the concept of conditional probability. These references were published in The College Mathematics Journal (March 1993, pp.149-154; March 1995, pp. 132-134).

One of the earliest discussions, by Martin Gardner, of the second-sibling paradox appeared in Scientific American (October 1959, p. 180).

In recent years, vos Savant has used three full-page columns to promote her flim-flam books. In view of this fact, her ongoing failure to provide her readers with appropriate references is highly unethical.


(1) If you picked at random a beginning probability book published after 1993, what would you estimate for the probability that it would discuss the Monty Hall problem?

(2) Estimate the probability that if the book you chose did discuss the Monty Hall problem it would have a reasonable history of the problem.

We had two responses to our article relating to girls social graces. From Peter Doyle we heard:

Dear CHANCE News,

Martha and I were astonished when we saw the Times article claiming, `Parental origin of chromosome may determine social graces'. The report in Chance News confirms what one would immediately suspect: This conclusion is based on a ludicrously small sample. In fact, the study relies on behavior differences between 55 girls in group A and 28 girls in group B. The researchers apparently consider it significant that 40% of group A (i.e. 22 girls) did such-and-such, as opposed to only 16% of group B (i.e. almost exactly 4.5 girls) did.

The whole scenario here sounds like a made-up example of abuse of statistics. Some jerks come up with a fundamentally improbable conclusion (an X chromosome "imprinted" to behave differently depending on whether it comes from the father or the mother), and lo and behold, when you look into the matter, their conclusion turns out to be based on a pathetically small sample. This is the kind of story that should have been laughed out of (in chronological order) Nature, the Times, and Chance News. Maybe you hoped that readers would do the laughing for themselves? If so, I think it would have been safer to stick with a single Discussion Question: `What's to discuss?'



Closer to home we had the following response to this article:

Dear Chance News:

How wonderful you have included an article trying to define social graces of young girls. I am glad we are concerned with why certain girls are better at "making friends, have more awareness of other's feelings, and have better relationships with teachers and families. Thank you for examining such an important and relevant study for the female gender.

Yours, delicately and sincerely, Paige Snell


Rodger Pinkham wrote about the problem of finding the number of people you need to sample to have a 99% chance of getting all 365 birthdays. He observed that not all birthdays are equally likely and while this helps in the classical birthday problem it hurts in our problem. In particular he was worried that Dartmouth might not be admitting any Geminis and Laurie would lose his bet in which he gave 10 to 1 odds that all birthdays would be represented in the current Dartmouth student body. Fortunately, Laurie won his bet. This despite the fact that there were only 3226 students in the Registrar's records because the seniors had left and the freshman not yet arrived. There were two students with birthdays 29 February.

Gary King wrote about our New York Times article on his book: A Solution to the Ecological Inference Problem: Reconstructing Individual Behavior from Aggregate Data (Princeton University Press).
You mentioned, correctly of course, that readers aren't told in the article what my method is. Unfortunately, the reason I wrote a book is that I couldn't figure out how to explain the method fully in less than 350 pages! So I can't offer a two-paragraph summary that would be useful for students. However, I can offer a few related discussion questions on aggregation.

In the first multivariate analysis of politics published in a political science journal, the ecological inference problem was originally raised. In 1919, William Ogburn and Inez Goltra recognized that making inferences about individual behavior from aggregate data, with the methods then available, could lead to inaccurate conclusions. Their data are no longer available, but in my book, I reconstruct the following hypothetical example from their verbal descriptions:

Consider two equal-sized precincts voting on Proposition 22, an initiative by the radical `People's Power league' to institute proportional representation in Oregon Legislative Assembly elections: 40% of voters in precinct 1 are women and 40% of all voters in this precinct oppose the referenda. In Precinct 2, 60% of voters are women and 60% of the precinct opposes the referenda. Precinct 2 has more women and is more opposed to the referenda than precinct 1, and so it certainly seems that women are opposing the proportion representation reform. (p.3-4)


(1) Can we be sure that women oppose the reform? To answer this question, compute the minimum and maximum fraction of women who could have opposed the referenda, and the minimum and maximum fraction of men who could have opposed it.

(2) How might our goal of learning about individual voting behavior be aided by having the data come from two separate precincts rather than one larger geographic area? Would precincts with different fractions of women and support for the reform have helped our cause? What types of information do we learn from each precinct separately as compared to the patterns in observed data across the precincts?

(3) Given the secret ballot, and given that surveys are not possible 75 years after the fact, what information might we collect to learn more about whether men or women supported the referenda at higher rates?

More information on the book and free software to implement the method is available at http://GKing.Harvard.Edu.

Richard Brucker raised the question of the relation between intuition and chance. He writes:

For some years I've been researching gambling processes whose outcomes are dependent on chance causes.

Consider this: we know that some bettors come to the casino craps table proclaiming their intuition that they're going to "make a million"; they have "gut feelings" they're going to win, etc.

However, the output of casino craps is chance-oriented due to the random fall of the dice. So, by definition I theorize that intuition is not applicable to any process whose output is based solely on chance.

I believe intuition works best with information not consciously available, that may have been stored in the past or acquired through subliminal or other non-sensory means.


(1) A gambler through years of playing craps or blackjack gets a feeling for what the chances are and what bets to make. Would you consider this intuition?

(2) Your friendly weather predictor starts with an objective probability for rain provided by the National Weather Service and combines it with his own local knowledge and anything else he or she wants to consider and comes up with a subjective probability for rain. Would you expect intuition to play a role in this subjective probability?


Please send comments and suggestions to jlsnell@dartmouth.edu.


CHANCE News 6.09

(9 July 1997 to 9 August 1997)