Prepared by J. Laurie Snell and Bill Peterson, with help from Fuxing Hou, and Joan Snell, as part of the Chance Course Project supported by the National Science Foundation.
Please send comments and suggestions for articles to
Back issues of Chance News and other materials for teaching a Chance course are available from the Chance web site:
Statistics is alien to everyday concerns and of little use for judging individual persons.
Heads or tails, boy or girl, the odds are the same. But earlier this month, Nicholas H. Noyes hospital in Dansville had an unusual run of births as a string of 12-straight girls were born in a row. According to Nurse Manager Amy Nasca, it wasn't something she and her staff were actively keeping track of, but she said it was very interesting to watch such an unusual trend develop. "We didn't realize we had so many girls in a row until we actually counted them up," said Nasca. "It would have been fun to make it to 13 in a row, but what we were actually more concerned about was having healthy babies be born." With the normal odds running at 50-50 to have either a boy or girl, the odds of having 12 of either sex in a row are 4,096-to-1. But according to Nasca, things have a way of evening out. "Earlier this year, a string of 11 baby boys were born," she said. "(But) it seems by the end of the year our statistics always even out." According to hospital statistics, of the 168 babies born so far this year, 80 have been girls. "We do have times where we will have several boys or girls in a row," said Nasca. "It's fun, but we are really just happy to have healthy babies born here."DISCUSSION QUESTIONS:
(1) Is the statement "the odds of having 12 of either sex in a row are 4,096-to-1" accurate.
(2) What is the chance that, of the 168 babies born so far this year, 80 are girls?
(3) A student answers question 2 by: the probability is 1 since it has already happened. Is she correct?
(4) Many people believe that families "tend to have boys" or "tend to have girls", so the coin tossing model would not be appropriate. Do you think there is any basis for this?
(5) About how many times do you think you would have to toss a coin to get 12 heads in a row?
(6) A recent article in the New York Times (Students' test scores show slow but steady gains at Nation's Schools, Sept 3, 1997, B8, Peter Applebome) tries to convince us that the slow but steady improvement of scores is significant. In discussing the A.C.T. scores we read:
Scores announced in August rose for the fourth time in the last five years, only the second time since A.C.T. scores were first reported in 1960 that the national average increased four times in five years.Model the ups and downs of the A.C.T averages in the last 37 years as 37 tosses of a fair coin. The current streak in the average A.C.T scores corresponds to a streak of 4 or 5 heads in five consecutive tosses. Lets find the chance that two such streaks occur in 37 tosses. Such a streak could begin at any one of the first 33 tosses. What is the probability that a streak begins on a particular toss? What is the expected number of such streaks in 37 tosses of a coin? Do you think it is unlikely that 2 or more such streaks would occur in 37 tosses?
There was an interesting report from The Associated Press last week that might be useful in a Chance Class. It appeared in the Rochester (NY) Democrat and Chronicle on 10 Sept. 1997.DISCUSSION QUESTION:
The premise of the article was that women who bear a child after the age of 40 are more likely to live to the age of 100 than women who do not have a child at that age.
The next to last paragraph makes the following, truly remarkable, statement:
About 19 percent of the centenarians had given birth after their 40th birthdays, versus just 5.5 percent for the other group. Analysis found that women with the late births were four times more likely to live to 100 than 73.
I'm not quite sure what they were trying to say, but our class got a kick out of it.
The study referred to is reported in the Sept. 11 issue of Nature.
Evidently the researchers considered 78 women who were born around
1896 and lived at least 100 years and 54 who were born in 1896 and
died in 1969 at age 73. They indeed found that 19% of the
centenarians gave birth after age 40 compared to 5.5% in the group
that died at age 78. Can you conclude from this that women who
bear a child after the age of 40 are 4 times as likely to live to
100 than those who do not have children after 40?
The psychology of good judgment.
Medical Decision Making, 1996, vol. 16, no.3, pp. 273-280
The fact that a test for HIV virus can appear to be extremely accurate and yet a person in a low-risk group who tests positive can have only a 10% chance of having the virus has been considered paradoxical and, indeed, used in elementary probability and statistics classes to show the need for understanding conditional probability. Unfortunately, such examples, as well as other conditional probability paradoxes such as the infamous Monty Hall problem have led some influential statistics educators to conclude that conditional probability is just too hard to teach in a first statistics course.
However, as these articles point out, peoples' lives may depend on their understanding these tests. For example, it has been claimed that positive tests for AIDS has led to suicides. Thus it would seem that we should not give up on getting future doctors and other health workers to understand how to understand the results of medical testing.
Gigerenzer argues that physicians and their patients will better understand the chance of a false positive result if we replace the conventional conditional probability analysis by an equivalent frequency method. The success of this method is illustrated in terms of an experiment that Gigerenzer and his colleague Ulrich Hoffrage carried out by asking 48 physicians in Munich to answer questions relating to four different medical-diagnosis problems. For the four questions given to each physician, two were given using the probability format and two using the frequency format. These two formats refer to whether the information is given in terms of probabilities or frequencies.
One of the four diagnostic problems was a mammography problem stated as follows:
To facilitate early detection of breast cancer, women are encouraged from a particular age on to participate at regular intervals in routine screening, even if they have no obvious symptoms. Imagine you conduct in a certain region such a breast cancer screening using mammography. For symptom-free women aged 40 to 50 who participate in screening using mammography, the following information is available for this region:
The probability that one of these women has breast cancer is 1%. If a woman has breast cancer, the probability is 80% that she will have a positive mammography test. If a woman does not have breast cancer, the probability is 10% that she will still have a positive mammography test. Imagine a woman (aged 40 to 50, no symptoms) who has a positive mammography test in your breast cancer screening. What is the probability that she actually has breast cancer?_____%
Ten out of every 1,000 women have breast cancer. Of these 10 women with breast cancer, 8 will have a positive mammography test. Of the remaining 990 women without breast cancer, 99 will still have a positive mammography test. Imagine a sample of women (aged 40 to 50, no symptoms) who have positive mammography tests in your breast cancer screening. How many of these women do actually have breast cancer?____out of______
In a classic study by D. M. Eddy, essentially this same question, with just the probability format, was given to 100 physicians. 95 of the physicians gave the answer of approximately 75% instead of the correct answer which, in this example, is 7.8%. (This study is found in: Dowie J. Elstein (ed) "Professional Judgment. A Reader in Clinical Decision Making, Cambridge University press, 1988, pp 45-590.)
In their study, Gigernezer and Hoffrage found that, when the information was presented in the probability format, only 10% reasoned with the Bayes computation
(.01)(.80) / ((.01)(.80) + (.99)(.096)) = .078.
For the group given the frequency format 46% employed the simpler Bayes calculation
The article discusses some of the reactions of the physicians to even considering such problems. Here are some quotes:
On such a basis one can't make a diagnosis. Statistical information is one big lie.
I never inform my patients about statistical data. I would tell the patient that mammography is not so exact, and I would in any case perform a biopsy.
Oh, what nonsense. I can't do it. You should test my daughter. She studies medicine.
Statistics is alien to everyday concerns and of little use for judging individual persons.
Some doctors commented that getting the answer in the frequency form was a piece of cake (or some equivalent phrase.) A more detailed analysis of this kind of study can be found in the article "How to improve Bayesian reasoning without instruction: Frequency formats" by Gigernzer & Hoffrage (1995), Psychological Review, 102, 684-704.
Gerd Gigernzer, Ulrich Hoffrage, and Axel Ebert have done another
study, on how counselors do in answering questions about HIV
testing, that will appear in the journal AIDS Care. We will
review this when it comes out.
Is using a car phone like driving drunk?
Chance Magazine, Spring 1997, pp.5-9
Donald A. Redelmeier and Robert J. Tibshirani
The authors discuss a study they carried out which was reported in the New England Journal of Medicine (Feb 17, 1997) and discussed in Chance News 6.03. This article discusses interesting aspects of the study that would not appear in a technical article. For example, they were cautioned by friends about carrying out the study since it could effect large companies' sales. They point out that the cellular phone companies in North America have significantly greater daily revenues than Microsoft.
The design the authors used for their study, called the case cross-over design, is relatively new. It is a case control method where the controls are the same people as the cases. The authors considered 699 drivers who had had an accident. They compared the proportion of those who used their phones in the ten-minute period before their accidents (24%) with the proportion of those who used them while driving during the same time period the day before the accident (5%). Summary statistics led to a relative risk of 6.5 for using a phone while driving. The authors explain why they rejected the use of more standard methods that had been used in previous studies and which, they felt, led to biased results.
They also discuss some issues involved in the media attention that the study received. They provide a cartoon from the Philadelphia Inquirer, suggesting that the danger of driving with a telephone should be compared to driving while drunk. Most writers included a statement similar to that of Gina Kolata in her article about the study in the Times. Referring to the risk of driving while talking on the telephone she writes:
Their paper, published today in The "New England Journal of Medicine," said it was the same risk as when a person's blood alcohol level was at the legal limit.The authors did say in their article "the relative risk is similar to the hazard associated driving with a blood alcohol level at the legal limit." However, they point out in this article that the dangers of alcohol are, in fact, quite a bit larger. For example, a drunk driver's alcohol content may be significantly above the amount required to be legally drunk. Also the effects of alcohol are likely to last for a significantly longer time than the time the driver is on the phone.
The authors remark that the fact that cellular phones tend to be brief and infrequent accounts for the lack of dramatic increase in the number of accidents at a time when the use of cellular phones increased rapidly. In addition there are some benefits of cellular phones, for example, in reporting an emergency.
The authors also say that newspaper reporters wanted them to give their opinion on regulation of the use of cellular phones in driving which the authors did not feel was their field of expertise.
This is a great article from which to get some additional insight into what goes on in carrying out a study, especially a study that receives media attention.
(1) Do you think the authors should be surprised at the media's interpretation of their statement in the NEJM article that compared the risk of using a cellular phone while driving to that of have a blood level of alcohol corresponding to being legally drunk? Why do you think the authors made the comparison in their NEJM article?
(2) Writing about an earlier study carried out in 1978, the authors say: "This survey of 498 individuals found that the overall frequency of traffic violations was marginally lower among the mobile telephone subscribers than among members of the general public (11% vs. 12%)." Why do you think the authors were suspicious of the results of this survey?
(3) Writing about another study carried out in 1985, the authors
say: This study of 305 individuals found a significantly lower
collision rate in the year following the purchase of a cellular
telephone (8.2% vs. 6.6%). They were "impressed" by this study
but also "worried". Why?
Norton Starr, a frequent contributor to Chance News, has a fascinating story in the latest issue of "Statistics Education" on the 1970 draft lottery.
Nonrandom risk: The 1970 draft lottery.
Journal of Statistics Education, Vol. 5 No. 2 July 1997
In this article Norton reviews the history of the discovery of non-randomness in the 1970 draft lottery, provides the data, and discusses how he and others have used this dataset in classes.
We recommend that you read Norton's interesting article but, since it is available electronically, we thought it would be more fun to review the 1970 New York Times article that broke the story and include, as discussion questions, the discussion questions that Norton suggested in his article.
Readers of the article referred Norton to two other web sites
where this data is discussed:
Michael Friendly: Exploratory and Graphical Methods of Data Analysis
President Richard Nixon signed Nov.26 an executive order to "establish a random selection sequence" for induction. The order stipulated that the lottery would be based on birthdays but did not say how the dates should be chosen.
After a staff meeting it was decided that the 366 dates of a year should be placed in capsules and then be drawn one by one from a large bowl. A man's draft number would then correspond to the order in which his birthday was drawn. For example Sept.14 was the first date drawn and June 8 was the last number drawn. Thus a man with birthday Sept.4 would have draft number 1 and someone born June 8 would have draft number 366. Pentagon manpower specialists believe that those in the last third of the numbers (200 to 366) would escape the draft entirely.
A knowledgeable White House official said this week that "discussions that the lottery was not random are purely speculative." He added that there was no possibility of another drawing.
Senator Edward Kennedy asked the National Sciences last month to analyze the "apparent lack of randomness" in the selection. The Academy has not yet decided whether to do this or not.
The challenge to the randomness is being brought by Mr.Stodosky, a 24 year-old doctorate student in computer planning. The challenge is based on the average numbers for the men in the lottery for each month. If the system were random, each month could be expected to average around 183 or 184. Each of the first six months average above this and each of the last six months average below it. Statisticians, who have studied the lottery, say that this could occur if the capsules with later months were not mixed as thoroughly as those with early months.
Two graduate students at the University of Wisconsin have estimated that the odds against obtaining the results of the drawings by a truly random process are 50,000 to 1. Other statisticians arrived at similar results.
The articles states:
Statisticians usually work on the principle that a random test should produce results that occur at least once in 20 times under the laws of probability. If the results occur less frequently, then the statist- cians conclude that some causative factor was involved.The lottery was set up over the weekend before the Dec. 1 drawing by Capt. Pascoe and Col. Charles R. Fox, under the observation of John H. Adams, an editor of U.S. News & World Report.
The article provides a detailed description of how they put the capsules in the box and mixed them up and finally put them in a two foot-deep bowl for the public drawing. The persons who drew the capsules generally picked ones from the top, although once in a while they would reach their hand to the middle or the bottom of the bowl.
In his article Starr comments:
It is not widely known that there was a second drawing on December 1, 1969, held to rank the twenty-six letters of the alphabet. "The order of selection from among men born on the same date would be determined by the order in which the first letters of their last, first and middle names were drawn.DISCUSSION QUESTIONS (These are from Norton Starr's article.)
(1) Assuming males are born with equal likelihood throughout the year, was the lottery really necessary? Or was it carried out largely to convey a sense of fairness in an essentially stochastic context where the stakes happened to be very serious?
(2) Should attention have been given to twins, triplets, and other siblings to avoid multiple impacts on a given family?
(3) Were those born on February 29 treated unfairly in the 1970 lottery?
(4) The alphabetic data for 1970 seem, to the naked eye, to have their own significant bias: only three letters from the first half of the alphabet were among the first thirteen chosen. Is this apparent lack of randomness statistically significant? If so, is it of practical significance? If the answer to the latter is "no," then why was a permutation of the alphabet used in the first place?
(5) How should men lacking a middle or even a first name be
handled? (This is a real-life missing data issue! Students might
be encouraged to find out what was actually done.)
How many people were here before Columbus?
US News & World Report, 25 August 1997, pp. 68-70
It is generally agreed that the populations native to North and South America declined drastically after the arrival of Columbus. But estimates of the actual size of the pre-Columbian populations vary widely. George Caitlin, a 19th century artist who painted nearly 600 portraits and other scenes of Indian life, wrote in his diary that in the time before settlers arrived the large tribes totaled "16 millions in numbers." On the other hand, the US Census Bureau warned in 1894 against believing Indian "legends," and claimed that investigations showed "the aboriginal population at the beginning of the Columbian period could not have exceeded much over 500,000."
Although the question is still unsettled, modern estimates are more in line with Caitlin. One view holds that due to lack of natural immunity, most of the native population were wiped out by epidemics of smallpox and measles carried from Europe. Anthropologist Henry Dobyns argues that disease resulted in a 95% reduction in native population. Assuming that the population North of the Rio Grande had bottomed out when the Census Bureau made its 500,000 estimate, Dobyns multiplies by a factor of 20 to arrive at pre-disease estimates in the 10,000,000 range. (By comparison, this is twice as many as lived in the British Isles at the time.)
Other new estimates are based on a painstaking study of documents such as Spanish reports of baptisms, marriages and tax collection. New methods of inference include adjusting reports of explorers, who tended to report only the number of warriors. Such figures are now multiplied by factors to account for women, children and elderly men. Archaeological data have been used to estimate amounts of foods, such as oysters, consumed as a basis for population estimates.
Can we ever know the real numbers? Historian William Borah is quoted as predicting that, with decades of careful research, scholars may eventually produce an estimate with a margin of error of 30-50%.
What does Borah's "margin of error" mean? How do you suppose he
might have arrived at his value for it?
Truly, madly, randomly.
New Scientist, 23 August 1997, pp. 32-35
This articles compares the Chaitin-Kolmogorov complexity definition of randomness and the more recent approximate entropy definition of randomness given by Steve Pincus and discussed in Chance News 6.07. The author thinks that the complexity definition is not useful for practical work but the approximate entropy definition is. He describes some practical applications of randomness that Pincus has been exploring.
Every year in the US some 7000 apparently healthy babies suddenly die in their sleep (sudden infant death syndrome (SIDS)). Many of these deaths can be prevented if help arrives in time, but the problem is to recognize when an infant is at risk for SIDS. Casti states that doctors believe that SIDS occurs when an infant's heartbeats descend into a pattern of regularity. A healthy heart beats in a complex, irregular rhythm as it responds to signals form the brain, muscles and digestive system. Pincus is working with medical researchers to see if his method of randomness can be used to prevent SIDS. The idea would be to screen infants for a tendency to show episodes of extreme regularity, and to alert medical personnel or parents. They tested this by looking at individuals, who because of non-fatal SIDS episodes earlier in their lives, are known to be subject to the condition. Comparing their results with normal infants, the heartbeats of the SIDS infants frequently lapsed into periods of enhanced regularity, as measured by low approximate entropy.
In another application, Pincus has worked with medical researchers to study the variation of testosterone levels in healthy men of various ages. Since testosterone is associated with sexual desire, researchers wonder why many men lose their sexual drive as they age, while their level of testosterone does not decrease. One theory is that the level of testosterone fluctuates more randomly as a man ages and it is this that accounts for the decrease in sexual drive. Again, a measure of randomness might be used to detect the increase in randomness in the level of testosterone. And finally there is always the question of the randomness in the stock market to explore.
The digits of pi seem pretty random by Pincus's definition though
not at all random by the complexity definition. Does the fact that
the definitions are so different suggest that one might be good
for medical applications and the other not, or do you think any
reasonable definition of randomness would do, and it is just a
question of which can most practically be implemented?
The next two articles seem to go well together. The first article was suggested by Dan Rockmore.
Keeping score: big social changes revive the false god of numbers.
The New York Times, 17 Aug. 1997, 4-1
John M. Broder
It was recently announced that there had been a 1.4-million-person drop in welfare rolls nationwide over the past year. This led President Clinton to state: I think itıs fair to say the debate is over. We know now that welfare reform works.
This article discusses why it is difficult to draw such conclusions from a single number with complex political issues. It is pointed out that there are obviously many possible explanations for the drop in welfare rolls, some attributed to government policy and many others whole unrelated. Welfare expert Wendell Primus remarked: Those figures do not tell how many former recipients moved from welfare to work, or simply from dependency to despondency.
Bruce Levin, at statistician at Columbia University remarked "This is the glory and the curse of the one-number summary. You take a hundred-dimensional problem like welfare reform and reduce it to a number."
Robert Reischauer of the Brookings Institute said: We live in a society where political evaluations have to fit into a sound bite.
Reischauer mentioned the following example. According to a 1996 study, the average one-way commuting time lengthened by 40 seconds between 1986 and 1996, to 22.4 minutes. The widely reported conclusion was that since more time was spent on the freeway the American quality of life was diminishing. This disregards the fact that many commuters voluntarily moved further from their jobs to bigger homes, greener lawns and better schools.
Professor Levin remarked that the physical sciences can use
controlled experiments, medicine uses longitudinal studies and
clinical trials, but "when numbers are crunched in politics, axes
are usually grinding, too."
Wage difference between women and men widens.
New York Times, 15 Sept. 1997, A1
This article seems to provide an example where people do not want to draw a conclusion to a single number.
The gap between men and women's wages have been steadily narrowing for the last two decades. The median weekly earnings of full-time working women rose from 62% of men's in 1979 to 77% in 1993. This year the median is just under 75%.
Some economists attribute this to a natural slowing down after a rapid change. It is suggested that the social movements that caused the narrowing of the gap are a little quieter now.
Others economists suggest that it may be linked to flood of unskilled women unleashed on the market by changes in welfare.
Another suggestion is that a good part of the narrowing came from the fact that men's wages were dropping during harder times. Now during the good times men's wages are increasing faster than women's.
Experts warn against any conclusion that the earnings numbers are
evidence of growing discrimination against women. However, women
may not be convinced and apparently have been well aware of this
slowing down in the narrowing of the gap. In a nationwide survey
of women workers, released last week, the A.F.L.- C.I.O found that
equal pay was the top concern cited. 94% of the women felt that
equal pay for equal work was very important for them and 1/3 said
that they did not get such equal pay in their current jobs.
Nature rarely repeats itself: Japan's earthquake-prediction programme, the last in the world, is about the get the chop.
The Economist, 2 August 1997, p 63
Several years ago, considerable attention was focused on the Parkfield, California site along the San Andreas fault. Having observed that the area tended to experience a magnitude 6.0 earthquake every 20 years, US scientists predicted the next one would occur before 1993. But there has still not been a quake in Parkfield.
The present article focuses on Japan's Geodesy Council, whose earthquake prediction program is now viewed as a failure. The consensus is that the 160 billion yen devoted to the program since 1965 would have been better spent on building design and disaster relief measures. Over the years, the hope had been to find some pattern in data about phenomenon that precede quakes, such as physical bulging in the earth's crust, electromagnetic field variations, or chemical changes in ground water composition. It was thought that detecting such changes might give hours or days of advance warning of a disaster.
There was also hope that larger cycles of seismic activity could be identified, like the Parkfield example, that would allow long- range predictions. The prime Japanese example is an apparent trend of quakes in the Tokyo area every 69 years or so. But there has yet to be a successor to the 1923 quake there (see Chance News 6.05). Furthermore, the Council failed to give any warning for four large Japanese quakes in the early 1990s. These were in sparsely populated regions, so the failures were not glaring. Unfortunately, the 1995 Kobe quake that took 6400 lives also came as a surprise.
The article attributes the difficulty in forecasting to the fact that fault lines are rarely found in isolation. Rather there are networks of interconnected faults that redistribute seismic stress in unpredictable ways. Thus tremors are often experienced far from where rocks originally moved.
One remaining quandary for the Geodesy Council is a 1978 Japanese
law requiring that adequate warnings be given for quakes. Indeed,
polls indicate that more than 50% of Japanese still expect to
receive advance warning of a catastrophic quake.
The wallet paradox.
The Mathematical Monthly, August-September, 1997, pp. 647-647
Kent G. Merryfield, Ngo Viet, and Saleem Watson.
Suppose Bill and Laurie wonder whose wallet contains the most money. They agree that, if they have different amounts of money in their wallets, they each have an equal chance of having the larger amount. They decide to take out their wallets and, if one of the wallets has less money than the other, the owner gets the contents of both wallets. This provides a game that both Bill and Laurie feel is a favorable game. Each of them can argue that, if they end up exchanging money, they have an equal chance of losing what they have or of getting more than twice what they had. Such a game cannot, of course, be favorable to both players.
This is, of course, a mild variation of the envelope paradox (see Sec 4.3 of "Introduction to Probability" by Grinstead and Snell on the Chance web site). The authors formulate the problem in terms of random variables and show that the game is fair if the amount of money in the wallets are random variables which are independent with a common distribution. The authors discuss other examples where this game is also fair.
(1) Assume that Bill has, with equal probabilities, 1 or 5 dollars in his wallet and Laurie has, with equal probabilities, 2 or 4 dollars. Show that the conditions of our game are satisfied and find the expected value for each player when the game is played.
(2) How would you explain to your uncle Joe, who does not know
anything about random variables, what is wrong with the argument
that the game is favorable to both players?
Rich students turning away from small private colleges.
The Boston Globe, 26 August 1997, pA8
In their book "The Aid Game" (to be published this fall), Michael McPherson and Morton Schapiro argue that affluent students are increasingly choosing to attend public colleges, avoiding the high cost of small private colleges. In 1980, 31% of college freshman from the richest families--those with annual incomes over $200,000--attended public institutions. But in 1994, that figure had risen to 38%. For upper-middle income families--defined as those with incomes between $100,000 and $200,000--the corresponding percentages rose from 42% to 48%.
McPherson is the president of Macalester College, a small private college in St. Paul, Minnesota. He is quoted in the article as saying: "The idea was always that you would create these great institutions that would then be accessible to anyone who had the motivation and promise to go. That kind of fundamental promise is being thrown into question."
Beyond the cost issue, McPherson observes that students are making less of a distinction between the quality of public and private institutions. He attributes this in part to magazine ratings of colleges.
Would you agree with McPherson that magazine ratings make public
institutions look more favorable relative to private institutions?
Scores on SAT inch up statewide: Mass. leads nation in rate of test-taking, lags in math.
The Boston Globe, 27 August 1997, pA1
For the second straight year, Massachusetts led the nation in percentage of graduating seniors taking the SAT; the state's 80% figure is nearly twice the national rate. Scores on the verbal section rose one point to 508, compared to a national average of 505. Math scores rose four points to 508, just below the national average of 511.
Broken down by gender, however, the results continue to be troubling. The average math score for girls was 39 points below that for boys. This is in spite of the fact that, of test-takers statewide with A+, A or A- averages, more than 60% in each category were girls. Also, 55% of students enrolled in honors science classes were girls. Laura Barnett of FairTest in Cambridge, an organization that monitors standardized tests, sees this as one more indication of bias in the SAT. She points out that, while SATs are promoted as a predictor of first-year college grades, women score higher than men when matched with identical undergraduate courses.
College Board president Donald Stewart sees high school grade inflation as the problem. Since 1987, the percentage of students with A+, A or A- average has increase from 28% to 37%, even as the SAT averages have fallen 13 points on the verbal section and 1 point on the math. Stewart asserts that teachers who give high grades for average performance are guilty of creating a detrimental "just-good-enough" attitude.
(1) The article noted that state officials were especially pleased by the scores because "generally the higher number of students taking the test the lower the average score." Does this mean that Massachusetts should expect a lower average score than New Hampshire, its less populous neighbor? What does it mean?
(2) What factors contribute to the association implied by the (correctly interpreted) statement in the previous question. What factors in Massachusetts would tend to mitigate this effect?
(3) Given that statewide the average women's math SAT score is lower than the men's, and that women score higher on average in identical college math courses, does it necessarily follow that SATs are not predicting college performance? What else should the article have said needs to be matched?
(4) What do you think of Stewart's argument? Is he asserting
that grade inflation is favoring girls?
Heavy defeats in tennis: psychological momentum or random effect?
Chance Magazine, Spring 1997, pp. 27-34
David Jackson and Krzysztof Mosurski
Tversky and Gilovich (Chance Magazine 1989, vol 2 no. 1, pp. 16- 21) made streaks a household word, when they looked at data of professional basketball players and showed that the apparent streaks of players were not sufficient to reject the simple Bernoulli trials model for the successes or failures of the players' shots. Albright (JASA, 88, 1184-1188) showed the same was true for hits by professional baseball players.
These results seemed to suggest that the usual idea that "success follows success" did not apply, at least very strongly, to the shots of basketball players or the hits of batters.
The authors of this article consider the game of tennis. They look at the outcome of each set in a match when the first player to win 3 sets wins the match. One of the authors, Jackson, had shown previously (Chance Magazine, 1995, Vol. 8, No. 3, pp. 7-40) that a "success-breeds-success" model provides a much better fit to the outcomes in the 1987 Wimbledon and the U.S. Open tennis tournaments than a Bernoulli trials model (independent trials with a fixed probability p for success on each trial). The evidence against the Bernoulli trials model was the larger number of heavy defeats (for example 3 to 0) than would be expected from such a model.
Jackson called the model he found to fit the data the "odds" model. In this model, the odds for winning the first set for a player ranked i against a player ranked j is (j/i)^a, where a is a parameter of the model. Then, for the player who won the last set, the odds of winning the next set increases by a constant positive factor b, with b a second parameter for the model.
In this article, the authors observe that the excessive occurrence of heavy defeats could also be explained by a model in which the outcomes of the sets are independent but a player's probability of winning a set is not constant throughout the sets of the match, as it would be in a Bernoulli trials model. The authors construct such an independent trials model along the lines of their odds model and show that it cannot be made to fit the data as well as the odds model. They conclude that this provides evidence for a "success-breeds-success" model for tennis.
Does the fact that the odds model fit the data well suggest that
the rankings of the players are reliable?
Chance Magazine, Spring 1997, p 58
Howard Wainer discusses Edward Tufte's new book "The Display of Quantitative Information". We reviewed this book in Chance News 6.04 but you will want to read this enjoyable review from another expert in graphic presentations.
Wainer himself has recently provided us with a handsome book on the uses and misuses of graphics:
Visual Revelations: Graphical Tales of Face and Deception From
Napoleon Bonaparte to Ross Perot.
Howard Wainer 1997.
Springer-Verlag NY($35). ISBN 0-387-94902-X.
We just got our copy of this book and have read only the first Chapter. This Chapter "How to display data badly" uses examples from The New York Times, Washington Post, and a book "Social Indicators III," written and published by the U.S. Bureau of the Census.
Wainer explains that he chose these newspapers because they are the ones he admires and has read regularly at different times during the 20 years that he has been collecting these examples. He chose the Census book because, like the newspapers he chose, one expects good graphics, which makes the occasional error stand out more clearly.
This first chapter would make wonderful reading for Chance
students to help them recognize the misuse of graphics. Later
chapters would show them what can be learned from good graphics.
Study links parental bond to teenage well-being.
The Boston Globe, 10 September 1997, A1
A study published in the Journal of the American Medical Association finds that strong emotional connection to a parent is the factor most strongly associated with teenagers' "well-being", as measured by health, school performance, and avoidance of risky behavior. The correlations were found to hold regardless of family income, education, race, the specific amount of time a parent spends with a child or family structure.
From an initial 1995 survey of 90,000 students in grades 7 through 12, the study focused on 12,000 teenagers, who were interviewed individually at home in 1995 and again in 1996. The study was praised for its breadth and depth, and the data are expected to be a continuing source of material for investigation.
Among the findings already reported here are the following. High parental expectation for school performance were associated with lower incidence of risky behavior. Feeling that at least one adult at school treats them fairly was associated with lower risk in every health category studied except for pregnancy. Students with easy access to guns, alcohol, tobacco at home were more likely to use them or to engage in violence.
(1) When the article reports that the correlations were found to "hold" at all levels of factors such as income, etc., do you think this means that the correlations were found to be the same at all levels?
(2) The article states that "Although the researchers measured associations, not cause and effect, they found that being on close terms with a parent reduces the odds that an adolescent will suffer from emotional stress, have suicidal thoughts or behavior, engage in violence..." Do you find that this sentence clearly explains the issue of association vs. causation?
(3) Later in the article, it is reported that "researchers found
that a solid connection with a parent delays the age at which
teenagers initiate sex." Comment in light of the last question.
One of us (Laurie) spent a week recently at his cabin on Isle Royale Michigan which is now a National Park. As we reported in Chance News 5.04, Isle Royale has had a wolves and moose since 1949, when Lake Superior froze completely and wolves came over on the ice.
Biologist Rolf Peterson has been studying the wolf-moose relationship since 1970. The park is closed during the winter except for his annual trip to observe the state of the wolves and moose. Rolf writes an annual report on this predator-prey relationship that includes pictures of the wolves and of the moose, and graphics tracing the history of this relationship during the past 27 years. His report is a good place to see how well a real predator-prey situation fits the classical predator- prey model. Last winter, Rolf found that the moose population had dropped from about 1200 in 1996 to about 500 in 1997. The number of wolves increased from 22 to 24.
Rolf and fellow researchers estimate the number of moose by flying
over the island in the winter, counting the number of moose
observed in a sample of zones across the island. This is a real-
life example of the problem of estimating animal populations,
discussed in "Sampling Wildlife Populations" by Bryan F. J. Manly
and Lyman L. McDonald, Chance Magazine, Vol. 9 No.2, Spring 1996.
Chance News 5.08. We will put Rolf Peterson's 1997 report
on the Chance web site under teaching aids. It should be there
later this week.
Please send comments and suggestions to firstname.lastname@example.org.