Prepared by J. Laurie Snell, Bill Peterson and Charles Grinstead, with help from Fuxing Hou, Pamela J. Lombardi, Ma.Katrina Munoz Dy, Meghana Reddy and Joan Snell.
Please send comments and suggestions for articles to firstname.lastname@example.org.
Back issues of Chance News and other materials for teaching a Chance course are available from the Chance web site:
Better you should have an approximate answer to the right problem than an exact answer to the wrong problem.
Contents of Chance News 7.01
A faithful reader Roger Pinkham often sends us interesting comments on items in Chance News. Here are his comments on the last issue.
The question of 1/4 vs 1/3 for the probability of 2 tails, say, brought to mind something, which, if I remember correctly, happened in 1955. There was a proposed problem in the American Mathematical Monthly that asked, "A cube is positioned in the first octant. If a pellet of lead is positioned in the cube at x=a, y=b, and z=c, what are the respective probabilities of the various faces turning up?" [That is obviously not a direct quote but only the way I remember it.] I was amazed. To the best of my knowledge no solution was ever published!DISCUSSION QUESTIONS:
It reminds me of von Neumann's allegedly figuring out in his head the probability that a randomly tossed circular cylinder having height h and diameter d, comes to rest on a face. [See problem 38: "The thick coin problem in Mosteller's little book of 50 problems] I once had a student make cylinders of different materials and of various h:d to determine the answers empirically.
On the birthday problem. If you use the Poisson as an approximation, the mean number of coincident pairs is m = choose(n,2) * 1/365, the product of the number of pairs times the probability that both members of a pair have the same birthday. The probability of zero occurrences for the Poisson is exp(-m). Set m equal to the mean above and solve exp(-m)=.5 for n, and you find the number n* needed for a fair bet.(This approximation gives the correct answer 23 for the classical problem.)
Now this approach is also useful, but not quite so accurate, in handling more complicated problems such as "How many people do you need in a room to have a 1/2 chance of at least three people with the same birthday?" (For a further discussion of the Poisson approximation see: W. Schwarz The American Statistician August 1988, Vol. 42, No. 3, pp. 195-196)
Lastly, on the census and sampling, I wonder if your readers and William Safire (recall that Safire argued against using sampling in the 2000 census) are aware that, until quite recently, receipts from airfares were portioned out by means of sampling. Even though it was their economic well-being at stake, the airlines were quite happy. The point being that if you fly from Logan, MA to Monterey, CA, there may be 3 or 4 carriers involved. Only one ticket is issued and one sum collected. To go through all tickets and determine who gets what is an horrendous job. Hence the sampling. (For a discussion of this, use of sampling see "How accountants save money by sampling" by John Neter in "Statistics: A guide to the Unknown," edited by Judith Tanur et. al.)
(1) Why do you suppose the airlines no longer use sampling to compensate airlines for tickets that involve several airlines?
(2) When the birthday problem is illustrated in class, there is often more than a simple duplication, for example, two duplications or a triple. Use the Poisson approximation to find the smallest class size that makes it a fair bet that something more surprising than a single duplication will occur.
(3) The Monthly problem is really an attempt to invent crooked dice. How would you interpret the problem to try to solve it? Do you think you could determine the probabilities on a priori grounds? Do you think that von Neumann solved "The thick coin" problem" on a priori grounds?
(4) In a Chance course, two students, for their final project,
wanted to estimate the probability that a coin would stand on end.
They did the simulations that Roger's student did for a number of
cylinders (thick coins) and then fit a curve to these
probabilities. They then extrapolated to find the probability that
a real coin would land on edge. Do you think this would give a
reasonable estimate for this probability?
We get a chance to use what we learned from Roger in our next item.
Parade Magazine, 11 Jan. 1998, p.8
Marilyn vos Savant
Three readers responded to previous discussion of the famous birthday paradox. Evidently, Marilyn suggested that her readers do a survey.
The first reader reports:
I am currently working on a systems project where we have a database of employees and their birthdays, and I was astonished when I checked them. Out of 112 people, we have nine sets of people who share the same birthday!!! I apologize for not believing you.Lets see if the reader should be astonished. With a group of 112 people, the expected number of pairs with the same birthday is choose(112,2)/365 = 17.03. Using the Poisson approximation with m = 17.03 we can find the probability of 9 or fewer duplications. Doing this we estimate this probability to be .026. In other words with probability .974 we will find 10 or more pairs with the same birthday in a group of 112 people. Thus, this reader, far from being astonished that she got nine pairs should be surprised that she got this few pairs.
The second reader writes:
I started counting from the first person who came came into the office. I counted until I found a matching birthday in the group. Then I started a new survey with the next person. In eight surveys, the smallest number of people it took before I found a matching birthday was 12. The largest number was only 54.Let us see if a minimum of 12 is surprising. If we ask 12 people, the expected number of pairs is Choose(12,2)/365 = .181. Thus the probability of no match is e(-.181) = .835. The probability that there is no match in the first 12 people in 8 surveys is then .835^8 = .236. Thus the probability that in 8 surveys the minimum number required is 12 or less is .764. Thus the minimum of 12 observed is not surprising. We have left the question of whether the maximum of 54 is surprising as a discussion question.
The third reader writes:
I was born on Jan. 30. When my wife next went to the mall, I tagged along to take you up on the random survey. I spoke with 100 people, and 28 gave me the cold shoulder. Of the remaining 72, no one shared Jan. 30 as a birthday. What happened?Of course, this reader has made a familiar mistake in misinterpreting the problem to mean that someone else his his birthday.
(1) How would you use a Poisson approximation to find the distribution for the number of people you would have to ask to find someone with your birthday?
(2) How could you use the Poisson distribution to find the distribution for the number of people you would have to ask to find two people who have the same birthday? What do you think the shape of this distribution is?
(3) Ignoring the information about the minimum, how would you use
the Poisson approximation to determine if a maximum of 54 in eight
surveys, as described in the second letter, is unusual? Would the
knowledge that the minimum obtained was 12 affect this estimate?
Jim Baumgartner observed that Car Talk has a discussion of the Monty Hall problem on their web site.
You will find there a historical solution to the Monty Hall problem by Steve Selkin (American Statistician, August 1975, Vol 29, No. 3) and a response to Steve from Monty Hall. Car Talk asks you to play the Monty Hall game and they collect statistics. After our play, the statistics were
Number of... Winning Pct Initially correct: 3191 33.675% Stickers 4292 32.968% Switchers 5184 67.072% Total trials 9476DISCUSSION QUESTION:
Do you consider it encouraging that slightly over 50% of the
players so far have chosen to switch?
Study suggests light to back of the knees alters master biological clock.
New York Times, 16 Jan. 1998, A20
Extraocular Ciradian Phototransduction in Humans.
Science, 16 January 1998, Vol. 279 pp. 396-397
Scott S. Campbell and Patricia J. Murphy
Circadian rhythms are oscillations in various bodily functions and are found in a wide variety of animals. The periods of these rhythms are usually close to, but not exactly, 24 hours and thus require a daily adjustment to synchronize the functions with the external environment. In various types of animals, such as fish, birds, amphibians, and reptiles, various photoreceptor systems that allow this synchronization to occur have been found.
In mammals, it is not clear that the eye is the system that synchronizes the body with the environment. Blind people still suffer jet lag. Also, experiments done with mice that are 'retinally degenerate' have shown that such mice can synchronize their bodies to a light-dark cycle.
In the present article, the authors report on a study that shows that there are routes not involving the eye that allow the circadian rhythms in humans to be synchronized with the environment. The study involved 15 individuals, in a total of 33 phase-shifting trials. The participants were assigned to the control group or the active group at random. In the active group, the individuals were subjected to a 3-hour pulse of light that was shined on the back of the knee. The participants were not told whether they were in the control or the active group.
Much care was taken to make sure that the participants could not distinguish, by the presence of heat or light, whether or not they were in the active group.
It was found that a significant shift in the phase of the circadian rhythm could be achieved by the light pulse. If the light was presented before the time of minimum body temperature, a delay in the phase of the circadian rhythm was produced, while light presented after the minimum produced an advance. The average delay was 1.43 hours, and the average advance was 0.58 hours.
The New York Times article quotes experts as saying that the results are very surprising, but they can see no flaw in the very good research. They say that, if the research holds up, it may lead to new treatments for seasonal depression, sleep disorders, and jet lag.
Editor's note: Before rushing out to buy one of the lights used in the study, the reader should understand that the least expensive lights, sold by the company that makes these lights, cost $3,200. However, the authors of the study have told us that their experiment has been replicated with less expensive "bright light boxes" used for SAD (Seasonal Affective Disorder) therapy. Unfortunately, this being the middle of the winter, all the lights available for SAD patients at our hospital are in use.
(1) It is well-known that one can feel even a modest amount of heat if it is directed at the skin. One way to introduce heat to the skin is by infrared electromagnetic radiation. Thus, the skin is detecting the presence of electromagnetic radiation of a certain wavelength. It is therefore possible to imagine that the skin can also detect the presence of visible light, if it is strong enough. It would be interesting to carry out such an experiment, but much care would have to be taken to avoid the possibility that the light is also transmitting heat to the skin. The reader is invited to try to carry out this experiment.
(2) The New York Times article suggests that airline passengers
could wear a knee brace with a light source that would reset their
biological clocks as they sleep during the flight. Can you see the
end of jet lag?
Environmental scares: Plenty of gloom.
The Economist, 20 December 1997 - 2 January 1998, pp 19-21
In a famous pronouncement in 1798, Thomas Malthus predicted that the population of Great Britain would increase geometrically, leading to starvation because the food supply was only increasing arithmetically. Starting with the Malthus example, this essay chronicles some lesser-known failed predictions: an 1865 book that argued Britain would run out of coal in a few years, a 1941 prediction by the US Bureau of Mines that American oil reserves would last ten years, and pronouncements in both 1939 and 1951 by the US Department of the Interior stating that American oil would last 13 years. The author of this essay wonders why, with such a dismal track record, forecasters of environmental doom are still taken seriously: "These people seem to feel that being wrong in the past makes them more likely to be right in the future."
The article focuses on two famous forecasts in more recent memory: the Club of Rome's 1972 "Limits to Growth" report and Paul Ehrlich's "Population Bomb". The Club of Rome foresaw drastic price increases as a range of critical natural resources approached depletion. The essay takes the Club to task for having underestimated reserves of oil and natural gas, and also of many minerals. In every case except tin, known reserves have actually grown since the "Limits to Growth" was published. Paul Erhlich lost a bet with economist Julian Simon over the price change in five minerals (tungsten, nickel, copper, chrome and tin) over the decade of the 1980s. After agreeing on how much of each $1000 would buy in 1980, Ehrlich had agreed to pay the difference in 1990 if the inflation-adjusted prices fell, with Simon to pay if they rose. Apparently Simon has offered to repeat the bet but has had no takers.
On the subject of the population explosion, the article quotes Paul Ehrlich in the 1970s: "The battle to feed humanity is over. In the 1970s the world will undergo famines--hundreds of millions of people are going to starve to death." The author cites similar quotations over the years, noting that none square with the facts on world food production, which has risen 20% per capita since 1961. Even in the third world, the Food and Agriculture Organization finds that calorie consumption per capita is 27% higher today than in 1963. Later in the article, the author asserts that previous alarming projections on population growth have quietly been scaled down. The "explosion" was replaced by an asymptotic approach to 15 billion, a figure more recently reduced to 12 billion and then less than 10 billion. The inference drawn is that world population may never double again.
The essay points out that environmentalists have turned attention from resource depletion to pollution, raising alarms on acid rain, global warming, extinction of plant and animal species. But the author finds that the only scare for which predictions were born out was the pesticide DDT. Noting that business people who oppose environmental measures are accused of having vested interests, the essay concludes with the thought that perhaps environmentalists owe their careers to publicizing the most extreme scenarios in every situation.
(1) The essayist expresses shock that, rather than recanting its position, the Club Rome is pessimistic about the future. But does the fact that natural resource reserves were underestimated in 1972 discredit models for the consequences of their depletion?
(2) A 1960 article in "Science" magazine, entitled "Doomsday: Friday 13 November A.D. 2026" (H. Van Forester, P.M. Mora, W. Amiot, Science 132, p 1291), predicted that world population would go to infinity in the year 2026. The projection was based on the equation
where p(t) denotes population in year t, which was derived from looking at thousands of years of population data. This would seem to be the sort of dire prediction the current article rails against, and indeed it was criticized at its time of publication. Yet in a 1987 letter to Science, (Science, 25 Sept. 1987, vol. 237, pp 1555-1556), Stuart A. Umpleby noted that we were still ahead of schedule! The world population in 1980 was reported to be 4.414 billion, nearly half a billion ahead of the equation's predicted 3.969 billion. World population passed 5 billion in 1986, three years ahead of the 1989 date predicted by the equation. From the census bureau homepage, we find the current world population is estimated at 5,892,392,619. Check out how the equation is doing currently and comment on how these observations relate to the remarks in the article.
(3) Entertainer Victor Borge noted that forecasting is difficult,
especially forecasting the future. In Chance News 6.09, we
reported the dismal track record of bond market forecasters. How
does this critique compare with the present story? More generally,
does the difficulty in making forecasts mean that we should just
Breast-fed youth found to do better in school.
The Boston Globe, 6 January 1998, pA3
A New Zealand study that followed 1000 children through age 18 has found that those who were breast-fed as children fared better in school, both in teacher ratings and in performance on standardized tests. The authors of the study conjecture that fatty acids found in breast milk, but not in formula, may boost brain development.
Critics of the study observed that the breast-fed children tended to have mothers who were older, better educated and wealthier; such factors by themselves could account for the differences in academic achievement. But the authors counter that they adjusted for these factors and still found "small but consistent tendencies for increasing duration of breast-feeding to be associated with increased IQ" and other measures of performance.
(1) Does the phrase "small but consistent" say anything about statistical significance? What do you think it means?
(2) Suppose we accept the researchers position that a longer
period of breast-feeding produces benefits in later academic
performance. Does it follow that something in breast milk itself
Tobacco smoke harms arteries, study finds.
Los Angeles Times, 14 Jan. 1998, A1
Thomas H. Maugh II
An article in the Jan 14, 1998 issue of the Journal of the American Medical Association reports new findings in the relationship between tobacco smoke and atherosclerosis, more commonly known as hardening of arteries. A research team at Wake Forest University studied 10,914 people over a three-year period. At the beginning of the study, researchers measured the thickness of artery walls. At the end of the three year period, measurements were taken again. Results concluded that there is indeed a relationship between the hardening of the arteries and exposure to tobacco smoke.
Comparing smokers to nonsmokers, artery walls of smokers grew 50% more than those of nonsmokers. Those who were ex-smokers had 25% more growth than nonsmokers. Even more disturbing, those exposed to secondhand smoke on a regular basis had 20% more thickening than those nonsmokers who werenąt regularly exposed.
Smokers and ex-smokers who had accumulated the same number of pack-years exhibited the same rates of thickening. This suggested to researchers that smoke triggers an irreversible process that continues long after the smoker has quit smoking.
These findings encourage taking action on banning smoking in public places.
We read: "This is a good study...that went right to the heart of the matter," said Dr. Aubrey Taylor of the University of South Alabama College of Medicine in Mobile. "It will be very difficult for the tobacco companies to argue it away," added Taylor, who has prepared the American Heart Association's position papers on secondhand smoke. But Tom Laurie, a spokesman for the tobacco industry's Tobacco Institute, argued that earlier studies "do not show any increased risk for nonsmokers. We consider the science to be inconclusive."
Is it that easy to argue it away? What studies do you think Tom
Laurie is referring to?
Unconventional Wisdom; New facts and hot stats from the social sciences.
The Washington Post, 11 January 1998, C5
A Nation of Extremists.
Richard Morin's article explores why we consider people who disagree with our views as not only incorrect but also extreme, unreasonable, and blinded by ideology. This occurs because most people wildly exaggerate the magnitude of difference between themselves and their opponents.
Psychologists Dacher Keltner of the University of California at Berkeley and Robert Robinson of Harvard University conducted a series of studies exploring how a person views someone who holds a different opinion. They interviewed people who had strong feelings on issues like abortion and race relations. They made sure that people with opposing views were represented on each issue. They asked them about their positions and then asked them to predict how their opponents would answer the same questions. Keltner and Robinson discovered that the study participants "thought there was twice to four times as much disagreement between their position and their opponents' position than there actually was."
Keltner and Robinson's findings support a concept called "naive
realism" which describes the tendency of people to believe they
see the world objectively and that those who disagree judge the
issue through ideology. They also discovered that most people
think that even those who agree with their views are more extreme
than they themselves are. This "Lone Moderate" phenomenon,
according to Keltner, is the sense people have that "they alone
got the facts right" and that "they're the balance between extreme
Really 'risky' times.
The Washington Post, 11 January 1998, C9
George F. Will
Noting news stories following Sonny Bono's death could not help mentioning the "risks of skiing", Will worries about the media's endless fascination with risks. He objects to newscasters blindly reciting the latest figures on risk in everyday life; for example, we hear that the odds of getting cancer from an average number of X-rays are one in 100, while the odds of dying in a home accident are one in 130. He complains that every story with good news seems to contain an obligatory reference to some downside risk. He points out that the elderly are at increased risk for many diseases precisely because medicine has successfully reduced the risk of others.
So what is a clear-thinking citizen to do. Will explains that "rational worrying requires two primary variables, the probability of an event and the magnitude of that event. Living next to a nuclear power plant involves a minuscule likelihood of a large disaster. Driving to the grocery store in this age of road rage-- that is worrisome."
Ultimately, Will has a political message, namely that irrational response to risk leads to demands for public expenditures to solve perceived problems. He faults television for blurring distinctions between "measured risks" and "perceived risks."
(1) What concept is Will trying to explain in the passage about rational worrying?
(2) Do you think of the power plant vs. grocery trip comparison helps his case?
(3) There were of course two famous victims of skiing accidents
in the news recently. Should we look forward to a piece by Will on
Sex, education: Of distinctions in intimate life.
The Boston Globe, 15 January 1998, A3
Two demographic researchers had these discouraging words for those who would pursue advanced degrees: "Americans who have attended graduate school may have the money and the smarts, but they report being the least sexually active." The study, published in the journal "American Demographics", was based on 10,000 interviews over the last ten years. It found that high school graduates average 58 sexual contacts a year, compared with 62 for people with some college education, 56 for people with 4-year college degrees, but only 50 for those with post-graduate education.
Tom Smith, an official and the National Opinion Research Center, offered a simple explanation: people with two-year degrees tend to be younger, and therefore more sexually active.
Aren't there people of all ages with two-year degrees? What did
Smith mean to say?
A revamped student test reduces the gap between sexes.
New York Times, 14 Jan. 1998, B7
Karen W. Arenson
The Preliminary S.A.T (PSAT) test is the sole criteria for choosing semi-finalists for the prestigious National Merit scholarships. Despite doing as well or better in their high school and college courses, women do less well on the math and verbal part of these tests.
In 1996 the Department of Education, acting on a sex bias complaint from the organization Fairtest, arrived at a settlement with the College Board. Under this settlement the Board agreed to include a writing component to the exam. This is the one area in which girls traditionally do better than boys. In the October tests, taken by over 1.2 million students, the girls did, indeed, do better than the boys on the writing component with an average of 49.8 compared to the boys' average of 49.0. However, they continued to do less well on Math. The girls had an average of 47.6 compared to 50.9 for the boys. On the verbal part of the test, the girls had an average of 48.7 and the boys had an average of 48.9.
How the girls do in getting scholarships will not be known until later in the year. Last year 44% of the National Merit Scholarships went to girls.
While admitting the tests are now less biased, Fairtest has called for a new approach to choosing national scholarships. A spokesman for Fairtest commented that "Researchers like Carol Gilligan of Harvard have shown that females process and express knowledge differently and more subtly. They look for nuances, shades of gray, different angles. They want to be certain. One could argue these are good skills for life, but not for zipping through tests like these."
(1) The article states that fewer than 2 tenths of 1% of the students who take the PSAT test are awarded scholarships. Another statistical difference between boys and girls on S.A.T. tests is that the scores for girls has smaller variance than the scores for boys (see Chance News 4.10) Does this suggest that, even if the girls' averages were as high as the boys, they would not get as many scholarships?
(2) If research suggests that the girls might do as well as the
boys if more time were given for the exam, do you think more time
should be given for the tests?
William Hochstin suggested the following article which was also featured in the January 15 issue of Rachel's ENVIRONMENT & HEALTH WEEKLY. This is a weekly electronic newsletter that discusses environmental issues in the news. In particular, it comments on all environmental articles in the New York Times. The articles provide quite complete references for other articles related to the issue being discussed. You'll find back issues of the newsletter and how to request to be on the mailing list at their web site.
Study finds conflicts in medical reports.
Boston Globe, 8 Jan. 1998, A12
Richard A. Knox
Conflict of interest in the debate over
Calcium Channel Antagonists.
New England Journal of Medicine, 8 Jan 1998, p 101
H.T.Stelfox, G. Chua, K. O'Rouke, A.S. Detsky
In 1995, a case control study suggested that the use of calcium- channel blockers to treat hypertension led to an increase risk of heart disease. This led to an intense debate both in the technical journals and the press.
The authors of this study questioned 87 authors of 70 reports who participated in this debate in the years 1995 and 1996, about their financial ties to drug companies. Their publications were scored as favorable to the drugs, neutral, or critical. 30 of the reports were classified as favorable, 17 as neutral, and 23 as critical.
96 percent of the authors of the favorable reports had received money from manufacturers of calcium-channel blocker. 60% of the writers of neutral reports had ties to such companies and 37% of those who had written critical reports had accepted money from the drug's manufacturers.
The authors also asked if authors of critical reports were more likely to have financial ties with competing companies. The answer was no. 88% of the authors of favorable reports, 53% of those neutral, and 37% of those critical had financial ties with companies producing competing products.
The authors also found a widespread failure of medical journals to disclose authors' financial ties.
(1) In discussing limitations of their study, the authors report that they did not know if the researchers, who received support from drug companies, obtained this support before or after they wrote their reports. Which would be worse?
(2) The authors say "we believe that the authors surveyed
expressed their own opinion and were not influenced by financial
relationships with the pharmaceutical manufacturers." Do you
The role of numeracy in understanding the benefit of screening mammography.
Annals of Internal Medicine, 1 Dec. 1997, 127 pp. 955-972
Lissa M Schwartz et. al.
A number of breast cancer experts have recommended that, for women between the ages of 40 and 49, instead of having a blanket recommendation on the advisability of having mammograms, women in this age group should be given the relevant information about studies that have been done and encouraged to make their own decision. This article attempts to see if this is realistic in terms of the general public's numeracy.
The authors sent questionnaires to test numeracy and understanding of risk reduction to 500 women from a registry maintained by the V.A. Hospital in White River Junction Vt. 61% of these women returned questionnaires that could be used in the study.
In the first part of the questionnaire the women were asked to answer the following three questions:
(1) Imagine that we flip a coin 1000 times. What is your best guess about how many times the coin would come up heads in 1,000 flips?________times out of 1000. (2) In the BIG BUCKS LOTTERY, the chance of winning a $10 prize is 1%. What is your best guess about how many people would win a $10 prize if 1000 people each buy a single ticket to BIG BUCKS?_______ person(s) out of 1000. (3) In ACME PUBLISHING SWEEPSTAKES, the chance of winning a car is 1 in 1,000. What percent of tickets to ACME PUBLISHING SWEEPSTAKES win a car?________%The subjects were then given a scenario randomly chosen from the following four possible scenarios.
Scenario 1: 12 in 1,000 women will die from breast cancer in 10 years without mammogrammy. Mammography will reduce breast cancer deaths by 33%. Scenario 2: Mammography will reduce breast cancer deaths by 33%. Scenario 3: 12 in 1000 women will die from breast cancer without mammography. Mammography will reduce breast cancer deaths by 4 in 1000. Scenario 4: Mammography will reduce breast cancer by 4 in 1000.The women were then asked to answer the following two questions:
Imagine 1,000 women exactly like you. Of these women, what is the best guess about how many will die from breast cancer during the next 10 years if... they are not screened every year for breast cancer by mammogram. __________ out of 1000 they are screened every year for breast cancer by mammogram. ____________ out of 1000For the three preliminary numeracy questions the percentages of correct answers were:
30% got 0 correct 28% had 1 correct 26% had 2 correct 16% had 3 correct.For the final two questions the percentages of correct answers were:
Scenario 1 17% Scenario 2 10% Scenario 3 33% Scenario 4 7%For the two cases where no baseline figures were given, the answer was considered correct if the difference between the number of deaths with mammograms and those without was estimated correctly.
The authors consider these results quite discouraging. However, they were encouraged by the fact that those who did better on the three-question "numeracy" test also did better on the two real- life questions.
(1) How do you think doctors should explain risks for breast cancer of the type considered in this survey?
(2) The authors state that about 1/3 of the respondents estimated
that the number of heads that turn up in 1000 tosses of a coin
will be less than 300. Meghana remarked: We should not be
surprised by this. The average person is used to probabilities in
everyday life--for example in lotteries--not turning out as
favorably as they expect them to. What do you think of Maghana's
theory? How could you test it?
Joan Garfield suggested the following article. The Star Tribune is the first newspaper to agree to letting us link to their articles and they will be kept at their site for a year.
Report cards for teachers.
Star Tribune, 21 Jan., 1998, A1
William Sanders has studied achievement scores for students in the Tennessee school system for two decades. He used statistical models to predict the academic gain a child could be expected to make in the next year based on traditional factors such as race, socio-economic status, class size, etc. and also a factor not normally considered: the effectiveness of the teacher. Sanders claims that the effectiveness of the teacher is by far the most important factor in predicting achievement of the students.
Sanders is quoted as saying: "The variability among teachers is tremendously bigger than among schools. And it's only when I look at the very top teachers do I see children making gains above expectation."
The idea of considering the effectiveness of individual teachers in controversial, but Sanders work has had significant influence on education in Tennessee. Berg reports that: in 1994 the state Education Department began publishing school-by-school comparisons statewide based on gains students were making on standardized tests. In 1996 teachers of grades two through eight began receiving yearly reports showing how much their students progressed compared with expectations - and compared with other kinds in the school system and the state.
One critic of Sanders work stated: "People love numbers, I love
numbers. It would be great if we could rate everything 1 through
10. But we're dealing with human systems that don't fit into such
a neat pattern. What do you think about this?
We got behind in our reading of Chance Magazine. Here are two articles in the Summer issue that we found particularly interesting. We'll get to the current issue by the next Chance News.
A long line of dead men.
Chance Magazine, Summer 1997, pp. 36-39
Borko D. Javanovic, Paul S. Levy, Jacob A Brody
The article was inspired by a mystery novel "A Long Line of Dead Men" by L. Block (New York, Avon Books, 1995). In the novel, Detective Scudder is told by a client that, in 1961, the client was approached by an elderly man who asked him to join a club of 29 other men in their twenties. Members of this group would meet once a year for dinner in a specific restaurant. The sole activity of the meeting was to read the names of members who had died in the past year. When the group had only one surviving member it was the duty of this last member to recruit 30 young men in their twenties to form the next group.
Detective Scudder's client said that, at the 1993 meeting, there were only 14 surviving members of the original 30. The client felt that 16 deaths was more than should be expected by natural causes. He consulted an insurance expert who told him that this number was unusual but not significant because of the small numbers involved. He was told that significance could only be established when the number of cases is large.
The authors of this article were offended by the lack of faith in the statistics of small numbers. They felt that the problem raised here was similar to the familiar cluster problem that occurs when a cluster of people in a certain area are found to have a certain disease. In such a situation an event has already happened, but one still wishes to assign a p value to the occurrence of the event. They consider an example where 4 out of 5 members of a group, who worked on a project involving radiation, died of cancer and the 5th person died of another disease. Assume that nationally 20% of the people die of cancer. Then, assuming no association with the radiation, the chance of 4 of the 5 dying of cancer can be found using the binomial distribution and gives a p-value of 0067 -- apparently showing a significant result.
The problem is that the cluster that has been observed is not a random sample of five people, because a significant cluster is more likely to be detected than a cluster of five people with only one cancer death. This leads the authors to define an adjusted p value for a cluster of size k, given that it has been detected. They begin by assigning a probability that a cluster of k cancer deaths will be detected. They assign this probability to be 0 if k is less than or equal to the expected number of cancer deaths. In this example this occurs if k = 0 or 1. They next assign the probability of being detected to be 1 if the number of deaths is 3 or more standard deviations above the mean. Again using the binomial distribution, the standard deviation for our example is .9. Thus 4 deaths would be 3 standard deviations above the mean, and so the probability that a cluster of k cancer deaths will be observed is 1 if k = 4 or 5. The probability of being detected for values of k between 1 and 4 is then determined by linear interpolation.
Then the probability that a cluster of 5 is detected is obtained by adding the probabilities p(j)p(detected|j) for j = 1 to 5. Using this we can obtain the probability of a cluster iwth k cancer deaths given that it has been detected by
We have seen that the unadjusted p-value for our example is p(4) + p(5) =.0067. The authors define the adjusted p value to be p(4|detected) + p(5|detected) = 0587 + .0027 = .0614. Thus a cluster with 4 cancer deaths, which appeared to be significant at the 1% level, is not even significant at the 5% level when you take into account the fact that it was detected.
The authors carry out the same kind of analysis for the data from "The Long List of Dead Men." Using standard mortality tables, they estimate the probability that a twenty-year-old will die within the next 30 years. Again using the binomial distribution for the group of 30 20-year olds, the authors find that the expected number of deaths is about 5 and the standard deviation about 2. The traditional p value is p(16 or more deaths) = .00001. The probability that k deaths would be detected is 0 for k less than or equal to 5 and 1 for 11 or more deaths. Other values are obtained by linear interpolation between these values. From this the authors obtain the adjusted p value by adding p(k|detected) for k greater than or equal to 16. This results in an adjusted p value of .00010--still highly significant. Thus Detective Skudder was justified in springing into action. You'll have to read the book to discover what he finds.
Do the authors' method of calculating the adjusted p values seem
reasonable? Are these subjective probabilities?
A statistician reads the sports page: shooting darts.
Chance Magazine Summer 1997, pp. 16-19
Hal S. Stern and Wade Wilcox
Hal Stern likes to find situations in sports where statistical analysis can help in making strategic decisions. Here with Wade Wilcox he gives an example from the game of darts.
Visualize a dart board as a pie which has been cut into 20 slices numbered, not in consecutive order, from 1 to 20. The slice marked 20 has neighboring slices marked 1 and 5 and the slice with 19 has neighboring slices marked 3 and 7.
There is an outer ring on the pie (dart board) and if your dart lands in this ring you get 3 times the value on the slice you hit. While there are other ways to get points that we have not described, the fastest way to accumulate points is to hit this outer ring on the 20 slice getting 60 points. However, if you miss fairly often and end up on a neighboring slice, it is not clear whether you are better off aiming at the 20 slice or the 19 slice since, when you miss you get more points from the neighbors of the 19 slice than from the neighbors of the 20 slice. The authors are interested in answering this question of strategy. They models the situation as follows:
Assume that the player aims at the center c of the outer ring on the 20 slice. Let x be the vector connecting c with the point where the dart lands. Assume that the horizontal and vertical components of the vector x are normally distributed with mean 0 and standard deviation s. Under these assumptions the radial distance r of the dart from c has density
(There is a type in this density in the article) This density is known as a Rayleigh density and also a member of the class of Weibull densities. The standard deviation s is then a measure of the accuracy of the thrower. s can be estimate using the fact that the variance of r^2 is 4s^2. Stern provides data showing that this model seems to fit experience quite well.
From the model, the expected winning on a throw of a dart when you aim at 20 or at 10 can be calculated. The model leads to the conclusion that, if your accuracy suggests that your value of s is less than 17, you should aim at the 20 ring, but if it is greater than 17 you are better off aiming at the 19 ring. Distance is measured in millimeters.
There are many websites devoted to darts. For a site that considers strategic decisions like those discussed here go to Darts.
(1) The author suggests estimating s by estimating the variance of the radial distance. The expected radial distance is sqr(pi/2)*s, so this would provide another way to estimate s. Which method do you think would give the better estimate?
(2) If you estimate your value of s as 17, what is your average
radial distance -- i.e. your average error in inches?
Please send comments and suggestions for articles to email@example.com