Prepared by J. Laurie Snell, Bill Peterson and Charles Grinstead, with help from Fuxing Hou, and Joan Snell.
Please send comments and suggestions for articles to
Back issues of Chance News and other materials for teaching a Chance course are available from the Chance web site:
Chance News is distributed under the GNU General Public License (so-called 'copyleft'). See the end of the newsletter for details.
Chance News is best read using Courier 12pt font.
The Mind, like the sense of sight, has its illusions; and just as touch corrects those of the latter, so thought and calculations correct the former.
After we went to press an article on the Chance Project appeared in the Washington Post, Sunday, April 4, 1999.
You can read it here.
Contents of Chance News 8.03
Note: We are updating the Chance web site. Any suggestions for
new material related to understanding chance in the news,
including data sets, videos, applets, activities, links, etc.
would be appreciated.
Here are four web sites that you might enjoy visiting.
Tools for Teaching and Assessing Statistical Inference
Joan Garfield and Robert C delMas, Beth Chance
This is an NSF project to construct modules based on simulations to help students understand basic statistical concepts such as: sampling distribution, confidence intervals, p-values and power and to assess the students' understanding before and after using these tools.
Each instructional unit consists of:
A teachers guide that describes prerequisite knowledge students need to complete the unit, common misconceptions that students display when reasoning about the statistical concepts presented in the unit, and the goals of the unit.
A pretest based on the prerequisite knowledge for a topic. The pretest can be used as a diagnostic test aimed at correcting misunderstandings before students engage in the instructional unit.
An activity that is based on the simulation software.
A posttest to evaluate students' understanding of key concepts and their ability to use these concepts in solving problems.
This is the result of a three year NSF project started in 1997 that provides:
A. A HyperStat Online statistics book
This is an online statistics book for a first course in statistics. In addition to its own text, each topic includes links to other statistics resources on the web related to the topic. This is an excellent way to find out what else is on the web on a given topic.
Java applets that demonstrate probability and statistical concepts encountered in a first statistics course. They are beautifully done. For example, the normal approximation to the binomial distributions allows the students to vary n and p to see what an amazing theorem the Central Limit Theorem is. Students can also calculate probabilities for intervals, explore the effect of the correction continuity, etc.
C. Case studies
Case studies provided in a standard form illustrated by the first case study: Smiles and Leniency
Instructions for JMP
Instructions for SAS
Hear the author speak (2MB)
D. A virtual analysis lab
This is an on-line statistical package that allows you to analyze your own data or other data provided by the lab itself. At the moment it includes the data sets from:
The Rice Virtual Lab in Statistics case studies. Introduction to the Practice of Statistics by Moore and McCabe. Statistical Methods for Psychology by David Howell.
More data sets are planned and you are invited to submit data sets to the lab.Students can take an on-line course for credit using all these resources by enrolling in the Rice Summer School.
This is an NSF project to "create a collection of modules that surprise and engage students - the 'Gotcha!' of probability" It has the nicest illustration of the birthday problem we have seen. Random birthdays float onto a year's calendar with date boxes indicating the numbers of birthdays for that date. Other Applets simulate the coupon problem, the matching problems, a tree problem, and an experiment to see if you can tell the difference between a Bernoulli trial sequence and a dependent sequence.You will find here also a detailed syllabus for a first probability course based on Introduction to Probability by Grinstead and Snell. This text is available on the web (www.dartmouth.edu/~chance/teaching_aids). Holmes has added her own materials and links to other sites for each class in the course.
We find here a comparison of the average age at death of a rockstar as compared to the general population. The owner of the web site claims to have made an honest search for the age at death of rockstars and found it to be 36.9 years as compared to a 75.8 for the U.S. population. You will find a list of the 317 rockstars that he used for his estimate including cause of death. The most frequent cause of death was heart attack (41) followed by drug overdose (40).
(1) The web site is sprinkled with quotes from the bible such as: The fear of the LORD prolongeth days: but the years of the wicked shall be shortened (Proverbs 10:27). Does this suggest a bias of the owner of the web page?(2) Do you believe that this is a reliable estimate for the average lifetime of rockstars?
Leap years are determined in a way which depends on the number of the year modulo 400. But in 400 years, including 97 leap years, there are exactly 20,871 weeks. Hence the calendar repeats every 400 years. In 400 years the thirteenth of the month occurs just 4800 times, distributed among the different days of the week as follows:Sunday Monday Tuesday Wednesday Thursday Friday Saturday 687 685 685 687 684 688 684
(1) Robinson remarks: If there were not an exact integral number of weeks in 400 years, there would be an equal chance for the thirteenth of the month to fall on each day of the week. Is this correct?(2) (Bob Drake) Is Friday the 13th also the most likely combination of a day of the month and day of the week?
Could it be that our capacity for happiness is, in large part, genetically determined? That is the thesis of the book "Happiness," by David Lykken. A retired University of Minnesota psychologist, Lykken is best known for his studies of twins. Such studies provide evidence for a "happiness set point," around which our emotional state fluctuates during good and bad times, but to which we eventually return in the long run. Lykken's research found that one identical twin's sense of well-being could be predicted from the other twin's score, or from the first twin's score at an earlier time.
Lykken's conclusions of course run contrary to popular opinion. Most people feel that winning the lottery or getting a big promotion would make them happier. Lykken would agree in the short term. But in the longer term, he sees it as part of "nature's plan" that a person gradually returns to his of her original happiness set point. Similarly, after a period of grieving following the loss of a loved one, people return to being their old selves. He describes this ability to adapt to changing conditions as a selective advantage, improving our chances of producing viable offspring. Nevertheless, Lykken believes people can live happier lives. The key is to tune in to the small things that make us happy. The accumulation of many happy moments is the basis for a happier life.
Lykken's views open up a nature versus nurture debate, and some professional psychologists disagree with his conclusions. Byron Egeland, a professor of child development at the University of Minnesota, argues that our experiences at an early age can strongly affect whether we view the world as a supportive place in which we can ultimately be happy.
(1) Do you think that happiness is quantifiable?
(2) Lykken proposes a measurement scale in units call haps. "A good meal is worth about one hap; if you prepared the meal yourself and others share (and appreciated) it, it might yield two haps...A hole in one would be worth one hap to me, because of the novelty of the experience, but perhaps several haps to a real golfer, who would regard it as proof of skill." What do you think of this?(3) See if you can find how large the correlation was for the twin study. It would be nice to have this data as an example of correlation.
There has been a recent scare in Britain over the safety of genetically modified foods. Without taking a position on this issue, this article urges readers to be cautious when evaluating risks, because probabilities can be counterintuitive. Three classic examples from the literature are cited as evidence: the birthday problem, the Monty Hall problem and the false-positive problem.
The version of the birthday problem presented here asks for the chance that in a group of 25 randomly selected people, two or more will have the same birthday (correct answer: about 0.57). One respondent at the Economist reportedly said: "I know this. It's much bigger than you think. One in four." So maybe even seeing the answer once doesn't make it stick!
Here is the Economist's explanation for why it makes sense to switch doors when Monty offers. "The point is, your chance of winning the car was one in three to begin with--and after Monty reveals a goat, the probability that your box has the car is still just one in three. Because Monty's choice was not random (he didn't open just any box, he revealed a goat) the remaining probability of two-thirds gets squeezed, as it were, into the third box." But, noting that "discussions of this point sometimes prove violent," the article provides a tree diagram at the bottom of the page.
Finally, here is the statement of the false-positive puzzle. "You are given the following information. (a) In random testing, you test positive for a disease. (b) In 5% of cases, this test shows positive even when the subject does not have the disease. (c) In the population at large, one person in 1000 has the disease. What is the probability that you have the disease?" The article states that most people answer 95%, when the answer is really 2%. The article explains that in a population of 1000 people, one will really have the disease, but another 50 will also test positive.
(1) Do you find the article's solution to Monty's problem convincing enough to ward off violent responses?
(2) What unstated assumption was used in the article's solution to
the false-positive problem?
Bottom line: Is it good for you? Or bad?
USA Weekend (Sunday supplement), 26-28 February,1999. pp. 8-9
In Chance News 8.02, we discussed the controversy surrounding a study published in the New England Journal of Medicine, which found no significant benefit from a high-fiber diet in terms of preventing colon cancer. This appears to contradict years of recommendations from medical experts, and is the kind of story that contributes to the public's confusion and skepticism about health advice.
The present article highlights several such problems (a sidebar is entitled "Would you like a contradiction with your coffee?"). It reports that a meeting between journalists and researchers was organized by the Harvard School of Public Health and the Society for Epidemiological Research, to discuss how to deal with these issues. The attendees noted that there were many places to lay the blame:
1. Some peer-reviewed journals now use public relations firms to promote their latest issues. Similarly, universities have an incentive to hype their own research in order to attract more funding.
2. The media feel pressure to pursue what Richard Harris of NPR calls the "news-you-can-use angle." As a result, "...every little twitch like oat bran becomes a huge trend. Science moves more slowly."
3. Scientists who have worked for years in relative obscurity may be seduced into overstating their case when they find themselves in the media spotlight.
4. The public's obsession with new findings inevitably leads to overreaction.
What is to be done about this? The article recommends that we educate everybody--scientists, media and the public. Journal articles are now appearing that teach scientists how to present findings to the media. The American Council on Science and Health publishes a series of booklets entitled "The Causes Of" intended to distinguish fact from speculation. The Council is working on a book for journalists. For consumers, moderation and eating a balanced diet is the most important message. It's not flashy news, but in the long run it may be the best response to all the controversies.
(1) Related to the idea of daily heath updates, we see daily public opinion reports based on call-in polls and other forms of voluntary response. Do you think there is public demand for these, or are the news media foisting them on us because such polls are cheap to produce?(2) Who do you think is most to blame: the journalists or the researchers?
Those who shop online for books at Amazon.com know that the company reports sales rankings for each item. How much sales activity does it take to create significant movement in the ranks? The author of this article knew the story of an eight-year old book whose rank jumped from beyond 30,000 to 166 in a matter of days, apparently as a result of a passing reference made to it in an online column.
Obviously, this called for a test. The author selected Thomas Carlyle's 1837 "The French Revolution," whose rank was 92,010, as a sufficiently obscure candidate. The test consisted of buying one copy of the book each day for a week. The first purchase was made on a Monday. Tuesday, the book had moved to 77,392, and the second purchase was made. As the process continued through the week, the book ranked 69,967 on Wednesday, 62,741 on Thursday and 57,982 on Friday. On Saturday, the author forgot to log on, but the rise continued anyway. On Sunday, the book's rank was 54,362, and another copy was purchased. As it turns out, this was Superbowl Sunday, which the author speculates must be a bad day for booksellers. That's because Monday the rank had hit 2923!
Curiously, this was the highest rank obtained. Despite a purchase on Monday, the book had fallen to 39,338 by Tuesday. Subsequent purchases failed to raise it above 30,000--even a last-ditch attempt by purchasing five copies at once.
A phone call to Amazon shed a little light on the process. A company representative brought up sales figures for the book in question. Although company policy forbids disclosing these, she did acknowledge that an incremental change in sales activity can have a strong impact on a lower-ranked title. Then she added, "Hmmm. I can see your traffic pattern."
(1) Based on this evidence, how do you think Amazon's ranking system might work?(2) What do you think of this experiment? Do you think the author should have followed (without purchase) some other works on the French Revolution?
The placebo controlled experiment is the so-called "gold standard" in clinical trials research. Yet such procedures are not free from ethical controversies. The article describes the unfortunate effects of using placebos in three studies of new anti-psychotic drugs over the last decade. Overall, more than 850 seriously schizophrenic patients were given sugar pills instead of existing drugs known to be effective. Up to 70% of these patients experienced deterioration in their condition severe enough that they had to be removed from the trial. One patient committed suicide while on a placebo. Critics have objected that the mentally ill are being used "like lab rats."
In fact, for tests of new drugs current FDA policy requires the use of placebo rather than existing drugs. But a Boston Globe investigative report last November highlighted the problems with this policy in the case of psychiatric research. It found that many of the patients enrolled in the study were not mentally competent to give "informed consent," since they could not understand the implications of being put on placebo treatment.
The Massachusetts Department of Public Health has responded by putting on hold two studies involving placebos. The American College of Neuropsychopharmacology, an international research group, has begun a debate on drug-testing practices in general. Referring to the experiments described above, Dr. Charles Weijer of Canada's Dalhousie University says "This is probably the most harmful ongoing research abuse in North America."
(1) The ethical issues here are quite compelling, of course. But in other situations, what do you think is the reason for preferring placebos rather than comparing new treatments to existing drugs?(2) What do you think the rules are as to when you can use a placebo in a controlled experiment? Whose rules are these?
We have a new Chance Video of a talk given by epidemiologist John Baron who reviews the studies on the benefits and risks of daily aspirin. The Washington Post article in November explains why we need such a review.
In November the American College of Chest Physicians (ACCP) recommended that those 50 or older with at least one risk factor for heart disease take aspirin daily to decrease their chance of having a heart attack. This came only one month after the Food and Drug Administration explicitly told healthy Americans not to take aspirin daily to prevent heart attacks.
The (ACCP) based their recommendation on a review of findings from the ongoing Harvard Nurses Health Study. This study showed a 30 percent decrease in heart attacks for women aged 50 and over who took aspirin daily compared to a similar group who did not take aspirin daily.
This is consistent with the finding of the classic Physicians Health Study. This study involved 22,000 US doctors randomized to take aspirin every other day or placebo. The physicians study was stopped early because of the dramatic results observed. While there was no significant decrease in deaths for the aspirin group, those 50 years or older who took aspirin had 33% fewer heart attacks than those who had the placebo.
According to this article, much of the FDA concern comes from the fact that regular use of aspirin can cause bleeding throughout the body. In the brain this bleeding is the cause of hemorrhagic stroke which accounts for 20 to 25 percent of all strokes.
The FDA, the American College of Cardiology, and the American Heart Association all are waiting for more data to make a recommendation for healthy patients. Aspirin is recommended by the FDA for the treatment of heart disease.
In his talk, John Baron reviews the studies to date. He reported that new studies indicate that daily aspirin can also prevent certain types of cancer including colon cancer. However, unlike the benefit for preventing heart disease, the cancer protection comes only after taking aspirin regularly for ten to fifteen years.
Based on his review, Baron concludes that he would recommend regular aspirin for those who have had heart problems, or have significant risk factors for heart disease. For others, the protection of daily aspirin against dying of heart disease is just about balanced out by the risk of dying from bleeding caused by the aspirin. Baron concludes that he would not recommend daily aspirin for those without risk factors for heart disease. He remarks that while even small amounts of aspirin (baby aspirin) daily or every other day seem to be effective, how small the dose can be has not been settled. Obviously, the smaller the dose the better in protecting against bleeding problems.
(1) The organizations that do not make a recommendation on whether healthy people over 50 should take aspirin regularly say that such people should consult their doctor. Do you think this justifies their not making a recommendation?(2) Read the Washington Post article and listen to John Baron's Chance Video and comment on what your recommendation would be for a member of your family 50 years or older.
Is the war on cancer in the United States being won? How would one decide which statistics to use in trying to decide the answer to this question? In this article, the first statistic cited is that during the 1991-1995 period, the death rate for all cancers combined fell by 2.6%. But is this a trend, or a fluke? One might argue that surely, with the large sample size under consideration (namely the U. S. population), this result is statistically significant. However, this article goes on to point out that many cancers remain undiagnosed unless an autopsy is performed, and the rate at which autopsies are performed has dropped in recent years.
It is claimed by the 'cancer establishment' that half of this overall improvement in cancer death rates is attributable to changes in 'life style' factors: smoking, diet, exercise. However, Richard Clapp, at the Boston University School of Public Health, notes that heart disease is, to a large extent, attributable to smoking, poor diet, and lack of exercise. Yet, over the past 25 years, the heart disease death rate has declined by 49%. It is not clear how to rectify these two statements.
Finally, there is a discussion of carcinogens, and the problems associated with their identification. First, in a typical test for carcinogenicity, groups of about 50 mice or rats of each sex are exposed to various levels of the chemical, after which the animals are killed and examined for cancer. In the human population, both the dosage and the population size are very different than in the test, and it is very hard to extrapolate from the test results. For example, if a chemical causes cancer in 1 out of 10,000 humans who are exposed to it, one would probably not be able to see any effect of this chemical on the animals in the test. Nevertheless, in a population as large as the United States, such a rate translates into thousands of cases.
In addition, chemicals that cause liver cancer in rats might cause some other type of cancer in humans. Thus, if we see a rise in the rate of a certain type of cancer, it is extremely difficult to attribute that rise to any particular carcinogen. According to the National Cancer Institute, `there is no adequate evidence that there is a safe level of exposure for any carcinogen.'
The article concludes that if one excludes lung cancer (the incidence rate of which has fallen, due to fewer people smoking cigarettes), then the incidence rate of all other cancers, taken as a group, has increased by an average of 0.8% per year over the past 45 years, even though the death rate has fallen. Is it fair to say, given this statement, that we are at a turning point in the war on cancer, as Donna Shalala, the U. S. Secretary of Health and Human Services, declared in 1996?
(1) Is 1 cancer case out of 10,000 exposures a worrisome rate? If so, how would you decide what rate would be acceptable? How should society decide how much money is worth spending to lessen the exposure rate to an acceptable level? Is it possible to lower this rate to zero for a given chemical?
(2) If you were to try to decide whether the war on cancer was being won, would you look at the incidence rate or the death rate?(3) The incidence rates of many other causes of death have decreased over the last 45 years. Should this be taken into account when trying to think about the war on cancer? If so, how would you propose doing this?
Passive smoking is the term used to describe the exposure of humans to the cigarette smoke of others. The most basic kind of question that is raised concerning passive smoking is whether it increases the risk of certain types of illnesses, such as lung cancer or heart disease. This is an important question from the point of view of public health, since, if there is an increased risk for the non-smoking population, then it becomes incumbent upon public health officials to drastically lessen the exposure of the population to second-hand smoke.
This article describes a 'meta-analysis' of the risk of coronary heart disease and passive smoking. A meta-analysis consists of the drawing together of data from many studies that have already been carried out. As might be imagined, such an analysis is a very tricky thing to carry out correctly. In an editorial in the issue of the New England Journal of Medicine that contains the present article, John Bailar III, a medical doctor at the University of Chicago, describes some of the general problems that can plague meta-analyses. These include publication bias, i. e. the possibility that only studies that lean in one direction on a specific question are published, and bias on the part of the team carrying out the meta-analysis. Another possible problem is that different studies concerning a given question may not be homogeneous; they might be radically different in their size, one might be a prospective cohort study, while another might be a case-control study. The studies might not be of equal quality; in this case, how does one measure the quality of a study, and does one weight the results of the higher quality studies more heavily in the meta-analysis?
The most important result claimed by the authors is that non- smokers who are exposed to second-hand smoke have a relative risk of 1.25 as compared with non-smokers who are not exposed to second-hand smoke. The 95% confidence interval for this estimate is stated to be (1.17, 1.32). In his editorial, Bailar states that the generally reported increase in incidence of coronary heart disease among smokers is 75% (i. e. the relative risk is 1.75). He goes on to say that he is troubled about the above figure of 1.25 for non-smokers, since the non-smokers are exposed to far more diluted smoke than are the smokers. He compares these two relative risk figures to the corresponding ones for lung cancer, which he says are 1.25 for non-smokers exposed to second- hand smoke, and 13.0 for smokers. He says that another way to understand these last two figures is to say that 'the added risk of lung cancer that is due to environmental tobacco smoke may be about 2% of the risk associated with active smoking.' Note that in the present article, the corresponding percentage is 33% (using the two relative risk figures of 1.25 and 1.75).
Bailar concludes by stating that in his opinion, 'we still do not know, with accuracy, how much or even whether exposure to environmental tobacco smoke increases the risk of coronary heart disease.'
(1) Another concern about meta-studies is that in a significant number of cases where there has been a large traditional study opposite conclusions are reached. If you saw two such studies that disagreed which would you believe?(2) Of course critics of a meta-study can also be biased. Would you suspect the authors or the critic to be biased in this case?
City Centre Mall, a huge, gray, pedestrian mall in downtown Middletown was built in the 70's and has long been little-to-un- used. The article describes the painful politicking concerning just what is to be done and how much money the project will consume. Of interest to the city commissioners is a plan that the good people of Middletown will support.
In order to measure the pulse of the populace, the commissioners asked for volunteer suggestions. Over a course of 90 days, "about" 100 responded. The commissioners are planning a course of action based on these 100 responses.
More recently, The Journal asked for a similar volunteer response. Over 10 days, 65 responded with an entirely different replacement plan. An article about the survey appeared last Sunday.
On Tuesday, the commissioners publicly criticized the article. "They said the limited number of respondents- 65 in 10 days' time; in comparison, the city had about 100 respondents in 90 days- makes the survey worthless.
In my opinion, and I took statistics in college, it's very unscientific and unfair," (Commissioner Paul) Nenni said. "In my opinion, that article was not worth publishing."
(1) Do you agree with the commissioners? Is the time span for the collection of the responses to the two surveys important?(2) Did Commissioner Nenni take a Chance course? Should statistics instructors now be worried that they may be named in future articles? Would you like to know what grade Commissioner Nenni received in his statistics course?
This article reports an interview with Warren Weaver. Weaver is best known in our field for his book "Lady Luck". At the time of this article Weaver was working for the Rockefeller Foundation and gave the Dartmouth mathematics department its first grant. It was a small grant by modern standards but had a huge effect on the development of the Dartmouth math department.
In this interview he tells the public how they can make money by probability shell games. For example, he suggests that you take three cards and mark a red dot on both sides of the first one, a black dot on both sides the second card and a red dot on one side and a black dot on the other side of the third one. Then ask your friend to mix up the cards and put one down. If Weaver sees a red dot on the side of the card facing up he bets that it is red on the other side. If it is black he bets it is black on the other side. Of course, you are correct two-thirds of the time and if you can get your friend to be even money you are in business.
Another proposal it that you suggest to your friend that you choose a set of random numbers. "Let him take the census counts sine 1790 in the Queens if he likes (it's on page 158 of the new World Almanac." Now you lay a dollar on what number appears first on each figure in the column. If the first figure is 68,388 then the first number of that figure is 6. To give your friend a break you will bet it is one of the numbers 1,2,3,4 and let your friend win if it is 5,6,7,8,9. If he gets suspicious when you seem to be winning too often, offer to switch with him and this time choose the first digits of the numbers in the telephone book.
(1) Why does Weaver win in his card game?(2) Why does he win before and after switching in his first digit game? How much should he win in each case?
A Washington-based nonprofit organization, the Surface Transportation Policy Project (STPP) has released two reports: The Texas Transportation Mobility Study and Aggressive Driving: Are you at risk?. The LA Times article discusses the aggressive driving report.
Using federal data, STPP ranked metro areas with a million residents or more. They found that among large metropolitan areas, the one with the highest fatality rate due to aggressive driving crashes was Riverside-San Bernardino, California, with a rate of more than 13 deaths per 100,000 residents. Their results showed that driving deaths are much higher in places with uncontrolled sprawl development, where the car is the only way to get around. The places with the least aggressive driving tend to be older areas with more neighborhoods having grid street patterns, sidewalks, and more developed transit systems. The large metro areas with the lowest aggressive driving death rates include Boston, with two deaths per 100,000 people due to aggressive driving. This will come as a shock to anyone who has driven in Boston. New York and Minneapolis had similar low death rates.
The LA Times article discusses primarily the California problem and the reasons for their winning rage record.
The Mobility study is a long report which ranks areas according to the following indices:
Roadway Congestion Index--cars per road spaceFrom this report we read:
Travel Rate Index--amount of extra travel time
Delay per eligible driver--annual time per driver
Delay per capita--annual time per person
Wasted fuel per eligible driver
Wasted fuel per capita
Congestion cost per eligible driver
Congestion cost per capita
The annual traffic congestion study is an effort to monitor roadway congestion in major urban areas in the United States. The comparisons to other areas and to previous experiences in each area are facilitated by a database that begins in 1982 and includes 70 urbanized areas.
(1) What kinds of driving behavior do you think the authors of the report considered as aggressive driving?
(2) What do you think are some of the reasons for the record established by the Riverside-San Bernardino drivers? What could they do to improve the situation?(3) The Mobility report made a number of recommendations to decrease congestion. What recommendations do you think they made?
Statistics, the King's Deer And Monsters Of the Sea
How many species of sea monsters are left to be discovered? The correct answer is 47, give or take a couple, claims Charles Paxton of Oxford University.
Paxton isn't a prophet. He's a marine biologist who has created a statistical model that, he claims, predicts how many species of creatures measuring more than two meters--about six feet--still swim, slither or crawl undiscovered in the world's oceans.
Of course his prediction may be off a bit, he allows in a recent issue of the Journal of the Marine Biological Association of the United Kingdom. It also may just be possible that there's nothing new of any size lurking in the depths.
But Paxton doesn't think so. Buoyed by the results of his statistical model, he's confident that there are probably many more marine mammoths out there--including, perhaps, "a couple of new totally weird sharks"--and maybe even another Moby Dick. "Field observations suggest one or two species of odontocetes [toothed whales] probably await capture and formal description," he reported in his article.
His confidence is based on the results of a statistical technique frequently used by ecologists to estimate the diversity of populations, based on only a small sampling. It's derived from the work of the English mathematician and biologist Sir Ronald Aylmer Fisher.
But Paxton's work is reminiscent of a tale, perhaps apocryphal, told about the famed French mathematician Simon Denis Poisson (1781-1840), whose achievements were so notable that he's honored with a plaque on the first level of the Eiffel Tower. The late Stephen Withey of the University of Michigan liked to tell the story to skeptical graduate students (including your Unconventional Wiz) in trying to prove that it really is possible to count the uncountable.
In the early 1800s, the king of France summoned Poisson. His highness wanted to know how many deer there were in the Royal Forest. It seemed that the royal gamekeeper had a sharp eye but a dull mind: He could easily recognize individual deer after seeing them just once (yeah, we didn't believe it, either--but it made the story better). When could the poor rustic stop counting the deer, Poisson was asked. And how certain could the king be that his count was accurate?
Poisson came up with this solution: He told the gamekeeper to search the forest every day and tally every new deer he found. He cautioned the man not to add to the count when he came across any deer he had previously seen. Of course, there were lots of first-time sightings on the first day. The cumulative total increased only modestly on the second day. Every day thereafter, the total grew by smaller and smaller amounts.
At the end of each day, Poisson dutifully plotted each updated total on a graph. Then he connected the dots, creating a line that grew at a progressively slower rate and tended to reach what mathematicians call a "fixed upper limit"--the point at which one could safely predict that the chances of finding a previously undiscovered deer would be overwhelmingly small.
Roughly speaking, what Poisson is said to have done with the king's deer, Paxton has done with creatures of the sea. Paxton consulted the 10th edition of Carolus Linnaeus's "Systema Naturae," first published in 1758, and identified all saltwater animals more than two meters long. He found about 100 species. Then he searched subsequent scientific literature through 1995 and recorded the discovery by year of new species that met his "monster" test.
He found that, by 1995, the overall number of really big sea creatures had reached 217. But the rate of discovery had dropped dramatically (we're currently discovering one new big marine animal on average every 5.3 years, Paxton reported). He then graphed the data between 1830 and 1995 and estimated its upper limit, which suggests there are probably about 47 more species of whoppers still eluding us in the world's oceans.
In reading the Poisson story, the first thing that came to our mind was that the remarkable ability of the gamekeeper to distinguish every deer he has seen suggests that the King might also have used the capture-recapture method used by the Census Bureau in the undercount problem in this way. The King asks the gamekeeper to make a serious effort to see as many deer as he could on a specific day. Think of the deer he saw the first day as tagged. Then on the next day he could make an even greater effort to see as many as possible. Suppose the gamekeeper sees 35 deer on the first day and 60 on the second day, noting that 20 of these 60 deer he also saw on the first day. Then, assuming there are N deer walking around the forest more or less at random, the proportion of tagged deer in the second day's sample, 20/60, should be approximately the proportion of tagged deer in the forest, 35/N. Thus the King can estimate the number of deer in the forest to be 3*35 = 105. Thinking about the validity of the assumptions made in this hypothetical problem suggests problems that the Census Bureau must adjust for in their real world use of the capture-recapture method in the Census 2000.
Let's turn now to solving the problems that the King and Paxton posed. In the deer problem, it might be reasonable to assume that each deer has the same probability of being seen in a day. However, for the monster fish problem the probability of seeing a particular species of large fish in a given year would surely depend on the species. Thus to have a solution that will apply to both problems we will let the probability of seeing a particular deer differ from deer to deer.
Poisson suggested plotting the numbers that the gamekeeper saw each day for a series, say, of 50 days. When you do this you get an increasing set of points that suggest that a curve could be fitted to them that looks a bit like a parabola. This curve would be expected to level off when all the deer had been seen and the value at which it levels off would be our estimate for the total number of deer in the forest. Our problem is to determine the curve that best fits this data.
Assume there are M deer in the forest which we label 1,2,3,...,M Let's assume that on a given day the gamekeeper sees deer j with probability p(j). Then the probability that he has not seen the ith deer after n days is (1- p(i))^n and so the probability that he has seen this deer after n days is 1- (1-p(i))^n. Thus the expected number of deer the gamekeeper has seen after n days is the sum of 1 - (1-p(i))^n for i = 1 to M. If we could estimate the probabilities p(i) this would provide a natural curve to fit to the data given by the gamekeeper.
The original deer problem would have all the p(j)'s the same, say p. In this case our expected curve is M(1-(i-p)^n). Of course, we do not know N and p. We can choose the curve that best fits the data by considering a range of M and p values and choosing the pair that minimizes the sum of the squares of the differences between the observed and predicted values. We carried out this procedure by simulation assuming M = 100 and p = .05 and the gamekeeper reported for 50 days. The best fit curve predicted the values M = 100 and p = .05 quite accurately.
For the species problem the p(j)'s are different, and so the problem is more difficult. Now we have M different values to estimate and less than that number of data points. Here researchers take advantage of the fact that the predicted curves look like parabolas. They replace this complicated expected value curve by the much simpler hyperbola of the form y = sx/(b-x) where x is the time observed and y the total number of species seen by this time. Now they again have only two parameters, s and b to estimate and again we can do this by choosing s and b to minimize the sum of the squares of the differences between observed and predicted values. To see how this works we simulated this experiment. We assumed that there are 100 different species and chose their probabilities of being seen in a given year at random between 0 and .1. We had the program estimate the best fit curve and estimate the total number of species from this curve. Using 100 species, we found that the resulting estimates were not too bad. However, the resulting hyperbolic curve did not fit the data very well. Even if it did fit the observed points, different models could fit just as well but give quite different limits. Curve fitting without an underlying model is a risky business.
The work of Fisher referred to by Richard Morin was for a slightly different problem. His problem can be described as follows: the King now wants to know how many species of butterflies there are in his forest. He asks the gamekeeper to go out one day with his net and see how many species of butterflies he catches. The gamekeeper does this and reports that he caught 30 butterflies with 5 different species of butterflies with numbers (12,7,5,3,5) for the different species. Fisher asked: how many new species would the gamekeeper expect to find on a second day of catching butterflies?
I. J. Good later gave an elegant new solution of Fisher's problem. Here is Good's argument.
It is natural to assume that number of butterflies of a particular species that enter the gamekeeper's net has a Poisson distribution since typically there are a large number of butterflies of a particular species and a small probability that any one would enter the gamekeeper's net. Let m be the mean number of species s caught in a day. Then the probability that no butterfly of species s is caught on the first day is e^(-m) and the probability that at least one such butterfly of species s is captured on the second day is 1-e^(- m). Therefore the probability that no butterfly of species s is captured on the first day and at least one is captured on the second day is:
Using the series expansion for e^(x) in the second term of this product we can write this probability as:
e^(-m)(m - m^2/2! + m^3/3! - ...)
Now Good assumes that the mean m for a specific species is itself a chance quantity with unknown density f(m). Integrating our last expression with respect to f(m) we find that the probability that species s is not seen on the first day and at least one is captured on the second day is:
P(X = 1) - P(X = 2)+ P(X = 3) - ...
where X is the number of times species s is represented in the first day's capture. Summing this expression over all species we find that:
The expected number of new species found on the second day is
e(1) - e(2) + e(3) - ...
where e(j) is the expected number of species with j representatives on the first day. The e(j)'s can be estimated by the number of species represented j times on the first day's capture. Doing this we obtain an estimate for the number of new species we will find on the second day.
We have assumed that the sampling time was the same on the two days. If it is different, then our final sequence becomes
we(1) - w^2e(3) + w^3e(3) - ...
where w is the ratio of the time spent sampling on day 2 to the time spent on day 1.
The same procedure can be used to find an expression for the expected number of species in the second sample that occurred k times in the first sample. The result is
wC(k+1,k)e(1+k) - w^2C(k+2,2)e(2+k) + w^3C(k+3,3)e(3+k) - ...
where C(n,j) is the number of ways to choose j elements from a set of size n.
Thisted and Efron used this method to determine if a newly found poem of 429 words, thought to be written by Shakespeare, was in fact written by him.
They interpreted a species as a word in Shakespeare's total works. The sample on the first day corresponds to the 884,647 words in known Shakespeare works. The words in the new poem constituted the second sample. They then used Good's method to estimate the number of words in the poem that were used k times in the previous works. Then they compared these estimated values to the actual values.
In this example, w = 429/884647 is sufficiently small that only the first term wC(k+1,k)e(1+k) in our series need be used.
From this formula, we see that the expected number of words that were not used at all in previous works is we(1). There were 14,376 words in Shakespeare's original works that were used once. Thus we estimate e(1) = 14376/884647. This gives an estimate 7 for new words in the poem (the actual number was 9). In the same way we estimate that there should be 4 words that were used once in the previous works (the actual number was 7), and 3 words that should have been used twice (the actual number was 5. The authors concluded that this fit suggested that the poem was written by Shakespeare.
They made similar estimations for other Elizabethan poems by Shakespeare and other authors to see how well this method identified a work of Shakespeare. Their results showed that their method did distinguish works of Shakespeare from those of Donne, Marlowe and Jonson.
The use of simulation in these models highlights the fact that in statistics we are usually trying to estimate a quantity given incomplete information. If we set a lot of gamekeepers to gather the data we would typically get slightly different answers using the data from the different gamekeepers. We can see how much variation to expect by simulating this experiment.While this example would take a couple of days to discuss in a Chance course, it uses nothing more difficult than the coin- tossing probability model and the Poisson approximation of this model.
This work is freely redistributable under the terms of the GNU General Public License as published by the Free Software Foundation. This work comes with ABSOLUTELY NO WARRANTY.