!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

CHANCE News 6.13

(10 November 1997 to 25 December 1997)

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

Prepared by J. Laurie Snell and Bill Peterson, with help from Fuxing Hou, Ma.Katrina Munoz Dy, and Joan Snell, as part of the Chance Course Project supported by the National Science Foundation.

Please send comments and suggestions for articles to jlsnell@dartmouth.edu.

Back issues of Chance News and other materials for teaching a Chance course are available from the Chance web site:

http://www.geom.umn.edu/locate/chance
or


For best results use 12 point currier font.
===========================================================

Where are they?

Enrico Fermi
===========================================================

Contents

Note:

There will be a Chance Workshop at Dartmouth College this summer from July 7 to July 11, 1998. The workshop is for college teachers interested in learning how we teach a quantitative literacy course called Chance. This workshop is supported by the NSF. You will find a description of the workshop and an application form at the end of this newsletter and also on the Chance web site.
<<<========<<




>>>>>==============>
Jane Millar called our attention to a nice set of probability and statistics lessons for K-12 students. These lessons can be found at the Office for Mathematics, Science and Technology Education web site MSTE maintained by the University of Illinois Education Department. These exercises make use of the resources of the web. For example, lessons on weather use graphs obtained from the National Center for Atmospheric Research.
<<<========<<




>>>>>==============>
In the last Chance News (Chance News 5.12) we discussed Will Lassek's use of census data to verify that the probability of 1/3 for having a two child family having two boys given that it has at least one boy. Italian staticisian Paolo Fina sent us the following interesting comment on this problem.
When I studied demography at the University they told me that there are about 105 male newborns for each 100 females. This ratio should be constantly observed in the long run (it is really true?) in different races, countries, social conditions etc. Hence the probability to have a male rather than a female is not exactly 50 vs 50 but about 51.2 vs 48.2 and consequently the woman's probability to have two boys should be 34.4.

In addition, taking into account the different mortality rates among males and females, we might estimate a value a little bit lower than 34.4 but probably greater than 33.3. In conclusion, in my opinion, Lassek's data are really very close to those predicted. The smaller-than expected percentage (22%) of two-children families with two girls could be partially explained too. In fact, the expected percentage is not 25% but about 23.8%.

Regards

Paolo

The sex-ratio does vary a bit from country to country and through time. We happened to have U.K. data and note that in 1995 it was 1.052 in the U.K.
<<<========<<




>>>>>==============>
California goes to war over math instruction.
New York Times, 27 Nov. 1997, A1
Milt Freudenheim

This article discusses the controversy in California over whether it is better to go back to the basics with lots of drill and practice or on to the new-math approach of encouraging students to think about what mathematics, how it is done, and how it relates to the real world. Of interest to us is their example of a typical question in the new-math. The student is shown a picture of three pairs of pennies: the first pair showing both heads, the second a head and a tail, and the third two tails. The following problem is posed:

Binky says that if you flip a coin twice, there are exactly three possible outcomes -- two heads, one head and one tail, and two tails -- and so the probability of getting two heads is 1/3.

Explain why she is wrong. Make your explanation as clear as you can, using diagrams as needed.

The answer given was:
It helps to visualize the coin-toss problem by using two different coins, say a penny and a nickel. Toss the penny once and the nickel once, and there are four possible outcomes: two heads; two tails; penny head, nickel tail; and penny tail, nickel head. Each outcome has an equal chance: one in four, not one in three.
The famous 18th century scientists Jean d'Alembert wrote several treatises on probability, but, unfortunately, he is best remembered in the history of probability as the one who agreed with Binky that the chance of getting two heads when you toss a coin twice is 1/3.

The author of the solution to this new-math problem missed a chance to emphasize the relation of mathematics to the real world by pointing out that the choice of probability measure should reflect information about the particular physical model studied, in this case tossing a coin twice. For example, it can be observed that, if you toss a pair of coins many times, two heads will turn up about 1/4 of the time. Galileo was led to describe the right model for rolling a die three times by trying to explain why gamblers had found that betting on a sum of 10 was more advantageous than betting on a sum of 9 even though each sum is produced by 6 triples. In his classic probability book, William Feller, after showing that different probability measures were needed to describe the behavior of photons and protons, remarks:

We have here an instructive example of the impossibility of selecting or justifying probability models by apriori arguments. In fact, no pure reasoning could tell that photons and protons would not obey the same probability laws.
In an attempt to illustrate Feller's principle in terms of the coin-tossing example, we asked a physicist to give us an example which might look similar to the coin tossing problem but where the correct answer is 1/3,1/3,1/3 for the analogs of 0 heads, 1 head and 2 heads. He provided the following example:
In the nucleus of a deuterium atom, there is one proton and one neutron. If one picks a z-direction, then each of these two particles has either spin up or spin down in that direction. It turns out that, if the spins of the two particles are measured, then there are only three possibilities, namely both spins up, both spins down, or one spin in each direction. Furthermore, the probabilities of these three possi- bilities are observed to be 1/3, 1/3, 1/3.
At first glance, this seems to be just what we asked for. If we had assigned the coin tossing measure, the probabilities of the three observed states for the pair of spins would have been 1/4,1/2,1/4. However, a knowledge of the laws of quantum mechanics suggests that the probabilities 1/3,1/3,1/3 are more appropriate. However further discussions with colleagues about this example caused more heat than light.

DISCUSSION QUESTIONS:

(1) What do you think about the solution given by the author of the two-coin problem?

(2) Do you think that you could justify the probability measure for coin tossing "on apriori grounds"?

(3) A physicist argued that the deuterium example bears no relation to the two-coin problem. In the two-coin problem it is possible to distinguish between the two coins spatially or otherwise. In the deuterium problem the uncertainty principle makes it absolutely impossible to distinguish which particle is a proton and which is a neutron. Thus this example is more like the problem of spinning an architect's ruler and seeing which of the three possible faces it comes to rest on. What do you think of this?
<<<========<<




>>>>>==============>
Fred Hoppe wrote us that the famous ³birthday problem² appeared on a CBC radio program called Quirks and Quarks . Fred wrote to Quirks and Quarks giving the answer and pointing out that, in general, for n possible birthdays, you need on the order of sqr(n) people for a duplication (In fact, more specifically 1.2 sqr(n) is a quite good approximation. For n = 365 it gives 22.9.) Fred got the following reply:

Dear Professor Hoppe,

I think you might have done us a large favor. Perhaps you can confirm. I've received a lot of answers, about 40% of them have corresponded to yours (ie. 23, apparently calculated the same way). Another 40% have been 20, and this is the answer the proposer has given us. He calculated it as follows:

After the first person has told me their birth date, the second person I ask will have one chance in 365 of having the same date as the first. The third per- son could have the same birthday as the first, or the second person, so with them I have 2 in 365 chances. Added to the chance from the second person, I have now had a total of 3 in 365 chances. By the same logic the fourth person has three chances, and we need to add the three from the previous one to make six, and so on. By the time we exceed 183 chances in 365, which is pretty much our 50-50 probability, we will have asked just 20 people.

I had just assumed that we'd got a lot of people making the same error (getting 23). But when a PhD in Stats comes up with the same answer, I think I might have a problem.

Can you just let me know what's going on here? Has the proposer made a mistake? If so, I'd like to save him (and us) some embarrassment.

Fred explained what was going on and for his efforts he received two CBC t-shirts for himself and his wife (the prize for the best answer was such a t-shirt). Fred asked the host of the program to send him the solutions when they were all in so that we might see why the answer of 20 was so prevalent.

Fred received the solutions and commented:

After adjusting for correct reasoning but with numerical errors: 24 responded with the answer 20, 33 responded with the answer 23, and 23 responded with different answers of which the plurality came up with something near 183.

Of course 183 is a familiar wrong answer. However, it is interesting to ask why the answer 20 was so prevalent. Those who arrived at 20 as the solution calculated the probability in essentially the same way that the proposer did finding the smallest n such that (1 + 2 + 3 + ... + n)/365 > 1/2. The most likely explanation for this error is that they were using the inclusion exclusion formula and used only the first term. However, Peter Doyle suggested a more charitable explanation. He writes:

20 is the number of people so that the expected number of birthday-coincidences is > 1/2 and someone who sets out to solve the birthday problem might easily solve the birthday-coincidences problem instead.
Several of the solvers remarked that they had fond memories of this problem from a probability course that they had taken long ago.

DISCUSSION QUESTIONS:

(1) What would be your explanation why there were so many answers of 20?

(2) Why is the proposer's solution the correct solution for the smallest number n such that the expected number of matches is greater than 1/2?

(3) Fred pointed out that if you calculate the probability of no match for n people by assuming (incorrectly) that the pairs are independent, you get .5243 for n = 22 and .4995 for n = 23 showing that 23 is sufficient for a favorable bet that two people have the same birthday. Why does this, in fact, prove that n = 23 is sufficient for a favorable bet?
<<<========<<




>>>>>==============>
Charles Grinstead suggested the following article and provided the abstract of the article.

Ruling out a meteorite.
The New York Times, 12 Dec. 1997, B6
Matthew L. Wald

This article discusses the possibility that T.W.A. Flight 800 was brought down by a meteorite. Before proceeding further, the reader might want to think about how one might estimate the probability of a given plane being hit by a meteorite.

Dr. William A. Cassidy, a professor of geology and planetary science at the University of Pittsburgh, was asked by the National Transportation Safety Board to estimate this probability. He found 14 known incidents in this century of houses being hit by meteorites. The article is not completely clear on this, but it appears this means 14 houses in the U.S. Next, he calculated the total roof area, to estimate how many meteorites hit a given area in a given year. Finally, he calculated, from data provided by the safety board, the target area of all planes in the air, taking into account the average number of hours that each plane is in flight.

Dr. Cassidy concluded that `the expected frequency of meteorites hitting planes in flight is one such event every 59,000 to 77,000 years.

DISCUSSION QUESTIONS:

(1) Using Dr. Cassidy's estimate, can you give a rough estimate of the probability that a given plane will be disabled by a meteorite? What other pieces of information would one need to make this estimate? Do you think that this is a more relevant probability when considering the fate of a specific flight such as the T.W.A. 800 flight?

(2) This problem was addressed previously in Chance News 5.11 (New York Times, 19 Sept. 1996, A26). In a letter to the editor Charles Hailey and David Helfand wrote:

Approximately 3,000 meteorites a day with the requisite mass (to cause a crash) strike Earth. There are 50,000 commercial airline takeoffs a day worldwide. Adopting an average flight time of two hours, this translates to more than 3,500 planes in the air; these cover approximately two-billionths of Earth's surface.

Multiplying this by the number of meteorites per day and the length of the era of modern air travel (30 years) leads to a 1-in-10 chance that a commercial flight would have been knocked from the sky by meteoric impact in the last 30 years.

Estimate the fraction of the earth that is covered by rooftops. Assume that meteorites with sufficient mass to cause a crash would also cause enough damage to a house to be reported. Using the estimate of 3,000 such meteorites a day, estimate the number of houses hit per day by meteorites. Is this consistent with Cassidy's estimation?

Assume that the Hailey and Helfand estimate for the coverage of the earth by airplanes is reasonable and assume also that Cassidy's estimate of 14 for the number of houses hit by meteorites in the past 100 years in the U.S. is correct. From this, estimate the number of planes hit by meteorites in the last 100 years. Is this consistent with Cassidy's estimation?

What can you conclude from all this?
<<<========<<




>>>>>==============>
Earth's past offers chilling global warming scenarios.
The Boston Globe, 2 December 1997, pA1.
David L. Chandler

The article reports that a number of recent climatological studies have used the earth's geologic record to predict the effects of future global warming. Emerging evidence indicates that at the end of the last ice age, about 11,600 years ago, worldwide temperatures changed at a much faster rate than that predicted by today's computer forecasts for global warming. In some locations the average temperatures changed by as much as 18 degrees Fahrenheit over a period of several decades. This contrasts sharply with computer models that show global warming from greenhouse gases unfolding gradually over the next century. The record from the past shows that some sort of threshold was crossed, beyond which changes of devastating proportions took place very rapidly.

In a recent article in "Science", climatologist Wallace Broeker of Columbia University argues that abrupt changes in ocean currents are the only possible explanation for such dramatic changes. This point of view is consistent with an MIT study published a few weeks ago in "Nature", which analyzed sediment cores drilled from the ocean floor off Bermuda. It concluded that climatic shifts are related to "deep water reorganizations" and that these can occur in a few hundred years or less. And three months ago, an article in "Nature" reported computer simulations showing that greenhouse gas buildup in the range projected for the next century could be sufficient to shut down an ocean circulation pattern known as the Atlantic Conveyer.

What would be the effects of such a shutdown? So far, no models have been developed that include such complexities--in fact, none can reproduce the dramatic changes now apparent in the past record. But Ronald Prinn of MIT says that such a shutdown would mean that "all bets are off" as to the trajectory of global climate change. Noting that some ice age models start with warming, he adds that it is even conceivable that the shutdown could trigger a new ice age. Wallace Broeker concludes: "There's no way to predict whether this is going to happen. We can get some indication that we're approaching the edge of the cliff, but there's no way to say if it's 1 chance in 2 ... or 1 in 200. It's Russian roulette where you don't know how many chambers are in the gun."

DISCUSSION QUESTIONS:

(1) Some groups opposed to curbs on greenhouse emissions have argued that adjustments to gradual warming would be manageable; some have gone so far as to suggest that the warming might be beneficial. Do you think the more dramatic scenarios described here will change any minds?

(2) Would it make a difference if we knew for sure how many chambers were in the gun?
<<<========<<




>>>>>==============>
Jeanne Albert suggested the next item and provided the abstract and discussion questions.

Sampling is not enumerating.
The New York Times, December 7,1997, A23
William Safire

William Safire is not happy about the Census Bureau's upcoming plans for using sampling as part of the year 2000 census.

He starts by observing that "as elections demonstrate, a poll is an educated guess and not a hard count," and that, "Often pollsters are mistaken." He then describes how "polling warps politics", and for evidence cites the results of several polls taken prior to the 1996 Presidential election that put Clinton from seven points ahead of Dole (Zogby for Reuters), to 12 points ahead (ABC and NBC/Wall Street Journal), on up to 18 points ahead (New York Times/CBS). Mr. Safire contends that early polls such as the latter two kept Republicans from voting. "On Election Day," he writes, "the actual enumeration showed Zogby alone to be within one point of accuracy."

He then turns to the upcoming year 2000 census, and claims that "Liberals want to replace, or 'augment,' laborious counting with the educated guesswork of sampling." It is clear that he believes that the decision to use sampling techniques is politically motivated, and that the goal is to "pick up a dozen House seats and increase spending on the poor." It is not clear, however, if he has a complete understanding of how this will be done, or indeed of how sampling will be used at all. He merely states that after doing a sloppy head count, Democrats will somehow "redo slums with a vengeance" and then "extrapolate those redone samples to skew--or 'weight'--the earlier count." Meanwhile, he gives several suggestions for improving an enumerative count, such as improving mailing lists and training census takers to be better at finding the homeless.

DISCUSSION QUESTIONS:

(1) Do you think a poll is an "educated guess"? Are the poll results cited evidence that "Often pollsters are mistaken"? How would knowing the margins of error affect your answer?

(2) Since Mr. Safire claims that the early polls showing Clinton far ahead kept Republicans from voting, in what sense was the Zogby poll "within one point of accuracy"?

(3) Should sampling techniques be used as part of a census? If you assume that an accurate head-count is impossible, how might you augment an enumeration with sampling to improve the results?

(4) Do you think that the errors involved in polling for public opinion and in sampling to improve a census are comparable?

(5) In a letter to the editor (New York Times, 9 Dec., 1997, A28) James Beniger writes:

The Oxford English Dictionary traces the word "enumeration" from 1577 and defines it as "the action of ascertaining the number of something," without further specifying what that action might be. Article I, Section 2 of the Constitution leaves the methods for the "enumeration" it mandates open to "such manner as they, Congress, shall by law direct".
Based on this, Beniger comments that Safire's definition of enumeration expressed when he writes:
This (sampling) flies in the face of the U.S. Constitution, which in Article I calls for an "actual Enumeration," with a capital E -- which means "counting one by one."
"doesn't wash".

Do you agree?

In another letter to the editor Joseph Raben writes:

William Safire's attempt to discredit census sampling by comparing it to pre-election polling is based on an inaccurate comparison. A more reasonable association would be with exit polling, where information is collected about actual behavior, not prediction about future actions (which those polled may not wish to disclose).

<<<========<<




>>>>>==============>
Do you think that exit poll comparison is reasonable?
DNA fingerprinting comes of age.
Science, 21 November 1997, 1407

This article states:

At a press conference last week, the Federal Bureau of Investigation announced that, for the first time, their experts will be permitted to testify that DNA from blood, semen, or other biological evidence at a crime scene came from a specific person. ... The new policy states that if the likelihood of a random match is less than one in 260 billion, the examiner can testify that the samples are an exact match.

The FBI lab provides the probability that a randomly chosen person has the same DNA fingerprint as that found at the scene of the crime. By the assumption of independence between alleles, numbers as small as 1 in 260 billion can easily occur.

As the famous Collins case showed, the use of such probabilities in a court case has to be done with care.

Used as evidence, the probability of a random match, should mean the probability that, in the population being sampled, there are two or more people with a particular DNA fingerprint, given that there is at least one person with this DNA fingerprint. This probability can be obtained by the Poisson approximation as (1-e^-m - me^-m)/(1-e^-m) where m = np with n the number of people in the population being sampled and p the probability that a randomly chosen person's DNA fingerprint will match that found at the scene of the crime.

For example, suppose that we are considering a case in Los Angeles and assume the relevant population is n = 10 million and the probability p that a random person's DNA matches the DNA found at the scene of the crime is 1 in 260 billion. Then the Poisson approximation gives the estimate 1/1000 for two or more matches given at least one match. While this is small, it would not be considered impossible according to the FBI criteria (a probability less than 1 chance in 260 billion).

DISCUSSION QUESTIONS:

(1) The article states that:

Even in O.J. Simpson's trial, prosecutors could only say that the odds were billions to one that blood found at the scene was not O.J.'s.
Do you really think the prosecutors said that!

(2) Is the FBI saying, in effect, that something that happens with probability less than 1 in 260 billion is impossible? How do you think the probability 1 in 260 billion was chosen?

(3) Assume that the probability a randomly chosen person has the same DNA fingerprint as that found at the scene of the crime is 1 in 260 billion. What is the probability that there is a random match in the whole world, assuming a world population of about 6 billion?

(4) What effect to you think this FBI policy will have on actual court trials? Should it have this effect?
<<<========<<




>>>>>==============>
Norton Starr wrote us about an interesting item that he came across in his copy of RSS News, a publication of the Royal Statistical Society. He writes:

In their Forsooth! column, November issue, the following text from the June, 1997 Good Housekeeping (British edition) is given:

Women may be shorter and lighter than men but they're no longer the weaker sex. In every age group 100 men die for every 64 women.

The quoted material appears as the very first sentences in a multipage article entitled "Why Women are Different". Moreover, just after "for every 64 women", the second sentence continues with "and life expectancy is now 72 years for men and 78 for women." Two questions come to mind: Can one, and if so how would one, estimate the differing life expectancy given the data provided? What are likely explanations for the real meaning or intent of the 64 and 100 deaths - i.e. to what do those data actually refer, assuming they are accurate mortality numbers of some sort?

Charles Grinstead suggested that the authors of the Good Housekeeping article were really saying that, for all age groups, the death rates for women are 64% of those for the men. He interpreted Norton's first question as asking: if the death rates for men are known and we assume the women's death rates 64% of those of the men, do we get the expected lifetimes for males and females in agreement with their true values? Charles verified this using the data for the U.S. from the 1990 census given in Grinstead and Snell's book "Introduction to Probability". (He replaced the 64% by 60% which was more appropriate for the U.S. data.)

We checked the 64% assumption for U.K. data by looking at a mortality table for the U.K. in 1995. We give below such a table for five-year age groups. A mortality table starts with 100,000 people at age 0 and estimates the number that will be alive at each successive age. The five-year male and female death rates (mdr and fdr) are easily obtained from such a mortality table. For example, the death rate for the five-year period starting at age forty for males is (96710-95725)/96710 = .01. Calculating these mortality rates and their ratio we have:

year  male   female   mdr   fdr  fdr/mdr

 0   100000  100000  .008  .006  .795 
 5    99185   99352  .001  .001  .736 
 10   99105   99293  .001  .001  .695 
 15   99003   99222  .003  .001  .438 
 20   98714   99095  .004  .002  .362 
 25   98296   98943  .004  .002  .421 
 30   97864   98760  .005  .003  .515 
 35   97360   98498  .007  .004  .601 
 40   96710   98103  .010  .007  .666 
 45   95725   97438  .016  .011  .682 
 50   94223   96396  .028  .018  .643 
 55   91607   94675  .047  .028  .601 
 60   87315   92010  .080  .048  .594 
 65   80293   87617  .135  .080  .591 
 70   69449   80627  .210  .128  .608 
 75   54848   70313  .315  .201  .638 
 80   37573   56187  .449  .317  .706 
 85   20691   38354     
We see that the assumption, that the death rates for women are about 64% of those for men, is not a bad assumption. It is not good for women in their early 20's where the death rates for men are about three times those for women, but there are not many deaths here anyway, so this is not too important. From this data, our estimate for the life expectancy of men is 73 years and for women it is 78 years, in reasonable agreement with the estimates given by the authors of the Good Housekeeping article.

DISCUSSION QUESTIONS:

(1) In both the U.S. data and the U.K. data there are roughly 3 times as many deaths for males in their early twenties as there are deaths for females in this period. Why do you think this is?

(2) Grinstead remarked that, if the death rates for men were constant and the death rates for women were 60% of those for men, then the expected lifetime for men would also be 60% of the expected lifetime for women. Why is this true? Of course, with any actual data set for a human population, it will never be true that the death rates for either gender will be constant. (For example, one can certainly see that the death rate is not constant for the U.K. data set.) For the U.K. date, the expected lifetime for men (73 years) is 94% of that for women (78 years). How might you explain this small difference despite the large differences in death rates?
<<<========<<




>>>>>==============>
Too few women are aware of their heart risk, study finds.
The Boston Globe, 19 November 1997, pA1.
Reuters

A joint study by the National Council on the Aging and the Center for Risk Communication finds that women remain unaware that heart disease represents their leading cause of death and consequently miss opportunities for prevention. The findings are based on work by the Center for Risk Communication with six focus groups of women across the country, and also on a telephone survey of 1000 women conducted by Opinion Research Corporation. While 61% of women surveyed said cancer was the disease they feared most, only 9% named heart disease. According to Vincent Covello of the Center for Risk Communication, "the anxiety and dread of breast cancer encourages many women to neglect other serious risks to their health and life."

The study faults the media and doctors for not getting the right information to women. More than 80% of women in the survey said they got their information on health from the news media, and fewer than 30% asked doctors or nurses for additional information.

DISCUSSION QUESTIONS:

(1) Does the fact that women fear cancer more than heart disease necessarily mean that they don't understand the relative mortality risks?

(2) The article gives the following figures: "Women have a one- in-eight lifetime risk of getting breast cancer if they live beyond age 85. The risk is one in 17 at age 65 and one in 50 at age 50. One in three women over the age of 65 have clinical evidence of heart disease." How does this clarify the issues for women?

(3) How do you think information received from the focus groups differs qualitatively from that received in the telephone survey?
<<<========<<




>>>>>==============>
The myth of millions of annual self-defense gun uses:
a case study of survey overestimates of rare events.
Chance Magazine, Vol. 10, No. 3, Summer 1997, pp. 6-10
David Hemenway

G. Kleck and M. Gertz, in their article "Armed resistance to crime: the prevalence and nature of self defense with a gun", Journal of Criminal Law and Criminology, vol. 86, no. 1, estimate that civilians use guns against offenders more than 2.5 million times each year. This figure resulted from a national random- digit-dial survey of 5,000 dwelling units. Slightly over 1% of the respondents reported that they had used a gun in self-defense in the past year. This leads to the 2.5 million estimate in the entire population of 200 million adults.

Hemenway points out a number of reasons that this figure is absurdly high. He uses this survey to motivate his discussion of how polls, related to the occurrence of rare events, can be significantly biased by reporting errors. He points out that the situation is essentially the same as the well known false positive paradox in medical testing.

The false positive paradox for medical testing results from the following situation. A medical test can appear to be very accurate in the sense that the chance for a false positive test or a false negative test is only 1%. Yet if the disease is rare, a positive test can indicate only a 50% chance that the person has the disease. Assume that in a population of 20,000,000 people 200,000 have the disease being tested for. Then the positive tests will be made up of the approximately 200,000 (1% of 20,000,000) who do not have the disease but test positive and the approximately 200,000 (99% of 200,000) who do have the disease and test positive correctly. Thus, a randomly chosen person from the population with a positive test has only 50% chance of having the disease.

Hemenway illustrates the corresponding situation in polling for a rare event. He assumes that a random sample of 2000 is chosen from a population of 200,000,000 adults and they were asked if they used a gun a self defense during the past year. Assume that the truth is that .1% of the population (200,000) used a gun in self defense in the past year. Then we would expect about 2 in the sample of 2000 to answer yes and the rest to answer no. Assume now that there is a 1% chance of a reporting error both for those who should say yes and for those who should say no. With this reporting error, we would still expect about 2 from those who should answer yes and an additional 20 from the 1998 who should say no, giving a total of 22 who answer yes. This would lead to an estimate of 200,000,00*22/2000 = 2.2 million which is a ten- fold increase over the number 200,000 who actually used a gun in self defense in the past year.

The author discusses other examples. For example, in May 1994 ABC and the Washington Post carried out a national random-digit-dial telephone survey of over 1,500 adults that led to the extrapolation that 20 million Americans have seen alien spacecraft, and 1.2 million have been in actual contact with beings from other planets!

DISCUSSION QUESTIONS:

(1) In the gun example, would you expect more reporting errors among those who had not used a gun in self defense or among those who had used a gun in self defense?

(2) What other kinds of polls of rare events can you think of that might result in significant biases because of reporting errors?

(3) Do you think margin of error in a poll should take into account reporting errors?
<<<========<<




>>>>>==============>
Behavior briefs: Getting theirs.
The Boston Globe, 24 November 1997, pC3.
Dolores Kong

The November issue of the "Journal of Personality and Social Psychology" reports research into greedy behavior--situations where individuals or groups take more than their fair share of limited resources. One goal was to determine whether greed is intentional or an instinctive part of human nature.

In one part of the experiment, a group of college students were told that they were sharing a box of sand and were instructed to take out what they considered a fair portion. However, they were also told that they would received $2 for every pound they scooped. This was a test for intentional greed. Another part of the experiment looked for unintentional greedy behavior. Students were told that they would received $10 if they accurately scooped a specified amount of sand. No mention was made of sharing with others.

It turned out that in either scenario, students took more that their portion of sand. Apparently people tend to take too much whether they intend to or not. In the first scenario, some students justified their behavior with comments like "I need the money." And some, when they sensed that others were over- scooping, decided to take even more.

The researchers draw implications for policy-making. They suggest that rationing is likely to be inferior to "resource credit" schemes that encourage people to consume less.

DISCUSSION QUESTIONS:

(1) If you were told to collect 10 lbs of sand for some reason, why might you conclude that erring on the low side was worse than erring on the high side. Would you necessarily describe this as "unintentional greed?"

(2) How do you suppose "accuracy" was defined for the second part of the experiment? Make some suggestions, and consider how they might affect the results.
<<<========<<




>>>>>==============>
AIDS scourge is growing, UN reports.
The Boston Globe, 26 November 1997, pA1.
Richard A. Knox

The United Nations agency UNAIDS estimates that 31 million people worldwide are infected with HIV, 27 million of whom are unaware of their infection. [Contrast this with the US situation reported in Chance News 6.11: the Centers for Disease Control and Prevention estimate that 775,000 Americans carry HIV, of whom 500,000 know their status.] The virus is spreading at the rate of 16,000 new infections per day. This year, 2.3 million people will die of AIDS, which now surpasses malaria as the leading cause of death.

The agency laments an apparent complacency about AIDS by western countries, where the perception is that the epidemic is now in check and that the disease itself is becoming increasingly manageable. However, the number of new US infections in 1997 is expected to be 44,000, unchanged from 1996. Furthermore, for most of the world's infected people, the expensive drug regimens used in the US are hopelessly out of reach.

DISCUSSION QUESTION:

Reporting what it calls "the most striking AIDS disparity between developed and developing nations" the article notes that worldwide there are 1600 new infections each day among children under the age of 15, whereas in the US preventative drug treatment has reduced infections among newborns to under 500 per year. What do you think of this comparison?
<<<========<<




>>>>>==============>
Mammogram debate seen as divisive, unproductive: Journal says rivalries mar screening policy.
The Boston Globe, 1 December 1997, pA3.
Reuters.

An article in the "Annals of Internal Medicine" warns that the scientific discussion about mammograms for women in their 40s has been distorted by gender rivalry and single-issue advocacy groups.

There is general agreement that mammograms for women over 50 are effective. The resulting early detection of cancer is credited with reducing breast cancer mortality by 30%. The picture is much less clear for women 40-49, and the last several years have seen bitter debate over recommendations for screening and insurance coverage. In March, the National Cancer Institute recommended screening every one to two years in this age bracket, but other groups, including members of the American College of Physicians, maintain that the recommendation is not backed by available evidence. Some health officials say the debate has become so divisive that medical trials to compare effectiveness of mammograms in different age groups could not be carried out in the US.

The article reports that a California benefit-cost study, based on a $106 for the cost of a mammogram, estimated that annual screening for women aged 50-69 costs $21,400 for every life saved, compared with $150,000 for women in their 40s.

DISCUSSION QUESTION:

(1) The article gives the following figures from the Centers for Disease Control and Prevention: 180,000 US women will be diagnosed with breast cancer this year; and that 31,000 of these will be among the 20 million women aged 40-49; in all, 44,000 will die from the disease. What else needs to be reported for this to be meaningful?

(2) What do you think of dollars per life saved as a measure of effectiveness? What do you think would happen if years of life expectancy were used as the denominator?
<<<========<<




>>>>>==============>
A reader asked us what we knew about estimates for the probability of life on other planets. The answer was very little, but we found the following informative article on this topic.

The biological universe; book review.
Sky and Telescope, June 1997
Robert Jastrow

This is a review of:

The Biological Universe : The Twentieth-Century Extraterrestrial Life Debate and the Limits of Science Cambridge University Press, 1996. $54.95 at Amazon.com. Steven J. Dick

This book discusses much more than the question of estimating the probability of extraterrestrial life, but we limit our discussion to this question.

Evidently, very different answers are obtained for the probability of extraterrestrial life depending whether one is thinking as a biologist or as an astronomer. Jastrow writes:

Dick cites an estimate by physicist Harold Morowitz that the probability of creating a bacterium -- the simplest living organism - through random molecular collisions is 1 in 10^100,000,000. Fred Hole raises this chance to a more optimistic 1 in 10^40,000. Biochemist Rober Shapiro estimates that the probability of chance formation of a short strand of self-replicating RNA is considerably greater - as "large" as 1 in 10^992.

Jastrow remarks that such numbers would force us to believe that we are indeed "alone".

Jastrow then discusses estimates by astronomers who argue statistically on the "Principle of Mediocrity" that there is nothing special about the earth, made up of common materials that can be found in many solar systems. Estimating the probability of life on other planets by astronomers has been inspired by the "Drake equation". The following discussion of this equation is found from the homepage of the SETI Institute.

How can we estimate the number of technological civilizations that might exist among the stars? While working as a radio astronomer at the National Radio Astronomy Observatory in Green Bank, West Virginia, Dr. Frank Drake (now President of the SETI Institute) conceived an approach to bound the terms involved in estimating the number of technological civilizations that may exist in our galaxy. The Drake Equation, as it has come to be known, was first presented by Drake in 1961 and identifies specific factors thought to play a role in the development of such civilizations. Although there is no unique solution to this equation, it is a generally accepted tool used by the scientific community to examine these factors. The equation is
                   N=R*fp*ne*fl*fi*fc*L 

    where, 

     N = The number of communicative civilizations.
         The number of civilizations in the Milky Way 
         Galaxy whose radio emissions are detectable. 

     R = The rate of formation of suitable stars.
         The rate of formation of stars with a large 
         enough "habitable zone" and long enough 
         lifetime to be suitable for the development 
         of intelligent life. 

     fp = The fraction of those stars with planets.
          The fraction of Sun-like stars with planets is
          currently unknown, but evidence indicates that
          planetary systems may be common for stars like 
          the Sun. 

     ne = The number of "earths" per planetary system.
          All stars have a habitable zone where a planet 
          would be able to maintain a temperature that 
          would allow liquid water. A planet in the 
          habitable zone could have the basic conditions 
          for life as we know it. 

     fl = The fraction of those planets where life develops.
          Although a planet orbits in the habitable zone of 
          a suitable star, other factors are necessary for 
          life to arise. Thus, only a fraction of suitable
          planets will actually develop life. 

     fi = The fraction life sites where intelligence develops. 
          Life on Earth began over 3.5 billion  years ago.
          Intelligence took a long time to develop. On other 
          life-bearing planets it may happen faster, it may 
          take longer, or it may not develop at all. 

     fc = The fraction of planets where technology develops.
          The fraction of planets with intelligent life that
          develop technological civilizations, i.e., technology
          that releases detectable signs of their existence 
          into space. 

     L = The "Lifetime" of communicating civilizations. 
         The length of time such civilizations release
         detectable signals into space.
 
You can find numerous web sites that give their own estimates for these parameters and permit you to put in your own choices for the parameters. One of the most interesting is the student web site SEDS.

It is said that Drake's current estimate for the parameters leads to an estimate for N of 10,000. The organization SETI (Searching for Extraterrestrial Intelligence) searches the skies, attempting to find radio signals from one of these civilizations. You can help in this search by going to The Search for Extraterrestrial Intelligence at Home and putting your name on a list to receive a special screensaver that will analyze data captured from the world's largest radio telescope. Your computer could be the first to record a message from another world!

An interesting account of the search for extraterrestrial life is provided by two astronomers engaged in the search, Dennis Mammana and Donald McCarthy, in their recent book "Other suns, Other Worlds?" (St. Martins Press, $17.47 from Ammazon.com.)

DISCUSSION QUESTIONS:

(1) Physicist Enrico Fermi supported the idea of extraterrestrial life, but, at lunch with a friend, he made the offhand comment: "Where are they?". This became called the Fermi paradox and caused a major reconsideration of the issue of extraterrestrial life. It even resulted in a conference called "Where are they?" How could such an offhand comment have so large an effect?

(2) Steven Dick writes:

Fully 58% of highly educated Americans, responding to a Gallup poll in 1973, affirmed their belief in intelligent life on other planets substantially unchanged by 1990.
In addition, thousand of people claim to have seen UFO's and many of these report that they have been carried off by these invaders (and returned). How should this affect our estimate of the probability of extraterrestrial life?
<<<========<<




>>>>>==============>
¤ ¤ ¤ ¤ ¤ ¤ ¤ ¤ ¤ ¤ ¤ ¤ ¤ ¤ ¤ ¤ ¤ ¤ ¤ ¤ ¤ ¤ ¤ ¤ ¤ ¤ ¤ ¤ ¤ ¤ ¤ ¤ ¤ ¤ ¤ ¤ ¤ ¤ ¤ ¤

CHANCE WORKSHOP APPLICATION

Dartmouth College

July 7 to 11, 1998

¤ ¤ ¤ ¤ ¤ ¤ ¤ ¤ ¤ ¤ ¤ ¤ ¤ ¤ ¤ ¤ ¤ ¤ ¤ ¤ ¤ ¤ ¤ ¤ ¤ ¤ ¤ ¤ ¤ ¤ ¤ ¤ ¤ ¤ ¤ ¤ ¤ ¤ ¤ ¤

Name:

University or College:

Address:

Phone:

E-mail:

The purpose of the Chance Workshop is to assist other faculty interested in teaching a quantitative literacy course, such as our Chance course, based on current events in the news that use probability or statistics concepts. For more information about the Chance course see the Chance web site. ()

Please type a short account of your own background in probability and statistics, interest in teaching a Chance course, and support you would have from your institution to do so. For the latter, a supporting letter from a chair or a dean would be very helpful.

The Chance course uses a variety of teaching techniques, including group learning, hands-on activities and journal writing, and work with Internet resources. It would be useful to know about any interest or experiences you have had with any of these. Of course, we do not expect that you have had experience with all of them, or else we would not be offering the workshop! Also, please describe the availability of computer resources for students at your institution, including statistics and simulation software packages, and Internet access tools. As far as possible, we want to tailor presentation to be applicable at your home institution.

Send this application by E-mail, FAX or ordinary mail to

J. Laurie Snell
Department of Mathematics
6188 Bradley Hall
Dartmouth College
Hanover, NH 03755-3551

E-mail: jlsnell@dartmouth.edu

FAX: 603-646-1312

The deadline for applications is March 15, 1998
Participants will be notified by April 15, 1998

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

CHANCE News 6.13

(10 November 1997 to 25 December 1997)

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!