CHANCE News 9.09

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

CHANCE News 9.09

August 10, 2000 to September 12, 2000

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

Prepared by J. Laurie Snell, Bill Peterson, Jeanne Albert, and Charles Grinstead, with help from Fuxing Hou and Joan Snell.

Please send comments and suggestions for articles to
jlsnell@dartmouth.edu

Back issues of Chance News and other materials for teaching a Chance course are available from the Chance web site:

Chance News is distributed under the GNU General Public License (so-called 'copyleft'). See the end of the newsletter for details.

Chance News is best read with Courier 12pt font and 6.5" margin.

The following quotes resulted from vice presidential candidate Joseph Lieberman's comment in a speech that George Washington warned us never to indulge the supposition 'that morality can be maintained without religion.'

===========================================================

There is no easy correlation between public piety and civic morality.
IRA GLASSER
I know religious people who I consider not to be moral, and I also know people who are not religious who I consider to be extremely moral. So, you know, I'm talking here about probabilities.
Joseph Lieberman

===========================================================

Contents of Chance News 9.09

1. Margin of error in a crossword puzzle.

2. It's poll time again!

3. Paulos on elections.

4. When is a poll a "dead heat"?.

5. Vegetarians are more likely to have daughters.
6. Test scores rise, surprising critics of bilingual ban.
7. Marilyn returns to a previous problem.
8. U.S. News's college evaluations questioned.
9. Positive trends hidden in SAT and ACT scores.
10. Which pattern would you bet on coming up first TT or HT?
11. Measuring the risk of the West Nile virus.
12. Is the pattern on the tile floor random?

Note: If you would like to have a CD-ROM of the Chance Lectures that are available on the Chance web site, send a request to

jlsnell@dartmouth.edu

with the address where it should be sent. There is no charge. If you have requested this CD-ROM and it has not come, please write us again.

We have discovered that the Chance CD-ROM's do not work correctly on the Macintosh if you have G2 or later versions of Real Player. If you have a problem the following should work. If, for example, you want to see the lecture on polls "How polls are really done? by David Moore, go to the folder called "polls" and find the file "POLLS.RM" and double click on this. It will bring up seperate windows for the video and the slides so you will have to position things to view them properly. Everything seems to still work on a PC but we would appreciate hearing from anyone who has problems. Everything still seems to work fine off the web. Of course, if anyone has figured a way to make the Mac version work on newer versions of Real Player we would really like to know about this!
<<<========<<

>>>>>==============>
John Lamperti sent us the following comment:

The August 13 New York Times Sunday crossword puzzle included a clue "Statistician's margin of error", with an 11-letter answer. The best idea I could come up with was PLUSORMINUS. Unfortunately, the NYT's opinion of us is somewhat different from our own self image. Last week I discovered that their answer was FUDGEFACTOR!
<<<========<<

>>>>>==============>
Poll-Axed?
STATS Statistical Assessment Service, September, 2000

Telling polls apart.
The Washington Post, 16 August, 2000, A35
Richard Morin

Iowa election markets.

Latest Gallup polls.

It's that time again, when statisticians and pollsters get a chance to talk about their favorite subjects. Every four years, people get excited by polls and predictions, as the presidential race lurches towards a conclusion. The first of these two articles points out that shoddy and misleading polling can confuse the public. As an example, they cite two polls, one showing George W. Bush with a margin of 50 to 41 percent over Al Gore, the other showing Bush ahead by 48 to 46 percent. The difference, in this case, was that the first sample was chosen among all adults, while the second was chosen among "likely voters."

The article also points out that a typical sample size is well less than 1000, which leads to 95 percent confidence intervals of radius around 5 percent. This means that, for example, if such a poll shows Bush ahead of Gore by 48 to 46 percent, then 95 times out of 100, such sampling would yield point estimates within 5 percent of the actual percentage who favor each candidate. But this means that we can only be fairly sure that the actual percentage that favor Bush is somewhere between 43 and 53 percent; the corresponding range for Gore is between 41 and 51 percent. So, it is entirely possible, and not even terribly unlikely, that Gore is actually favored over Bush in this example.

One point that is not made by this article in the above situation, but is worth noting, is that, as long as the point estimate for Bush is higher than for Gore, then one can say that it is more likely that Bush is favored over Gore than vice versa.

The article also points out that when pollsters break down their sample into subsets, such as 18-29-year-olds, then the sample size becomes much smaller in some cases, which can lead to large margins of error.

The second article tries to explain why there were such enormous swings during the summer in the polls in the presidential race. The easy explanation is that this is a "volatile" time in the race, with both of the major parties (and some minor parties as well) holding conventions. Morin says that the pollsters (including himself) shouldn't be let off the hook so easily. In these times of instant access to (and demand for) news, many polls are conducted overnight, or even in a shorter time. This leads in general to smaller sample sizes, and, more importantly, it skews the samples because at certain times during the day only certain types of people are accessible to the pollsters. Other surveys, such as those conducted on the Internet, do not even qualify as polls, since the participants are self-selected. Morin concludes by noting that, after Labor Day in a typical presidential race, the numbers calm down. Also, larger samples are indeed usually better than smaller ones; one good way to make your own larger poll is to combine the results of the smaller polls.

DISCUSSION QUESTIONS:

(1) What possible problems do you see in Morin's suggestion to combine a number of smaller polls?

(2) What kind of an analysis would you have to do to justify our statement: "As long as the point estimate for Bush is higher than for Gore, then one can say that it is more likely that Bush is favored over Gore than vise versa"?

(3) The latest Gallup poll reported September 17 that among the likely voters, if the election were held today 48 percent say they would vote for Gore and 42 percent for Bush. On the same day at the Iowa election market you can buy a share of stock for Gore for 51.7 cents and for Bush for 46.4 cents. (You get $1 for each share if your candidate wins.) Are the prices on the Iowa election market consistent with the Gallup poll? Do you think they should be?

(4) The Iowa election market has done a pretty good job in previous elections of predicting the final vote. This is certainly not a random sample. Why do you think it might still be a pretty good predictor?
<<<========<<

>>>>>==============>
Parties, platforms, and politics.
ABCNews.com, September, 2000
http://abcnews.go.com/sections/science/
John Allen Paulos

This article discusses some of the perils that candidates face when they decide whether to push a certain issue (and even, perhaps, which side to take on an issue). One might think that one would focus on issues where the candidate agrees with the largest subsets of voters. For example, suppose that a candidate is in favor of a certain side of an issue, and that 80 percent of the voters favor the same side of the issue. It would seem safe for the candidate to weigh in on this issue. However, suppose that most of the voters who oppose the candidate on this issue are "single-issue voters" on this issue, meaning that they do not care about any issue except this one. In this case, the candidate has lost almost 20 percent of the voters by taking a stand on this issue. It might be smarter for the candidates to try to determine which issues have a high number of single-issue voters, and if the numbers are large (on the opposing side), then the candidate should perhaps ignore the issue entirely.

Another problem that parties and candidates confront is that rankings of issues, in order of importance, by groups of voters are not necessarily transitive. This is a paradox that has been known for a long time. Suppose that there are three issues, labelled A, B, and C, say. Now suppose that a third of the voters think that these issues, in decreasing order of importance, are A, B, C. Suppose another third of the voters think that the order of issues is B, C, A. Suppose that the last third of the voters think the order is C, A, B. Then clearly two-thirds of the voters think that A is more important than B, and two-thirds of the voters think that B is more important than C. Thus, it would seem to be reasonable to assume that a majority of the voters think that A is more important than C. However, in fact two-thirds of the voters think that C is more important than A. The point here is that it might not be obvious to candidates or parties how they should rank the issues, in terms of how important these issues are to the voters.

DISCUSSION QUESTIONS:

(1) Are there any examples of issues which are being ignored by a candidate because of the "single issue voters" in the present presidential election?

(2) Paulos starts his article with the comment:

First, let's consider the median theory of elections. One version maintains that, if one issue or one set of highly related issues dominates an election, there is a strong tendency for both candidates to move toward the position of the median voter. That position is such that half the voters are on one side of it, half on the other side.

Do you think this has occurred in the current presidential election? Is so, for which issue or set of issues?
<<<========<<

>>>>>==============>
Bush takes a beating
Salon
Alicia Montgomery

Milton Eisner wrote us commenting on the following statement in this article which summarizes the results of the latest polls relating to the presidential race:

The vice president is up in Michigan 45 to 37 percent, still a statistical dead heat when considering the four-point margin of error.

Milton commented that he did not believe it correct to call this a statistical dead heat. To test this he reported the following simulation

Let G = proportion of the population favoring Gore.
      B = proportion of the population favoring Bush.
      U = proportion of the population undecided or other.
Let g = proportion of the sample favoring Gore.
      b = proportion of the sample favoring Bush.
      u = proportion of the sample undecided or other.

I assumed G = .45, B = .45, U = .10, and took 100 samples of 1000 "voters" each. I did a scatterplot of the resulting (g, b) pairs. For no pair was ABS(g - b) greater than about .06. In summary, I found the mean of the 100 values of g - b to be 0.004030 and the standard deviation to be 0.029592. This is consistent with MAX(g - b) being about 0.06 and MIN(g - b) about -0.06.

Therefore, I think I am justified in rejecting Salon's comment that g - b = 0.08 is a statistical dead heat. A difference this large did not occur in 100 simulations.
<<<========<<

>>>>>==============>
Vegetarians are more likely to have daughters.
The Times (London), August 7, 2000
Helen Rumblelow

and

Plant chemicals could play small part in deciding gender.
The Times (London), August 7, 2000
Dr. Thomas Stuttaford

Both articles concern a 1998 study of nearly 6000 pregnant women in Nottingham, England, five percent of whom did not eat meat or fish. Among this group, the ratio of boy babies to girl babies was 85 to 100, while for the meat and fish eaters, the ratio was 106 to 100. According to the articles, the second ratio represents the overall national average in Great Britain.

The only other study on how diet affects gender has shown that high levels of magnesium, potassium, and calcium produces more boys, but there is no evidence that vegetarian diets are low in these elements. The study also found that vegetarians are less likely to smoke during pregnancy than non-vegetarians (10 percent versus 25 percent). But curiously, other research has found that nonsmokers tend to produce more boys. "If diet is the factor that lowers the sex ratio in vegetarians it would appear that this effect overrides the effect of vegetarians' low smoking rate in pregnancy," one of the researchers is quoted as saying.

The second article looks at some beliefs during the past century on what affects the gender of a child. Apparently, at the beginning of the 20th century "it was firmly believed that an older father, or one married to a very strong, domineering woman, was likely to have boys." The fact that service men returning from World War II tended to father more boys than other men is considered to be support for this idea.

Support is also cited from studies of Burke's Peerage. "The habit for centuries was for the rich to take a bride considerably younger than themselves, whereas in poor families marriage was earlier, and the ages closer. As recorded in Burke's, the aristocracy had more sons."

DISCUSSION QUESTIONS:

(1) The above study received very little attention in the United States, where apparently only USA Today ran even a brief paragraph on the story. Why do you think this is so?

(2) The fact that men in the aristocracy tended to marry later and to younger wives, and that they had more sons, is also viewed as support for the "age of the father" hypothesis. How might have diet played a role? [One answer is given in the article.]
<<<========<<

>>>>>==============>
Test scores rise, surprising critics of bilingual ban.
The New York Times, August 20, 2000, page 1
Jacques Steinberg

Two years ago Californians voted to end bilingual education and many observers predicted dire consequences. This article reports that standardized test scores in reading and math in fact have increased since 1998, and that this information is likely to affect the future of bilingual education in Arizona, Colorado, Massachusetts, and New York.

Some impressive increases in test scores are reported. For example, the average reading score for second graders classified as "limited in English" increased from the 19th to the 28th percentile, while the average math score increased from the 27th to the 41st percentile. The article compares two nearby school districts that are comparable in size and economic background. One district eliminated bilingual education for all students, while the other granted waivers to half the students with limited proficiency in English. In nearly every grade, the increases in the former were at least twice those in the latter district. On the other hand, the article concedes that in some places many other changes accompanied the elimination of bilingual education, some of which may have helped boost test scores. For example, in the lower elementary grades class sizes have been reduced from more than 30 students to 20, and in many places the phonics method--as opposed to whole language-- has been used to teach students who have had the most persistent difficulty learning to read. There has also been little or no attempt to determine to what degree schools are complying with bilingual education ban.

DISCUSSION QUESTION:

Do you agree with Kenji Hakuta, a professor of education at Stanford University quoted in the article, that, "with so many variables introduced at once, few conclusions about bilingual education could be drawn from the results other than that 'the numbers didn't turn negative', as many had feared." Why or why not? What other information would help you answer this question?
<<<========<<

>>>>>==============>
Ask Marilyn.
Parade Magazine, 20 August 2000, 7
Marilyn vos Savant

A reader writes about a previous question (discussed in Chance News 9.06):

I was all set to call you wrong about your answer to this recent problem: "Brian drops his key into a box with six others. He'll remove the keys one at a time randomly and try to start his car. If a key does not work, he'll discard it. What are the chances that he'll retrieve his key on the second try?"

You said 1 out of 7. Then I noticed you didn't say, "Given that he didn't get his key the first time, what are his chances the second time?" If you had said that, the answer would have been 1 out of 6.

The chances of him retrieving his key on the second try are contingent on him not receiving it on the first try. So the answer is the chances that he did not select it the first time (6 out of 7) multiplied by the chances that he did select it the second time (1 out of 6). That translates to 1 out of 7--just what you said. This one almost got me. What a trick question!"

DISCUSSION QUESTIONS:

(1) In the language of conditional probability, what is the reader saying?

(2) In everyday language, do you see why one might be tempted to answer the question with a conditional rather than an unconditional probability? Do you think this is a "trick question?"
<<<========<<

>>>>>==============>
Putting rankings to the test; U.S. News's college evaluations questioned.
The Washington Post, 25 August 2000, C1
Jay Mathews

Playing with numbers: How U.S. News mismeasures higher education and what we can do about it.
The Washington Monthly, September 2000
Nicholas Thompson

The annual US News & World Report college rankings issue is now out, an event which the Post describes as "a source of gastrointestinal distress for college presidents, alumni fundraisers and professional statisticians." According to the Post, internal documents from US News indicate that the magazine's own consultants have been privately raising the same concerns that have long been voiced by outside critics.

The story stems from a 1997 report prepared for US News by the National Opinion Research Center (NORC) in Chicago. According to that report, "the principal weakness of the current approach is that the weights used to combine the various measures into an overall rating lack any defensible empirical or theoretical basis." This reminded us of the comments by former president Gerhard Casper of Stanford, who in a 1997 letter to US News wrote: "I hope I have the standing to persuade you that much about these rankings -- particularly their specious formulas and spurious precision -- is utterly misleading" (see Chance News 6.02). The choice of factors to include in the formulas was also criticized. For example, there are no direct measures of quality of the curriculum or of student experience.

The Post received a copy of the report from Nicholas Thompson of the Washington Monthly. Thompson's own commentary can be read on-line at the URL cited above. There you will also find Casper's original letter (now in the public domain),the NORC report, and US News's own response to that report.

Elaborating on its critique of the rating formula, the NORC report states that "Apart from the weights, however, we were disturbed by how little was known about the statistical properties of the measures or how knowledge of these properties might be used in creating the measures. For example, the simple correlation matrix among the variables has apparently not been computed. This would tell us whether some of the present measures are redundant, or whether some are contributing more to the discrimination among colleges and universities than others." The report adds that graduation rate appears in the formula twice, once explicitly and once as a component of the "value-added" measure, so its contribution is greater than it might first appear.

Thompson himself laments the "beauty contest" mentality promoted by the rankings issue. He further worries about what he calls the "Heisenberg effect;" that is, colleges may actually institute changes in response to the US News rating categories. For example, admissions yield (the percentage of accepted students who choose to enroll) is one component of the formula. Thompson explains that a college looking to enhance this measure might decide to admit a larger fraction of its class during the early decision phase, since these applicants commit to enroll if accepted.

Admissions officials are certainly aware of the power of the rankings. They see application rates and yield go up when their school rises in the rankings; drops in the rankings have the opposite effect. Thompson cites findings by the National Bureau for Economic Research (NBER) that quantify the yield effect: to compensate for the decreased yield resulting from a one-place drop in the rankings, a school would need to increase its admissions rate by 0.4 the following year.

Readers of last year's rankings may recall Caltech's surprising leap to the number one spot among national universities. Thompson provides some details on how this came about. The key shift was the spending-per-student measure. In prior years, it had entered the formula through ranks. But last year, absolute dollar numbers were used. To illustrate, Thompson gives data from 1997. Caltech led the group, spending $74,000 per student. Yale was fourth with $45,000 and Harvard seventh with $43,000. In terms of ranks, Caltech's advantage over Yale is the same as Yale's over Harvard, namely three positions. But in absolute numbers, Caltech spent some 40 percent more than Harvard, whereas Harvard and Yale were nearly the same. The switch to absolute dollars last year gave Caltech a dramatic advantage over the pack. Thompson reports that the change provoked considerable internal debate at US News.

DISCUSSION QUESTIONS:

(1) Despite concerns of statisticians, it seems the US News rankings are here to stay. By way of improvement, can you suggest a reasonable way for US News to measure "academic rigor"?

(2) Explaining the NBER finding, Thompson writes: "In other words, if a school that needs to admit 15 percent of its applicants to fill its class moves from 5th place to 10th place, it will need to admit 17 percent the next year." What kind of assumptions are being made here?

(3) You can read the details of US News' methodology on their web site. Can you see any rationale for when absolute numbers are used and when ranks are used?

(4) This year US News has Princeton, Harvard and Yale as the top three universities, followed by Caltech and MIT. What do you think happened?
<<<========<<

>>>>>==============>
Positive trends hidden in SAT and ACT scores.
New York Times, 30 August 2000, B10
Richard Rothstein

This is an interesting article in which Rothstein makes a hypothesis, and then looks at the data only to find that his data does not settle his question. The article relates to the recent announcements of the year 2000 SAT and ACT scores.

Both the College Board and the ATC provided news releases. The headline of the College Board report was:

SAT math scores for 2000 hit 30-Year High: Reflect gains for American education.

The headline of the ACT report is more modest:

ACT Scores for 2000 Maintain Gains of the '90s.

Rothstein asks: do rising SAT scores mean that schools are doing a better job? He answer "no" saying that this would be justified only if all 18-year-olds took the test. For example, if a higher percentage of bright students took the test this year the average scores could increase with no improvement in the schools.

However, Rothstein suggests that a more careful look at the scores might indicate improvements in public education. While the SAT scores get the most publicity, he remarks that only about 1/2 the students applying to selective colleges take the SAT exam. The other half, mostly from the Midwest, take the ACT exam. Thus it is necessary to look at data from both tests. To illustrate this he mentions the classic example of comparing Iowa and New York just by SAT scores. Iowa typically has the highest SAT average SAT score among all states (this year they were number 2 with North Dakota number 1). This does not imply that Iowa does a better job teaching students than New York.

The reason that Iowa has such a high average was discussed by Paul Alper in a letter to Chance News 9.05. In his letter Paul also remarked that Iowa's average ACT score was "way down on the list". Loyal Iowans wrote us that this is not the case. For example, this year they were seventh in the ranking of states by average ACT scores. Rothstein remarks that disadvantaged blacks, who take these exams, have lower scores, on average, than the white students who are mostly from middle class. As the black students become a larger proportion of those taking the exam, a modest increase in the average scores for all exam takers is possible even when the scores of both black and white students are increasing at a more significant rate. In this case, the schools may well be doing a better job but it would not show up in the average SAT scores. Rothstein looks at the data available on the College Board web site and the ACT web site

Scores have been reported by race since 1976. Since this time the combined SAT scores for all students has risen by only 1 percent from 1006 to 1019.

The combined SAT scores for blacks have increased by 9 percent from 790 in 1976 to 860 today. The average black ATC score has increased by 13 percent from 15.1 to 17.0 during this time. In 1976, 29 percent of the black 18 year olds took either the SAT or the ATC exam. Now 38 percent take one of these exams.

The combined SAT score for whites during this period has increased from by 1 percent from 1043 to 1058 and their average ATC scores have increased by 3 percent from 21.1 to 21.8. The percentage of white 18 year olds taking the SAT or ATC exam has increased from 32 percent in 1976 to 57 percent today.

Looking at all these statistics Rothstein remarks:

More 18-year-olds are taking these tests, but that is not sufficient proof that more lower-ranking seniors do so. Perhaps many good students did not take the tests in 1976 because they could not afford college, and now such students can.

But this is an implausible explanation for what seems to be a democratizing trend, with rising achievement. Scores for blacks and whites have both grown over the last 24 years, and the gap between them has narrowed, even while test-takers probably become less elite.

DISCUSSION QUESTIONS:

(1) Why do you think Iowa and North Dakota have such high average SAT scores?

(2) Do you think the statistics given relating to SAT and ACT scores between the years 1976 and 2000, indicate an improvement in the schools? What additional information would help you decide this?

(3) The SAT and ACT web sites have much more data about the results of their examinations over time. Look these over and see if you can get a better idea from this if schools are doing a better job. (4) We remarked that the percentage of white 18-year-olds taking the SAT or ATC exam has increased from 32 percent in 1976 to 57 percent today. Is this a 57-32 = 25 percent increase or a (25/32) = 78 percent increase?
<<<========<<

>>>>>==============>
Ask Marilyn.
Parade Magazine, 10 September 2000, page 24
Marilyn vos Savant

Marilyn gets the following letter:

Say, we're going to toss a coin repeatedly until we get one of these sequences in order: heads/tails (HT) or tails/tails (TT). If my sequence comes up first, you pay me a dollar; but if your sequence comes up first, I pay you a dollar. At that point we stop and start all over again. Given that we're going to play this game repeatedly, which sequence would you choose? Or doesn't it matter?

Barney Bissinger,
Hershey, Pa.

Marilyn says that she would choose HT. She points out the only you can win if you pick TT is for the first two tosses to be tails which occurs with probability 1/4. If this does not happen the next TT must be preceded by an H and so she will already have won. Thus by choosing HT she will win with probability 3/4.

This is a special case of a game called Penny-Ante proposed by W. Penney, Journal of Recreational Math, vol.2 (1969), p. 241. In Penny-Ante there are two players; the first player picks a pattern A of H's and T's and the second player, knowing the choice of the first player, picks a different pattern B. A coin is tossed a sequence of times and the player whose pattern turns up first is the winner. It is assumed that neither pattern is a sub-pattern of the other.

John Conway found a very simple formula for calculating the probability that player A wins. This formula is described in M. Gardner, "Mathematical Games," Scientific American, vol. 10 (1974), pp. 120-125.

A proof of Conway's formula and other interesting aspects of Penny Ante can be found in the exercises of Chapter 11 Section 2 of "Introduction to Probability Theory" by Grinstead and Snel.l.

DISCUSSION QUESTIONS:

(1) In the September 20th 1992 issue of Parade Magazine Marilyn received the following question, also from Barney Bissenger in Hershey Pennsylvania.

I am asked to select one of two envelopes and told only that one contains twice as much money as the other. I find $100 in the envelope I select. Should I switch to the other one to improve my worldly gains?

Marilyn says that she does not think there is any advantage in switching. An argument for switching is that the other envelope must have either $50 or $200. Thus your expected winning if you switch is 1/2*50+ 1/2*200 = $125. Is this argument correct? If not, why not?

A much more mysterious envelope paradox is the following: I put two distinct numbers a and b in an envelope. You pick one of the numbers at random. Show that you can decide if you have the bigger or smaller number with a probability greater than 1/2 of being correct. These two famous envelope paradoxes are discussed in Chapter 4 of the Grinstead-Snell probability book .

(2) There is no telephone listed in Hershey Pennsylvania for a Barney Bissenger and we were not able to find this name anywhere searching on the internet. Do you think there is such a person?
<<<========<<

>>>>>==============>
Chances are, mosquito's bite just itchy. West Nile virus may be deadly, but it's also rare.
Courier Monitor, Sept. 4, 2000
Sarah M. Earle

Risk management people worry about the Nile virus.
Boston Globe, 20 August, 2000 A1
Patricia Wen

The Monitor article resulted from the first appearance of the West Nile virus in dead crows in New Hampshire. This article, like most articles on this subject says that the odds of being killed by the West virus disease is reported to be about 1 in a million. This is based on the fact that last year in New York among a population of 8 to 10 million there were 7 deaths.

Unlike most articles that give these odds, in this article it is stressed that this statistic may not mean much in other areas since there has been very little experience with this virus in the United States. Epidemiologist Jesse Greenblatt remarks "it is hard to make any comparison. We just don't know enough."

We saw in our discussion of the risk of being killed by lightning in Chance News 9.08 that men are much more likely to be killed by lightning than women. Also if you are a golf player you are more likely to be killed by lighning etc. There will certainly be similar factors to consider with the West Nile virus. For example, Greenblatt says "this mosquito tends to like urban areas" Also, all of those who died as a result of the virus in New York were over 75 so age is probably a factor.

The Boston Globe article deals with the reasons that the public is so much more concerned about being killed by the West Nile virus than they are about being killed by other events which are much more likely to occur. One explanation given is that people are less frightened of things which they have control over. Thus they worry less about driving than flying even though the risk of driving is greater. Also people are frightened by cluster deaths related to some kind of new event. Also, just as the public plays the lottery "because someone has to win" they are sure that someone will die of the virus and "with my luck it will be me".

Psychology professor Steve Pinker said that people find it difficult to ignore the threat of blood-sucking insects that deposit illness beneath our skin, regardless of statistical odds.

Psychologist Scott Geller refers to the "Just World" hypothesis in which people believe bad things happen for a reason. For example, if Jones is killed in a car accident it is because he was driving too fast. Referring to the randomness of the West Nile virus Geller remarks that "When something looks random like that, people worry that even the good guys can get it.

DISCUSSION QUESTIONS:

(1) Would you expect a gender difference in the risk of death from the West Nile virus? If so, which way?

(2) The Globe article refers to the risk of 1 in a million as the statistical cutoff point for saying something has almost no risk at all. Where do you think this comes from?
<<<========<<

>>>>>==============>
The bathroom tiles problem
ASA 2000 meeting in Indianapolis, August 2000
Bill Finzer and Laurie Snell

The recent meeting of the American Statistical Association was held in the Indianapolis Convention Center. Bill Finzer from Key Curriculum Press was at the meeting to demonstrate their new statistical package Fathom (See Chance News 8.08). Bill noticed that the men's bathrooms had one inch square blue and white tiles on the floor laid out in an apparently random pattern. The floor was about 24ft x 24ft so there were 288^2 = 82,944 such squares. For a picture of the floor go here.

Bill proposed testing to see if the pattern is indeed random. Looking at a few randomly chosen 10in x 10in squares suggested that it was at least reasonable to assume that about half the squares are white and half are blue.

This suggested making the hypothesis that the colors were determined by the equivalent of a coin tossing process. We proposed to test this hypothesis using the longest horizontal or vertical run of a single color as a statistic. This longest run turned out to be 17. You can see a photo of this longest run here.

In a second similar men's bathroom the longest such sequence was also 17 though the overall pattern was quite different.

Bill agreed to try to determine the distribution for the longest run by simulation while Laurie offered to try to determine the exact distribution. Bill showed the power of his new statistical package by doing his part before breakfast the next day. Laurie was still working on his assignment at the end of the meeting. You can also see Bill's simulations including the relevant graphics here.

Bill first produced a single simulation which would be the result of tossing a coin to determine the color of each of 82,944 squares. The graphical representation of the simulation made a floor that looked remarkably like the floor in the bathroom. Then, assuming that the rows and columns are independent, Bill simulated 288 rows and 288 columns and found the longest run in any of these. While the independence assumption is not quite true we shall see that this does not affect the outcome very much. He then simulated this 100 times and obtained the following results for the 100 simulations:

longest run	frequency
14	1
15	11
16	20
17	33
18	19
19	5
20	4
21	6
22	1

From this we see that 17 occurred 33 times suggesting that it is the most likely value for the maximum horizontal or vertical run in a tiling. Thus we certainly cannot reject the hypothesis that the tiling is random on the basis of this test.

Laurie started his consideration of an exact solution by looking at the longest run in the rows instead of both rows and columns. This avoids the issue of the dependence between rows and columns. For a single row the length of the longest run is the length of the longest run of heads or tails when we toss a coin 288 times.

This problem has a long history and is beautifully explained in an article by Mark Schilling, The College Mathematics Journal, Vol. 21, No 2, May 1990, pp. 196-207.

Schilling starts his article with an old activity that is still popular in statistics classes. This activity is described in a book by Revesz and Csorgo (Strong Approximations in Probability and Statistics, Academic Press 1981, p. 97). Here we find a teaching experiment of T. Varga described as follows:

His class of secondary school children is divided into two sections. In one of the sections each child is given a coin which they then throw two hundred times, recording the resulting head and tail sequence on a piece of paper. In the other section the children do not receive coins but are told instead that they should try to write down a "random" head and tail sequence of length two hundred. Collecting these slips of paper, he then tries to subdivide them into their original groups.Most of the time he succeeds quite well.

The usual way to try to divide the results into their original groups is to choose those with the longest run of heads or tails 6 or greater as the ones for which a coin was tossed.

While this experiment has been carried out by many of us with great success it failed once for Bill Peterson when he was teaching an alumni class at Middlebury. Bill asked the students to do the experiment during the lunch break. When he looked at the results he suspected that the students might have tried to outsmart him. He asked one them if they had and got the answer: Age and cunning beat youth and brains!

Schilling first gives a simple recursion equation to compute the distribution for the length of the longest head run in n tosses of a coin. He lets A(n,k) be the number of sequences of length n for which the longest head run is at most k.

Consider k = 3. If n <= 3 then all sequences are favorable. If n > 3 every favorable sequence must begin with either T, HT, HHT, or HHHT corresponding to 0,1,2, or 3 heads before the first tail. This divides them favorable sequences into four groups so that

A(n,3) = A(n-1,3) + A(n-2,3) + A(n-3,3) + A(n-4,3).

Thus to compute A(n,3) for all n we start with the first four powers of 2 and then compute successive values as the sum of the previous four values giving:

n	0	1	2	3	4	5	6	7	8	...
A(n,3)	1	2	4	8	15	29	56	108	208	...

The same procedure works for any k. You determine the first k+1 values as powers of 2 and then later value are the sum of the previous k+1 values. The probability that the longest head run is at most k is then A(n,k)/2^k and this determines the distribution of the length of the longest head run.

Now we are interested in sequences for which the longest head or tail run in n tosses of a coin is at most x. But this is the same as the probability that the longest head run in n-1 tosses of a coin is at most x-1. To see this consider the following two sequences: The first is the result of tossing a coin 10 times and the second indicates which of the tosses were the same (S) as the last toss and which are different than the last toss(D)

T H T T T H T T H H

D D S S D D S D S

Then the second sequence can also be considered as the outcome of tossing tossing a coin 9 tosses and the length of the longest S run is one less than the length of the longest head or tail run in the original sequence. Thus B(n,k) = 2*A(n-1,k-1) (the two comes from the fact reversing all the outcomes in the first sequence gives the same second sequence.) This means that the distribution of the length of the longest head or tail run in n tosses of a coin is the same as the distribution of the longest head run in n-1 tosses of a coin shifted up by one.

Schilling's recursion equation made it easy to write a Mathematica program to compute the distribution of the longest head or tail run in n tosses of a coin. Running the program for n = 200 we obtain a distribution for the length of the longest head or tail run in 200 tosses of a coin. From the distribution we find that there is about an 80 percent chance that the longest run of either heads or tails in 200 tosses is 6 or greater. This explains the success of the Varga activity.

Schilling states that, for large n, the expected length of the longest head run is approximately log(n) - 2/3 where the log is to the base 2. The standard deviation, remarkably, is essentially independent of n and is about 1.873. This means that the expected value of the longest head or tail run in n tosses of a coin is approximately log(n-1)+1/3 and the standard deviation is also approximately 1.873. For example for 200 tosses of a coin the exact values for the mean and variance of the longest head or tail run are 7.977 and 1.828 respectively, and the approximations would be 7.970 and 1.873.

Returning now to the bathroom tile problem we can us n = 288 and compute the probability F(x) that the length of the longest black or white run in a row of our floor is less than or equal to x. Now we have 288 such rows in our square. Thus the probability G(x) that all 288 rows have largest run less than or equal to x is given by F(x)^288 since the length of the longest runs in the rows are independent. From this we can find the probability G(x) that the largest black or white run in all the rows of the square is at most x. If we further assume independence of rows and columns we would get H(x) = F(x)^576 for the distribution of the longest run or black or white squares in the entire floor. Again, this is easy to compute using our Mathematica program.

To see if the assumption of independence causes problems we did our own simulation where we did not assume independence. We just made 500 tile floors making each square black or white with probability 1/2 and finding the longest run of black or white either in the rows or the columns. Here are the exact calculations assuming independence and our estimates for the probabilities from our simulation. Assuming Independence Simulated longest run prob prob estimated

Here is a comparison of the results of calculating the probability assuming independence and simulating the length of the longest run without assuming independence.

Longest run	Probability assuming independence	Simulated not assuming independence
14	.01	.01
15	.08	.09
16	.21	.20
17	.25	.25
18	.19	.19
19	,12	.11
20	.07	.08
21	.04	.04
22	.02	.02
23	.01	.00
24	.00	.01

From this we can see that the assumption of independence does not seem to be a problem.

Note that if we had done the simulation right in the first place without assuming indpendence we would have seen that 17 was the most probable number for the length of the longest run and we would not have to have told the Schilling story. However, it is a great story and problems related to runs occur frequently as they did in Varga's activity so if you got this far you learned something important.

DISCUSSION QUESTIONS:

(1) Bill Peterson suggested considering the rows as one long sequence by just continuing each row starting with the beginning of the next row. What would the expected length of the longest black or white run be in such a sequence?

(2) When we returned back to Dartmouth we were reminded that people often thought the tiles of the side of the Mathematics department looked random. After walking by them for forty years it was pointed out to us that this is far from true. In fact it is periodic of period 3. For a photograph of the tiles and the three parts go here. How would you go about testing if within each third the pattern is random?
<<<========<<

>>>>>==============>

Note: Chance News Copyright (c) 2000 Laurie Snell This work is freely redistributable under the terms of the GNU General Public License as published by the Free Software Foundation. This work comes with ABSOLUTELY NO WARRANTY.

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

CHANCE News 9.09

August 10, 2000 to September 12, 2000

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!