CHANCE News 9.11

October 21, 2000 to November 27, 2000


Prepared by J. Laurie Snell, Bill Peterson, Jeanne Albert, and Charles Grinstead, with help from Fuxing Hou and Joan Snell.

Please send comments and suggestions for articles to

Back issues of Chance News and other materials for teaching a Chance course are available from the Chance web site:

Chance News is distributed under the GNU General Public License (so-called 'copyleft'). See the end of the newsletter for details.

Chance News is best read with Courier 12pt font and 6.5" margin.


Dilbert cartoon 11/13/2000:

Manager: We have a gigantic database full of customer behavior information.
Dilbert: Excellent, we can use non-linear math and data mining technology to optimize our retail channels!
Manager: If that's the same thing as spam, we're having a good meeting here.


The majority of Dartmouth students drink 4,5,6 or more drinks
when they party.

Seen on the t-shirt of a Dartmouth student
(See Chance News 9.09)                


Contents of Chance News 9.11

Note: If you would like to have a CD-ROM of the Chance Lectures that are available on the Chance web site, send a request to jlsnell@dartmouth.edu with the address where it should be sent. There is no charge.                                         <<<========<<

We are sending separately an invitation to attend our Third Chance Lecture series to be held at Dartmouth from 5:00 P.M Friday, 15 Dec. 2000, to 5:00 P.M. Saturday 16 Dec. 2000. We hope that some of you will be able to come to the lectures and others will enjoy viewing them when they have been put on the Chance website.

For more information about these lectures and to register to attend them go to: The Chance Lecture Series

Please mention this to others whom you think might be interested in attending these lectures.

Thanks to our readers we have our own version of Forsooth for this issue of Chance News:

Maya Bar-Hillel heard the following comment on CNN during the discussion of the 2000 Presidential election:
The voter turnout for this election is much larger than was expected. That will help both candidates.
Tom Shortridge sent us the following:
The Incredible Shrinking Office

The average office space for middle managers shrank from 151 square feet in 1994 to 142 square feet in 1998. That meant going from a 10-by-11 foot office to cubicles half that size.

Bucks County Courier Times
Nov. 9, 2000

Incredible, indeed!
Charles Grinstead contributed the following Forsooth!

Calculating one kind of middle class.
The New York Times, 29 October 2000, Week in Review p 5
Tom Zeller

Zeller remarks that, despite politicians talking about the Middle Class all the time, there is no official definition. He proposes as a definition the middle quintile (middle 20%) of the U.S. income distribution. For the U.S. this means between $30,000 and $50,000 annually. He then provides a map of the U.S. colored with four shades of gray to show the regional distribution of the middle class households. Among other things the map shows that the middle class tend to cluster on the coasts and the midwest. Four types of regions are indicated by:

very light gray 0-4.99%
light gray 5-9.99%
middle gray 10-14.99%
dark gray 15-19.99%

and therein lies the Forsooth!

Of course we must start with the election.

We remember stories by John Tukey's students about working with him for the networks on election night; and so we cannot resist providing for background the following wonderful tribute to John Tukey that appeared in the Washington Post two days before the elections. Be sure to get to the last paragraph!

Insider's Guide, Many happy returns.
Washington Post, 5 Nov 2000, F2
David Alan Grier
Sure the election will be close, but will it be good television?

I'm going to go out on a limb here and suggest that we all might want to pull up a comfy chair Tuesday night and stay tuned for the kind of electoral cliffhanger that went out with coonskin caps. When pressed for my reasons, I give only one. This will be the first election since the Second World War without the presence of John Tukey, the man who destroyed the suspense of election night.

Tukey, a professor of statistics at Princeton University, was--among an astonishingly long list of other credits--a psephologist (se-fo-lo-gist), an expert on the mathematical prediction of elections. Every fall, he would sit down to plan how to predict the outcome of that season's election, be the election presidential, midterm congressional or the off-year election of a local mayor.

The country got its first taste of the power of mathematical election prediction in 1952, when CBS television used the first commercial computer, the Univac I, to predict the vote in the Eisenhower-Stevenson election.

By early evening, the computer model predicted a landslide for Eisenhower. The production staff, used to the late drama of election nights, resisted using the numbers. One producer recalled thinking they "just couldn't be right." Only four years before, in 1948, Harry Truman had gone to sleep without knowing the results of the election that would return him to the White House. He learned that he had retained the presidency only when the state of Ohio announced its vote at 8:30 a.m. the following day.

But the 1952 numbers proved to be accurate and with them began the decline of election night.

This would have happened without Tukey. The election of 1952 and its rematch in 1956 were easy tests for psephology, for in both the leading candidate was a popular war hero and even those who did not have electronic computers predicted an easy win by Gen. Eisenhower. The real test of the mathematical crystal ball came in 1960, when Kennedy went against Nixon.

This is where Tukey came in. In a close election, the built- in weakness of sampling technique--how do you know that the opinions of 1,000 voters selected for a poll don't vary from those of the 100 million who will vote?--can provide fatally flawed predictions.

In 1960, Tukey joined NBC as a consultant. He had been trained as a classical mathematician, but began focusing on statistics during World War II when he contributed to the war effort by applying statistical analysis to the problems of warfare. Tukey's wide-ranging genius would eventually provide science with an abundance of mathematical tools, including a fast way to analyze the spectrum of starlight. But his most far-reaching contribution to statistical sampling was a development called "robust analysis," an approach that guards against wrong answers in situations where a randomly chosen sample of data happens to poorly represent the rest of the data set.

As it turned out, this was just the approach needed to transform exit polling from an educated guess to the almost infuriatingly unerring predictor that makes election nights so, um, predictable. Over the years, Tukey built a mathematical model that was increasingly efficient. As he refined his equations, we were able to predict the results ever earlier in the day.

By 1980, he claimed that he could accurately forecast the vote "by 9:30 Eastern Time"--before the polls even closed in California. By then, election evenings had all the suspense of pizza delivery. Every hour would bring a new round of polls closed, a new round of commercials and a new round of predictions that had long been known to network producers. Unless election night is filled with a round of creative advertisements, such as those that make even Super Bowls bearable, watching past prime time is a wash.

But now we are living in a post-Tukey age. He departed this Earth just before the nominating conventions last July. As much as his genius and energy will be missed, I cherish a small hope that--absent Tukey--the psephologists will be wrong this year, that their equations will fail and that we will see just a bit more drama when they report the votes.
Professor Grier is director of the George Washington University Honors program and writes on the history of computing.

The story of the presidential election 2000 starts with the networks prediction, shortly before the 8:00 closing of the polls in Florida, that Gore would win Florida followed at around 10:00 P.M. by the networks withdrawing this prediction, followed by at about 2:00 A.M., the projection that Bush won Florida and the election, which itself was withdrawn around 3:00 A.M. Here are some accounts related to this screw-up.

National Public Radio.
All things considered. 7 Nov. 1996

National Public Radio.
All things considered, 8 Nov. 2000

Bad call in Florida.
Washington Post, 13 Nov. 2000, OP-ED, A27
Richard Morin

Before 1988 the networks had their own experts to carry out exit polls and make projections on election night. In 1988 the networks consolidated their once separate exit poll operations into one and, in 1990, established the Voter News Service (VNS) to carry out the exit polls and provide the projections for all the major networks on election night. However, in 1994 ABC set up a decision desk with experts whose job it was to make their own projections based on information from the Voter News Service. ABC scooped the other networks that year and this led other networks to follow suit and set up their own decision desks.

The first NPR program is an interview with Murray Edelman, editorial director of VNS, after the incorrect prediction by the networks in the 1996 election that Republican Bob Smith had been defeated by his Democratic opponent Dick Swett. Asked what an exit poll is, Edelman explains that, based on information from the previous election, a sample of precincts are chosen and then about 1000 voters from these precincts are asked who they are, how they voted, why did they voted that way, their age, and other demographic questions. Asked what is the biggest potential problem with exit sampling, Edelman answers: those who are willing to respond to the exit poll may not be representative of the population of voters.

The second NPR program is an interview about the Florida 2000 vote with Warren Mitofsky, the founding father of VNS and currently an election consultant for CBS and CNN. Mitofsky also explained the exit poll and said that VNS uses statistical models based on information from the exit polls and the counts so far at the precinct and county level. He remarked that a precinct model and a county model both erred in the same direction and predicted that Gore would win Florida. He said that the later prediction that Bush would win was based solely on the vote count. He remarked: Bush was ahead by 56,000 votes with almost all the votes counted and we never dreamed that it would close all the way. Motosfsky said that he has made 5 mistakes in about 3000 elections in five different countries but had never seen anything like this election night.

In his article, Morin claims that VNS never called Florida for Bush. He writes:

Fox News was the first to declare Florida for the Republicans as vote counts supplied by NVS continued to show Bush with a substantial lead. Fox announced the call at 2:16. (Ironically, the decision to declare Bush the winner was made by John Ellis, who headed the call desk at Fox and happens to be Bush's cousin.) The other networks quickly followed suit, only to take the call back around 3 a.m. But VNS officials confirm they never issued a prediction.

Morin recommends that the networks abolish their decision desks and make VNS solely responsible for making all state and national Election-Night calls. He also suggests that it would be nice if VNS had some competition though these two recommendations do not seem to mesh too well. At any rate, there will almost surely be a congressional investigation of all this and so we will learn more later what really happened.


(1) Do you think the network errors might have influenced the voting? If so, how?

(2) What changes in the way networks project the outcomes of elections do you think will result from this year's experience?

(3) It has been suggested that the networks are forced to go along with the early projections of other networks, often against their better judgment, because of the natural competition between networks. They deny this. What do you think about this?

The next act in this drama started when voters in Palm Beach Florida complained that the ballot used, called the butterfly ballot, caused some of them to vote for Buchanan when they intended to vote for Gore. You can see this ballot and get a chance to see if it would confuse you.

Pat Buchanan himself said that he did not believe that he could have received as many votes as the count indicated in Palm Springs -- a county with a history of supporting Democratic candidates.

Our first article describes an experiment to see if the butterfly ballot was really confusing.

 An electoral buttefly effect.
Nature, 7 Dec., 2000, Vol. 408, pp. 665-666
Robert Sinclair, et al. University of Alberta, Psychology Department

This article describes two experiments performed by researchers at the University of Alberta and at Pennsylvania State University. In the first study, conducted the day after the U.S. election, 324 introductory psychology students were randomly assigned to one of two groups and asked to participate in a mock election for Prime Minister of Canada. The treatment group used a ballot that was similar to the now infamous butterfly ballot used in Palm Beach county, while the control group used a single-column format. All the students in both groups were also asked to report the degree to which their ballot was confusing (two items on a seven point scale), and to write the name of the person for whom they had intended to vote.

Sinclair predicted that the students in the treatment group would rate their ballot as significantly more confusing than those in the second; they report that their hypothesis was confirmed (p<.0001). On the other hand, no errors were made in either group, which they attribute to the fact that "college students are quite skilled at completing confusing optical scoring sheets." To examine whether members of the general public would have difficulty using the butterfly ballot, they conducted virtually the same study off-campus the next day. By then they were also able to more closely approximate the Palm Beach ballot (since pictures of the ballot had been widely distributed.) They again report a significantly higher rate of confusion with the butterfly ballot, and also report that 4 of the 116 participants made errors, all of them on the suspect ballot. Moreover, 3 of these people unintentionally voted for the candidate who was listed in the same position as Buchanan, when they had intended to vote for the candidate in the same position as Gore. They conclude: "Thus, the results suggest that the butterfly ballot, as used in Palm Beach county, does result in systematic errors."


The researchers state that "the current findings may underestimate the magnitude of the bias" toward errors of the Gore/Buchanan type, "because the candidate in the first position on our butterfly ballot (analogous to Bush) received 49.9% of the vote in Study 2 and no errors occurred in this position (the candidate in the second position, corresponding to Gore, received 21.4% of the vote and the remaining 8 candidates shared 29.5%.)" Why do the figures presented suggest an underestimate of the described bias?

This led to a large number of papers posted on the internet aimed at demonstrating statistically that the number of votes Buchanan obtained could not reasonably be attributed to chance variation and to estimate the number of votes that Gore lost because of the confusion caused by the butterfly ballot. Such losses would occur when a voter punches the hole for Buchanan thinking it is the one for Gore and also when a voter notices this error and tries to correct it, resulting in multiple holes and the vote not being counted. The next two articles appeared to us to be two of the most convincing of the many articles written.

This possible effect of the butterfly ballot also led to a large number of papers posted on the internet aimed at demonstrating statistically that the number of votes Buchanan obtained could not reasonably be attributed to chance variation and to estimate the number of votes that Gore lost because of the confusion caused by the butterfly ballot. Such losses would occur when a voter punches the hole for Buchanan thinking it is the one for Gore and also when a voter notices this error and tries to correct it, resulting in multiple holes and the vote not being counted. The next two articles appeared to us to be two of the most convincing of the many articles written.

Voting irregularities in Palm Beach county.
Jonathan N. Wand, Kenneth W. Shotts, Jasjeet S. Sekhon, Walter R. Mebane, Jr. and Michael C. Herron.
18 Nov. 2000

This paper analyzes voting in Palm Beach county in order to determine if Buchanan received an unexpectedly high number of votes. Their overall conclusion is that, using "several different analyses of presidential voting in Palm Beach county, each analysis leads to the same result: the vote totals in Palm Beach county are irregular." In particular, their findings include: (1) Buchanan received far more votes in Palm Beach county then we should expect, given the county's characteristics and historical voting patterns, and (2) patterns of voting within the county indicate that excess votes for Buchanan came primarily from Gore supporters.

The authors describe their analysis in some detail and provide technical descriptions in an appendix. Using a generalized linear model, for each county in Florida they compute the expected number of votes for Buchanan given the Bush and Nader vote shares. They conduct a similar analysis on the national level and provide compelling graphs displaying the distribution of discrepancies between the expected vote and the actual vote for both the statewide and national data. They also conduct more in-depth county-level analysis. In addition they determine that the Buchanan vote share in the current election is positively correlated with the Jeb Bush-Frank Brogan share in the 1998 gubernatorial race in Florida. Further, they state that, if the relationship between Buchanan's vote share and the Bush-Brogan vote share in Palm Beach county were the same as the relationship between these two variables in the other 66 Florida counties, then Buchanan should have received 0.196% plus or minus 0.447% of the vote share in Palm Beach. Since the observed Buchanan vote share of 0.789% is greater than 0.196 + 0.447 = 0.643%, they conclude that the level of support received by Buchanan in Palm Beach county was not consistent with the county's level of political conservatism as measured by its support for Bush-Brogan in 1998.

The authors also discuss several questions related to the butterfly ballot. For example, they consider the argument that the Reform Party is popular in Palm Beach county, so that a good showing there by Buchanan should not be surprising. Their findings, which they say are inconsistent with this argument, show that Buchanan did better in precincts that strongly supported Gore. They also discuss the problem of "over-voting": ballots on which more than one choice for president was selected. To counter the argument that such ballots did not significantly affect the outcome in Palm Beach, or in any event did not harm Gore more than Bush, they note that they found a strong positive correlation between the number of non-counted ballots and the number of Gore votes in a given precinct. By contrast, they found no meaningful relationship between non-counted ballots in Leon county, where the butterfly ballot was not used and the number of Gore votes.

Finally, they rebut the rather simplistic argument(put forth by, among others, commentators at the National Review website)that Buchanan's 3,407 votes in Palm Beach county this election is not necessarily unusual, since he received 8,788 votes there in the 1996 primary. What supporters of this argument fail to point out is that the 1996 figure represented 5.4% of Buchanan's Florida vote, while his current vote share in Palm Beach represents 19.6% of his statewide Florida vote.


(1) In their analysis of the expected vote for Buchanan in 4481 reporting units in 46 states, the authors find three other counties in New Hampshire, South Carolina, and Arizona with greater discrepancies between the expected vote and the observed vote for Buchanan. How would you answer a critic of this study who might say: this shows that it is reasonable to expect such discrepancies by chance?

(2) The authors find that, in Palm Beach County, almost half of the precincts had more votes counted for Senate than for President. In Leon County, where the butterfly ballot was not used, 102 out of 103 precincts had more votes for President than for Senate. Why do you think they looked at this?

(3) The authors also investigate how Buchanan's vote is related to demographic variables including income, race, education, military service, and age. Which of these variables would you predict were associated with a low Buchanan vote?

(4) In a methodological caveat, the authors say that, in a study like theirs, one must be concerned about the "ecological inference problems." What does this mean?

Papers written to estimate Buchanan's vote typically used linear regression but differed in their choice and the form of the explanatory variables and the predicted variable. Initial studies simply used Gore's votes in the counties to predict Buchanan's vote. With this choice, Palm Beach turned out to be an obvious outlier. On the other hand economist Robert Shimer at Princeton showed that, if one plots the Gore's vote share against Buchanan's vote share, Palm Beach is not such an outlier. To counter this, others showed that if you add other explanatory variables, Buchanan's vote share again appears as a significant outlier. A list of the papers written to-date can be found at here.

You will find that these papers tend to be written by economists, political scientists and other social scientists. An exception is a recent paper by statistician Richard L. Smith at the University of North Carolina. Smith reviewed the work of others and wrote his own evaluation paying special attention to the assumptions made in a regression model.

A statistical assessement of Buchanan's vote in Palm Beach County.
Richard L. Smith
20 Nov. 2000

Smith reviews the previous studies and suggests that in many cases the choice of dependent variable and explanatory variables is such that the assumptions of regression theory: the variances should be constant across counties and the errors should be normally distributed are not satisfied. Consideration of these requirements leads Smith to choose as predicted value the cube root of Buchanan's vote. As explanatory variables Smith chooses both the votes of other candidates and a number of demographic variables. From his study Smith concludes:

If the regression model is fitted to the 66 counties (not including Palm Beach), we obtain a point prediction of 326 (Buchanan's vote) and a 95% prediction interval of (181,534), compared with the actual vote reported in initial returns of 3,407. These results demonstrate conclusively that the Palm Beach County vote was indeed anomalous.


(1) Why do you think statisticians are so slow to contribute to this issue?

(2) Do you think that statistical studies should play a role in the court decisions related to voting issues in Florida?

(3) Were the conditions for regression emphasized in your statistics course?

The next thought that occurred to writers about the presidential election 2000 was that this election was simply a dead heat and the outcome might well be considered to be due to chance.

Talk about slim margins....
The New York Times, 12 Nov. 2000, Week in Review, Page 3
David Leonhardt

This brief article uses close Olympic sports contests to make the case that "for all meaningful statistical purposes, the Florida vote was a tie." Leonhardt points out that after a gold medal in swimming, in 1972, was awarded to an athlete who had won by a few thousandths of a second in a 400-meter race, "officials decided that such a tiny margin was an unfair and potentially dangerous way to separate the two performances." Now races are timed to only the hundredth of a second. Because of this change, two swimmers this summer shared a gold medal in the 50 meter freestyle. Leonhardt writes, "The difference between them--as much as almost 0.01 seconds--could be nearly 10 times larger than the difference between Gore and Bush in Florida." Later he adds: "And what are the chances that such a close race will happen again in the next century? Using most mathematical models, the probability is just a handful out of a million, or less, for a given election."


(1) How do you think the "handful out of a million" probability was computed?

(2) Do you think that close swimming races are an appropriate analogy to close elections?

The election; Surprise! Elections have margins of error, too.
The New York Times, 19 Nov. 2000, Section 4 Page 3
George Johnson

The author suggests that the difference in the vote between two candidates in an election has a margin of error just like a poll. If the margin of error is larger than the difference, then the outcome could be assumed to be due to chance, suggesting that the president of the United States was chosen at random. He suggests that this is the case in current election.

The author says that the random errors range from mechanical glitches such as the "hanging chad" to the zigzag trail of political and legal decisions which might have gone one way or the other.

It is suggested that, in fact, the election might less accurately represent the people's choice than a good poll would. If we really trusted statistics over counting we could dispense with elections and just use polls.

Referring to the Florida, vote the author comments that the industry officials say that the accuracy of the punch-card machines used range from 99 to 99.9 percent and this could result in an error of thousands of votes in a state where a few hundred votes decided the presidency. He uses this example to suggest that, while the Electoral College may have other advantages, it does tend to magnify chance errors.


(1) Do you think that the use of a sample would ever be accepted by the voters? If not, why not?

(2) In an Op-Ed article ("We're Measuring Bacteria With a Yardstick", New York Times, 22 Nov 2000, A27) John Paulos suggests that the election should be settled by the toss of a coin. Charles Grinstead agrees remarking "And Persi Diaconis should toss the coin." Why Persi?

(3) Settling a vote by the toss of a coin is an old idea and has been applied to a number of less important elections. If you were consulted on how the coin should be tossed, what conditions would you impose? In particular would you allow spinning the coin?

Our next article suggests the New York Times needs an editor who can check articles for statistical accuracy.

Anlayze this: A physicist on applied politics.
The New York Times, 21 Nov. 2000, Science Times, F4
Lawrence M. Krause, Case Western Reserve University

Our friend Mitchell Laks wrote a letter to the editor to the New York Times, about this article. Unfortunately, it was not accepted but here is what he wrote:

To the Editor,

Professor Lawrence Krauss, in his 11/21/2000 Science Times essay, "Analyze This: A Physicist on Applied Politics" overstates the ability of statistics to predict a-priori the variability of the results in the presidential election. His assertion that "the law of Large Numbers suggests that for roughly 68 percent of the time, if one preformed the same experiment on precisely the same system over and over again the total number of events counted would be expected to vary by at least 2,000 events." is incorrect. There are two major errors in this broad assertion. Firstly, if the standard deviation (SD) of the election experiment$E2 could be demonstrated to be 2,000, as Professor Krauss believes, then in fact only 32% (the complementary probability) of the time would the variability be at least 2,000 votes, a far cry from the 68% that Krauss asserts.

However, and this is even more significant, Professor Krauss is also incorrect inasserting without evidence that the SD is roughly 2,000. He seems to believe that this follows from an a-priori argument. He asserted in private communication that this follows from the standard Poisson model for experimental error. He asserted that the standard deviation is the square root of N, (where N is the number of experiments) (his argument being that for N = 6,000,000, the SD is 2,449, which he approximated to 2,000 in the essay). However, this is incorrect. The standard deviation of the Poisson Distribution is in fact the square root of (N * p) where p is the probability of each individual error in election vote determination. Professor Krauss offers no a priori determination of the error rate p. In asserting that the SD is roughly 2000, Professor Krauss seems to have confused proportionality to equality, unless he has some a-priori reason to assume that p = 1. The standard deviation will always be less than 2,449, depending upon the value of p. For a more plausible value of p closer to 1/1,000, the SD would be closer to 77.

If the SD of the Poisson Distribution for error in repeated 'experiments' was square root of N, as Professor Krauss seems to believe, then we would be unable to clap our hands 4 times without 2 errors, type paragraphs without error, or successfully complete virtually any repetitive task. If the statistics of the Poisson distribution operated in the real world according to the conception of the author of "The Physics of Star Trek", we would not be able to drive our cars to work in the morning, much less contemplate sending out explorers on the Starship Enterprise.

Mitchell P. Laks, MD, PhD
136-05 71st Rd Flushing NY 11367
718-268-0708, beeper 917-424-7480, email : mplaks@mail.com


(1) At the time this article was written Bush had a lead of 930 votes What error rate would be necessary to conclude that this could reasonably be the result of chance errors?

(2) Assume that we model the number of votes for Bush in Florida as the count proceeds as a random walk with 6 million steps. As each ballot is counted Bush's lead increases by 1 or decreases by 1 with probability 1/2. Could his lead of 930 after all the ballots are counted reasonably be accounted for by this model? What is the most likely number of times that you would find Bush in the lead?

Finally people are beginning to ask about possible biases in recounts. Matthew Spiegel at the Yale School of Management provided the following discussion of this question.

Are chads Democrats?
15 Nov. 2000
Matthew Spiegel

Florida ballots are made out of punch cards. In order to vote people poke out little paper dots from the cards. These little paper dots are called chads which sometimes hang on by one or more corners, an object known as a hanging chad. If a chad blocks its hole the counting machine assumes the chad was not punched and does not register a vote. As the ballots are handled some hanging chads will fall off, while others will close.

Press reports indicated that the cards were passed through the machines multiple times to "ensure an accurate recount." In general the counts go up but they also sometimes go down. Increases in the vote occurred about four times as often as decreases. There appeared to be the possibility of optional stopping to choose the most favorable time to stop. The author states that he is not saying that this is done deliberately but that it's just human nature to adopt the rules that favor ones own side.

Based on the recount throughout Florida there were 4245 "chad events" and Gore gained 1225 votes. Since each candidate got about the same number of votes, if the chads open or close at random each candidate should have an equal chance of gaining by a chad changing its state (open or closed). The author computes that, to have this great a difference by chance at the 5% level, there would have to have been about 400,000 chad events.

The author considers other possible explanations for the difference but, as the title of the paper suggests, he clearly believes that chads tend to be Democratic. When this was written, the hand recount had just begun and the author suggests that the chance for bias in these recounts is even greater.

Chance Magazine has a new web site.
This site includes highlights from previous issues, a topical index for Volumes 1-10 (1988-1997) and indexes for later volumes. It also includes the cover article for the current issue. This article shows the application of statistical techniques to archaeology.

The Pots of Phylakopi.
Chance Magazine, Fall 2000, Vol. 13, No. 4, pp 8-15
Ina Berg and Selwyn Blieden

The authors discuss the use of statistics in excavations made on the Greek island of Melos by the Cambridge archaeologist Lord Renfrew between 1974 and 1977. They do this in terms of a study that used fragments of ancient pots found in seven trenches scattered through the old town of Phylokopi to determine the chronological order of the ancient settlements on the island of Melos.

It is expected that settlements grew in layers as old houses were demolished and new houses were built on the ruins. The seven trenches each have different patterns of layers corresponding to different patterns of occupation, demolition and abandonment. To understand the historical development it was necessary to know how the varying patterns of layers were related to each other chronologically.

Successive layers were merged if necessary so that each layer contained 100 or more pieces of pottery. The pottery itself was divided into 8 different categories reflecting different styles for different time periods. Then the data was presented as seven matricies, one for each trench. The rows represented the levels for a particular trench in their natural order and the columns represented the 8 categories of pottery. The ijth entry of the matrix is the percentage of the pottery pieces found at the ith level that were of type j.

The authors use a statistical technique called seriation to put the entire list of levels in chronological order. The method of seriation was invented in 1899 for use in an archaeological study but has been also used in other contexts.

To carry out seriation in this example, a measure of distance between two layers is determined. The authors choose this distance to be the sum of the squares of the differences between the proportions the 8 types of pottery found in the two levels. Then the levels are ordered in such a way as to minimize the sum of the distances between neighboring levels.

Thus the pottery is used to order the levels found in the different trenches and this gives a picture of the evolution of the city of Phylakopi.

Of course, replacing the information in these matrices by a linear list loses information. The authors discuss more sophisticated statistical techniques that can be used to keep more of this information.

Food news can get you dizzy, so know what to swallow.
US News & World Report, 13 Nov. 2000, 68-72.
Linda Kulman

In previous issues of Chance News, we have presented many discussions of the challenges in conducting observational studies on diet, and the problems the press and the public have interpreting the results of such studies. The present article synthesizes stories from many of these studies and compiles a number of quotes from prominent commentators.

The public complains that planning a healthy diet is like trying to hit a moving target. The article highlights conflicting results over the years regarding coffee, eggs, butter/margarine, salt and fiber. Part of the problem is that the public doesn't appreciate the slow process by which hypotheses are tested and revised before a scientific consensus is reached. According to Harvard epidemiologist Walter Willett, "If things didn't shift, it would be bad science." David Murray of the Statistical Assessment Service (STATS) sees a conflict between the timetable of careful research and the deadlines faced by news reporters. He says "While science is contingent and unfinished, journalists want something definitive. They impose closure by the deadline. Out of that, the public thinks they are always changing direction." Nevertheless, says Gina Kolata of the New York Times, the blame does not rest solely with reporters. She points out that the scientists are themselves quite enthusiastic about their own findings: "They say, 'I myself am taking folic acid.' I used to feed off their enthusiasm. But when you see one [study] after another fall, I've become much more of a skeptic." Another issue we've become attuned to over the years is the source of funding for the research. The article cites research partially funded by the Chocolate Manufacturers Association which found that certain compounds in chocolate may be good for the arteries.

The author of the article offers the following questions that critical readers should bear in mind when reading about a health study.

For explanation of the last item, the article quotes Dr. Marcia Angell, the former editor of the New England Journal of Medicine: "The breakthroughs are in the first paragraph and the caveats are in the fifth."


(1) Do you agree with the suggestions on the list? Would you add anything?

(2) Do you think the caveats are deferred to later paragraphs because they seem somehow less newsworthy or because they are less well understood? You might follow up on this by comparing some recent news articles with the reports of the original studies in medical journals. Do you find that the caveats are more prominent in the originals?

The fatal flaw in 'random' polls.
CBS MarketWatch, 2 Nov. 2000
Paul Erdman

Erdman is an economist who writes regular on-line columns for MarketWatch. His commentaries can be found at Erdman's World.

His piece on polls appeared just before the presidential election. Erdman states bluntly that "paying attention to political polls in the Year 2000 is a total waste of time. Their methodology is fatally flawed."

He is not talking about the statistical theory of sampling, but rather the methods that polling organizations use to contact their random sample, which he describes as throwing several thousand darts at the nation's phone books and then making calls. Gone are the good old days when most people would answer their phone and politely respond to questions. Nowadays, says Erdman, people use caller ID or answering machines to screen incoming calls, hang up on callers who appear to be soliciting, or simply refuse to pick up the phone during dinner (a popular time for interviewers trying to catch people at home). In light of this, can we really expect that the group ultimately contacted will be representative of the voting population?

Erdman concludes that this "will make watching TV all the more interesting when the real results start rolling in." With tongue in cheek, he calls the election for Ralph Nader.


(1) Polls leading up to the election regularly found that the race was "too close to call." This now sounds almost prophetic. Given the problems cited above, how do you think the polls got it right?

(2) Who do you think is most likely to pick up the phone and answer questions? (Erdman declined to share his own views on this, saying they were not politically correct.) There have been a number of published breakdowns of the vote by age, gender, socioeconomic status and educational level. Should these trends have shown up in the telephone polls?

(3) This reviewer (Bill Peterson) was called three times by the Gallup organization during the weeks prior to the election. Each time the interviewer consulted the computer and then apologized, saying that the quota for New England males had already been reached. What does this tell you about the polling technique beyond the dart-throwing analogy?

Speeding object may have a date with Earth.
The Washington Post, 4 Nov, 2000, A6
William Harwood

Risk to Earth from object in space is lowered; lab recalculates threat of impact.
The Washington Post, 7 Nov. 2000, A9

On September 29, astronomers using the Canada-France-Hawaii telescope on the island of Hawaii discovered an object in space moving on a "near-earth orbit." Researchers at NASA's Jet Propulsion Laboratory in Pasadena, California speculated that the object, named 2000 SG344, might be a small asteroid or part of a discarded rocket. They estimated that the object's trajectory would bring it within 3.8 million miles of Earth in the year 2030. Taking into account the uncertainties in its exact orbit, they calculated a 1-in-500 chance that the object would actually hit the Earth.

The first article quotes Dan Yeomans, the manager of NASA's Near- Earth Orbit project, as saying that, although the calculated probability was small, 2000 SG344 has the best chance of hitting Earth of any object detected to date. He added that "if future observations show the impact probability is increasing rather than decreasing as we expect, then we'll have to make some decisions as to whether we should mount some mitigating campaign."

In fact, the second article reports that the estimated probability of a collision in 2030 had been rather drastically adjusted downward--to zero! Additional observations of the orbit showed that the object would come no closer than 2.7 million miles to the earth in 2030, so there is no chance of a collision then. However, there is now an estimated 1-in-1000 chance of a collision in 2071.


(1) Commenting on the original 1-in-500 estimate, Yeomans said that this figure "is less than the likelihood of us getting hit that same year by an [unknown] object of comparable size." How could he know this if 2000 SG344 has the best chance of hitting Earth of any object yet detected?

(2) Yeomans is quoted in the second article as saying: "As we noted, the most likely scenario was that we will find additional observations that would render this prediction invalid. If there are 499 chances it won't hit and one that it will, new data will almost every time render it invalid." What do you make of this comment?

(3) Should we be worried about 2071?

Fred Hoppe suggested that our readers might be interested in the mathematics of Lottery Wheels sold on numerous lottery websites. Charles Grinstead wrote the following introduction to Lottery Wheels.

Each year, billions of dollars (and pounds, marks, etc.) are spent on lottery tickets throughout the world. One of the most common types of lotteries is called Lotto and is run as follows. The agency running the lottery chooses a set L of p integers from a set S of n integers. For our purposes, we will assume that this last set is the set S = {1, 2, ..., n}. A player then picks a set B of k integers from the set S (typically k = p, but sometimes k < p). The payoff depends upon the size, call it m, of the intersection of B and L. Unless m is at least some number t, the payoff is 0. Of course, since the purpose of the lottery (from the point of view of the agency running it) is to make money, the expected payoffs are less than the price of a ticket.

A lottery wheel is a set W of tickets, each bearing k integers from the set S, with the property that the purchase of the whole set W will guarantee that the player will win some payoff. In other words, no matter what the set L is, at least one ticket will intersect with L in at least t numbers.

This brings up a question of some mathematical interest. Given the numbers n, p, k, and t, what is the smallest size of a wheel W? Let B(a,b) denote the binomial coefficient 'a choose b.' Using some results from the area of design theory, one can show that the smallest size of a wheel is at least


As is almost always the case in design theory, this lower bound is not tight, and the proof that this is a lower bound gives no hint about how to construct a wheel that is anywhere near this size. Many people work on constructions of designs, and some of their constructions have been used to make wheels for various actual Lottos. For example, in the United Kingdom, there is a contest to construct a wheel for the case n = 49, p = k = 6, t = 3. The lower bound given above gives a value of 86.24, but the current record-holder among wheels that have actually been constructed is a wheel of size 154.

It is important to note that the purchase of a wheel does not increase the expected value of the game to the player, since each ticket has a certain expected value, and thus the expected value of a wheel with w tickets is just w times the expected value of each ticket, while the wheel costs w times the cost of a single ticket.

You can read more about the mathematics of lottery wheels and find other references in an by Charles Colbourn "Winning the lottery" included in The CRC Handbook of Combinatorial Designs edited by Charles J. Colbourn and Jeffrey H. Dinitz, pp. 578-584.

Combinatorial problems are notorious for generating wrong arguments. But sometimes there is serendipity in an incorrect solution which may provide an opportunity to introduce some truly interesting probability. This is illustrated in the following example sent to us by Fred Hoppe who recently received some email sent to McMaster University by a student (all correspondence is abbreviated and edited below).

I am a senior high school student taking a finite math course. My teacher has given our class the probabilities for rolling each sum from 1 to 12 on a fair set of dice. According to him, the probability of rolling a seven is three in twenty-one. I am in disagreement and since that class about a week and a half ago, we have been at WAR. He believes in the twenty one point sample space and I think a thirty-six space sample space is the only possibility and the probability is six in thirty-six. The teacher is saying that his probabilities were based on the fact that one die remains constant. As he presents it, if you were to roll a (1,4) say, that is the same as (4,1). But I don't think that that is consistent. It means that it is just as easy to roll combinations with different numbers i.e. (1,4) as it is to roll same number combinations i.e. (1,1). Everyone except myself and another student agrees with him. Here is his sample space. There are 21 points, of which three of which add up to seven.
           (1,2) (2,2)
           (1,3) (2,3) (3,3)
           (1,4) (2,4) (3,4) (4,4)
           (1,5) (2,5) (3,5) (4,5) (5,5)
           (1,6) (2,6) (3,6) (4,6) (5,6) (6,6)
Is there ANY way that conditions would make the sample space out of 21 to roll a seven on a fair set of dice.

Fred thought that this was a great question and felt that this student showed a remarkable maturity in asking whether a 21 point sample space could exist under some circumstances, rather than merely asking which sample space was correct. It touched an area in which he has some expertise (see Super Model), and he responded as follows.

Your last question asking whether there are ANY conditions which would make the sample space out of 21 is a great one and I'm going to give you an answer which may surprise you.

YES, there is a way to generate "dice" which behave this way through something called a POLYA URN. First let's look at ordinary dice in a slightly different way. Instead of dice use a box with six ping pong balls which are identical except for their numbering, which is 1, 2, 3, 4, 5, and 6, one of each. Shake the box and select one ball at random. Return the ball to the box, shake the box again and again select a ball at random (possibly the same one). I think you'll agree that this sampling with replacement represents the tossing of two dice.

Now I'll now give you another set of experimental conditions based on the same box with the same initial six balls. Again, shake and select one ball at random. Return the ball but this time ADD ONE more ball to the box identical (SAME NUMBER) to the ball you selected. For example if you selected a ball numbered 1 then you would return it AND another ball numbered 1 and the box would subsequently contain seven balls, two with the number 1 and one each numbered 2, 3, 4, 5, 6. Shake the box again and make a second drawing (any of the seven balls at random). You might draw the first ball, or you might draw the additional ball you added, or you might draw one of the balls not selected the first time. IT TURNS OUT THAT THE DISTRIBUTION OF THE NUMBERS ON THE TWO BALLS SELECTED IN THIS WAY OBEYS THE SAMPLE SPACE OF 21 ABOVE. This process can be continued inde- finitely and is a special case of a Polya urn, named after George Polya.

You may be able to verify the distribution depending on how much probability you've learned. Here is a proof. To achieve the point (1,1) in your sample space you must select the ball numbered 1 the first time, an event which has probability 1/6. After you return it together with a second ball numbered 1 there are now seven balls in the box of which two are numbered 1. The conditional probability of getting a ball numbered 1 on the second draw given that a ball numbered 1 was obtained on the first draw is therefore 2/7. The probability attached to the point (1,1) is therefore (1/6) x (2/7) = 2/42 = 1/21.

What about the probability attached to the point (1,2)? There are two disjoint possibilities to consider: ball numbered 1 followed by ball numbered 2 or ball numbered 2 followed by ball numbered 1. The first case has probability (1/6) x (1/7) = 1/42 because after the first ball numbered 1 is drawn there will still only be one ball numbered 2 out of the seven (remember you have increased the number of balls by adding one more numbered 1 after the first draw). Likewise the second case has probability (1/6) x (1/7) = 1/42. Because the two cases are disjoint (mutually exclusive) the probabilities add and you get 1/42 + 1/42 = 1/21. In this fashion every one of the 21 outcomes has the same probability.

This sample space also arises in a beautiful way in PHYSICS where it is known as the BOSE-EINSTEIN distribution. In this model the faces of the dice are supposed to represent regions of space and the outcome of each toss represents the location of a particle (a photon or light particle). The 21 point space thus represents two particles distributed into six regions. In contrast the 36 point sample space dice sample space leads to what is called the Maxwell-Boltzmann distribution in physics. Of course in physics the numbers involved are immense.

I commend you and your friend for wanting to get to the bottom of this and for your curiosity in wondering whether the 21 point sample space could somehow exist."

Readers may recognize the dice as being two exchangeable random variables. Exchangeability is often introduced in a probability course using {0,1} valued random variables, for instance the outcomes of the successive tosses of a coin having a random probability of landing heads. An infinite sequence of exchangeable random variables has a representation as conditionally independent and identically random variables (this is De Finetti's theorem) and an elementary proof is possible for {0,1} valued random variables by moment calculations (William Feller -- "An Introduction to Probability Theory and Its Applications" Vol II, 2nd edition, pp 230 ff). Feller's Volume I has an excellent discussion of the Bose-Einstein and Maxwell- Boltzmann distributions. This sample space was the first time Fred had thought about exchangeability in the context of dice and he felt the example had great pedagogic value, especially in view of the incorrect counting argument.

Unfortunately he then received this final reply.

Dear Prof. Hoppe,

Thank you very much for all your time. It was more than expected and we appreciate it enormously. My teacher is quite a character and I don't necessarily mean that in a good way, but we won't get into the details. I think he is nearing retirement and his exhaustion is beginning to show. I just wanted to say that I am thankful for all you've done, but although I can see what I want him to see, I must not be explaining it properly, because he won't budge. I hate to say it, but I give up. He is a little too unprofessional for me. Half the time he won't even let me speak.


(1) Formulate a similar problem where the number of outcomes is smaller, say the tossing of two fair coins. The analogue of the 21 point dice space is the three point coin sample space (coding 1 for heads and 0 for tails)

        (1,2) (2,2)

while the analogue of the 36 point dice sample space is

        (1,1) (2,1)
        (1,2) (2,2)

The teacher's and the student's arguments should not depend on how many "sides" the dice have. The chance of getting two heads is 1/3 in the first case and 1/4 in the second case. Do you think that this example is more compelling than the dice and would make a difference in ending the war?

(2) Does the color of the dice make a difference?

(a) What do you think of the following argument the student might make? The probabilities should not depend on the color of the dice so for instance (red 1, green 6) is different than (red 6, green 1), meaning a 36 point sample space is appropriate.

(b) On the other hand suppose that the dice were absolutely identical, that you rolled them with your eyes closed and then opened your eyes to observe the outcome. Is (1,6) still different than (6,1)?

(3) With a spreadsheet such as Excel you could simulate and keep track of the cumulative proportion of "sevens". How many simulations would you need to make in order to decide empirically as a scientist which sample space is the more appropriate?

(4) How would the probabilities for various games of chance change with such dice? For instance, calculate the probability of winning in craps for a pass line bet and a no pass line bet using such dice. These are just a shade under 0.5 for fair dice.

Chance News
Copyright (c) 2000 Laurie Snell

This work is freely redistributable under the terms of the GNU General Public License as published by the Free Software Foundation. This work comes with ABSOLUTELY NO WARRANTY.


CHANCE News 9.11

October 21, 2000 to November 27, 2000