CHANCE News 10.04

March 12, 2001 to April 18, 2001


Prepared by J. Laurie Snell, Bill Peterson, Jeanne Albert, and Charles Grinstead, with help from Fuxing Hou and Joan Snell.

Please send comments and suggestions for articles to

Back issues of Chance News and other materials for teaching a Chance course are available from the Chance web site:

Chance News is distributed under the GNU General Public License (so-called 'copyleft'). See the end of the newsletter for details.

Chance News is best read with Courier 12pt font and 6.5" margin.


That the ten digits do not occur with equal frequency must be evident to any one making much use of logarithmic tables, and noticing how much faster the first pages wear out than the last ones.

Simon Newcomb 1881


Contents of Chance News 10.04

The 2000 Chance Lectures can now be viewed from the Chance web site under Video and Audio. The lectures are:

Robert Hamman, SCA Promotions, Inc.
Risky Business

Mark Nigrini, Southern Methodist University
Benford's Law - It's a Secret Law of Numbers
(increasingly being used by auditors)

Susan Ellenberg, Federal Drug Administration
Coincidence, Bad Science, and Other Pitfalls in the Regulation of Medical Products

Matthew McGue, Department of Psychology, U.of Minnesota
Quantifying the Nature-Nurture Debate

Jonas H. Ellenberg, Westat, Rockville, Maryland
Medical Research - Study Results vs. Study Conclusions:
A cautionary guide

Susan Holmes, Statistics Department, Stanford University
Probability by Surprise: Teaching with Paradoxes and animations

Bill Rhodes, ABT Associates Inc., Cambridge, Mass.
How Many Hardcore Drugs Users? Counting Hard to Reach Populations.

We will not be able to provide CD-ROM of these lectures but we will make it possible to download these videos for those with a PC and a reasonably fast internet connection.

If you would like to have a CD-ROM of the 1997 and 1998
Chance Lectures that are also available on the Chance web site
send a request to


with your address. There is no charge.

It was pointed out to us that our translation of Forsooth! as "for shame!" is not correct. Here is what the OED says.

Obs a. In truth, truly. Also in phrase, forsooth to say, forsooth and forsooth (cf. verily, verily), forsooth and God. Obs. b. Now only used parenthetically with an ironical or derisive statement.

Here are some Forsooth! items from the March 2001 issue of RSS News:

Stuart Wheeler's gift of pounds 5m to the Conservative Party ... more than matches the gifts of pounds 2m each pledged by three donors to the Labour Party.

21 January 2001

What are his (Jack Nicklaus') chances? Mathematicians claim that if you toss a penny and it comes down 'tails' 10 times in a row the odds on it coming down 'tails' on the eleventh occasion are still 50/50. If this is true, which I humbly beg to doubt, then the chances for Nicklaus must be quite good. He will after all, be a clear favorite for all four events.

Mathematically then, if he wins the Masters he will have just as good a chance of winning the US Open, and these titles under his belt will in no way diminish his prospects at Muirfield, and so on. On that reasoning the odds against his winning all four are the same as against his winning any one of them, which is to say about seven to one, For myself, I might be tempted to risk a few of these horrible new pence at 200-1.

21 Jan 1972 (reprinting 7 Jan 20001)

The number of Elvis look-alikes is spiraling out of control. US film maker Gordon Forbes said: The number of impersonators has risen from 150 in 1977 to 85,000 this year. At this rate a third of the world's population could be Elvis by 2019.

Charles Grinstead suggested the following Forsooth item:

Psychologists are split on the notion that bullying can be an underlying cause to a violent response... J. Brien O'Callaghan, a clinical psychologist in Bethel, Conn., cited studies that showed 15 percent of schoolchildren were routinely bullied in any year. "So if bullying were the cause," Dr. O'Callaghan said, "we would have millions of school shootings each year."

New York Times
19 March 2001


Joan Garfield wrote us about an interesting web site:
Stat Labs: Mathematical Statistics through applications

The typical mathematical statistics course does not seem to be consistent with the recommendations of the statistical reform program for the introductory statistics course. This web site provides materials for a course called Stat Lab, developed by Professors Nolan and Speed at Berkeley, which is more in the spirit of statistical reform program. Here is their description of the course:

Stat Lab integrates the theory of statistics with the practice of statistics through a collection of case studies, which we call labs. Each lab introduces a problem, provides some scientific background, suggests investigations for the data, and provides a summary of the theory used in the investigations

Kevin Eva (McMaster University) send us the following example of hindsight bias. Kevin writes:

Tiger Woods just won the Masters tournament in golf. He became the first to win all 4 major championships in a row, 3 last season and now the Masters this past weekend. Debate has swirled regarding whether this should qualify as a "Grand Slam" or doesn't count because it didn't happen within one season.

CNNSI.com polled its readers before and after the tournament regarding whether or not they believed this feat should count as a grand slam. Here are the results.

Q. If Tiger Woods wins the Masters, should he be credited with a Grand Slam? 26117 votes: Yes 38%, No 62%


Q. Now that Tiger Woods has won the Masters, should he be credited with a Grand Slam? 26719 votes: Yes 58%, No. 42%                                         <<<========<<

The tree of me.
The New Yorker, March 26, 2001, pp. 58-71
John Seabrook

John Seabrook gives an entertaining account of present methods for determining one's ancestry--including his own--using DNA. He briefly describes recent uses of a Y-chromosome test (the Y-chromosome is passed through the male line and rarely mutates), including identifying descendants of Thomas Jefferson and Sally Hemmings, and members of the ancient Jewish priestly class, the Cohanim. In much of the article Seabrook describes his efforts to discover his own genetic family tree by using a commercial product, Y-Line (found at "We put the gene in genealogy"), that was developed and sold by Oxford geneticist Bryan Sykes. More information on Sykes' methods and analysis of his own genetic family tree is in Chance News 9.06

One can also trace ancestry through the maternal line by using mitochondrial DNA (mtDNA), the method used to show that skeletons found in Siberia in 1991 were members of the Romanov royal family. In theory, Seabrook writes, one's mtDNA could be used to determine both living relatives, "and most weirdly, mtDNA data offer living people connections to female ancestors who lived tens of thousands of years ago." It is mtDNA analysis that is behind the "Eve ancestor" theory proposed in 1987: that all humans share mitochondrial DNA that traces back to a single African woman who lived 200,000 years ago. In addition, 28 major branches of the mitochondrial family tree--"Daughters of Eve"--have been identified. If your background is European, you can even get Bryan Sykes (for $180) to determine which of seven European Daughters of Eve is your ancestor.


(1) Seabrook writes, "The rate of false paternity in the United States is estimated to be between two and five percent--not large, but over ten generations the likelihood that a bloodline suffers what geneticists refer to as a 'non-paternity event' could approach fifty per cent." Explain.

(2) How do you think the Eve ancestor theory was developed?

(3) Seabrook says that, when he first went to Sykes' website and looked over the material on the Daughters of Eve, his reactions were, in quick succession:

(a) What a scam!
(b) How many people are actually paying for this?
(c) It would be kind of interesting to know which of the Daughters I am descended from.
What do you think? Would you pay for this service?

John Turner (US Naval Academy) sent the following contribution including the discussion questions:

Testing missile defense.
Washington Post Magazine, 18 Feb, 2001
Letter from Christopher B. Earls, National Research Inc. Washington

Commenting on an article by Bradley Graham about the nuclear missile defense system (Washington Post Magazine 10 Dec.) Earls writes:

Even with an interceptor enjoying only 60% accuracy, firing four interceptors would reduce a warhead's chance of penetration to 2.5%.


(1) Can you confirm his calculation?

(2) What assumptions is he making in his calculation?

(3) How might these assumptions be false and how might this affect the final probability of a kill?

David Siev (USDA Center for Veterinary Biologics) suggested the next topic:

Five Britons' mad cow deaths traced to butchering methods.
New York Times, 21 March 2001, A7
Warren Hodge

UK investigators put forward theory for vCJD cluster.
The Lancet, 24 March, 2001
Haroon Ashraf

Five people in the English village of Queniborough in Leicestershire in England died from variant Creutzfeld--Jakob Disease (vCjD) disease between August 1998 and October 2000.

The incurable Cretuzfeldt-Jakob disease is a brain-wasting malady that humans get from eating the meat of animals with bovine spongiform encephalopathy (BSE), or mad cow disease. The disease was identified in 1995 and there have been 81 deaths attributed to this disease since then.

The Lancet article reports the results of a study carried out by the Leicestershire Health Authority to try to find an explanation for the apparent cluster of deaths in Queniborough.

In the first part of their study, researchers surveyed people involved in the local meat and dairy trade in the 1980's and spoke to parents of children of similar age to the vCJD cases about their children's eating habits during the 1980's. They then spoke to relatives of the vCJD cases to examine possible sources of exposure to the BSE agent.

These investigations suggested that there was an association between cases of vCJD and the consumption of beef purchased from butchers who sold meat that butchered by an older form of slaughtering technique which is no longer being used. This traditional slaughtering technique worked with carcasses that still had the animal's head attached and could allow brain matter from animals with mad cow disease to pass to cuts of meat through the use of common knives etc.

The investigators then tested this theory with a case-control study. They gave a questionnaire to a relative of each of the five cases. They gave the same questions to each of 30 age-matched controls--six for each case. They then interviewed all the sources of meat identified by the controls to see how the meat was slaughtered.

They found that four of the five people with vCJD bought and consumed beef from one of two butchers during the early 1980s, both of which sold meat prepared by the traditional method. They found that the control group used 20 butchers and only three of these butchers used the traditional slaughtering method, one of which was used by those with vCJD.

While this finding is being taken seriously, experts point out the limitations of the study. These include the small numbers involved and possible "recall bias" when people are asked about eating habits 20 years ago.

Critics also said that it is hard to understand why there was only one such cluster, since 9% of UK butchers removed brain tissue from carcasses in the early 1980's.

One explanation given was that the incubation period before an infected person becomes ill is 10 to 15 years and most of the cases considered occurred before the epidemic peaked. This led one investigator to suggest that future will bring other clusters linked to the older methods of slaughtering.

We asked David Siev what a cluster means and how one determines it is due to chance. He sent the following interesting reply:

What does a cluster mean is always an interesting question. I don't think it can be answered by statistical methods alone, which is what is implied by the report's statement that "this result is statistically significant and is therefore very unlikely to be a chance finding." In my view, the possibility that something is a chance finding is never ruled out by "statistical significance", as many seem to believe. But the more serious concern in observational studies is bias due to unobserved confounders which can lead to an apparent effect that may be opposite the true effect (Simpson's paradox). Shoe leather epidemiology is an art relying on good judgment and a multidisciplinary approach.


How would you answer the question: What is a cluster and how do we determine if it due to chance?

Index charts growth in diversity;
Despite 23% jump, segregation is stilling on, researchers say.
USA Today, 15 March 2001. 3A
Haya El Nasser and Paul Overberg

USA Today has developed a number called the Diversity Index, which it computes from Census data to measure the racial and ethnic diversity of the population. In 1990, the index was 40; in 2000 it is 49. The article interprets this to mean there is "a 49% chance that two individuals are different." USA Today attributes the increase over 1990 to the growth of the Hispanic population. Unfortunately there is not much detail on how the index is actually calculated. The article states that:

The USA TODAY Diversity Index is based on each of the five race categories recognized by the federal government and what percentage they are of the total population.

The categories are: white, black, Asian, American Indian and Native Hawaiian. The index also is based on the percentages of Hispanics and non-Hispanics, who can be of any race.

We found a much more complete discussion of the index on a web site entitled " Reporting Census 2000: A Guide for Journalists," which is maintained by Professor Stephen K. Doig of the Cronkite School of Journalism at Arizona State University.

There you can find a wealth of useful on-line tools and references for working with Census data, as well as commentaries on current issues such as congressional redistricting and reapportionment. For the diversity story, see Diversity Index, which links to a discussion entitled "Updating the USA TODAY Diversity Index," by Phil Meyer and Paul Overberg (January 2001). We learn here that the index was introduced by USA Today after the 1990 Census ("Analysis puts a number on population mix," USA Today, 11 April 1991). The equation applied to the 1990 data was:

    1 - ((W^2 + B^2 + AmInd^2 + API^2) * (H^2 + NonH^2))

We can now get some idea of what is going on. The formula gives the probability that two randomly chosen individuals are different by computing one minus the probability that they are the same. The variables 'W', 'B' 'AmInd' 'API' are the probabilities of white, black, American Indian, and Asian/Pacific Islander, and are computed as the respective proportions of the total population. Then 'H' is the probability of Hispanic, and 'NonH' = 1 - H. The first sum of squares computes the probability that two people are of the same race, the second computes the probability that they are either both Hispanic or both not. But multiplying the two sums treats Hispanic ethnicity as if it were independent of race, which is a questionable assumption.

There is one last modification: there was an "other races" category. This was handled scaling the race probabilities W, B, AmInd and API to sum to 1 (in other words, these are actually conditional probabilities). As described in the article, this amounts to applying the known categories proportionally to the unreported cases. Armed with this information, we can now match the index calculated from 1990. The national breakdown reported in the 1991 USA Today article was:

 White  80.3%
 Black  12.1%
 Hispanic  9.0%
 Asian  2.9%
 Native American  0.8%
 Other races  3.9%

Thus H = .09, W = (80.3/96.1), B = (12.1/96.1), AmInd = (0.8/96.1), and A = (2.9/96.1). Substituting these into the preceding formula gives an index of .402, which USA Today converted to a percentage and reported as 40.

Census 2000 presented two changes. First, the API group was split into two subcategories: Asian (A) and Native Hawaiian/Other Pacific Islander (PI). This is easily accommodated by moving from four to five terms in the race factor. The added complication is that the Census now allows respondents to report more than one race. The discussion on the Cronkite School web site describes three possible options that were considered (1) including 63 terms in the race factor, one for each possible multi-race combination; (2) adding a sixth term to the race factor, representing a single "multi-race" category; and (3) simply applying the formula with 5 race categories, ignoring the fact that the race "probabilities" could now sum to more than 1. Option (3) would inflate the calculated probability of a match, thus decreasing the estimated probability that two randomly chosen people are different.

The actual 2000 formula uses none of the preceding options. The description on the web says it lets mixed race people "default to diversity." The formula is:

    1 - ((W^2 + B^2 + AmInd^2 + A^2 + PI^2) * (H^2 + NonH^2 ))

Now 'W', 'B', 'AmInd', 'A' and 'PI' represent proportions single category responses. There is no term for the multi-race response; this effectively declares a pair of multi-race persons to be a non-match. Option (2) is the other extreme, always treating such a pair as a match. The web site presents some sample calculations to show that if small percentage of the population declares itself multiracial (5 percent in the example), the diversity index is not greatly affected (first two decimal places unchanged). The authors point out the additional attraction that the 2000 formula is only a minor modification of the 1990 formula.


(1) As described in the on-line paper: "The index uses two basic principles of probability theory: (a) to obtain the probability that all of several independent events will occur, multiply their separate probabilities and (b) to obtain the probability that at least one of several independent events will occur, add their separate probabilities." This is not quite right. Where is the error?

(2) The first option considered for the 2000 update refers to 63 race combinations. How was the 63 computed?

(3) An article in the St. Louis Post-Dispatch (Diversity eludes most of region; Census 2000: who we are. 16 March 2001, p. A11) gave the following characterization of the diversity index:

The Diversity Index, developed by USA Today, takes racial and ethnic proportions for any area and computes a single number from 0 to 100. The index shows the probability that two people, chosen randomly, are different racially and ethnically. An area made up of all one race and ethnicity would have a diversity index of 0. An area that is 50 percent white and 50 percent black would have a diversity index of 25. An index of 100 would only occur if every person was of a different race and ethnicity.

Do you agree with these calculations?

(4) The San Francisco Chronicle (S.F.'s diversity comeuppance, 1 April 2001, p. A1) lamented the city's #13 ranking in diversity among the 65 largest US cities. Particularly galling was the #1 ranking of Long Beach, CA, whose diversity index was 79.4. The Chronicle described the index by saying that "essentially, it predicts how often the next person you meet walking down the street will be different from you." What do you think of this characterization.

(5) What is the largest possible value of the Diversity Index using the 2000 formula?

Admissions test courses help, but not so much, study finds.
The New York Times, March 25, 2001, page 16
Gina Kolata

The effect of admissions test perperation: Evidence from NELS:88
Chance Magazine, Vol. 14, No. 1, 2001
Derek C. Briggs

The New York Times article is a brief summary of the information presented in the SAT study, along with comments from various interested parties. (For other commentary on this topic, see Chance News 7.11.)

Derek Briggs is a graduate student at The UC Berkeley School of Education and apparently is not affiliated with either the ETS/College Board or any test preparation company. Most previous studies have been conducted by researchers with such affiliations, and their results have not been surprising. [However, for an interesting account of coaching research commissioned by ETS that was suppressed because parts of the math SAT were found to be coachable, see the book, The None of the Above: The Truth Behind the SATs, by David Owen and Marilyn Doerr, September, 1999, Rowman Littlefield Publishers. Also, the book, Standardized Minds: the High Price of America's Testing Culture and What We Can Do to Change It, by Peter Sacks, Perseus Books, Cambridge, Mass., February 2000, considers standardized testing of all kinds.] As the title indicates, Briggs found quite modest test score gains: about 14-15 points for the math portion, and only 6-8 points on the verbal portion.

Briggs used survey data collected by the National Education Longitudinal Survey of 1988 (NELS:88), which tracks a "nationally representative" sample of students from the 8th grade through high school and beyond. He writes,

The test preparation indicators used in this study were created from the following item in the NELS second follow-up questionnaire:

To prepare for the SAT and/or ACT, did you do any of the following?

A. Take a special course at your high school.
B. Take a course offered by a commercial test preparation service
C. Receive private one-to-one tutoring
D. Study from test preparation books
E. Use a test preparation video tape
F. Use a test preparation computer program

When I (Jeanne Albert) read this question, it was clear that it was included on the survey only to determine if students were preparing in some way for the SAT/ACT. In particular, one certainly can't tell how long a student spent doing any of these things. Even Donald Powers of ETS, who wrote a response to the study at the request of Chance Magazine, admits that, "No distinction is made among the wide variety of commercial test preparation services, which clearly differ with respect to cost, emphasis, and time required of students."

Briggs further concentrates on only those students who took a commercial course, because he considers other types of preparation not "coaching". His outcome of interest is the difference between PSAT and SAT/ACT scores--he uses the PSAT as a proxy for the SAT/ACT--given that a student has or has not been coached. One problem here that caught my attention is that students who weren't coached may well have studied in some other way for the test, and Briggs doesn't appear to take this into account.


(1) Briggs includes in his analysis students who took the PSAT, prepared for the SAT, but then didn't take the test (or hadn't at the time they completed the survey), as well as students who didn't take either test. I agree with Briggs comment that these "populations are of interest if there is reason to believe that some or many of these students had college aspirations but self-selected themselves out of the other sample populations because they expected to do poorly on the SAT or ACT. In theory, at least, if test preparation activities are effective in the short run, these are the students who might have had the most to gain from them." However, he later writes: "...any study seeking to evaluate the effectiveness of test preparation activities using only the sample of students taking admissions tests is likely to be biased upward, depending on the number who opt out of such tests." What is he saying here? Aren't these statements contradictory?

(2) The SAT (more precisely, the SAT I) has changed since the NELS survey information was gathered, and presumably so have the test preparation companies. Although Briggs acknowledges this, he still states that his analysis "suggests unequivocally (!) that the average effect of coaching is nowhere near the levels previously suggested by commercial test preparation companies", and that "with respect to the NELS dataset, there is no evidence that commercial test preparation makes much of a difference..."

(3) For his study, Briggs considers students to have been coached if they "received systematic instruction over a short period of time." This he interprets to mean they enrolled (completed?) in a commercial test preparation service designed especially for the SAT or ACT, but not offered by their school. He excludes books, videos, etc, because there's no time constraint, and he excludes private tutoring because it might not be systematic. He doesn't explain why he excludes test preparation given by a school. In any event, his "definition" seems rather arbitrary.

(4) What would constitute a well-designed study of the effects of coaching on SAT scores? Why aren't they being done?

David Dorman suggested the following article.

Companies turn to grades and employees go to court.
The New York Times, 19 March 2001, A1
Reed Abelson

The article reports that a number of companies have adopted grading systems to rank employees, often using a bell curve or other scale that forces a certain percentage of employees to be place in each category. Supervisors at General Electric are required to identify the top 20 percent and the bottom 10 percent of their employees. Receiving the bottom ranking is especially grim in light of the following comment by GE's top executive, John Welch: "A company that bets its future on its people must remove that lower 10 percent and keep removing it every year--always raising the bar of performance..."

One feature of the bell curve noted by critics is that only a small percentage get high or low rankings, while the majority get grouped in the middle. Ford Motor Company, for example, grades employees on and A-B-C scale. Last year, 10 percent got A's, 10 percent got C's, while 80 percent got B's. Older worker have claimed in lawsuits that the system is being used to discriminate against them.

Peter Browne, a former Microsoft executive, is now suing the company for discrimination. He explains that he once had to rate a group of five employees on a curve. The results were used to determine who got stock options. Browne claims he was given no guidance other than to meet the curve.

The article concludes with a quote from a consultant who said that companies "do need something in making pay decisions, downsizing decisions," but noted that "they can get it very wrong."


If your boss demanded that you rate five people, giving 10 percent A's, 80% B's and 10% C's, what would you do?

Why mathematicians now care about their hat color.
New York Times, Science Times D5, 10 April 2001
Sara Robinson

This is a full-page story about a simple probability puzzle stated as:

Three players enter a room and a red or blue hat is placed on each person's head. The color of each hat is determined by a coin toss.

No communication of any sort is allowed, except for an initial strategy session before the game begins. Once they have had a chance to look at the other hats, the players must simultaneously guess the color of their own hats or pass.

The puzzle is to find a group strategy that maximizes the probability that at least one person guesses correctly and no-one guesses incorrectly.

The naive strategy would be for the group to agree that one person should guess and the others pass. This would have probability 1/2 of success. Of course it would not be a puzzle if this were the best you can do. Readers who like puzzles should not read further and try to find a strategy with a greater than 50% chance for success.

For those still reading, the following strategy achieves a probability of 3/4 of success.

Each player looks at the hats of the other two players. If they are different colors he passes. If they are the same color he guesses that his hat is the other color.

We show that this strategy has probability 3/4 of success.

If all three have the same color everyone will guess the other color and be wrong, so the strategy will fail. This will happen in 2 out of the 8 possible ways the hats can be assigned. Thus it happens with probability 1/4.

Assume that two people have the same color and one has a different color, for example there are two red hats and one blue hat. Using our strategy, the one with the blue hat will see two red hats and say, correctly, that he has a blue hat. The two with red hats will see two hats with different colors and pass. Thus the strategy will be successful. This will happen in 6 out of the 8 ways the hats can be assigned so the probability of success using this strategy is 3/4.

The article goes on to say that the same problem with n members in the group has been solved for n = 2^n-1 and that the solution was suggested by coding theory--specifically by the Hamming code. Peter Winkler provided us with the following proof for the case n = 2^k-1. We illustrate the solution for the case n = 7 corresponding to k = 3.

Consider the eight ways you can write down a sequence of three binary sequences: 000, 001, 010, 011, 100, 101, 110, 111. Peter calls these "nimbers". Define additions of two nimbers by ordinary addition mod 2. Thus, for example, 001 + 101 = 100, 010 + 101 = 000 etc.

In their strategy session each player is assigned one of the seven non-zero nimbers and it is put on his hat for the others to see. Now use the following strategy:

Each member of the group adds the nimbers of those who have red hats. If the sum of these nimbers is 000 then he says he has a red hat. If the sum equals his number he says he has a blue hat. Otherwise he passes.

Assume first that the sum of the nimbers for all those who have red hats is 000. Then a person with a blue hat gets 000 when he adds the nimbers of all those he sees with red hats. Thus he says that he has a red hat and is wrong. For a person with a red hat, the sum of the numbers on the red hats he sees will equal his nimber and so he will guess that he has a blue hat and also be wrong. Thus, if the sum of the nimbers for those with red hats is 000 everyone guesses incorrectly. The probability this happens is 1/8 (See discussion question).

Assume now that the sum of all the nimbers on red hats is not 000, say it is 010. Consider a person with a red hat. If the sum of the nimbers on the red hats he sees is 000 he will guess red and be correct. Note in this case his nimber must be 010. The only other case in which he would guess would be if this sum is his own number. But this cannot happen since it would mean that the sum of all nimbers of those with red hats is 000 and we have assumed this is not the case.

Consider next a person with a blue hat. Then the sum of the nimbers on the red hats he sees is 010 so the only case in which he would guess would be if he has the number 010. In this case he would guess blue and be correct.

For the strategy to be successful we must show that at least one person will guess correctly. But looking at these last two cases we see that whoever has the number 010 will be correct whichever color his hat is.

We will now give an argument that shows that the strategy we have just given is optimal(in the sense that no other strategy can give a higher probability of winning).

There are 128 different arrangements of colors of hats. We assume that S is any strategy which prescribes a specific response for each player. Under S, given any arrangement, some players will guess their hat colors, while others will remain silent. Under any strategy, any player who guesses his hat color will be right half of the time, since he only knows the colors of the other hats, and the hat colors are independent.

Suppose that among all 128 arrangements, and under a strategy S exactly x players make correct guess about their hats' colors (by the argument above, this means that exactly x players make an incorrect guess as well). Then there can be at most x arrangements which result in a win, and this can happen if and only if each correct guess occurs on an arrangement in which, under strategy S, all of the other players are silent. Similarly, the smallest number of arrangements which result in a loss is x/7, and this can happen if and only if, in any arrangement in which one player guesses wrong, they all guess wrong, and there are no arrangements in which all players are silent.

Thus, under any such strategy S, the best that can be done is to have exactly x arrangements that result in a win, and exactly x/7 arrangements that result in a loss. Therefore, the probability of a win is at most x / (x + x/7) = 7/8. Since the strategy we proposed achieves this probability, it is optimal. The same argument works using expected values for strategies such as the naive strategy in which players choices depend on a chance experiment.

The same arguments we used for n = 7 work for any n of the form n = 2^k - 1.

The article states that there is no known solution for general n but other values of n have been solved using other coding techniques.


(1) In order to determine the probability of "winning" for the strategy given, it suffices to show that all possible sums of distinct nimbers are equally likely (why?). Specifically, one can show that the number of sums that equal 000 is the same as the number of sums that equal any other nimber (say 101) as follows: Given a sum of distinct nimbers that equals 000, either 101 is in the sum or it isn't. If it is, then the remaining nimbers sum to 101; if it isn't, then adding 101 to the given sum also yields a sum of 101. Now use these observations to set up a one-to-one correspondence between: the set of all sums of distinct nimbers that sum to 000, and the set of all sums of distinct nimbers that sum to 101.

(2) How much of all this do you think you could explain to your Uncle George?

Gregory Kohs, Editor of the American Cynic sent us the following remarks about a new Powerball option:
The Powerball lottery has a new feature called "Power Play". Excerpted below is the description from their website. I'm interested especially in the statement, "Using the '1' allows us to add twice as many '5's'!" Because the lottery can basically control ANY of the prize-levels it sets, isn't this a bit preposterous to make a statement like that? I strongly suspect that taking the Power Play option is to the DISADVANTAGE of the player, just as the lottery is, unless the payout moves above $80 million, where it would "pay" to just buy all 80 million ticket combinations.

Here is the Powerball description of their new option.

For an extra $1 per Powerball play, players can now multiply their Powerball prizes by 1,2,3,4 or 5 times the original prize amount (for all prizes except the jackpot). A spinning wheel with 12 slots numbered 1 through 5 (there are two 1's, two 2's, two 3's, two 4's, and four 5's) has been added to Powerball's Wednesday and Saturday drawings to select the Power Play number. If any of the player's numbers match the winning numbers and the Power Play was purchased, the set Powerball prize amount (except the jackpot) will be multiplied by the Power Play number. If the Power Play number is "1", the prize amount does not increase. Using the "1" allows us to add twice as many "5's"!

In the Powerball lottery you buy a ticket by choosing five distinct numbers from 1 to 49 and a bonus number from 1 to 42. The bonus number does not need to be distinct from the other five numbers. The lottery officials then choose 5 balls from a drum containing white balls numbered from 1 to 49 and one ball from a drum containing red balls numbered from 1 to 42. The amount you win depends on how your numbers match those on the balls drawn.

We wrote a lottery profile for our Chance web site (see Teaching Aids/Profiles of Topics). We gave there the following results for the probabilities for various ways that you can win in the Powerball lottery. Let C(n,r) be the number of ways to choose r objects from n. Then the number of possibilities for the winning numbers chosen by the Lottery officials is 42*C(49,5) = 80,089,128. To find the probability for each possible way to win we have only to count the number of outcomes that give the necessary matches and divide by this number. Doing this we obtain:

 Need to match  Prize  Number of ways to win  Probability of winning
 5W + R  Jackpot  1  .00000001248
 5W  $100,000  41  .00000051193
 4W+R  $5,000  220  .00000274693
 4W  $100  9020  .00011262452
 3W+R  $100  9460  .00011811840
 3W  $7  387860  .00484285458
 2W+R  $7  132440  .00165365766
 1W+R  $4  678755  .00847499551
 0W+R  $3  1086008  .01355999281

Leaving out the jackpot we find that the expected winning from the smaller prizes is .20805703. If you pay the extra $1 the expected amount that this will be multiplied by is:

1*1/6 + 2*1/6 + 3*1/6 + 4*1/6 + 5*1/3 = 10/3

Thus your expected winning from the extra dollar invested is .20805703*10/3 = .69352343. Gregory asks if this is better payoff than your original dollar to buy a ticket. This expected payoff depends on the size of the jackpot.

From our lottery profile we find that your expected winning, if you just buy a $1 ticket, as a function of size of the jackpot is:

               Jackpot        Expected winning
               in millions

                 10                .333
                 20                .457
                 30                .583
                 40                .707
                 50                .832
                 60                .957
                 70               1.082
                 80               1.207
                 90               1.332
                100               1.457

Thus for jackpots less than 40 million the extra dollar spent on the Power Play is a more favorable bet than buying a single ticket. Of course both bets are unfavorable.

The suggestion that games with jackpots more than 70 million are favorable is a little misleading since we have to pay income tax on the jackpot. For example if we pay 28% income tax then a 80 million jackpot is really 62.4 million which makes the lottery again an unfavorable game. Also when the jackpot get this large you have to take into account the possibility that you will have to share the jackpot. When everything was taken into account we estimated that a jackpot of 160 million was still a slightly unfavorable game. In fact, by our calculations, there has not yet been a jackpot big enough to make the Powerball a favorable game.


(1) Do you agree with Gregory that the lotteries claim: Using the 1's allows us to add twice as many 5's! is kind of silly.

(2) Why do you think people play the lottery?

How many shuffles to randomize a deck of cards?
Lloyd N. Trefethen and Lloyd M. Trefethen
Proc. Roy. Soc. Lond. A (2000) 456, 2561-2568

Information loss in card shuffling.
Dudley Stark, A. Ganesh, and Neil O'Connell
HP Labs Technical report

In 1992, Bayer and Diaconis showed that, in a fairly precise sense, the number of riffle shuffles needed to randomize a deck of n cards is (3/2)log(n). (All logarithms in this discussion are to the base 2.) For a 52 card deck this suggested the famous: seven shuffles suffice. Their paper appeared in the Annals of Probability, volume 2:294. One can also read a proof of this result in Section 3.3 of the book Introduction to Probability, by Grinstead and Snell (this book can be found on the Chance website under Teaching Aids/Books and Articles.)

How do we measure how random a deck of cards is? In fact, we don't measure the amount of randomness in a particular arrangement of the deck; rather, we measure how random the process is that we are using to rearrange the deck. In the present case, the process consists of k riffle shuffles. Although we will not give it here, there is a beautiful mathematical model that seems to do a fairly good job of describing what actually happens when someone riffle shuffles a deck of cards (see the above references for a description of this model). A process that outputs an ordering of a deck of n cards can be said to be perfectly random if each possible ordering has the same probability of occurring, namely 1/n!. If one starts with a deck in some given order and performs k riffle shuffles, it is easy to believe that this process will not produce every possible ordering with equal probability. To measure how far this process is from perfect randomness, we sum, over all possible orderings, the absolute value of the difference between the actual probability of the ordering and 1/n!.

Bayer and Diaconis showed that, as we increase the number of riffle shuffles, the process does not march steadily towards perfect randomness. Rather, there is a "cut-off" phenomenon, which can be described as follows. For large n, if k is around 1.4*log(n), the process is not even close to being random. However, if k is around 1.5*log(n), each subsequent riffle shuffle brings the process twice as close to perfect randomness as it was before. Thus, as k increases through the interval [0, 1.5 log(n)], the distance from perfect randomness is essentially constant, but near the end of this interval, this distance falls off exponentially towards 0.

The two articles being reviewed here use a different measure of how close a process is to the perfectly random process. This measure is the entropy of the process, a quantity that we will now define. If the process always produces the same outcome, there is no uncertainty, and we would like to define the entropy, or uncertainty, in this case to be 0. On the other hand, in the perfectly random process, each outcome is equally likely to occur, so this process should have the largest possible entropy. To make this quantitative, let p(1), p(2), ..., p(m) be the vector whose entries are the probabilities that each of the m possible outcomes occurs. Then the entropy of the process is defined by

     Entropy = Sum(j = 1 to m) (- p(j)log(p(j))).

In the case of shuffling a deck of n cards, the entropy of the perfectly random process is log(n!).

Repeated riffle shuffling can be thought of as the progressive loss of information; the question then becomes: How does the entropy of the process increase as k increases? The first paper gives some numerical results that suggest that there is no sharp cut-off phenomenon, as there was using the previous distance function. Specifically, the authors give evidence that each shuffle increases the entropy by about n, until k is about log(n). Since log(n!) is approximately nlog(n), this means that, after log(n) shuffles, the process has almost reached perfect randomness.

The authors call this phase the linear phase of the shuffling process. For k between log(n) and (3/2)log(n), each time k is increased by one, the distance to perfect randomness under this measure is decreased by roughly a factor of 4. The authors call this phase the exponential phase.

The second paper proves all of the results described above.

It is worth noting that most of these ideas were known to Peter Doyle of Dartmouth College in 1996 and were communicated by him to one of the authors of the first paper at that time. (The authors graciously acknowledge this fact in their paper.)

Digit Analysis Using Benford's Law: Tests & Statistics for Auditors,
Global Audit Publications, Vancouver, BC
Mark J. Nigrini

It is often rewarding though less often enjoyable to read a technical book. But this book is both technical and a pleasure to read. It tells the story of Mark Nigrini's crusade to convince the world of accountants that Benford's distribution for the leading digit of "natural" data can be useful in detecting fraud.

Nigrini tells us how he learned about Benford's distribution, how he came to write his Phd thesis on the use of this distribution to detect fraud, and how, in the last twenty years, he has developed and applied methods for using digital analysis for fraud detection. While Benford's distribution is at the heart of this analysis, Nigrini incorporates other kinds of digital irregularities in his analysis. For example, when Dartmouth reimburses us for travel, we do not have to provide receipts for items that are less than $25. So, most of our meals end up costing $24.50. Nigrini's digital analysis would have no trouble detecting this fraud.

The leading digit L of a positive numbers is the first non-zero number in its decimal representation. So the leading digits of .0023, 2.23, and .234 are all 2. The second digits of these numbers are 3,2, and 3 respectively.

As our quote for this Chance News suggests, the famous astronomer Simon Newcomb was led, by looking at tables of logarithm, to the belief that the leading distribution of data was not typically uniformly distributed. In his 1881 article Newcomb (1) (see end of the article for references) gave an argument to show that the mantissas of the logarithms of the numbers should be uniformly distributed which leads to

      Prob(L = j) = log(j+1) - log(j) for j = 1,2,..,9

where the logarithm is to the base 10.

This gives the distribution

    1     2     3     4     5     6    7      8      9  

  .301  .176  .125  .097  .079  .067  .058  .051   .046

The physicist Frank Benford(2) independently discovered this distribution and showed that a distributions of the leading digits for a number of "natural" data sets reasonably fit this distribution.

Here are some data whose leading digits are reasonably approximated by the Benford distribution:

   Benford  Dow Jones 1900-1993  Rivers Area  News papers  County populations  Electricity consumption
 1  .301  .289  .310  .30  .316  .316
 2  .175  .168  .164  .18  .170  .167
 3  .125  .124  .107  .12  .134  .116
 4  .097  .100  .113  .10  .083  .087
 5  .079  .085  .072  .08  .073  .085
 6  .067  .072  .086  .06  .067  .064
 7  .058  .062  .055  .06  .055  .057
 8  .051  .053  .042  .05  .056  .050
 9  .046  .047  .052  .05  .046  .057

The Dow Jones numbers are from an article by Ley(3) The rivers and newspaper data are from Benford's article. The County populations are from the population for 3,141 counties as reported in the 1990 census and analyzed by Nigrini in this book. The electricity consumption data appears in an excellent survey article by Raimi(4) and represents the electricity consumption of 1243 users in one month in the Solomon Islands.

In keeping with his desire to keep the mathematics informal, Negrini justifies the Benford distribution in terms of simple examples. For example, he considers a town that starts with population 10,000 and increases 10 percent per year. Then (ignoring compounding) the population increasing from 10,000 to 20,000 is a 100 percent increase and so takes about 10 years. But, a population change from 50,000 to 60,000 is only a 20 percent increase so takes only about 2 years. Thus the city will have a population with leading digit 1 about five times longer than it has leading digit 5.

We found Newcomb's argument a little mysterious and so will describe an argument that justifies the Benford distribution as the only distribution that is invariant under a change of scale, i.e. multiplying the data by a constant c. This was first proved by Pinkham(5) and extended by Hill(6).

To know the leading digit, the second digit, etc. we need only know the data modulo powers of 10. This is the same as saying we need only know the mantissas of their logarithms. We represent the mantissas as points on the unit circle. Following Hill we assume there is some chance process producing the data which in turn determines a probability measure for their mantissas.

For a data point to have leading digit L = j it must have a mantissa between log(j) and log(j+1). Thus

P(L = j) = P(log(j) <= mantissa < log(j+1))

Since multiplying the numbers by c amounts to adding log(c) to their mantissas, this means that the probability measure for the mantissas should be invariant under a rotation of the circle. It is well know that the only measure with this property is the uniform measure. Thus

P(L = j) = log(j+1) - log(j)
which is Benford's distribution for the leading digit.

Negrini's book contains a wide variety of case studies starting with the analysis of the data, from his student projects analyzing the data from their relatives mom-and-pop stores and to his own digital analysis of data from major corporations. Along the way there is an interesting discussion on the best way to test the fit of a distribution when you have many thousands of data elements.

Here is an example Nigrini gives from his student projects. For his project a student, Sam, used data from his brother's small store in Halifax, Nova Scotia. Throughout the day family members would ring up sales in the normal fashion. At night before closing, Sam's brother would go downstairs and ring up fictitious sales on a similar register so that the basement total was less than the upstairs total, to evade income and sales taxes. On one day there were 433 authentic sales totaling $4,038.32. On the fake register printout used for tax purposes there were 245 sales totaling $1,947.29. The leading digit distributions from the two registers' printouts and, for comparison, Benford distribution are:

 Digit  Benford  Actual sales  Fake sales
 1  .301  .290  .17
 2  .176  .230  .06
 3  .125  .075  .05
 4  .097  .100  .02
 5  .079  .170  .38
 6  .067  .025  .17
 7  .058  .024  .02
 8  .051  .024  .01
 9  .046  .040  .12

The fit to the Benford distribution for the actual sales is not great but the biggest difference, the excess in digits 5, could be explained by the large number of sales of cigarettes which sold for somewhat more than $5 at that time. Thus Nigrini would pass the actual sales data. However, his tests would certainly detect fraud in the fake sales data.

The obvious test to use for deciding if an empirical distribution fits the theoretical Benford distribution would be the chi-square test. However, Nigrini rejects the use of this test for the very large data sets obtained from analyzing data from a major company. The reason is that, when the true distribution is only slightly different from the theoretical Benford distribution, the large sample would lead to rejection of the Benford distribution even though there may be no fraud. This is like the observation that with enough tosses of a coin it is possible to reject just about any coin as a fair coin since coins are never exactly fair.

For this reason Nigrini recommends a test that he calls the Mean Absolute Deviation (MAD) test. This test computes the average of the 9 differences between the empirical proportions of a digit and the proportion predicted by the Benford distribution. Based on his experience Nigrini suggests the guidlines for measuring conformity of the first digits to Benford using MAD to be:

MAD:    0 to .004  (close conformity)
MAD: .004 to .008  (acceptable conformity)
MAD: .008 to .012  (marginally acceptable conformity)
MAD:  greater than .012 (nonconformity)

Thus a single deviation of more than 5% would rule out close conformity and more than 10% would suggest nonconformity. Nigrini gives a graph of the fit for a data set with 250,000 data entries from a large Canadian oil company. There are noticeable deviations for at least half of the digits. However, the deviations are small and the MAD was computed at .0036 indicating close conformity. A chi-square test would undoubtedly have rejected the Benford distribution.

Nigrini emphasizes that deviation from Benford's distribution and other digital irregularities do not by themselves demonstrate fraudulent behavior. There may be good reasons for these deviations. It only suggests that investigator might want to look further into how the data is collected, both for evidence of fraud and also for more efficient ways to run the business.

While we have assumed that natural data is the result of chance outcomes, it has not been necessary to know the probability distribution that produces them. However, we would hope that some of our standard distributions would be appropriate for fitting natural data. Both Hill and Nigrini remark that it would be interesting to know which standard distributions produce data with the leading digits having at least approximately a Benford distribution.

In a recent paper Leemis, Schmeiser, and Evans(7) have looked at this problem for survival distributions. These distributions include, among others, the well-known exponential, Gamma, Wielbull and Log normal distributions. All these distributions, for some parameter values, produce data with leading digits that reasonably fit Benford's distribution. However, since the fit is sensitive to the parameter values, the authors also warn that relying completely on the Benford distribution to detect fraud could lead to a significant number of false positives.

Mark Nigrini was one of our year 2000 Chance Lecturers, so you can see the movie before or after you read the book.

References for this review:

(1) Simon Newcomb, (1881), Note on the frequencies of the different digits in natural numbers. Amer. J. Math 4, 39-40.

(2) Benford, Frank,(1938), The law of anomalous numbers. Proc. Amer. Phil. Soc. 78, 551-72.

(3) Ley, Eduardo, (1996), On the peculiar distribution of the U.S. stock Indices. The American Statistician, 1996, 50, 311-313.

(4) Raimi, R. (1976), The first digit problem, Amer. Math. Monthly 83, 521-38.

(5) Pinkham, Roger, (1961), On the ditribution of first significant digits, Ann. Math. Statist., 32, 1223-1230.

(6) Hill, Theodore P., (1995), The Significant-Digit Phenomenon. American Mathematical Monthly, Vol. 102, No. 4. pp. 322-327.

(7) Survival Distributions Satisfying Benford's Law (with B. Schmeiser and D. Evans), The American Statistician, November 2000, Volume 54, Number 4, Available at: Larry Leemis -- Homr Page

The final word on the Car Talk puzzle.

In the last Chance News we discussed a Car Talk puzzle as a variation of the famous secretary problem. In the secretary problem you have n candidates for a secretarial position who arrive in a random order. As you interview them you can rank the ones you have seen so far but if you reject a candidate you cannot go back to that one. How should you choose a candidate to maximize the chance of getting the best secretary?

The optimal strategy is to reject the first s-1 secretaries and then accept the first, if any, that is better than any of those you have interviewed. The value of s depends on the number of candidates n. For large n it is approximately n/e and so you should pass over about 36.8% of the candidates. The probability of getting the best candidate then is also 1/e = .368.....

In the Car Talk version, numbers are written on three slips of paper which are then shuffled and placed face down on the table. You are to open them one at a time and stop when you think you have the largest number. How should you stop to give the highest probability of getting the biggest number? This version of the secretary problem was called by Martin Gardner the "Googol problem". We can obtain the secretary version of the Car Talk problem by assuming that the slips are turned over by a referee who just tells you the relative ranks of the numbers turned over so far.

Jeanne Albert suggested that you might be able to do better when you saw the numbers themselves instead of just knowing their relative rank as in the secretary problem. It turns out that she is correct and this problem has an interesting history.

Assume that it is known that the numbers on the slips were chosen with a known distribution F(x) which, without loss of generality, can be assumed to be a uniform distribution on the interval (0,1). Under these assumptions Gilbert and Mosteller (J. Ameri. Statist. Assoc. 61 35-73) found the optimal strategy and the probability of getting the biggest number. Their optimal strategy with n slips of paper is determined by an increasing sequence of numbers b(i), i = 1,2,...n that they show how to calculate. Then the optimal strategy is: When you have observed a number x and there are i slips remaining, choose x if it is the largest number you have seen so far and is bigger than b(i). The authors also show how to calculate the probability that you get the biggest number under this optimal strategy.

Consider the Car Talk problem when you know the distribution by which the numbers are chosen. Then Gilbert and Mosteller would find that b(0) = 1, b(1) = 1/2 and b(2) = .690. Thus we should look at the first number. If the second number is bigger than the first and bigger than .690 we should choose this number. Otherwise we have to choose the third number. Using this strategy gives a probability of .684 of getting the biggest number. The best we could do if we only knew the relative values of numbers (the secretary problem) would be .5. Thus the information about how the numbers were chosen does help us.

Of course, in the real Car Talk problem we have no idea how the numbers were put on the slip. What can we say about the best we can do here? The approach to this problem has been to consider Googol as a two person game in which the first player tries to pick a distribution for choosing the numbers that makes it as difficult as possible for the second player to choose the largest number. The solution to the secretary problem shows that the second player can achieve a probability of 1/2 of getting the biggest number no matter how the numbers are chosen. Can the second player choose a distribution to limit the first player to 1/2 even if he knew this distribution? This problem was posed by Ferguson in his article "Who solved the secretary problem?" Statistical Science, Vol 4, 282-289.

Reader Ronald Fagin reminded us that our favorite "envelope paradox" shows that the answer is "no" for the case of two slips. For this case the secretary version has only two possible strategies: accept the first number or reject the first number. In either case the probability of getting the correct answer is 1/2. Now assume that there are numbers on the slips. The following strategy will give you a probability greater than 1/2 no matter how the numbers were put on the two slips. Choose an auxiliary number x according to a normal distribution. If the number on the first slip is bigger than x, choose this number. Otherwise choose the second number. If x lies between the two numbers on the slips you will be sure to get the bigger number. If it is not, you still have a 1/2 probability of getting the bigger number. Thus you have a probability greater than 1/2 of getting the bigger number no matter how the numbers were assigned to the slips.

However, it has been proven that for n > 2 it is possible to find a method of assigning the numbers such that you cannot do better than for the secretary version even knowing this distribution.

For n = 3 this was proven by Silverman and Nadas (Contemp. Math. 125, 1992, 77-83). Their solution requires that the numbers be written on the slips as a two-step process. First choose a number X using the density

    g(x) = 1/8   0 <= x < 1

    g(x) = (7/24)x^(-(4/3)   x >= 1

Then put the numbers on the slips using three independent random numbers chosen from the interval (0,X).

In his article, "A solution to the game of googol", Annals of Probability, Vol. 22, No 3 1588-1595 Gnedin showed that for any n > 2, player 1 can assign the numbers in a way to limit player 2 to the probability of getting the biggest number achieved by the secretary problem.


The Car Talk brothers often comment on the fact that they like the math puzzlers the best of all their puzzles. Yet on NPR's All Things Considered 4 April 2000 brother Tom Magliozzi, joined the math bashers with comments like:

The purpose of learning math, which most of us will never use, is only to prepare us for further math courses... which we will use even less frequently than never.

You can read his comments or listen to the program at here.


Why do you think they have the double standard?

Chance News
Copyright (c) 2001 Laurie Snell

This work is freely redistributable under the terms of the GNU General Public License as published by the Free Software Foundation. This work comes with ABSOLUTELY NO WARRANTY.


CHANCE News 10.04

March 12, 2001 to April 18, 2001