Best of Chance News 13.03

Prepared by J. Laurie Snell, Bill Peterson, Jeanne Albert, Charles Grinstead, and Myles McLeod with help from Fuxing Hou and Joan Snell. We are now using a listserv to send out notices that a new Chance News has been posted on the Chance Website. You can sign on or off or change your address at here. This listserv is used only for this posting and not for comments on Chance News. We do appreciate comments and suggestions for new articles. Please send these to jlsnell@dartmouth.edu. Chance News is based on current news articles described briefly in Chance News Lite .

The current and previous issues of Chance News and other materials for teaching a Chance course are available from the Chance web site.

Chance News is distributed under the GNU General Public License (so-called 'copyleft'). See the end of the newsletter for details.

He's got their number: Scholar uses math to foil financial fraud.
Wall street journal, 10 July, 1995
Lee Berton

Mark Negrini, who teaches accounting at St. Mary's University in Halifax, wrote his PhD thesis on: "The detection of income evasion through an analysis of digital distributions". He has persuaded business and government people to use Benford's law to test suspicious financial records such as bookkeeping, checks and tax returns. The article states that Bendford's law "lays out the statistical frequency with which the numbers 1 through 9 appear in any set of random numbers". Actually, Benford's law states that the distribution of the leading digit in data sets is typically not equi-distributed but rather given by the distribution p(k) = log(k+1)- log(k) for k = 1,2,...,9. (The leading digit of .0034 is 3, of 243 is 2 etc.). Numerous explanations for this have been given but perhaps the most persuasive is that Benford's distribution is the unique distribution for the leading digits that is not changed by a change of units, i.e. multiplying the data by a constant c.

(For a recent discussion of Benford's distribution and further references see: Theoredore P. Hill, The significant digit phenomenon, "The American Mathematical Monthly", April 1995.)

Negrini's idea is that, if we are honest, the numbers in our tax returns and on our checks should satisfy Benford's law and if they do not there may be some skullduggery.

The article states that "Mr. Negrini has also lent his expertise to federal and state tax authorities, officials in Denmark and the Netherlands and to several companies. He has even put President Clinton's tax returns to the Benford's Law test. When he analyzed the president's returns for the past 13 years he found that 'the returns by Clinton follow Benford's Law quite closely'".

DISCUSSION QUESTION:

1. Would you expect Benford's distribution to apply to the number of hits a baseball player gets in a year?, to the prices of stocks on a given day?, to the population of cities in the United States?

2 Mr. Smith is quite wealthy and makes over a hundred charitable contributions each year. Do you think the distribution of the leading digits of these numbers would have a Benford distribution if he is honest? If he cheats, why might they not have a Benford distribution?

3. Compute the first 100 powers of 2 and show that the leading digits have a Benford distribution.

On the peculiar distribution of the U.S. stock indexes' digits.
American Statistician, Vol. 50, No. 4, Nov. 1996, pp. 311-313
Eduardo Ley

As we have discussed in previous issues of Chance News (see Chance News 4.10), Benford's distribution for leading digits is supposed to fit "natural data." The Benford distribution assigns probabilities log((i+1)/i) to digits i = 1,2,..,9. Ley asks if this distribution fits the leading digits of the one-day return on stock indexes defined as: r(t) = 100*(ln(p(t+1))-ln(p(t)))/d(t) where p(t) is the value of the index on the tth trading day and d(t) is the time the tth and t+1st trading day -- usually 1. Since p(t+1) = p(t)exp{r(t)*d(t)/100}, r(t) is the continuous-time return rate for the period between tth and t+1st trading time.

Ley finds that the leading digits of r(t) fit the Benford distribution remarkably well for both the Dow-Jones and the Standard and Poor's index.
He obtained the following distributions for the leading digits for the one-day return rate for the Dow-Jones from January 1900 to June 1993 and the S&P from January 1926 to June 1993.

As you can see, in both cases the approximation is very good.

Despite this apparent good fit, in both indicies a chi-squared test would reject Benford's distribution. Ley attributes this to the very large power of the test caused by so many samples. He remarks that if you take just the last ten years this does not happen and says:

If one takes models as mere approximations to reality, not as perfect data-generating mechanisms, then this can only be viewed as a weakness in the Neyman-Pearson theory (hypothesis testing). Ley's data is available from from http://econwpa.wustl.edu.

Comment: This example is discussed in an article" First Íignficant Îigit Patterns From Mixtures of Uniform Distributions" by Ricardo J. Rodriguez in The American Statistician, Vol. 58, No. 1, Feb. 2004. Here it is shown that this data is better fit by another law called "Stigler's law."

DISCUSSION QUESTIONS:

(1) Would you expect leading digits of the Dow-Jones values themselves to have Benford's distribution?
http://econwpa.wustl.edu.

(2) Do you agree with Ley's remark about the weakness in the theory of hypothesis testing?

It is often rewarding though less often enjoyable to read a technical book. But this book is both technical and a pleasure to read. It tells the story of Mark Nigrini's crusade to convince the world of accountants that Benford's distribution for the leading digit of "natural" data can be useful in detecting fraud.

Nigrini tells us how he learned about Benford's distribution, how he came to write his Phd thesis on the use of this distribution to detect fraud, and how, in the last twenty years, he has developed and applied methods for using digital analysis for fraud detection. While Benford's distribution is at the heart of this analysis, Nigrini incorporates other kinds of digital irregularities in his analysis. For example, when Dartmouth reimburses us for travel, we do not have to provide receipts for items that are less than $25. So, most of our meals end up costing $24.50. Nigrini's digital analysis would have no trouble detecting this fraud.

The leading digit L of a positive numbers is the first non-zero number in its decimal representation. So the leading digits of .0023, 2.23, and .234 are all 2. The second digits of these numbers are 3,2, and 3 respectively.

As our quote for this Chance News suggests, the famous astronomer Simon Newcomb was led, by looking at tables of logarithm, to the belief that the leading distribution of data was not typically uniformly distributed. In his 1881 article Newcomb (1) (see end of the article for references) gave an argument to show that the mantissas of the logarithms of the numbers should be uniformly distributed which leads to

The physicist Frank Benford(2) independently discovered this distribution and showed that a distributions of the leading digits for a number of "natural" data sets reasonably fit this distribution.

Here are some data whose leading digits are reasonably approximated by the Benford distribution:

The Dow Jones numbers are from an article by Ley(3) The rivers and newspaper data are from Benford's article. The County populations are from the population for 3,141 counties as reported in the 1990 census and analyzed by Nigrini in this book. The electricity consumption data appears in an excellent survey article by Raimi(4) and represents the electricity consumption of 1243 users in one month in the Solomon Islands.

In keeping with his desire to keep the mathematics informal, Negrini justifies the Benford distribution in terms of simple examples. For example, he considers a town that starts with population 10,000 and increases 10 percent per year. Then (ignoring compounding) the population increasing from 10,000 to 20,000 is a 100 percent increase and so takes about 10 years. But, a population change from 50,000 to 60,000 is only a 20 percent increase so takes only about 2 years. Thus the city will have a population with leading digit 1 about five times longer than it has leading digit 5.

We found Newcomb's argument a little mysterious and so will describe an argument that justifies the Benford distribution as the only distribution that is invariant under a change of scale, i.e. multiplying the data by a constant c. This was first proved by Pinkham(5) and extended by Hill(6).

To know the leading digit, the second digit, etc. we need only know the data modulo powers of 10. This is the same as saying we need only know the mantissas of their logarithms. We represent the mantissas as points on the unit circle. Following Hill we assume there is some chance process producing the data which in turn determines a probability measure for their mantissas.

For a data point to have leading digit L = j it must have a mantissa between log(j) and log(j+1). Thus

Since multiplying the numbers by c amounts to adding log(c) to their mantissas, this means that the probability measure for the mantissas should be invariant under a rotation of the circle. It is well know that the only measure with this property is the uniform measure. Thus

Negrini's book contains a wide variety of case studies starting with the analysis of the data, from his student projects analyzing the data from their relatives mom-and-pop stores and to his own digital analysis of data from major corporations. Along the way there is an interesting discussion on the best way to test the fit of a distribution when you have many thousands of data elements.

Here is an example Nigrini gives from his student projects. For his project a student, Sam, used data from his brother's small store in Halifax, Nova Scotia. Throughout the day family members would ring up sales in the normal fashion. At night before closing, Sam's brother would go downstairs and ring up fictitious sales on a similar register so that the basement total was less than the upstairs total, to evade income and sales taxes. On one day there were 433 authentic sales totaling $4,038.32. On the fake register printout used for tax purposes there were 245 sales totaling $1,947.29. The leading digit distributions from the two registers' printouts and, for comparison, Benford distribution are:

The fit to the Benford distribution for the actual sales is not great but the biggest difference, the excess in digits 5, could be explained by the large number of sales of cigarettes which sold for somewhat more than $5 at that time. Thus Nigrini would pass the actual sales data. However, his tests would certainly detect fraud in the fake sales data.

The obvious test to use for deciding if an empirical distribution fits the theoretical Benford distribution would be the chi-square test. However, Nigrini rejects the use of this test for the very large data sets obtained from analyzing data from a major company. The reason is that, when the true distribution is only slightly different from the theoretical Benford distribution, the large sample would lead to rejection of the Benford distribution even though there may be no fraud. This is like the observation that with enough tosses of a coin it is possible to reject just about any coin as a fair coin since coins are never exactly fair.

For this reason Nigrini recommends a test that he calls the Mean Absolute Deviation (MAD) test. This test computes the average of the 9 differences between the empirical proportions of a digit and the proportion predicted by the Benford distribution. Based on his experience Nigrini suggests the guidlines for measuring conformity of the first digits to Benford using MAD to be:

Thus a single deviation of more than 5% would rule out close conformity and more than 10% would suggest nonconformity. Nigrini gives a graph of the fit for a data set with 250,000 data entries from a large Canadian oil company. There are noticeable deviations for at least half of the digits. However, the deviations are small and the MAD was computed at .0036 indicating close conformity. A chi-square test would undoubtedly have rejected the Benford distribution.

Nigrini emphasizes that deviation from Benford's distribution and other digital irregularities do not by themselves demonstrate fraudulent behavior. There may be good reasons for these deviations. It only suggests that investigator might want to look further into how the data is collected, both for evidence of fraud and also for more efficient ways to run the business.

While we have assumed that natural data is the result of chance outcomes, it has not been necessary to know the probability distribution that produces them. However, we would hope that some of our standard distributions would be appropriate for fitting natural data. Both Hill and Nigrini remark that it would be interesting to know which standard distributions produce data with the leading digits having at least approximately a Benford distribution.

In a recent paper Leemis, Schmeiser, and Evans(7) have looked at this problem for survival distributions. These distributions include, among others, the well-known exponential, Gamma, Wielbull and Log normal distributions. All these distributions, for some parameter values, produce data with leading digits that reasonably fit Benford's distribution. However, since the fit is sensitive to the parameter values, the authors also warn that relying completely on the Benford distribution to detect fraud could lead to a significant number of false positives.

Mark Nigrini was one of our year 2000 Chance Lecturers, so you can see the movie before or after you read the book.

(1) Simon Newcomb, (1881), Note on the frequencies of the different digits in natural numbers. Amer. J. Math 4, 39-40.

(2) Benford, Frank,(1938), The law of anomalous numbers. Proc. Amer. Phil. Soc. 78, 551-72.

(3) Ley, Eduardo, (1996), On the peculiar distribution of the U.S. stock Indices. The American Statistician, 1996, 50, 311-313.

(5) Pinkham, Roger, (1961), On the ditribution of first significant digits, Ann. Math. Statist., 32, 1223-1230.

(6) Hill, Theodore P., (1995), The Significant-Digit Phenomenon. American Mathematical Monthly, Vol. 102, No. 4. pp. 322-327.

(7) Survival Distributions Satisfying Benford's Law (with B. Schmeiser and D. Evans), The American Statistician, November 2000, Volume 54, Number 4, Available at: Larry Leemis -- Homr Page

Greg Leibon taught the Dartmouth Chance Course this winter and made this sample project for his students. Since we have discussed the Benford's Law a number of times, we thought readers might also enjoy his project. Recall that the Benford's Law is that the first(leading) digit in "natural data" should have approximately the distribution given by base 10 logarithm of (1 + 1/d) for d = 1,2,...,9. Thus the leading digit is 1 with probability log(2) = .301, 2 with probability log(1.5) = .176 etc. The complete distribution is:

For his project, Greg wanted to see if "natural numbers" on the web satisfied Benford's Law. He writes:

I wanted to understand numbers on the World Wide Web in which real live people were actually interested. In particular, I did not want to accidentally include numbers from data sets intended only for data mining purposes. To accomplish this, I included a piece of text in my search. I wanted to choose a natural piece of text, hence (for lack of a better idea) I used the word “nature”. Thus, my Google Numbers are numbers that occur on a web page that also includes the word “nature”.

I wanted my search to produce robust but reasonable numbers of results. This is because I wanted to leave myself in a position to actually examine the resulting hits in order to achieve a sense for how the numbers were derived.

A little experimenting led Greg to the conclusion that searches for six-digit numbers and the word "nature" resulted in a reasonable number of hits. So he chose nine random five digit numbers and for each of these he added all possible leading digits. His first five-digit number was x = 13527 giving him the 9 six-digit numbers 113527, 213527, 313527, ...,913527. He then searched for each of these numbers and the word "nature" in Google and recorded the number of hits. Here is what he found:

He repeated this for his 8 other random five-digit numbers and combined the results to obtain:

Greg wondered if he was just lucky or if there was some explanation for such a good fit. Looking for an explanation, he found that many of the numbers he observed could be considered the result of a growth process. As an example of such a growth process, consider the money you have in the bank that is continuously compounded. Then it is easy to check that the percent of time your money has leading digit k for k = 1,2,3,...,9 fits the Benford distribution. Greg remarks:

Hence, we would expect Google numbers to have a Benford distribution if they satisfied two criteria: first that every Google Number behaves like money with interest continuously compounded, and, second that the probability that a Google number is posted on the web is proportional to how long that quantity is meaningful.

We gave Greg an A on his project but you should read it here yourself. You can also see what Greg did in the Chance Course and some student projects here.

Repeat Greg's experiment replacing "nature" by a different word. Do you get similar results?

This article deals with the topic of missing data. Two major subtopics are discussed; one is the capture-recapture method, and the other is bias in clinical studies. As an example of the capture-recapture method, consider the following scenario. Suppose that we are trying to estimate the number of people who live in a certain area. We can pick a random sample from the area and consider the people in the sample to be 'captured'. Then we can pick another random sample from the same area and count how many of the captured people are picked the second time (i.e. 'recaptured'). Suppose that we pick a sample of 100 people the first time, and then we find that, in our second sample of size 100, 10 people have been recaptured. This tells us that the captured set is about 1/10 the size of the population, leading to an estimate of 1000 for the population size. This method has been applied in ecology, national defense, and public health, to estimate sizes of populations that are hard to count directly.

The second subtopic concerns missing data. Suppose, for example, that a drug has been developed to treat a certain condition. Before it can be introduced into the marketplace, it must be tested to see if it is safe and if it works. It is fairly clear that if one tests a drug sufficiently often, it is probable that at least one of the tests will show that the drug is effective. Unfortunately, there may be many other studies of the same drug that do not show this, and these latter studies are less likely to be published than the former one. The question in this case is whether the fact that these negative studies exist can be discerned from the studies that have been published.

It turns out that the answer to this question is 'sometimes'. One can use what is known as a 'funnel plot' to help determine whether there is missing data. One begins with the data from many small studies concerning the drug's efficacy. Since these studies are small, they tend to vary widely in their predictions concerning the drug. Larger studies are needed to establish statistically significant results. If one compares the results of the published larger studies with the smaller studies, two possibilities exist. The first one is that, while the larger studies tend to cluster more closely to a certain figure than the smaller ones, there is no bias in the larger studies with respect to this figure. The second possibility is that the larger studies are skewed (presumably in the direction that shows the drug is beneficial). This latter situation suggests that some larger studies have been conducted and have remained unpublished because they do not show that the drug is effective at a statistically significant level.

Matthias Egger and some colleagues at Bristol University have used funnel plots in this way to show that, in at least one quarter of 75 published medical studies, there were significant signs of missing data. These results have given more ammunition to medical scientists who have called for access to all study results.

(1) The Census Bureau plans to use the capture-recapture method to determine the undercount in the census 2000. This is done by carrying out two surveys (Mass Enumeration & PES Sample Enumeration) of a population of N blocks at about the same time. The unknown total population is

where M11 is the number enumerated on both occasions; M10 the number enumerated on occasion one but not on occasion two, M01 the number enumerated on occasion two but not one, and M00 the number not enumerated on either occasion. Everything but M00 is known here. Assume the two surveys are independent and each individual has the same probability of being counted. Then you can estimate M00 by

(2) Matthews mentions the following problem: Jones, a butterfly researcher, captures butterflies for a period of time m. He captures 5 different species with 4,1,10,20,2 representatives of these 5 species. Jones asks: if I were to return the next day and catch butterflies for a length of time n, how many new species of butterflies should I expect to get?

This is a famous and very difficult estimation problem. There is one elegant solution due to I.J. Good (Biometrica, Vol. 40, 1953, pp. 237-264) which works sometimes. Assume that butterflies of species s enter the net according to a Poisson process with rate v which depends on s. Then the distribution of the number of butterflies of species s caught in time t is Poisson with mean tv. Thus, before he starts his research. the probability that Jones catches no s butterfly on the first day but at least one s butterfly on the second day is

Using the infinite series expansion for e^x in the term of this product permits us to write this product as:

where X is the number of s butterflies caught on the first day. Summing these equivalent expressions over all species shows that

where r(j) is the expected number of species represented j times in the first days catch.

(3) Assume that n=m. Estimate the number of new butterflies that Jones will capture on the second day.

Thisted and Efron (Biometrica 1987,74, 3, pp. 445-55) used this method to determine if a newly found poem of 429 word had about the right number of new words to be consistent with being written by Shakespeare. They interpreted a species as a word in Shakespeare's total works discovered or not discovered. They took the time m to be 884647, the number of words in existing Shakespeare works, and n to be 429, the number of words in the new poem. Using the above result they got an estimate of about 7 for the number of new words that should occur in a new work of 429 words, and since the new poem had 9 such new words they concluded that this was consistent with the poem being a work of Shakespeare.

(5) It would be interesting to check this model on bird data. A natural thing to try would be the Christmas Bird Count (CBC). I'm told the data is available.

In his column, Unconventional Wisdom, Richard Morin included an account of the sea monster study that we mentioned in Chance News. As we read his account it occurred to us that this was a good topic for a Chance profile on sampling methods related to the capture-recapure method. We wrote the beginning of such a module and am including it in this Chance news with the hope of getting some suggestions for improvement or additions. We plan to include the simulation programs mentioned. Morin's discussion is such a nice introduction that we include it as he wrote it.

Unconventional Wisdom
Washington Post, 7 March, 1999

In reading the Poisson story, the first thing that came to our mind was that the remarkable ability of the gamekeeper to distinguish every deer he has seen suggests that the King might also have used the capture-recapture method used by the Census Bureau in the undercount problem in this way. The King asks the gamekeeper to make a serious effort to see as many deer as he could on a specific day. Think of the deer he saw the first day as tagged. Then on the next day he could make an even greater effort to see as many as possible. Suppose the gamekeeper sees 35 deer on the first day and 60 on the second day, noting that 20 of these 60 deer he also saw on the first day. Then, assuming there are N deer walking around the forest more or less at random, the proportion of tagged deer in the second day's sample, 20/60, should be approximately the proportion of tagged deer in the forest, 35/N. Thus the King can estimate the number of deer in the forest to be 3*35 = 105. Thinking about the validity of the assumptions made in this hypothetical problem suggests problems that the Census Bureau must adjust for in their real world use of the capture-recapture method in the Census 2000.

Let's turn now to solving the problems that the King and Paxton posed. In the deer problem, it might be reasonable to assume that each deer has the same probability of being seen in a day. However, for the monster fish problem the probability of seeing a particular species of large fish in a given year would surely depend on the species. Thus to have a solution that will apply to both problems we will let the probability of seeing a particular deer differ from deer to deer.

Poisson suggested plotting the numbers that the gamekeeper saw each day for a series, say, of 50 days. When you do this you get an increasing set of points that suggest that a curve could be fitted to them that looks a bit like a parabola. This curve would be expected to level off when all the deer had been seen and the value at which it levels off would be our estimate for the total number of deer in the forest. Our problem is to determine the curve that best fits this data.

Assume there are M deer in the forest which we label 1,2,3,...,M Let's assume that on a given day the gamekeeper sees deer j with probability p(j). Then the probability that he has not seen the ith deer after n days is (1- p(i))^n and so the probability that he has seen this deer after n days is 1- (1-p(i))^n. Thus the expected number of deer the gamekeeper has seen after n days is the sum of 1 - (1-p(i))^n for i = 1 to M. If we could estimate the probabilities p(i) this would provide a natural curve to fit to the data given by the gamekeeper.

The original deer problem would have all the p(j)'s the same, say p. In this case our expected curve is M(1-(i-p)^n). Of course, we do not know N and p. We can choose the curve that best fits the data by considering a range of M and p values and choosing the pair that minimizes the sum of the squares of the differences between the observed and predicted values. We carried out this procedure by simulation assuming M = 100 and p = .05 and the gamekeeper reported for 50 days. The best fit curve predicted the values M = 100 and p = .05 quite accurately.

For the species problem the p(j)'s are different, and so the problem is more difficult. Now we have M different values to estimate and less than that number of data points. Here researchers take advantage of the fact that the predicted curves look like parabolas. They replace this complicated expected value curve by the much simpler hyperbola of the form y = sx/(b-x) where x is the time observed and y the total number of species seen by this time. Now they again have only two parameters, s and b to estimate and again we can do this by choosing s and b to minimize the sum of the squares of the differences between observed and predicted values. To see how this works we simulated this experiment. We assumed that there are 100 different species and chose their probabilities of being seen in a given year at random between 0 and .1. We had the program estimate the best fit curve and estimate the total number of species from this curve. Using 100 species, we found that the resulting estimates were not too bad. However, the resulting hyperbolic curve did not fit the data very well. Even if it did fit the observed points, different models could fit just as well but give quite different limits. Curve fitting without an underlying model is a risky business.

The work of Fisher referred to by Richard Morin was for a slightly different problem. His problem can be described as follows: the King now wants to know how many species of butterflies there are in his forest. He asks the gamekeeper to go out one day with his net and see how many species of butterflies he catches. The gamekeeper does this and reports that he caught 30 butterflies with 5 different species of butterflies with numbers (12,7,5,3,5) for the different species. Fisher asked: how many new species would the gamekeeper expect to find on a second day of catching butterflies?

I. J. Good later gave an elegant new solution of Fisher's problem. Here is Good's argument.

It is natural to assume that number of butterflies of a particular species that enter the gamekeeper's net has a Poisson distribution since typically there are a large number of butterflies of a particular species and a small probability that any one would enter the gamekeeper's net. Let m be the mean number of species s caught in a day. Then the probability that no butterfly of species s is caught on the first day is e^(-m) and the probability that at least one such butterfly of species s is captured on the second day is 1-e^(- m). Therefore the probability that no butterfly of species s is captured on the first day and at least one is captured on the second day is:

Using the series expansion for e^(x) in the second term of this product we can write this probability as:

Now Good assumes that the mean m for a specific species is itself a chance quantity with unknown density f(m). Integrating our last expression with respect to f(m) we find that the probability that species s is not seen on the first day and at least one is captured on the second day is:

where X is the number of times species s is represented in the first day's capture. Summing this expression over all species we find that:

where e(j) is the expected number of species with j representatives on the first day. The e(j)'s can be estimated by the number of species represented j times on the first day's capture. Doing this we obtain an estimate for the number of new species we will find on the second day.

We have assumed that the sampling time was the same on the two days. If it is different, then our final sequence becomes

where w is the ratio of the time spent sampling on day 2 to the time spent on day 1.

The same procedure can be used to find an expression for the expected number of species in the second sample that occurred k times in the first sample. The result is

Thisted and Efron used this method to determine if a newly found poem of 429 words, thought to be written by Shakespeare, was in fact written by him.

They interpreted a species as a word in Shakespeare's total works. The sample on the first day corresponds to the 884,647 words in known Shakespeare works. The words in the new poem constituted the second sample. They then used Good's method to estimate the number of words in the poem that were used k times in the previous works. Then they compared these estimated values to the actual values.

In this example, w = 429/884647 is sufficiently small that only the first term wC(k+1,k)e(1+k) in our series need be used.

From this formula, we see that the expected number of words that were not used at all in previous works is we(1). There were 14,376 words in Shakespeare's original works that were used once. Thus we estimate e(1) = 14376/884647. This gives an estimate 7 for new words in the poem (the actual number was 9). In the same way we estimate that there should be 4 words that were used once in the previous works (the actual number was 7), and 3 words that should have been used twice (the actual number was 5. The authors concluded that this fit suggested that the poem was written by Shakespeare.

They made similar estimations for other Elizabethan poems by Shakespeare and other authors to see how well this method identified a work of Shakespeare. Their results showed that their method did distinguish works of Shakespeare from those of Donne, Marlowe and Jonson.

The use of simulation in these models highlights the fact that in statistics we are usually trying to estimate a quantity given incomplete information. If we set a lot of gamekeepers to gather the data we would typically get slightly different answers using the data from the different gamekeepers. We can see how much variation to expect by simulating this experiment.

While this example would take a couple of days to discuss in a Chance course, it uses nothing more difficult than the coin- tossing probability model and the Poisson approximation of this model.

David Rutherford sent us remarks on the Economist article on estimating the number of large salt-water species, He writes:

On the estimation of the number of as yet undiscovered salt-water species with length or width of at least 2 meters (6.6 feet), the author does not state that we are sampling only from that portion of the sea which trawlers fish (the bottom of sea trenches is relatively undersampled, for example, while the top portions of the sea are relatively oversampled). This is equivalent to sampling only the top half of the chocolate box. Here the analogy breaks down however, because the conditions at the bottom of the sea are markedly different (in terms of pressure, temperature and light) than conditions at the top of the sea, whereas the conditions in the chocolate box are fairly uniform. Presumably this qualifier appears in the original paper, either by Fisher or Paxton (ie 47 species remaining in that part of the sea which is or has in the past been sampled).

David Rutherford
Melbourne, Australia

Editors comments: Fisher and his colleague (R. A. Fisher, A.S. Corbet, C.B. Williams, Journal of Animal Ecology, 1943, Vol. 12, p. 42-48) state that their estimates apply only to a region for which the sample is representative. So, for example, estimates for the number of species of butterflies based on a sample at the bottom of a mountain would not apply to the number in an area of higher elevation. Likewise, the estimates for the number in one season need not apply to another. Paxton does not discuss this issue directly but after remarking that his model assumes a constant sampling which may well not be satisfied he writes:

The validity of the analysis presented here would also be in doubt if many new species are split from existing species by information gained by molecular techniques or new methods of biological sampling found large numbers of new species of large marine fauna in as yet unexplored habitats.

Study: Oscar winners tend to live longer.
Nando Times, 14 may 2001
Michael Rubinkam
Available at Nando Times until 27 May 2001

Survival in Academy Award--winning actors and actresses.
Annals of Internal Medicine, Vol. 134, No 10, 15 May 2001
Donald A Redelmeier, Sheldon M. Singh

The abstract for this study as presented in the Annals of Internal Medicine described the study as follows:

Those who win more than one academy award are counted only once which explains the difference between the number nominated 762 and the number of controls 887.

As seen above, Oscar-winners have a life expectancy of 3.9 years (79.7 vs. 75.8 yrs) greater than the matched controls from the same movie who have not been nominated for Oscars. The researchers report that the winners have about the same advantage over those who were nominated but did not get an Oscar -- 79.7 vs. 76.

The researchers comment that this increase of about 4 years in longevity is equal to the estimated societal consequence of curing all cancers in all people for all time. They do not have any simple explanation for this increase in longevity. But they suggest the following possible explanations: Oscar winners are under greater scrutiny that may lead to a more controlled life to maintain their image; they may have managers with a vested interest in their reputation and enforce high standards of behavior; they may have more resources and so can avoid stress and have access to special privileges that others do not.

The usual explanations for longer life expectancy for the rich over the poor: better schooling, better health care etc. do not seem to apply here.

(2) In discussing the limitations of their study the authors say that they should have had more biographical information about those in the study. What would they looked for if they did?

We usually do not go into the technical details of a study that we discuss in Chance News but decided that it might be fun to try this. We chose the study "Survival in academy award-winning actors and actresses," Annals of Internal Medicine, Vol. 134, No. 10, by Donald A Redelmeir and Sheldon M. Singh discussed in Chance News 10.05. Redelmeier was also the lead author of the interesting study on the danger of using a cell phone while driving, discussed in Chance News 6.03 and 6.10 and in an article "Using a car phone like driving drunk?" in Chance Magazine, Spring 1997.

Recall that for their Oscar study the authors identified all the actors and actresses who had been nominated for an award for leading or supporting role since the Oscar awards were started 72 years ago. For each of them, they identified another cast member of the same gender in the same film and born in the same era. This provided a group of 887 actors to be used as a control group for their study of Oscar winners. From those nominated there were 235 Oscar winners. The authors wanted to determine if Oscar winners tend to live longer than comparable actor who were not winners. Thus the key question is how do you decide if there is a significant difference in the life expectancy of members of two different groups.

similar problem arises in a medical trial in which one group of patients is given a new treatment and a second group is given a placebo or a standard treatment and the researchers are interested in the expected time until a particular "end event" occurs such as death, the disappearance of a tumor, the occurrence of a heart attack etc. The test that is generally used for such studies, and was used in the Oscar study, is called the "Kaplan-Meier survival test". This test was developed by Kaplan and Meier in 1958 ("Nonparametric Estimation from Incomplete Observations," Journal of the American Statistical Association, Vol. 53, No 282, 457-481). The importance of this test is suggested by the fact that Science Index shows that over 22,000 papers have cited the 1958 Kaplan and Meier paper since 1974. We are willing to bet that no reader can, without help from the Internet or a friend, guess a scientific paper that has a larger number of citations.

A good description of how the Kaplan-Meier test is carried out can be found in Chapter 12 of the British Medical Journal's on-line statistic book "Statistics at Square One."

The Kaplan-Meier test requires that we construct a life table for Oscar winners and the control group. We start by reminding our readers how Life Tables are constructed for the US population. The most recent US Life Tables can be found in CDC's href="http://www.cdc.gov/nchs/products/pubs/pubd/nvsr/48/lifetables98.htm">National Vital Statistics Report Volume 58, Number 18 and are based on 1998 data. Life tables are given by sex and gender and also for the total US population. The following table is from the first 10 rows of the Life Table for the 1998 US population.

The second column gives the proportion q(x) dying in each age interval, determined, as the number who died in this interval in 1998 divided by the US 1998 population at the midpoint of the age interval.

The third column starts with a cohort of 100,000 at birth and gives the number expected still to be alive at the beginning of each age interval. To compute these numbers we start with l(1) =100000 and then use the recursion relation l(x+1) = l(x)(1- q(x)) for x > 1. (Note that 1-q(x) is the proportion of those alive at time x that survive at least one more year.) For example, l(2) = 100000(1- .00721) = 99279, l(3) = 99279(1-.00055) = 99,225 etc. The quantity l(x) can be interperted as the probability that a newborn child will live to year x. For any year t greater than or equal to x, the quantity l(t)/l(x) can be interpreted as the probability that a person who has lived to year x wil lives to year t.

>To determine the life expectancy of a newborn baby we need only sum l(x) for all x. (Recall that for a discrete random variable X the expected value of X can be computed by the sum over all x of Prob(X >= x)). To find the life expectancy for a person who has reached age x we add the values of l(x)/l(t) for t greater than or equal to x. From the table we see that the life expectancy for a person at birth is 76.7 while for a person who has survived 9 years it is 69.4 making a total life expectancy of 79.4. Thus there is a 2.7 year bonus for having survived 9 years. You can view the entire Life Table here and check your own bonus. We have 10 year bonus for surviving so long but, alas, only a 10.7 year additional expected lifetime.

>A survival curve is a plot of the probability l(x)/100000 of living x or more years as a function of age. Using the full life table we obtain the following survival curve for the US population.

For a discussion of the technical problems in producing and interpreting Life Tables see the article . A method for constructing complete annual U.S. life

We return now to the Oscar study. In studies like this and others where the Kaplan-Meier test are used, we have incomplete information about the length of time to the end effect-- death in the case of the Oscar study. Some of the Oscar winners will have died by the time the study is completed but others will not have died by this time. Still others may have been lost by the researchers after being followed for some time. For those who have not died we know only that they lived to a certain age and at this point we say that they have "left the study." The Kaplan-Meier procedure allows us to use this information along with the information, about those who have died, in making life tables.

Dr. Relemeier provided us with the >data needed to carry out the Kaplan-Meier test. In Table 1 we show the first 10 entries in his data set for the members of the control group:

The first column gives the number of years that it is known the actor survived. For the first actor this was 56 years and the 0 in column 2 indicates that this actor died in his 56th year. The second actor survived 78 years and the 1 in column 2 indicates that this actor was lost to the study in his 79th year.

Recall that in the actual study there were 235 Oscar winners and 887 controls. To discuss how the Kaplan-Meier test works, using a manageable number of subjects, we chose a random sample of 30 Oscar winners and a random sample of 100 controls from the author's data set.>

We first want to determine a survival curve for each of the two sample groups. Recall that the key information for determining a survival curve for the US population was the estimate q(x) of the probability that a person who has lived to the beginning of year x survives this year. We use the same quantity to construct the survival curve when we have incomplete information. This is done by constructing Table 2. In column 2 of Table 2 we have listed, in increasing order, the years that the 30 Oscar winners died or were lost to the study.

In column 3 we put a 0 if the actor died and a 1 if the actor was lost to the study. Note that in some years there was more than one actor. For example there were three actors who either died or were lost to the study in their 54th year -- one died and two were lost to the study.

In column 4 we put the number n(x) of Oscar winners known to be alive at the beginning of year x.

In column 5 we put the number of Oscar winners d(x) who died in year x. Then (n(x)-d(x))/n(x) is the proportion q(x) of those alive at the beginning of year x who survived this year. These values appear in column 6.

Finally we use q(x) to estimate for the probability that an Oscar winner will live at least to the beginning of year x. As in the case of the tradition life table, l(0) = 1 and, for values greater than 0, l(x) is calculated by the recursion equation l(x) = l(x-1)*(1-q(x)). The values of l(x) appear in the last column.

The upper curve is the survival curve for the winners and the lower curve is for the controls. These curves suggest that the winners do tend to live longer than the controls. We can get another indication of this by computing the expected lifetime for members of each group. As in the US population we simply summing l(t) over all years t. Doing this we find that the expected lifetime for an Oscar winner is 78.7 and for members of the control group it is 74.7 again indicating that the Oscar winners live longer on average than the controls.

Again the top curve is the survival curve for the Oscar winners and the bottom curve is for the control group. Using all the data we find that the expected lifetime for an Oscar winner is 79.7 and for the controls it is 75.8 which reflect the same difference we saw in >

We must now tackle the question of how the authors concluded that the approximately four years difference found in the study was significant, i.e., could not be accounted for by chance. For this, the authors used a test called the "log-rank test".

We say an event happened in age year x if either at least one subject died or was lost to the study during this year.For each group and each age year we count the number still being followed at the beginning of the year and the number of deaths during this year. For example, in our sample we find that in the age year 61 we were still following 19 Oscar winners 2 of whom died in this year. For the control group we were still following 65 controls one of whom died this year. Thus there was a total of 84 still being followed at the beginning of age year 61 and a total of 3 deaths during this year. Now, under the hypothesis that there is no difference between Oscar winners and the controls, these 3 deaths should be randomly chosen from the 84 people still being followed. Thus we can imagine an urn with 84 balls, 19 marked O for Oscar winner and 65 marked C for controls. Then father death chooses three balls at random from this urn to determine the deaths. Then the number of the number of deaths chosen from the Oscar winners group has a hypergeometric distribution. The probability that any particular death is an Oscar Winner is 19/84 so the expected number of Oscar winners deaths in year 61 is 3*(19/84) =..679. The observed number o was 1. We also need to calculate the variance of the number of deaths among the Oscar winners. This is more complicated because the variance for the multinomial distribution is complicated.

Assume that you have n balls in an urn, k are red and and n-k are black. Then if you draw m balls at random the expected number of red balls is

e = m(k/n)
and the variance is

v = ( m*(k/n)*(n-k)/n)*((n-m)/(n-1)).

In our example the red balls are Oscar winners so the variance for the number of Oscar winners who died in the 61th year is

Then to carry out the rank-order test we do the above calculations for each age year for a particular group. We chose the Oscar winners. Let O be the sum of the observed number of Oscar winners who died in each year, E the sum of the expected values over all years and V the sum of the variances over all years, Then the statistic

will, by the Central Limit theorem, be approximately normal so S^2 will have approximately a chi-square distribution with 1 degree of freedom. Note that in summing the variances we are assuming that the observed numbers of Oscar winners who die in different years are independent. This is true because we are conditioning on knowing the number in each group that we are still watching and the total number of deaths for a given year. These determine the distribution of the number of Oscar winners who die in this year under our assumption that there is no difference between the two groups.

Using our program to carry out these calculations for the data for this study we found S^2 = 9.1246. The probability of finding a chi-squared value greater than this is .0025 so this indicates a significant difference between the Oscar winners group and the control group.

To check our program we carried out the calculations for the Kaplan-Meier procedure, as did the authors, using the SAS statistical program. SAS yielded the same survival curves as our program and for the significance tests SAS reported:

Thus the log-rank test agreed with our calculation. SAS also provided two other tests that might have been used both of which would result in rejecting the hypothesis that there was no difference between the two groups. This ends our saga.

WEATHER FORCASTING

Here is an account by Dan Rockmore of another chance exploration carried out recently by Dan and Laurie. This is a sequel to a previous exploration when they visiting local weather forecaster Mark Breen (See Chance News 8.04).

Whither the weather balloon?

Recently, our quest to understand the process of weather prediction led us to take a trip to visit the National Weather Service (NWS) Forecast Office in Gray, Maine. Our goal was to witness statistics in action, in the form of the launch of a real, live weather balloon.

Some time ago, our friend Mark Breen, the local Vermont Public Radio forecaster had shown us how some of his forecasting depended on the regional forecast generated from this NWS office. In turn, their own forecast used the data generated by weather balloons which are launched every day, at seven o'clock in the morning and seven o'clock in the evening at about 70 sites around the country. We couldn't understand how we had never seen one of these balloons. This was clearly a job for the Chance News action team, so off we went to witness a launch, ask some questions and shed some light on the mystery of the missing balloons.

We arrive in nearby Portland, Maine on the evening before our scheduled morning meeting at NWS, just in time to take in a Portland SeaDogs baseball game and have a great dinner at the Fore Street restaurant (order the Loup de Mer if it is on the menu!). We awake at 5:30 AM the next day and are on the road by 6 heading for the weather station. Weather prediction is clearly a coffee-intensive activity.

We wind our way over the local highways on a drizzly and appropriately gray morning in Gray. Our final destination is a medium-sized reddish brick, official-looking building, which is the regional weather prediction center of National Oceanic and Atmospheric Administration (NOAA). You can see pictures of the office at

http://www.seis.com/~nws/officepix.html.

For a brief history of the office see

http://www.seis.com/~nws/historyPWM.html.

You'll be interested to discover that the National Weather Service traces its origins to President Ulysses S. Grant, who gave responsibility for its creation to the Secretary of War.

We ring the bell and are let into the center. Weather prediction is a 24 by 7 job and some night-shift scientists are around, as well as the scientist in charge of the sacred balloon launch, Art Lester, one the hydro-meteorological technicians. In fact Art has already been there for about an hour or so, since it is his responsibility to prepare the balloon at around 6 AM for its 7 AM launch. Art gives us a tour of the operations room which is basically a computer room where computers are running weather models and weather maps are displayed on every monitor. We ask a bunch of questions about the pictures. The scientists can't help but look and sound like weather forecasters as they respond. Their hands sweep across the monitors as they trace out fronts moving this way and that, lows evolving into highs, and temperature isotherms. Other maps show clouds of precipitation and locate lightning strikes. The weather prediction models are being run in preparation for the construction of the morning regional forecast which is put together by our host John Jensenius, the Warning Coordination Meteorologist, who arrives at about 6:40.

John takes us aside to show us a weather balloon up close, along with the instrument that it carries into the sky, a radiosonde. The balloon itself is a silky yellowish brown sac which will be filled with helium and released into the air and should rise about 17 miles. The accompanying radiosonde carries sensing instruments encased in styrofoam. The instruments are measuring the temperature, humidity, barometric pressure at different altitudes. Wind speed and direction is also inferred by tracking the signal, thereby monitoring the movement of the balloon and radiosonde. The readings are sent back via radio waves - separate frequencies are reserved for the different variables. In fact as a way of checking that the radiosonde is on-line, the measurements are "played" as a set of tones of different frequencies. This gives new meaning to songs like "You are My Sunshine'' and "Thunder and Lightning".

Launch time is approaching. Everything checks out at the office and now it's time to launch! We go out and get into the car for a quick drive up to the launch site -- it looks like a little observatory, really just like a largish tall garage. We go inside and there is the balloon, tied to the table, with radiosonde tied on to it. The garage door is opened and final preparations are made, checking again to see that the radiosonde is secured and transmitting. The balloon is walked outside and with no ceremony at all, released and it speeds into the sky. With the low ceiling on this cloudy day it is quickly out of sight. Its rate of ascent makes it clear to us why we have never seen one floating lazily in the sky. With the low ceiling, in ten seconds it disappears, and we imagine that even on a clear day it is invisible in less than 30 seconds. As it rushes up to the clouds, the highly malleable skin flattens out in the early morning wind, looking more like a large lumpy pillow (UFO?) than a balloon, and soon it is gone, disappearing into the low clouds on this rainy morning. The radiosonde continues to transmit its song of the weather.

The readings will continue to be sent for about 2 hours or so. After this the balloon usually bursts (this was tested on the ground) and begins to descend rapidly to earth, often ending up in the ocean or a tree somewhere, which is probably why we have yet to come across one in our daily wanderings. In general, those launched on the east coast are rarely found, although in other places, like the landlocked Midwest with its wide-open plains, they are found relatively often. The radiosonde has a little notice on it, assuring anyone who finds it, that the equipment is perfectly safe and asking that it be mailed back (for free) to the regional office whose address is on the label. Each launch costs about 200 dollars, including the money spent for the time of the scientists. It is believed that this process will be automated in the near future, and that ultimately (possibly as soon as ten years from now) as satellite imaging gets better, there may very well come a day when the balloons are unnecessary.

We get back into the car to return to the main building for our debriefing. We continue our discussion with John of the role that the forecast plays, and the many, many tools available now for predicting the weather. John will be using the radiosonde data as well as the most recent model output, and satellite data (animated as brief movies made by the looping the pictures taken over time by the satellites) to prepare the day's area forecast discussion which is posted on the web - (see

http://www.seis.com/~nws/mesnhs.html

for the appropriate link as well as other related links). Comparison of the model predictions with incoming weather data helps to generate the forecast. We remember that the regional forecast is used by our own Mark Breen on VPR to help create his local forecast.

John explains that there have been great improvements in the 3,4, and 5-day forecasts, but that beyond that accurate predictions remain elusive. Once again we discuss the age-old bugabear, "What does probability of precipitation mean?" as well as its corollary "If a 20 percent chance of rain is predicted, and it rains, is the forecaster correct?" Here is our e-exchange:

Finally it's time to leave. We've tracked down the elusive weather balloon, yet unbelievably the joys of discovery are not yet over. As we make our way back to the interstate, we decide to stop for a proper breakfast at a roadside diner, Stones Grove. The six pick-up trucks parked in front are a good omen. We are not disappointed as we enjoy the best home fries we've ever tasted. The diner is full of regulars. We join in as the gang all sings Happy Birthday to Tom the cook who receives an 8 pound torque wrench, sweater, pineapple upside down cake and gift certificate to L.L. Bean for his birthday. As we climb back in the car and head home, we congratulate each other on a perfect trip. Now what were the chances of that?!

What is the difference in these two approches? Which do you think is better? Compare these two methods as applied to the data given in Harold Brooks article.

Bible's word patterns suggest divine writing.
The Valley News, 3 Nov. 1995
Associated Press

An article in the October issue of "Bible Review" has renewed interest in research by three statisticians, Doron Witztum, Eliyshu Rips and Yoav Rosenberg, published in "Statistical Science" (1994, Vol, 9, No. 3, pp 429-438). These authors claim to show that the book of Genesis contains information about events that occurred long after it was written that cannot be accounted for by chance. The editors of the "Statistical Science" comment that the referees doubted this was possible but could not find anything wrong with the statistical analyses. So they published it for the rest of us to try to discover what is going on.

The authors chose 32 names from the "Encyclopedia of Great Men of Israel" and formed word pairs (w, w') where w is one of the names and w' a date of birth or date of death of the person with this name. We say a word w is "embedded" in the text if it appears in the text at positions of an arithmetic sequence (not counting spaces) i.e., appears in the text seperated by intervals of fixed length of letters from the text. For example, the word "has" is embedded in the sentence "The war is over." since the letters h, a, and s occur in the sentence separated in each case by two letters. The authors showed that the names and dates they chose appeared in Genesis (which is not surprising) but the names were nearer their matching date than could be accounted for by chance (p = .00002). All this was done using the Hebrew language.

At the suggestions of a referee the authors tried the same tests using other Hebrew works and even Tolstoy's "War and Peace" translated into Hebrew as controls. They did not find any similar unlikely events in these controls.

The article in the "Bible Review' gives an interesting account of this research and how it has been received. The authors first announced similar results in the "Journal of the Royal Statistical Society A"., (155:1 1988, pp. 177-178) while commenting on an article "Probability, Statistics and Theology" by D. J. Bartholomew. At his announcement a public statement was made by well known mathematicians including H. Furstenberg at Hebrew University and Piateski-Shapiro at Yale that these results "represented serious research carried out by serious investigators."

DISCUSSION QUESTIONS:

(1) What do you think could be going on here?

(2) Do you think your name is embedded in Hamlet?

(3) The authors restrict themselves to words that are embedded in the text with separation between letters at most a specified number D. They estimate the expected number of times a word w is embedded in Genesis by taking the product of the relative frequencies (within Genesis) of the letters constituting w multiplied by the total number of equidistant letter sequences in the text having separation at most D. Is the independence between letters assumed in this calculation reasonable?

In Chance News 4.15, we discussed an article by Witztum, Rips and Rosenberg (WR and R) published in "Statistical Science" (1994, Vol, 9, No. 3, pp 429-438). In this article, the authors claim to show that the Hebrew version of Genesis contains information about events that occurred long after the Bible was written that cannot be accounted for by chance.

Orthodox Jews believe that the Torah, consisting of the first five books of the Bible, represents the word of God unmediated by human beings. This article has been widely quoted as providing evidence for this belief.

For their study, WR and R chose two lists of names of prominent Rabbis, born thousands of years after the Bible was written. There were 34 Rabbis in their first list and 32 in the second. With each of these names they associated a date representing the date of birth and/or death. Dates, in Hebrew, are written using letters only, no numbers. Actually each Rabbi's name and date were represented by WR and R in more than one way, corresponding to the different ways they might occur in scholarly books.

WR and R represented the book of Genesis as one long string of Hebrew characters with no spaces. A word was said to be "found" in Genesis if its letters appeared equispaced within this string.

The two lists of Rabbis were tested separately. For each test, a computer program searched for the Rabbi's names and dates in Genesis. The authors defined an overall measure, Omega, of the distance of the names of the Rabbis from their dates. To test for significance, they computed Omega for a large number of permutations of the names. They found, from this, that it was very unlikely to get a value of Omega as small as that obtained for the original order of the names.

In accepting the paper, the editors of "Statistical Science" commented that the referees doubted the conclusions of the study but could not find anything wrong with the statistical analyses. They were publishing it for others to try to discover what might be going on.

For the first 2 to 3 years after the paper's publication, there was no formal scientific response, though religious groups used the results as new evidence of divine intervention and, of course, there were a number of lively discussions of the work on the web.

The editors could hardly have anticipated that the response to this article would be a best-seller and possibly a movie. However, when you think about the current public attitude towards science, this response is not such an unlikely event.

Drosnin is a writer best known for his best seller, "Citizen Hughes." He became interested in the WR and R study and learned from Rips how to do his own search for hidden messages. Drosnin then did what is called "data mining", looking for important modern events encoded in Genesis. He found Rabin's name and, next to his name, the phrase "assassin that will assassinate". He sent a message to Rabin, telling him of this finding. A year later, when Rabin was assassinated, Drosnin says he became a believer.

Maya Bar-Hillel tells us that the Hebrew for "assassin that will assassinate" is exactly the same as "assassin that will be assassinated" and remarks: "while Drosnin has the ears of the world, I suggest he should use his fame to warn the prison authorities in Israel that Igal Amir (Rabin's assassinator) is in danger for his life!"

Drosnin shows that it is possible to find just about every great moment in recent history including the Gulf War, Watergate, the collision of the comet Shoemaker-Levy with Jupiter, Clinton's election, the Holocaust, etc. He shows us evidence for the final atomic war in the year 2000 or 2006. He covers himself slightly by saying that future events are really probabilities.

While a serious study of the WR and R study has been slow in coming, now Maya Bar-Hillel and Dror Bar-Natan at the Hebrew University, Brendan McKay at the Australian National University, Arie Levitan and Alec Gindis have accepted the challenge of the Statistical Science editors and have made serious studies of the WR and R results.

Maya recently visited Dartmouth and talked about this work. She remarked that anyone who understands Hebrew realizes that, along the way, many rather arbitrary decisions had to be made. For example, as in English there are various ways in Hebrew that Rabbis are addressed and different forms for their names as well as for the dates of their birth or death. There are many ways that the distance between words could be measured. Maya reported that there is a large (and growing) list of felicitous choices the WR and R made, many of which would have detracted from the final result if they had been made differently (even though apriori, you would think it shouldn't matter). It is all too common for experiments to be designed, consciously or unconsciously, by making those choices which produce the results being looking for.

Maya remarked that the WR and R were happy to try the same experiment with works where you would not expect to find the names of the Rabbis. For example, they did their experiment on the Hebrew version of "War and Peace" and found nothing significant there. But they were less willing to carry out experiments that would replicate their discovery. They did do one replication, or cross validation, using their second list of Rabbis. They again found significant results. However, they declined to do the more convincing replication of using one of the other four books in the Torah. Since all five of these books are believed to be the direct words of God, similar evidence of divine writing would be expected in the other four. When Maya and her colleagues tried this replication, using the same words used by WR and R, no significant results were found.

McKay and Bar-Natan, are preparing a paper for publication that reports the results of a series of experiments that attempted to replicate the WR and R experiment. One was only a slight variation of the WR and R experiment. In a more substantial variation, they replaced the date associated with the Rabbi by the name of the most famous book written by the Rabbi. All versions of their experiments were carried out for each of the five books of the Torah. They used the same method for computing distances and determining significance and also another method suggested by Perci Diaconis. They could not find anything that could not be attributed to chance in any of their experiments. You can see a detailed description of these experiments and their results at: Report on new ELS tests of Torah

Maya reports that, while the original WR and R list found nothing significant in "War and Peace," a list, identical to the original in everything except for some playing around with choice of names and titles (in a manner that would probably be undetectable to us and possibly even to experts in the field) did yield significance values in "War and Pease." that match or surpass those reported by WR and R in their original article.

McKay and his colleagues have also provided amusing examples of what can be done by the type of data mining employed by Drosnin. A computer search in the "Law of the Sea Treaty" found such phrases as: "Hear all the law of the sea" and "safe UN ocean convention to enclose tuna" which, as usual, if predicted in advance would have been difficult to attribute to chance. Also they found 59 words related to Chanukah "closely clustered" in a small segment of the Hebrew version of "War and Peace." You can find a discussion of this experiment at: Astounding Discoveries in War and Peace!

(1) Brendan McKay says: Sorry Maya, but your theory is quite wrong. The code is referring to the intending assassin being assassinated, i.e. that Amir would be killed before carrying out his deed. Who do you think got it right? What other interpretation can you give to this phrase?

If there are divine words in the Bible why should they not be able to be used to predict future events? Why did Rips have to think about it so hard if it is the opinion of every scientists involved in serious codes research?

(3) It often appears that believers in ESP are able to get significant results while non-believers, carrying out the same experiments, fail to get significant results. Do you think this will be the case with Bible codes?

(4) Drosnin gives the impression that he feels that God has encoded an unlimited amount of knowledge in Genesis. Can you see how God might have done this using only finitely many words?

(5) If I showed you that a minor change in one of the subjective decisions made by the experimenters in the WR and R study would make their results not statistically significant, would you reject the results of their study? Would you be influenced by the number of minor changes I had to look at to find a change where the results would not be significant?

Michael Olinick sent us a number of interesting web references to the Bible code story. Among them was the following amusing proof that kiddie TV character Barney is actually the devil:

There are codes in War and Peace too.
Galileo, vol. 24, November-December 1997
Maya Bar-Hillel, Dror Bar-Natan, Brendan McKay

This article reports on new work related to the Witztum, Rips and Rosenberg (WRR) paper on Bible codes (also called Torah codes) that appeared in Statistical Science 1994, Vol, 9, No. 3, pp 429- 438 (Chance News 6.07). We shall refer to the authors of the Galileo article as BBM and the authors of the Statistical Science article as WRR. WRR reported on the following experiment.

The names of 34 famous Rabbis, born long after Genesis was written, were chosen from an encyclopedia of famous Rabbis. For each Rabbi, WRR chose a set of names and titles that would identify the Rabbi and a set of dates that represent the date of birth or death of the Rabbi.

WRR then considered a Hebrew version of Genesis as a string of 78,064 Hebrew letters with no spaces. They defined an equi-letter- skip (ELS) word as a word whose letters occur in this string of letters of Genesis, separated by sequences of letters of equal length. An elementary probability calculation shows that we can expect, just by chance, that most of the names and dates of the Rabbis will be ELS words.

WRR then defined a notion of distance between two ELS words and hypothesized that the names and dates of the Rabbis would be closer together than could occur by chance. This hypothesis was tested and the results were highly significant (p = .000016).

The referees suggested that the authors choose a completely new set of famous Rabbis and test their hypothesis again. They did and again obtained highly significant results. Finally the referees asked the authors to test their hypothesis in the Hebrew version of another work of similar size. They did so, using the first 78,064 letters of the Hebrew translation of War and Peace. The Rabbis' names and dates again appear as ELS words but the degree of closeness of their names and dates was not significant. On the basis of these tests the referees accepted the paper for publication in Statistical Science.

We have now a situation very similar to a well designed experiment in extra-sensory perception with highly significant results that a skeptic just doesn't believe. What does the skeptic do? He looks for something in the experiment that was not done quite right. MBB play the role of skeptics. They believe that recent experiments, carried out by Dror Bar-Natan and Brenden McKay, cast considerable doubt on the claims of Witztum and that the choice of names and dates to use for the Rabbis was made before any consideration of where the names and dates occurred as ELSs in Genesis. WRR claim, in fact, that they (WRR) did not even make these choices, but rather they were made by other historical scholars for them.

So what did Bar-Natan and Brenden McKay (BM) show? They showed first that, whoever chose the names and dates to use for the Rabbis for the WRR article, made a significant number rather arbitrary choices--especially for names of the Rabbis. BM then asked if they could find a significant result in War and Peace if they were allowed to make judicious choices to help their cause. BM considered the second list of Rabbis chosen by WRR at the suggestion of the referees. They kept the same dates but made modifications in the choice of names. Specifically they dropped 20 of the names from the 90 names used by WRR for the Rabbis and added 30 new ones. The new names were ones that research suggested to BM could equally well have been chosen by WRR. With these changes, BM found the same kind of significant results in War and Peace that WRR found in Genesis.

The authors conclude that one explanation for the significant results of WRR is that they "cooked" their data. The authors report on additional evidence for this cooking. They state that almost every one of the apparently quite arbitrary choices WRR made in choosing the names, increased the significance of the result. Of course, this itself could be considered evidence of divine intervention.

Well, that leaves us with the familiar ESP situation: the believer continues to believe and the skeptic continues to be skeptical. (See the reports by Jessica Utts and Ray Hyman (Chance News 5.04) as they assessed the research on extra sensory perception sponsored by the Defense Intelligence Agency during the cold war.

(1) Harold Gans, former senior cryptologist with the Department of Defense, took the names of all 66 Rabbis and replaced the various spellings of dates of birth or death of the Rabbis used by WRR by the spellings of the cities where the Rabbis were born or died. Gans again obtained a highly significant result (p < 1/143,000). If you were asked to referee papers by Gans, purported to confirm the WRR results, or by BM purporting to show how the WRR results could have been flawed, how would you decide whether one or both of these papers should be accepted?

(2) What would you estimate to be your apriori probability for the hypothesis proposed by WRR before their experiment? How would the results of WRR change this apriori probability?

Another skeptical look at the Torah Codes has been provided by Barry Simon of California Technology. Simon is one of the countries leading mathematical physicists and himself an Orthodox Jew. His article "A skeptical look at the Torah codes" will be published in the March issue of Jewish Action. A preliminary version is posted on the web Barry Simon on Torah Codes.

Simon has an interesting discussion, based on his considerable experience as an editor of a scientific journal, about the overstated claims regarding the WRR article that have been made, especially by religious groups, based on acceptance of this work by Statistical Science and supporting comments by leading mathematicians.

Simon discusses some of the same issues raised by MBM in their Galileo article. Another concern Simon mentions is that the definition of distance between EDLs provided by WRR is extremely complicated and not a natural definition that other mathematicians would have been likely to choose. The statistical significance of the WRR could be quite sensitive to the form of this definition and Simon suggests that, without even realizing it, WRR could have been influenced by what works in choosing this rather unnatural definition of distance.

DISCUSSION QUESTION:

Simon says that he believes it would be impossible to disprove the WRR claim that the Torah has hidden codes. He writes:
I explicitly asked Professor Rips this question and he admitted it was an interesting question to which he didn't have an answer. If it isn't possible to disprove, then the hypothesis is not a scientific hypothesis. This is not to say that statistical analysis can't be a valid way to analyze what might be going on, but without the possibility of disproving a hypothesis, that hypothesis is outside the realm of science as we understand it.
What do you think about this?

We asked Bible Code expert Brenden McKay to see if the Bible could be of any help to Mr. Starr in his investigation. Here is his answer:

A number of new articles on the Bible Code controversy can be accessed from Brendan McKay's Torah Codes page: Here are four that we found interesting.

The lecture by Rips was given in about 1985 in Russian and is provided with an English translation by McKay to help clarify the history of the Bible Code controversy.

In the Statistical Science article (See Chance News 4.15), the authors, Witztum, Rips and Rosenberg, gave no explanation for how the codes got in the Bible though the reader is surely meant to infer that they came from God. In this lecture, given some 10 years before the Statistical Science article appeared, Rips explains what he had already done and hoped to do to establish the presence of the codes in the Torah and what this will mean. We found his discussion of the effect on free will particularly interesting:

Everything is foreseen but man is still free to exercise his will. How is that possible? Let me put it this way: supposing that today we are watching the repeat of a game that took place last week. Now, you know by now that the game ended with the score 3:2, that the first goal was scored fifteen minutes into the game, and so on. Did our present knowledge hamper the players? No, of course not. Yet to the Almighty, who is outside of time, there is no gap between past and future, that is to say, He is as cognizant of the future as He is of the past. But his knowledge lies outside of our world, and it does not prevent us, those living in this world, from exercising our individual free will. That is a crucial point.

In "On the Witztum-Rips-Rosenberg sample of nations" McKay and his colleagues analyze a sequel to the Statistical Science paper, by the same authors, which has been circulating in preprint form for several years. Instead of matching Rabbis and their dates, this article considers pairs of the form (N,X), where N is the name of a nation and X is some related word or phrase. It is much easier for this study than it was in the Rabbi study to show that many subjective choices were made which, if chosen otherwise, would not have led to a significant result.

Hasofer is an Emeritus Professor of Statistics at the University of New South Wales and has analyzed previous statistical claims for hidden codes in the Torah. In this article, Hasofer supports the criticisms of KcKay and others of the WRR studies and adds some of his own criticisms. Hasover states that this study was a test of hypothesis where the authors provided a null hypothesis but no alternative hypothesis. He points out that the obvious alternative hypothesis -- the codes were put their by God -- is clearly not testable. He states that when an alternative hypothesis that can be tested is not given, all that can be concluded from rejection of the null hypothesis is that the null hypothesis is unlikely to account for the results. (There seems to be general agreement, both among the proponents and the critics, that the results of the Bible Codes studies did not occur by chance!)

Barry Simon's article is a revision of his previous article ìA Skeptical Look at the Torah Codes" which was published in March 1998 Jewish Action. Simon states that, after considerable study of the evidence and the replies of the proponents to his original article, he has gone from being skeptical about the arguments presented for the validity of the codes to being certain that all of the evidence presented so far has no legitimacy. This is an excellent up-to-date discussion of the case against the Bible Codes. Simon provides a detailed discussion of why a study whose design involves subjective choices that cannot be replicated by others cannot be considered a scientific study.

For a lighter but no less interesting discussion of what science is all about, we recommend the new Richard Feynman book "The Meaning of it All: Thoughts of a Citizen-Scientist." (Addison Wesley 1998, Hard cover, 133 pp, $15.40 from Amazon.) This book consists of three lectures Feynman gave at the University of Washington in Seattle in April 1963. In the first lecture Feynman talks about the nature of science with particular emphasis on doubt and uncertainty. In the second he talks about the impact of science on politics and religion. By the third lecture Feynman says that he has run out of organized ideas and so we are treated to some of his "unorganized ideas" on "this unscientific age".

(1) Hasover remarks that if there is a null hypothesis and an alternative hypothesis, a small probability for the data under the null hypothesis might not lead to accepting the alternative hypothesis because the probability of the data under the alternative hypothesis might be even smaller. Is this relevant for the Bible Codes study?

(2) A coin is tossed 10,000 times and comes up heads 5,150 times. Could you reject, at the 95% confidence limit, the null hypothesis that the coin is a fair coin? If the alternative hypothesis is that the coin is biased with a 53% chance of heads, which hypothesis would you accept? It is often suggested that you will be able to reject any null hypothesis with a sufficiently large sample. Why? Do you think this is what Hasover is really worried about?

(3) In his Jewish Action article, Simon reminds us that a scientific hypothesis must, at least is principle, be capable of being disproved. However, it is hard to see how the hypothesis "There are Bible Codes in the Torah" can be disproved. If it cannot, this hypothesis is not a scientific hypothesis. What do you think about this? Does this tell us anything about the statistical study of Bible Codes?

The Spring issue of Chance Magazine arrived and, as usual, has a number of interesting articles. Maya Bar-Hillel, Drior Bar Natan, and Brenen McKay write on their research on the Bible Codes which we have reviewed in previous issues of Chance News.

The Bible Code: A series of three talks on Bible Codes given by Brendan McKay, Maya Bar-Hillel and Jeffrey H. Tigay at Princeton University, Tuesday April 28, 1998. McKay and Bar-Hillel give a critical analysis of evidence for Bible Codes. Tigay discusses how the most authentic version of the Bible known today could differ from the original.

Statistical Science recently provided a press release related to this paper in which included the following introduction to the paper written by Robert E. Kass, former Executive Editor of Statistical Science.

There has not yet been much press response. Science (Bible Code Bunkum, Science 1999 September 24; 285: 2057a in Random Samples) had a brief discussion based on this press release but the most complete discussion we found was the following:

Wittes is an editorial writer for the Washington Post. This is a well-written discussion of why Statistical Science believes the Bible Codes puzzle has been solved. We will not try to summarize it since Wittes article is available on the web. You will find there also a response from Michael Drosnin which will sound all too familiar to those who have followed this controversy.

Of course, this is not the end of the Bible Code. The movie "The Omega Code" will be opening October 15 in a theater near you. From its synopsis we read:

Readers of Chance News need no introduction to the Bible Codes controversy. Six years after he accepted the paper that started it all, Robert E. Kass wrote an introduction to the paper "Solving the Bible Code Puzzle", by Brendan McKay, Dror Bar-Natan, Maya Bar-Hillel, and Gil Kalai (Statistical Science, May 1999) which ended with: "It indeed appears, as they conclude, that the puzzle has been solved."

We all learned a lot from this controversy and it will remain a wonderful case study for understanding statistical tests and the study of coincidences. Reading this article, you will see why Maya and Laurie have both tried to get their theatrical daughters interested in producing a serious play about the Bible Codes -- the characters are great, it is a real life mystery, and the topic has already shown it can sell millions of books.

Of course the Conant Doyle quote at the beginning of this Chance News is perfect for this article. It was also used by the authors to introduce their article -- a coincidence? We could equally well have used the following quote from Maya and Avishai's fascinating article:

Drosknin's "Bible Code II" begins with the September 11 tragedy. On the cover of the book we see the author's evidence that this tragedy was anticipated in the Bible.

Readers of Chance News will recall the following challenge made by Michael Drosnin relating to his first book on the Bible Codes:
When my critics find a message about the assassination of a prime minister encrypted in Moby Dick, I'll believe them.Newsweek, Jun 9, 1997McKay accepted the challenge and found messages about the assassination of Indira Gandhi, Leon Trotsky, Martin Luther King, John Kennedy, Abraham Lincoln and others in Moby Dick. You can see these messages here.

On October 8, McKay provided evidence that Moby Dick also predicted the "War on Terrorism" writing:

Afghanistan is mentioned exactly once in Melville's classic Moby Dick. Imagine our amazement to find clear indications of the attack on the World Trade Center and the subsequent American attacks on Afghanistan encoded nearby!

In a brief review of Bible Codes II, posted on Amazon before the book was available in Australia, McKay wrote:

I am the author of the "Moby Dick" codes mentioned by an earlier reviewer. To see that codes appear anywhere, we can use Drosnin's own book. Go to the extract that appears on this site and locate the words "Lower Manhattan" a few paragraphs down. Starting at the R of "Lower", count forward 32 letters at a time to find the hidden message "R.U. FOOLED?" encoded in the text exactly the same way that Drosnin's hidden messages are encoded in the Bible. Moreover, the chance that this message appears so close to the start of the book is less than 1 in 200,000, so, according to Drosnin's logic, it must be there by design and not by accident.

In Bible Codes II Drosnin explains that Bible Codes indicate that the world will come to an end in 2006. He writes:

The more closely we looked at the warnings in the Bible code, the clearer it became that the ultimate danger centered on 2006. That is the year most clearly encoded with "Atomic Holocaust" and "World War," and also with the "End of the Days."

He also found a suggestion that if the world acts now they might be able to prevent this. This led him to two goals and the account of his attempt to achieve these goals is the main content of the book.

His first goal was to meet personally with world leaders to warn them of the coming Apocalypse. He attempted to arrange such a meeting with Clinton, Bush, Barak, Sharon, and Arafat. He was able to meet with people close to these leaders but Arafat was the only one who would speak to him personally.

His second major goal was to determine the key to the Bible code. Throughout the book Drosnin reports his discoveries to Eliyahu Rips and asks for his reaction. Recall that Rips is a well known mathematician who was one of the authors of the American Scientist article who claimed to have verified that Bible Codes could not be due to chancel. Rips comments are usually limited to stating the probability that Drosnin's latest finding could have happened by chance. But when Drosnin asks him if we will ever know the full Bible Code, Rips replies: "This will only happen if we find the key to these codes."

So Drosnin sets out to find this key. He need only search on key and Bible Codes and finds that the key is engraved on stone pillars which were enclosed in an "ark of steel"( brought to our world by aliens) that ended up in the Dead Sea. Drosnin reports that he had been given a written permit for archaeological expeditions to attempt to find the the code key, but unfortunately this permission was withdrawn with no explanation.
So, alas, Drosnin did not achieve either of his two goals.

In support of his alien theory, Drosnin reports that Francis Crick, who with James Watson discovered the structure of DNA, believes that DNA also was brought to our world by aliens.

Crick did discuss this idea in his book "Life Itself" published by Simon and Shuster in 1962. However, we found reading this book, after reading Drosnin's, was a breath of fresh air.

Crick writes about DNA which is generally agreed to exist and to be the code of life. He observes that it is not too much of a stretch to imagine that bacteria could one day be carried on a space ship from earth to some other planet where the conditions for life were favorable. Thus we might one day be responsible for establishing life on another planet. But if this is possible for us it also might have happened the other way around--an advanced form of life might have been established on another planet before on ours and then transplanted to earth by a space ship. He remarks that his wife regarded this as Science Fiction, but he at least felt that it made some sense scientifically.

Since there is no scientific evidence for the existence of Bible Codes, Drosnin's suggestion that the Bible Codes key came on the same trip is not very convincing.

DISCUSSION QUESTIONS:

(1) The New York Times listed the original Bible Codes book in the non-fiction category of its best seller list. Do you think that was appropriate? If so, and if Bible Codes II also becomes a best-seller, should it also be listed under non-fiction?

(2) If you did a survey, what proportion of the people in the U.S. would you expect to believe in Bible Codes?

Copyright (c) 2004 Laurie Snell
This work is freely redistributable under the terms of the GNU General Public License published by the Free Software Foundation. This work comes with ABSOLUTELY NO WARRANTY.

Best of Chance News
APRIL 17, 2004

BEST OF CHANCE NEWS
APRIL 17 2004

Best of Chance News APRIL 17, 2004

BEST OF CHANCE NEWS APRIL 17 2004

Best of Chance News
APRIL 17, 2004

BEST OF CHANCE NEWS
APRIL 17 2004