CHANCE News 7.07

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

CHANCE News 7.07

(27 June 1998 to 8 August 1998)

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

Prepared by J. Laurie Snell, Bill Peterson and Charles Grinstead, with help from Fuxing Hou, and Joan Snell.

Please send comments and suggestions for articles to
jlsnell@dartmouth.edu.

Back issues of Chance News and other materials for teaching a Chance course are available from the Chance web site:

Chance News is distributed under the GNU General Public License (so-called 'copyleft'). See the end of the newsletter for details.

Our quote comes from the June 1998 issue of FORSOOTH!

===========================================================

Tea accounts for 42 per cent of everything drunk in Britain, except tap water. Every man, women and child over ten drinks 3.4 cups of tea per day.

Britannia Magazine
February 1998

===========================================================

Contents of Chance News 7.07

1. Additions to the Chance web site.
2. Is your ticket a winner? Odds are, it's not.
3. A lottery with no nines.
4. A probability puzzle.
5. Will there be a new home run record this season?
6. Berkeley's toolkit for interactive learning and education.
7. America under the gun.
8. Ask Marilyn: Using coin tossing to explain lottery odds.
9. Increasingly, couples aren't the marrying types.
10. Following Benford's law, or looking out for No. 1.
11. The quantitative leap.
12. The dedication of the Shakespeare's sonnets.
13. Reckonitis: A cognitive deficit of social origin.
14. Probabilistic proofs.

<<<========<<

>>>>>==============>
We had our last Chance Workshop from July 7 to July 11. The participants were great, and some of our articles came from the participants' presentations of a typical Chance class. You can see more details on the workshop and these presentations by going to the Chance web site under "Chance Course"

We have added to our video collection three videos of the visiting speakers at the workshop: "How to Display Data Badly" by Howard Wainer, "Codes in Chance Magazine" by Stephen Samuels and "Bible Codes" by Maya Bar-Hillel. Videos of these lectures can be accessed from the Chance Web site under "Chance Lectures". In addition we have added a short news segment from our local TV on the day of the 250-million-dollar Powerball jackpot.

We have also put on the Chance web site, under "Teaching Aids", the "Proceedings of the 1996 IASE Round Table Conference on Research on the Role of Technology in Teaching and Learning Statistics", edited by Joan Garfield and Gale Burrill. This is a fine set of lectures and discussions with an international cast.

<<<========<<

>>>>>==============>
Here are a two new NPR videos we have added to our Audio series.

They can be accessed from "Teaching Aids".

INFLUENCING THE WEATHER Morning Edition, August 6, 1998

NPR's David Kestenbaum reports that a new study indicating that humans may influence the weather. This study appears in the current issue of Nature and purports to show that, on the East Coast, there is more rain on weekend than weekdays and storms are less severe on weekends than weekdays. The authors suspect that this is caused by pollution. A skeptic doubts that pollution could cause such a difference and offers another solution. For more information see the discussion on ABC news:

BRITAIN'S BET SHOPS. All Things Considered July 30, 1998

NPR's Michael Goldfarb reports from London on the British mania for betting. In London you can walk into on of 8300 shops and make a bet on just about anything you wish. In the shop Goldfarb visited, you can bet immediately on a sports event, but a bet on whether Clinton will finish his term requires a phone call to Ed Nicholson who sets the odds. Nicholson says he accepts most any bet. But he did turn down a bet that the world would come to an end as not fair to the proposer.

Perhaps one of our U.K. readers can solve our betting problem. When we were invited to dine at one of the Cambridge colleges, the usual after-dinner betting took place. In discussing the college's betting history, a member said "for example, we can tell you what Darwin bet". Foolishly, we did not say o.k. what did he bet? As a result we have wondered all these years what Darwin bet. Can anyone help?

Powerball! All Things Considered July 28, 1998

An amusing discussion on All Things Considered of what you might do and might not be able to buy with your 137 million in cash if you win the July 29 250-million jackpot. You would not be one of the list of Form Magazine's 400 richest Americans. You would have trouble buying a baseball team, but you would have more than twice Michael Jordon's last year salary.

Morning Edition July 19, 1998

A report from Iowa where the Powerball lottery began. Officials claim that they had no idea the prizes would get so big. Hal Stern mentions the possibility of buying enough tickets to cover most numbers as was successfully done some years ago in the Virginia lottery..

Finally, we mention two web sites that we have added to the Chance web site under "other related web sites"

(1) http://www.bus.utexas.edu/~koehlerj

This is the web site of Jonathan Koehler, Professor of Behavioral Decision Making at the Graduate School of Business and Law School, University of Texas at Austin. Koehler has written a number of articles on the use of DNA in the courts and you can find references to these articles at his homepage. There is an audio report in the AAAS Science Update Series in which Koehler discusses an experiment he carried out relating to how jurors react to the way that DNA evidence is presented. You will also find an amusing correspondence with US News and World Report in which Koehler tries to correct an article they wrote about the reliability of DNA fingerprinting in paternity cases.

(2)Intellectual Capital.com

What is interesting to us about this web site is the column written by Howard Wainer on the use and misuse of graphics. His current column "Scaling the Market" discusses the need to ask why a graph is being displayed. He shows the Dow Jones as displayed by the New York Times with various time scales: minutes, days, years etc. and discusses when each is appropriate. You can find this column and previous columns by choosing "archives" and searching on Wainer.

<<<========<<

>>>>>==============>
Is your ticket a winner? Odds are, it's not
Valley News 29 July 1998, A2
Sarah M. Earle

At the time of the 250-million-dollar Powerball jackpot, most newspapers felt that they had to help the public think about what the 1-in-80 million chance of winning the jackpot really means. We got two calls for help. The first was from our local newspaper, the Valley News. We gave two suggestions for thinking about these odds. The first we learned from Fred Hoppe: if you toss a coin 26 times your chance of getting 26 heads in a row is greater than your chance of winning the Powerball jackpot. The second we learned from Arnold Barnett in his Chance Lecture " Risk in everyday life". Arnold was talking about helping people understand the chance of being killed on an airplane flight. He estimated that, if you go on a randomly chosen airplane flight you have a 1-in-7 million chance of being killed. He said his first attempt to explain these odds was to say that this chance was less than the chance of winning the Massachusetts lottery. This he said did not work because people think they have a pretty good chance of winning the lottery. After some experimenting, he found more success telling them that they would have to take a randomly chosen airplane flight every day for 19,000 years to have a reasonable chance of being killed in an airplane accident. Based on this, we suggested that you would have to buy a lottery ticket twice a week for 800,000 years to have a significant chance of winning a jackpot.

Our second opportunity for fame came when the TV station WNNE wanted to know if a player's choice of numbers could affect the chance of winning the jackpot. We said no but it might affect the chance of having to divide the jackpot if you did win. You can view the WNNE discussion of the Power Ball lottery

To know how much choosing your own numbers can help avoid having to split the pot, you have to know something about the numbers people choose. You also need to know what proportion of the people choose their own numbers rather than have the computer choose them. In the Powerball lottery, only about 30% of the players choose their own numbers while in the UK Nation Lottery about 80% do. We discussed these issues in our article: "Using the Lottery in Teaching a Chance Course" (on the Chance Web site under "Teaching Aids").

We also learned from reader John Haigh that this problem of what numbers players choose has been studied in some depth by a number of authors. A good summary of this work, and some of John's own work, can be found in his recent article: Statistics of National Lottery, Journal of the Royal Statistical Society, Series A, June 1997, Vol 160. John also told us about a new book: How to Win More, by Norberet Henze and Hans Riedwyl, A.K. Peters, 1998, $15. These authors discuss the numbers lottery players choose based on a huge amount of lottery data. It is a nice succinct book on all aspects of lotteries that holds out no false hopes.

DISCUSSION QUESTIONS:

(1) Dan Rockmore wrote us:

I was watching some news show last night which gave the usual statements about how you are more likely to

(1) get a hole-in-one playing golf than win the P-ball
(2) get struck by lightning than win the P-ball
(3) win 1M at roulette than win P-ball
(4) win 1M in the stock market than win the P-ball

and it occurred to me that all of these statements are nonsense. For the Powerball lottery all players have equal chances. For (1) and (2) these are only sensible as conditional probabilities - they depend on the player, as does (4) sort of. Certainly (3) and (4) depend on the bet. If I have a million dollars and play black or red, or even-odd then sure I have a much better chance of winning 1M.

Do you agree with Dan? How could you make (3) more precise so that it is comparable to winning the jackpot?

(2) How would you describe to your Uncle Joe what his chances are in winning the Powerball lottery?

(3) Powerball officials said they had no idea that the jackpot would get so big and perhaps it was time to think about setting a limit to how big it can be. Do you believe they had no idea how big it would get? Should there be a limit to the size of the jackpot?

(4) John Haigh remarked: It is a nice curiosity that the higher the use of genuine random choice, the more predictable the distribution of the numbers of winners in all categories is! What does he mean by this?

<<<========<<

>>>>>==============>
Beth Chance told us about another amusing lottery story. The Arizona Lottery (www.arizonalottery.com) started a new game "Pick 3" on May 3, 1998. In Pick 3, you choose 3 numbers from 0 to 9. Three ordered winning numbers are picked and prizes awarded depending on how your numbers match the winning numbers. New winning numbers are picked every day of the week except Sunday. By June 9th, after 32 picks, there were still no nines in any of the 96 winning numbers. During this period, a women whose son was born on September 7th and always chose 9 0 7 called to complain that something must be wrong since she had not seen any 9's is Pick 3 began. She was assured that all was well, but then it was discovered that indeed the company who supplied the random number generator had provided a program that omitted 9's. The company officials said they were concerned about the 0's since they had not done these before and so only checked numbers from 0 to 8.

About 1.2 million tickets with 9's had been sold during this period. Lottery officials agreed to refund the money of anyone who sent the their tickets. Since few people keep losing lottery tickets this did not quell the storm. Therefore, for the period July 15 to July 31, lottery officials ran the same game but with all the prizes doubled. They also went back to the old-fashioned method of using numbered balls to determine the winning numbers.

DISCUSSION QUESTIONS:

(1) A ten-sided die has numbers from 0 to 9 on it. What is the probability that, in 96 rolls of the die, no 9 turns up? When do you think the error could reasonably have been detected?

(2) When you play the game, you declare one of four options: Exact Order, Any Order, Front Pair or Back Pair. Three ordered winning numbers are then drawn. If you choose "Exact Order" and your numbers agree with the winning numbers, counting order, you win $500. If you chose "Any Order" and chose three distinct numbers you win $160 dollars if you match the winning numbers, not counting order. If only 2 of your 3 numbers are distinct you win $80. If you chose "Front Pair" and your first two numbers match the first two numbers of the winning numbers, counting order, or you chose "Back Pair" and your last two numbers match the last two of the winning numbers, counting order, you win $50.

DISCUSSION QUESTIONS:

(1) Find your expected winning under each choice. Which choices have the highest expected winnings? Were any of the choices favorable games when the prizes were doubled?

(2) The Arizona Pick 3 game, between July 15 to July 31, was essentially a fair game. How much do you think this affected the number of people who played the game?

<<<========<<

>>>>>==============>
During the week of our Chance Workshop, the following puzzle was proposed by Will Shortz on NPR's July 5, 998 Good Morning America program.

Here's a mathematical puzzle by Sam Loyd, America's all-time greatest puzzlemaker. His most famous puzzle column began exactly 100 years ago this year in the old New York Journal. This challenge is called "Th Puzzled Puzzler", and Loyd wrote: "A letter was received a few days ago from one of our puzzlists who neglected to attach his name and address, and strange to say, the postmark was imperfectly printed, so that only the consecutive letters EST were decipherable. It being known, however, that the letter came either from CHESTER or WESTCHESTER, the question is to determine from a mathematical standpoint the chances in favor of one place or the other." So to paraphrase, the postmark has the consecutive letters EST. This is the only information you have to go on. Is it more likely that the letter came from CHESTER or WESTCHESTER? And what are the respective odds for the two places?

We proposed that participants of the workshop, as a group, submit a solution. Individual participants proposed different solutions. The discussion then turned to which of our solutions the group felt, based on psychological consideration, was most likely to be considered the "right" solution by the Loyd? One proposed the solution: the probability that the letter is from WESTCHESTER is 2/3 because WESTCHESTER has two EST's and CHESTER has only one. Another said that this would be just what would be looking for. She offered to accept bets up to a total of $100 that this would be the winning solution. Strangly, she was only able to collect %15 in bets.

Another participant also argued that the answer will be WESTCHESTER but based on the following argument: there are 5 ways of getting 3 consecutive letters out of CHESTER, only one of which is EST, but there are 9 ways of getting 3 consecutive letters out of WESTCHESTER, two of which are EST. Thus the probabilities for CHESTER and WESTCHESTER are 1/5 and 2/9 respectively, making the odds favor WESTCHESTER by a factor of 10 to 9. The person who proposed this solution claimed that Loyd would like the closeness of these odds.

We found it easier to make bets than to agree on a solution to submit. Those who bet on the 10 to 9 solution were glad that they did because this was Loyd's solution.

DISCUSSION QUESTION:

How would you solve this problem?

<<<========<<

>>>>>==============>
The question of whether McGwire will establish a new home run record was the topic of one of our workshop presentations. They based it on the following article:

McGwire gets better, and a record looks more vulnerable
New York Times, 9 July, 1998, A1
Buster Olney

This article discusses the chances of the leading contenders establishing a new home run record this year. As of July 9 the contenders were: St Louis Cardinals' first baseman Mark McGwire with 37 home runs, Chicago Cubs right fielder Sammy Sosa with 33 home runs, Seattle Mariners' center fielder Ken Griffey with 35. 62 home runs are needed to break Roger Maris' 1961 record of 61 home runs.

McGwire got 37 home runs in the first 80 games. Assuming the same rate per game in the remaining 82 games the article projects that he will end up with 75 home runs. Similar projections predict 64 home runs for both Sosa and Griffey.

In the workshop presentation, a table was provided showing how players who had 30 or more home runs before the All Star game did after the All Star game. Here is such a table:

Player Year Before After Total

Reggie Jackson 1969 37 10 47

Frank Howard 1969 34 14 48

Roger Maris 1961 33 28 61

Mark McGwire 1987 33 16 49

Kevin Mitchell 1989 31 16 47

Mike Schmidt 1979 31 14 45

Willie Mays 1954 31 10 41

Ken Griffey 1997 30 26 56

Brady Anderson 1996 30 20 50

Harmon Killebrew 1964 30 19 49

Willie Stagell 1971 30 18 48

Willie McCovey 1969 30 15 45

Willie Stargell 1973 30 14 44

Dave Kingman 1976 30 7 37

You can see from this table that those with a large number of home runs before the All Star game did not do as well after the All Star game. It would appear that this is a classic case of regression to the mean. To test this it was suggested that we also look at how those who had 30 or more home runs after the All Star game. Here is what we found:

Player Year Before After Total

Albert Belle 1995 14 36 50

Ralph Kiner 1949 23 31 54

Ralph Kiner 1947 20 31 51

Harmon Killebrew 1962 18 30 48

The regression to the mean effect still seems to apply but the fact that there are so few examples in this direction suggests that the All Star game may not really be in the middle of the season. Checking this for the particular players and years is difficult but can check this to some extent by looking at historical records for current players available on the remarkable CNN-Sports Illustrated sports database. For the three leading contenders we find from their historical records:

Mark McGuire Games At Bat Home runs At Bat/Home runs

Pre-All Star 769 2592 228 11.4

Post-All Star 583 1977 156 12.7

Sammy Sosa Games At Bat Home runs At Bat/Home runs

Pre-All Star 638 2429 118 20.6

Post-All Star 450 1582 89 17.9

Ken Griffy Jr. Games At Bat Home runs At Bat/Home runs

Pre-All Star 662 2485 164 15.2

Post-All Star 562 2108 130 15.6

All three players played significantly more games before the All Star game and so we should certainly be comparing the At Bat/Home runs Ratio when illustrating the regression to the mean effect.

Scott Page, professor of Economics at the University of Iowa, writes on sports under the name Orie Glen. His analysis of McGwire's chances, as of July 15, are available at Sportsjones.

Orie observes that McGwire's record back to 1986 shows a home run on average every 11.58 games. As of August 9 McGwire has come to bat 358 times and had 46 home runs. So he had, on average, a home run every 7.8 games. Orie thinks that a home run every 9.1 games is a reasonable estimate for McGwire for the rest of the season. He estimates that McGwire gets an average of 3.2 at bats per game. As of August 9th McGwire has 46 home runs. With 47 games remaining, McGwire, using Ories estimates, should come to bat 150 times and make 17 more home runs given him 64--enough for a record.

DISCUSSION QUESTIONS:

(1) From McGwire's lifetime record we find that he has come to bat an average of 3.7 times per game. Why do you think Orie chose 3.2. What would Orie's estimate be currently if he used 3.7 at bats per game.

(2) Using Orie's method of estimating the number of home runs McGuire will get, how would you estimate the probability that McGwire will set a new record.

<<<========<<

>>>>>==============>
John Emerson recommended the following link to the Computing and Graphics Newsletter, which is jointly produced by the Computing and Statistical Graphics Sections of the American Statistical Association.

Statistical Computing & Graphics

The Summer 1998 edition has a special feature article entitled "Interactive Education: A Framework and Toolkit" This article describes a project initiated at Berkeley called TILE (Toolkit for Interactive Learning and Education). The authors describe their pedagogical approach as motivated by the Freedman, Pisani and Purves "Statistics" text.

There is an interesting lab proposed based upon students using studies as reported in the news. Further information on the TILE project is available at A Toolkit for an Interactive Learning Environment (TILE)

(By the way, while you are browsing the Newsletter, you should also have a look at a nice article on Mosaic Displays in S-PLUS by John Emerson!)

<<<========<<

>>>>>==============>
America under the gun
Time, 6 July 1998, pp. 34-63.

This issue of Time Magazine features a special report on America's "gun culture". While crime rates are down overall, the school killings over the last year have focused attention on gun violence. Is easy availability of guns the culprit? Or is this just one more attempt to argue for restrictions on the constitutional right to bear arms? Various sections in this report portray a broad range of views.

Of particular interest for a Chance course is the section "Should you carry a gun? A new study argues for concealed weapons" (p. 48, by Romesh Ratnesar). This is a review of a controversial new book "More Guns, Less Crime: Understanding Crime and Gun Control Laws", written by University of Chicago economist John Lott (University of Chicago Press, 1988).

Lott's thesis is that criminals prey on the unprepared, so that allowing citizens ready access to guns is essential for deterring crime. He sees an analogy with the substitution effect in economics. When the price of apples rises relative to oranges, people will buy fewer apples and more oranges. Armed citizens represent a higher cost target for criminals. Lott favors allowing citizens to carry concealed weapons, on the grounds that leaving criminals unsure of who is armed results in greater protection for all. Such effects are known as "external benefits" in the language of economics.

The dust cover to the book touts Lott's study as "the most rigorously comprehensive data analysis ever done on crime". He begins with a comparison of crime rates in states with and without concealed handgun laws. But he finds considerable variability within states, and reports substantial county by county differences in how easy it actually is to get a gun permit. His analysis is intended to show that not only do the average crime rates fall, when concealed handgun laws are adopted, but furthermore that counties with the greatest increase in permits experience the largest reductions in crime.

Lott also disputes the conventional wisdom that more gun ownership leads to law-abiding citizens harming themselves. He criticizes what he calls "possibly the best known paper" in this area: Kellerman et. al., "Gun Ownership as a Risk Factor for Homicide in the Home" , New England Journal of Medicine, 7 Oct 1993, pp. 1084- 91. This article reported that the presence of a gun in the house was strongly associated with risk of homicide. This finding was based on a case control study in three counties, which compared 444 victims of homicides in the home with 388 individuals who lived near the victims and matched on sex, race and age. Gun ownership data was then ascertained for both groups, which led to the association described above. Disputing Kellerman's methodology, Lott asks the reader to consider an analogous (hypothetical) study on hospital care. He considers contacting the relatives of all residents of a particular county who died during a given year and asking whether the deceased had been admitted to a hospital during the year. Next assemble a control sample matched on sex, race, age and neighborhood, and ask whether these people had been in a hospital during the previous year. Lott says we should expect to find a strong positive association between having visited the hospital and dying, but this would never be construed as evidence that hospitalization was killing people.

There are many other issues for discussion raised in the book. For an indication of the kind of debate it has stirred up, you can find a transcript of an on-line discussion from the TIME.com web site.

Here Lott squares off with Douglas Weil, director of research for Handgun Control, Inc. and the Center To Prevent Handgun Violence. Weil cites criticisms of Lott's work by criminologist Gary Kleck. Chance magazine readers may remember Kleck as the co-author of the article "The Myth of Millions of Annual Self-defense Gun Uses: A Case Study of Survey Overestimates of Rare Events" in the Summer 1997 issue (see also Chance News 6.13 for discussion of this article). In the on-line discussion, Weil disputes Lott's survey data showing increases in gun ownership. He maintains that other factors are responsible for the reported decreases in crime rates.

DISCUSSION QUESTIONS:

(1) What do you think of Lotts' analogy between gun ownership and hospitalization?

(2) Lott asks "would removing all guns successfully discourage crimes because criminals would find knives and clubs poor alternatives?" He suggests that in this case the weakest members of society might find it even more difficult to defend themselves. Turning this around, suppose that everyone carries a concealed weapon. What methods or activities do you suppose criminals would "substitute"?

<<<========<<

>>>>>==============>
One of Marilyn vos Savant's readers also thought of using the coin example to explain the odds in a lottery.

Ask Marilyn
Parade Magazine, 12 July 1998, p10
Marilyn vos Savant

Reader Robert Gannett writes:

I know that each time you flip a coin, there's a fifty-fifty chance of it landing heads (or tails). The odds quoted for a state lottery are often around a million to one. Equating this to a coin flip, how many times must a coin come up heads consistently to equate to the odd of a million to one?"

"Consistently" here is taken to mean "consecutively."

Marilyn answers:

Surprisingly, only 20. But this doesn't show how "easy" it is to win a lottery. Instead, it shows how hard it is to get heads (or tails) consistently. Try it and see. Even if every one of PARADE'S 82 million readers flips a coin 20 times we would expect only 82 of them to get a consistent string of 20 heads.

DISCUSSION QUESTIONS:

(1) Do you understand Marilyn's comments about "easy" vs. "hard"?

(2) What number of consecutive tosses would correspond to a thousand-to-one odds? a hundred thousand to one? Does the coin toss analogy help you appreciate the magnitudes?

<<<========<<

>>>>>==============>
Increasingly, couples aren't the marrying types
The Boston Globe, 27 July 1998, A3
Barbara Vobejda, Washington Post

In 1960, the Census Bureau estimated that fewer than half a million couples lived together outside of marriage. The total passed 1 million in 1978, 2 million in 1986 and 3 million in 1991. The Bureau's newly released report on "Marital Status and Living Arrangements" estimates that as of March, 1997 the number was 4.13 million, passing the 4 million mark for the first time. The report is updated annually; the estimate for 1996 was 3.96 million.

The article says that the Census Bureau attributed the "steady increase" to the tendency of young people to delay marriage. Later, it is reported that the average age of women at first marriage was 25, up from 24.8 last year. But men's average age at first marriage was 26.8, down from 27.1 last year.

DISCUSSION QUESTIONS:

(1) How would you describe the trend in the number of couples living together outside of marriage. What other data would you want to have?

(2) Are you convinced that delaying marriage is responsible for the trend? What else would you need to know?

<<<========<<

>>>>>==============>
Frank Winkler suggested the following article:

Following Benford's law, or looking out for No. 1
The New York Times, 4 August 1998, F4
Malcolm W. Browne

This article is based on:

The first digit phenomenon
American Scientist, Vol. 86, July-August 1998, 358-363
T.P. Hill

The article begins with a description of a classroom experiment conducted by Professor Theodore Hill of Georgia Tech. For homework, Hill asks those students whose mother's maiden name begins with A through L to flip a coin 200 times and record the results and the rest of the students to imagine the outcomes of 200 flips and write them down. We first learned of this experiment in an article by Mark Schilling [The longest run of heads. The College Mathematics Journal 21 (1990) , 196-207]. The Times article points out that, when a fair coin is tossed 200 times, the odds are overwhelming that there will be a run of at last six consecutive heads or tails somewhere in the sequence; following Schilling's algorithm, we find that the actual probability is about 0.965. Most people find this result surprising. Thus, in a class experiment, an instructor can achieve a high rat of success at detecting the fraudulent sequences by flagging those which fail to contain a run of length six or greater.

This example is not an application of Benford's law but rather an example to show tat people have a hard time acting "randomly. Hill put this example after Benford's law has been explained when the example made sense relative to applications of Benford's law.

The Times article then states that Hill and others have successfully applied a statistical phenomenon known as Benford's Law to detect problems ranging from fraud in accounting and tax data to bugs in computer output. The law is named for Dr. Frank Benford, who was a physicist at General Electric and in 1938 verified it in many data sets. In fact, as Hill's article explains what is called Benford's law was discovered in 1881 by the famous astronomer Simon Newcomb. Newcomb noticed that tables of logarithms showed greater wear on the pages for lower leading digits. Apparently, the calculations people were performing involved relatively more numbers with lower leading digits. Naive intuition might suggest that leading digits should be distributed uniformly. But Newcomb conjectured instead that, for many collections of numbers, the chance of leading digit d is given by the base 10 logarithm of (1 + 1/d) for d = 1,2,...,9. Thus the chance of a leading 1 is not one in nine, but rather log(2) = .301--nearly one in three! This has come to be called Benford's Law.

Benford's law has been empirically verified for a wide range of data sets, including the numbers on the front page of the New York Times, tables of molecular weights of compounds, and random samples from a day's stock quotations. Dr. Mark Nigrini, an accounting consultant now at Southern Methodist University, wrote his Ph.D. dissertation on using Benford's distribution to detect tax fraud. He recommended auditing returns on which the distribution of digits failed to conform to Benford. In a test on data from Brooklyn, his method correctly flagged all cases in which fraud had been admitted. We discussed Nigrini's work in Chance News 4.10 and there you can see his graph indicating that the data from 13 years of Clinton's income tax records fit Benford's distribution very well suggesting that Clinton did not make up numbers on his income tax.

Nigrini points out that the test is not infallible. For example, analyses of corporate accounting data often turn up too many 24's, apparently because business travelers have to produce receipts for expenses of $25 or more. He also notes that the law won't help you pick lottery numbers. He says "the balls are not really numbers; they are labeled with numbers, but they could just as easily be labeled with name of animals. The numbers they represent are uniformly distributed."

DISCUSSION QUESTIONS:

(1) Do you understand the distinction between balls in the lottery actually being numbers vs. merely being labeled with numbers?

(2) The discoverers of Benford's law said that it should be satisfied for "natural" data. What do you think was meant by that?

(3) See if the law holds for the leading digits of the first 20 powers of 2.

(4) Hill says that it has been proven that Benford's distribution is the only distribution that is scale invariant. What does this mean?

<<<========<<

>>>>>==============>
The quantitative leap
New York Times, 21 July, 1998
Joseph Kahn

This article discusses the success of mathematicians and physicists who have gone to work in the stock market to attempt to make money by mathematical models for the movements of stocks. Such people are called quants. One such quant is David Shaw, a computer scientist who left academia 20 years ago to become a wall street trader. Shaw himself has been successful and heads his own company that employs some 1,050 people. They spend their time trying to locate and trade on price anomalies among two or more securities, called inefficiencies, before the market corrects the imbalance. According to the article the company trades shares in volumes that can account for more than 5 percent of the daily turnover of the New York Stock Exchange.

Shaw believes that each time their computers identify an inefficiency, the chances of making money on it are only marginally better than coin flipping. This is in contrast to the high hopes that people had for mathematical models and new theories such as chaos. Whatever secrets their computers uncover are short lived as the market also discovers them. Shaw say that a few quants have done well but the majority have been forced out.

Eric Sorensen, head of quantitative research for Salomon Smither Barney, claims that this kind of "statistical arbitrage" works well in a stable market but when the market begins t fall apart you can lose a lot of money.

Some say that it is only a matter of time until the mathematicians figure it all out but others say that the lack of real progress is just another indication that the market is really a random walk or something very close to it.

<<<========<<

>>>>>==============>
The dedications of Shakespeare's sonnets

Elizabethan Review, autumn 1997, p. 93 to 122
John M. Rollett

This article has provoked a controversy on the internet somewhat like that of the bible codes. In his article, Rollett claims to have found a cryptogram in the dedication to the Shakespeare Sonnets that he thinks sheds some light on the Shakespeare authorship problem. This dedication is:

TO.THE.ONLIE.BEGETTER.OF.

THESE.INSVING.SONNETS.

Mr.W.H. ALL.HAPPINESSE.

AND.THAT.ETERNITIE

PROMISED.

BY.

OVR.EVERLIVING.POET

WISHETH.

THE.WELL-WISHING.

ADVENTVRER.IN.

SETTING.

FORTH.

T.T.

The T.T. stands for the publisher Thomas Thorpe. However, the identity of Mr. W.H. has been a mystery. In part I of his paper Rollett purports to solve this mystery by showing that it is Henry Wriothesley, 3rd Earl of Southampton. Wiothesley is regarded by many commentators to be the "onlie begetter" and the young man to whom many of the sonnets are addressed.

Rollett displays the 144 word dedication in a rectangle with 8 rows of 16 letters:

t o t h e o n l i E b e g e t t e r

o f t h e s e i n S v i n g s o n n

e t s m r w h a l L H a p p i n e s

s e a n d t h a t E T e r n i t i e

p r o m i s e d b Y O v r e v e r l

i v i n g p o e t w I s h e t h t h

e W e l l w i s h i n g a d v e n t

v R e r i n s e t t i n g f o r t h

Here he found WRIOTHESLEY broken up into three parts WR-IOTH-SLEY in columns 2, 11, and 10 reading down, up, and then down. Rollett found further verification of the cryptogram when he found HENRY along a diagonal in a similar rectangle with 9 rows of 16 characters.

In the second part of his article Rollett argues that another part of the cryptogram identifies the author of the Sonnets. He observes that the dedication is written in the form of three triangles with 6, 2 and 4 rows respectively. This suggests looking at the 6th, 2nd, 4th, 6th, 2nd 4th words etc. Doing this he obtained the phrase:

THESE SONNETS ALL BY EVER

Thus Rollett argues that the author is Edward de Vere, 17th Earl of Oxford, one of the leading contenders to have been the real author of the Shakespeare works. He notes that the 6 2 4 code used is also suggested by the number of letters in the three parts of the name Edward de Vere.

Rollett estimates the probability of his findings these names by chance to be in the area of 1 in 10 billion.

Of course the real problem is to decide if it makes sense to try to assign a probability for these findings just on the information given in his article, and, if so, how do you do it? A fine question for a Chance class discussion!

DISCUSSION QUESTIONS:

(1) If you wee a Baysian, how would your apriori probability that the bible contained information about the future compare with your apriori probability that the author of the dedication of the Sonnets would use a cryptogram to indicate who he was and the person to whom he dedicated the sonnets?

(2) Do you think you have enough information to estimate the probabilities of obtaining these results by chance? If not, what other information would you want?

(3) The famous cryptologists William and Elizebeth Friedman, in their book "The Shakespearean Ciphers Examined", Cambridge Press 1957, state that "the plain-text solution must make sense; it must be grammatical and it must mean something." They also say that "the decipherer must be told unambiguously, either in the message itself or in some other way, which key is actually being used; and unlike the encipherer he must not be allowed to exercise his judgment at all. Does this cipher fit these requirements? What is the key?

<<<========<<

>>>>>==============>
Our next article was suggested by Norton Starr.

Reckonitis: A cognitive deficit of social origin
Perspectives in Biology and Medicine, Spring 1998, pp. 349-358
Mikel Aickin

Aickin proposes a new class of diseases that he calls "epidiseases" which are afflictions of those who study diseases. An example, is "reckonitis" which is the inability to carry out the computations of disease incidence or probability. In this paper Aickin examines one particular manifestation of reckonitis: the inability to compute the lifetime risk of breast cancer.

Aickin starts with examples where he feels that reckonitis is present: scientific journals (like Science), advocacy groups (like the Women's Health Letter), newspapers columnists (like Ann Landers), neurologists (like William Landau), and popular health writers (like James Walsh). Aickin provides quotations from these sources and explains what is wrong with each one. For example, from Walsh's book "Cancermania", Aickin provides the following quote relating to the infamous Cancer Society claim that 1 woman in 9 will suffer from breast cancer:

Technically, the Cancer Society's 1-in-9 number is misleading because it is an incidence rate. Most people confuse the incidence rate with a mortality rate--which it isn't. Even accepting the Cancer Society's questionable assumptions, the mortality risk for breast cancer would be 1 in 28.

Aickin comments:

This confabulation confuses a lifetime probability with an incidence rate (a factual mistake) but places it next to a correct statement (a breast cancer incidence rate is not a breast cancer mortality rate). The author then goes on to assert an undocumented "mortality risk" which is completely undefined even in a technical sense, although it is again true that a woman's lifetime breast cancer mortality probability is about 1 in 28.

After similar complaints about other writers, Aickin goes on to show how easy it is to compute the one-year probability of breast cancer incidence and mortality, given survival to each age, and, from these, to find the lifetime probability of incidence and mortality of breast cancer, given survival to each age. We include, in the web version of this Chance News, his graphs exhibiting these probabilities.

Aickin comments that the calculations are quite easy and could be understood and carried out by a high school student with a $10 calculator and a little knowledge of probability. He suggests that this form of reckonitis could be cured by simply teaching basic probability in high school or early college.

DISCUSSION QUESTIONS:

(1) Do you understand Aickin's criticism of Walsh's remarks? Do you agree with them?

(2) Sometimes it is said that women interpret the Cancer Society statement as meaning that 1 in 9 of the women in their bridge club will get breast cancer. Is the probability of a lifetime incidence of breast cancer the same as the probability that a randomly selected women from these who do not have breast cancer will develop breast cancer during her lifetime?

(3) Do you agree that this kind of reckonitis could be cured by teaching basic probability in high school?

<<<========<<

>>>>>==============>
Probabilistic proofs
Mathematical Intelligencer, Vol. 20, No. 3, 1998
Alexander Shen

This article provides a number of examples of ways to use probability in proving results that do not seem to involve probability. Here are three of Shen's examples that we found interesting in increasing order of difficulty.

(1) It is known that oceans cover more than one half of the earth's surface. Prove that there are two symmetric points covered by water.

To prove this, choose an x on the surface of the earth at random. Denote by x' the point antipodal to x. Then P(x is in the ocean) is greater than 1/2 and P(x is in the ocean) = P(X' is in the ocean). But P(that x or x' is in the ocean) = P(x is in the ocean)+ P(x' is in ocean) - P(x and x' are in the ocean). If the last term were zero we would have probability greater than 1 for the P(x or x' is in the ocean.) Thus the probability that both x and x' are in the ocean is greater than zero so there must be at least one such point where this is true.

(2) A sphere is colored in two colors: 10% of its surface is white, the remaining part is black. Prove that there is a cube inscribed in the sphere such that all its 8 vertices are black.

This is proved in a method similar to (1). Choose a random inscribed cube. Then the probability that and one corner is white is .1 so the probability that at least one corner is white is at most .8. Thus the probability that none are is at least .2 so there must be such inscribed cubes.

These two are considered toy examples by the author, and the following is an example from his non-toy examples.

(3) A piece of paper has area 10 square centimeters. Prove that it can be placed on the integer grid (the side of whose square is 1 cm) so that at least 10 grid points are covered.

The author suggests the following proof:

Place a piece of paper randomly on the grid. The expected number of grid points covered by it is proportional to its area (because this expectation is an additive function). Moreover, for big pieces the boundary effects are negligible, and the number of covered points is close to the area. So the coefficient is, and the expected number of covered points is equal to the area. If the area is 10 the expected umber of points covered is 10 so there must be at least one way to put the paper down to cover 10 points.

John MacKey and John Lamperti provided us with more direct proofs involving the fact the change of the order of integration theorem. These solutions, while more convincing, are a bit technical for Chance News. We will put John Lamperti's proof on the web version of this chance news.

DISCUSSION QUESTIONS:

(1) In problem (2), we are given a sphere and need to choose an inscribed cube at random. How do we do this?

(2) Give a proof of (3) that convinces you.

This work is freely redistributable under the terms of the GNU General Public License as published by the Free Software Foundation. This work comes with ABSOLUTELY NO WARRANTY.

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

CHANCE News 7.07

(27 June 1998 to 8 August 1998)

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!