CHANCE News 6.12

(10 October 1997 to 9 November 1997)


Prepared by J. Laurie Snell and Bill Peterson, with help from Fuxing Hou, and Joan Snell, as part of the Chance Course Project supported by the National Science Foundation.

Please send comments and suggestions for articles to jlsnell@dartmouth.edu.

Back issues of Chance News and other materials for teaching a Chance course are available from the Chance web site:


Note: We got Bible code fever and fell behind. We will send out part 2 of Chance News 6.12 later.


DNA neither knows nor cares. DNA just is.
And we dance to its music.
Richard Dawkins
River out of Eden p. 133

Contents of Chance News 6.12 Part 1

Chance News 6.12 Part II

We are pleased to announce the (first?) "Chance Lectures"
at Dartmouth College, December 12 and 13, 1997.

The Chance project is devoted to the use of current news items in the teaching of probability and statistics. The goal of this lecture series is to bring together experts in a variety of subject areas which regularly appear in Chance news. We've asked the speakers to present the basic ideas of their subjects in a manner which will be accessible to an inquisitive and intelligent audience with "newspaper knowledge" of their area. The lectures will eventually be put on the Chance web site and are intended to be an educational resource.

This Lecture Series is supported in part by the National Science Foundation.

Schedule for Chance Lectures

Bob Hayden writes:

Your Chance acquaintances may be interested to hear that I found the hard cover version of Tainted Truth for $4 in a Book Warehouse. They had MANY copies. Presumably this means it is out in paperback, which may also be of interest.


From our long dormant Chance Discussion listserve we received the following amusing contribution. On Oct. 13 Nightline Ted Koppell was discussing the various natural disasters that were being predicted because of El Nino. While he was questioning Dr. Richard Andrews, director of the California Office of Emergency Services, we heard:

TED KOPPEL: Dr. Andrews, I'm sure you have heard such cautionary advice before so on what basis is the assumption being made that this is the one that's going to have the kind of impact on southern California in particular that's being predicted?

RICHARD ANDREWS: Well, in the business that I'm in and that local government and state government is in, which is to protect lives and property, we have to take these forecasts very seriously. We have a lot of forecasts about natural hazards in California and we have a lot of natural events here that remind us that we need to take these forecasts seriously. I listen to earth scientists talk about earthquake probabilities a lot and in my mind every probability is 50-50, either it will happen or it won't happen. And so we're trying to take the past historical record, our own recent experience of the last, two of the last three years and make the necessary preparedness measures that can help protect us as much as we can from these events.


Referring to our article in Chance News 6.11 on the sex-ratio Maya Bar-Hillel suggested that readers would enjoy Richard Dawkins's discussion of R. A. Fisher's theory of how changes in the sex-ratio occur. This appears in Dawkins's book "The Garden of Eden" (Basic Books, 1995) in Chapter 4 entitled "God's utility function". Maya said that it is "fabulous reading" and we agree.

Maya also sent us an English translation of the following article on the bible codes that appeared in Galileo, a Hebrew science journal. You will be able to obtain this translation from web version of this issue of Chance News.

There are codes in War and Peace too.
Galileo, vol. 24, November-December 1997
Maya Bar-Hillel, Dror Bar-Natan, Brendan McKay

This article reports on new work related to the Witztum, Rips and Rosenberg (WRR) paper on Bible codes (also called Torah codes) that appeared in Statistical Science 1994, Vol, 9, No. 3, pp 429- 438 (Chance News 6.07). We shall refer to the authors of the Galileo article as BBM and the authors of the Statistical Science article as WRR. WRR reported on the following experiment.

The names of 34 famous Rabbis, born long after Genesis was written, were chosen from an encyclopedia of famous Rabbis. For each Rabbi, WRR chose a set of names and titles that would identify the Rabbi and a set of dates that represent the date of birth or death of the Rabbi.

WRR then considered a Hebrew version of Genesis as a string of 78,064 Hebrew letters with no spaces. They defined an equi-letter- skip (ELS) word as a word whose letters occur in this string of letters of Genesis, separated by sequences of letters of equal length. An elementary probability calculation shows that we can expect, just by chance, that most of the names and dates of the Rabbis will be ELS words.

WRR then defined a notion of distance between two ELS words and hypothesized that the names and dates of the Rabbis would be closer together than could occur by chance. This hypothesis was tested and the results were highly significant (p = .000016).

The referees suggested that the authors choose a completely new set of famous Rabbis and test their hypothesis again. They did and again obtained highly significant results. Finally the referees asked the authors to test their hypothesis in the Hebrew version of another work of similar size. They did so, using the first 78,064 letters of the Hebrew translation of War and Peace. The Rabbis' names and dates again appear as ELS words but the degree of closeness of their names and dates was not significant. On the basis of these tests the referees accepted the paper for publication in Statistical Science.

We have now a situation very similar to a well designed experiment in extra-sensory perception with highly significant results that a skeptic just doesn't believe. What does the skeptic do? He looks for something in the experiment that was not done quite right. MBB play the role of skeptics. They believe that recent experiments, carried out by Dror Bar-Natan and Brenden McKay, cast considerable doubt on the claims of Witztum and that the choice of names and dates to use for the Rabbis was made before any consideration of where the names and dates occurred as ELSs in Genesis. WRR claim, in fact, that they (WRR) did not even make these choices, but rather they were made by other historical scholars for them.

So what did Bar-Natan and Brenden McKay (BM) show? They showed first that, whoever chose the names and dates to use for the Rabbis for the WRR article, made a significant number rather arbitrary choices--especially for names of the Rabbis. BM then asked if they could find a significant result in War and Peace if they were allowed to make judicious choices to help their cause. BM considered the second list of Rabbis chosen by WRR at the suggestion of the referees. They kept the same dates but made modifications in the choice of names. Specifically they dropped 20 of the names from the 90 names used by WRR for the Rabbis and added 30 new ones. The new names were ones that research suggested to BM could equally well have been chosen by WRR. With these changes, BM found the same kind of significant results in War and Peace that WRR found in Genesis.

The authors conclude that one explanation for the significant results of WRR is that they "cooked" their data. The authors report on additional evidence for this cooking. They state that almost every one of the apparently quite arbitrary choices WRR made in choosing the names, increased the significance of the result. Of course, this itself could be considered evidence of divine intervention.

Well, that leaves us with the familiar ESP situation: the believer continues to believe and the skeptic continues to be skeptical. (See the reports by Jessica Utts and Ray Hyman (Chance News 5.04) as they assessed the research on extra sensory perception sponsored by the Defense Intelligence Agency during the cold war.


(1) Harold Gans, former senior cryptologist with the Department of Defense, took the names of all 66 Rabbis and replaced the various spellings of dates of birth or death of the Rabbis used by WRR by the spellings of the cities where the Rabbis were born or died. Gans again obtained a highly significant result (p < 1/143,000). If you were asked to referee papers by Gans, purported to confirm the WRR results, or by BM purporting to show how the WRR results could have been flawed, how would you decide whether one or both of these papers should be accepted?

(2) What would you estimate to be your apriori probability for the hypothesis proposed by WRR before their experiment? How would the results of WRR change this apriori probability?

Another skeptical look at the Torah Codes has been provided by Barry Simon of California Technology. Simon is one of the countries leading mathematical physicists and himself an Orthodox Jew. His article "A skeptical look at the Torah codes" will be published in the March issue of Jewish Action. A preliminary version is posted on the web Barry Simon on Torah Codes.

Simon has an interesting discussion, based on his considerable experience as an editor of a scientific journal, about the overstated claims regarding the WRR article that have been made, especially by religious groups, based on acceptance of this work by Statistical Science and supporting comments by leading mathematicians.

Simon discusses some of the same issues raised by MBM in their Galileo article. Another concern Simon mentions is that the definition of distance between EDLs provided by WRR is extremely complicated and not a natural definition that other mathematicians would have been likely to choose. The statistical significance of the WRR could be quite sensitive to the form of this definition and Simon suggests that, without even realizing it, WRR could have been influenced by what works in choosing this rather unnatural definition of distance.


Simon says that he believes it would be impossible to disprove the WRR claim that the Torah has hidden codes. He writes:

I explicitly asked Professor Rips this question and he admitted it was an interesting question to which he didn't have an answer. If it isn't possible to disprove, then the hypothesis is not a scientific hypothesis. This is not to say that statistical analysis can't be a valid way to analyze what might be going on, but without the possibility of disproving a hypothesis, that hypothesis is outside the realm of science as we understand it.
What do you think about this?

For a serious discussion by a believer in the existence of Torah codes, you can read another new book on the Bible codes: "Cracking the Bible Code" by Jeffrey Satinover (William Morrow, 1997, $23).

Satinover is a psychiatrist who is currently studying physics at Yale. His book is meant to be a complete story of the search for Bible codes from the point of view of a "believer" with an "open" mind. We were rewarded by reading this book in finding ourselves quoted as saying: "the Chance course at Dartmouth College is designed to train students to be suspicious of what look like Œmeaningful coincidences¹ when careful statistical analysis will show they are not meaningful at all." This appears in a discussion of how things will change if the claim of the existence of Torah codes becomes widely accepted by the scientific community.


What evidence would be needed to convince the scientific community of the existence of Torah codes?

Finally, what about Michael Drosnin's best seller book "The Bible Code" suggested by the work of WRR (Chance News 6.07). Barry Simon remarks "at one point was on the top ten best sellers list simultaneously in New York, London, Paris and Rome.

This book has been ridiculed by everyone, including Witztum and Rits, except Warner Brothers who have signed the book up saying that it has the potential for a fine movie. Perhaps the most amusing put down of this book resulted from a challenge made by Drosnin:

When my critics find a message about the assassination of a prime minister encrypted in Moby Dick, I'll believe them. (Newsweek, Jun. 9, 1997)
Of course critic Brenden McKay could not resist the challenge. McKay found in Moby Dick "Grandhi" near "the bloody deed", "Trotsky" near "executed", "M L King" near "to be killed by them", "Kennedy" near "shoot", "Lincoln" near "killed" and "Princess Di" near "mortal in these jaws of death." You can find these results and much more about the work of Brenden McKay and his colleagues related to Bible codes at web site Torah Codes.
You will find here also the full text of Moby Dick in case you want to do your own exploring.


Closeness of two ELSs is shown, at least informally, as follows: Imagine putting the whole text of Moby Dick (without spaces) on a single page. Then ELSs appear as lines, vertical, horizontal, or slanted somewhere on this page. The text can appear on the line in either of the two possible directions. Two ELSs are considered close if you can choose the line length of the page containing the entire text so that these two ELS lines are confined within a rectangle that is small compared to the rectangle (page) containing the entire book. For example, by a judicious choice of line length, McKay finds "Kennedy" and "shoot" in Moby Dick within a 32 by 15 rectangle of letters. McKay also finds five other words including "rifle" and "coffins" or phrases including "has been so killed" within this same rectangle that could be associated with Kennedy's assassination!

(1) If you find two ELSs by searching in a single string of letters representing Moby Dick, how would you decide if you could choose a line length that would make them appear in a relatively small rectangle?

(2) Using the full text of Moby Dick, Albin Jones provided us with the following distributions for the occurrences of the 26 letters in Moby Dick.

A: 75566  (8.169677%)
B: 16413  (1.774461%)
C: 21755  (2.352001%)
D: 37243  (4.026457%)
E: 113660 (12.288139%)
F: 20286  (2.193183%)
G: 20265  (2.190913%)
H: 60966  (6.591225%)
I: 63761  (6.893402%)
J: 1043   (0.112762%)
K: 7843   (0.847931%)
L: 41683  (4.506480%)
M: 22700  (2.454168%)
N: 63918  (6.910375%)
O: 67454  (7.292663%)
P: 16631  (1.798030%)
Q: 1494   (0.161521%)
R: 50466  (5.456037%)
S: 62477  (6.754584%)
T: 85524  (9.246268%)
U: 25874  (2.797319%)
V: 8344   (0.902096%)
W: 21546  (2.329406%)
X: 998    (0.107897%)
Y: 16417  (1.774893%)
Z: 629    (0.068003%)
Total number of letters = 925,141

If a book with about a million letters is written, with letters chosen randomly according to this distribution, what is the expected number of times that the ESL "Kennedy" would appear in the book? Would it be reasonable to use the Poisson approximation to estimate the probability that this ESL occurs at all?

(3) See if you can find another estimate for the distribution of the frequency of letters in a typical English text. See for example, "Secret and Urgent" by Fletcher Pratt, Bobbs-Merrill, 1939. Is this second estimate reasonably consistent with the distribution found from Moby Dick? If not, why not? (If you know of a better source of letter frequencies please let us know about it).

The jungles of randomness.
Ivars Peterson
Wiley, 1997
ISBN 0471164496

We often say that probability plays a key role today in almost every field of knowledge. On the other hand, reading a typical probability book, you would never realize this. It has taken mathematician-science writer Ivars Peterson to show us that, what we have been claiming on some kind of blind faith, is really true.

This final paragraph of the preface sets the stage and the tone of the book.

The trek through the jungles of randomness starts with games of chance. It proceeds across the restless sea of life, from the ebb and flow of human concourse to the intricacies of biological structure and the dynamics of flashing fireflies. It wanders into the domain of sounds and oscillations and the realm of fractals and noise. Glimpses of gambling lead to a lifetime of chance.
The central theme of this book is the interplay of order and disorder or determinism and randomness. We can be sure that any pattern of heads and tails we prescribe will occur if we toss a coin enough times, but yet we feel the outcomes are purely random. Our beloved big dipper in the sky is perfectly compatible with a model of random placement of the stars in the sky. It can also be explained by Ramsey's combinatorial theorem that is a purely logical result. Erdos's random graph theory brings these two approaches together and permits Peterson to introduce his readers to Paul Erdos.

The middle third of the book is centered around the many manifestations of oscillations. We learn how studying "the bounds of a kangaroo, the graceful leaps of a gazelle, the rocking gait of a cockroach and the slithering of a snake" can help us build robots that will wander over the surface of Mars. Another marriage of probability and mathematics is presented with the discussion of the solutions of Mark Kac's problem: Can you hear the shape of a drum?

The study of the synchronous behavior of the lights of fireflies provides us with a dramatic example of interacting particle systems; a major branch of modern probability theory.

The fact that Peterson is a science writer (mathematics and physics editor at Science News www.sciencenews.org) makes him aware of most recent developments in science that use probability concepts. We found his discussion of the recent applications of Levy flights and of the new ways to let nature provide us with truly random numbers completely new to us and sent us off to the library to look up the references provided for these topics. References are given by chapter at the end of the book.

Even though this book surveys many areas, Peterson provides a level of detail that makes the reader become involved in the phenomena being discussed. Even in reading about such a mundane object as a slot machine, we learn that the spinning wheels of the old slot machines determined the random final combination and told us if we had won a prize, but, with the modern electronic slot machines, before the wheels even start spinning the random number generator has determined where they should stop. The wheels are there just for nostalgia!

Each year in June we have spectacular displays of fireflies in our back yard. In future years watching these displays will remind us of our enjoyment reading this fascinating book.

Prenatal care better at HMOs, study finds.
The Boston Globe, 21 October 1997, pA12
Associated Press

This article reports on a review of 8000 birth records for babies born in the Seattle area during 1992-3. Half the babies were born to women in an HMO and half to women in private insurance. The study found that women enrolled in HMOs were 40% less likely to receive inadequate prenatal care and 30% more likely to give birth to a baby weighing more than 5.5 lbs -- usually a marker for good health.

On the other hand, the article points out that "for reasons not understood" HMO patients with no obstetrical risk factors were 40% more likely to have labor and delivery complications. However, the authors of the report stressed that the incidence of complications was less than one percent in both groups, affecting only a few of the 8000 patients in the study. Moreover, the differences might be attributable to more thorough record-keeping at HMO hospitals, so that details on complications would be more likely to be found there.


(1) What do you suppose constitutes "adequate" prenatal care? How would you interpret the claim that HMO patients are "40% less likely to receive inadequate prenatal care"? What else do you need to know?

(2) If anything near 1% of the 8000 had complications, wouldn't you regard this as more than "a few" patients? Why do you think the result was reported in this way?

(3) If the number of complications really is small, why didn't the authors simply state that the differences might represent chance variation?

(4) What do you think of the explanation that HMO hospitals are doing more thorough record-keeping?

AIDS researchers drop placebo plan.
The Boston Globe, 24 October 1997, pA1
Richard A. Knox

This updates a story from Chance News 6.11, where we reported on the ethical controversy surrounding the use of placebo in a study of "short course" AZT treatment to prevent mother-to-fetus transmission of HIV. The present article reports that Johns Hopkins University researchers have dropped plans to use the placebo design in Ethiopia.

However, according to Dr. Joseph Saba, who oversees the studies worldwide, the decision is not a response to the criticism. There are five studies currently underway in Africa and Asia, and he insists that these are not unethical. If, however, these begin to show positive results, then the newer trials would have to be stopped prematurely. Dr. Jack Killen, of the National Institutes of Allergy and Infectious Diseases, adds that it had been part of the larger plan all along to rethink the design if positive results started to appear.

US advocacy groups are pressuring the US Dept. of Health and Human Services to end placebo use in any further studies of mother- infant studies funded by the US. They maintain that 1993 studies in the US already demonstrated that the short course is better than no treatment at all. Dr. Saba and other officials dispute this interpretation of the data.


The article does not actually say that positive results have appeared so far. If none do, do you believe it will be feasible to start new trials with placebos?

Star watch: Gum wrapper astrology.
The Boston Globe, 3 November 1997, pC6.
Alan M. McRobert

McRobert points out that the horoscopes in today's "Globe" actually derive from ancient interpretations of gods walking among the constellations. Yet astrologers don't seem to mind that the positions of the constellations have changed in the last 2000 years, due to the astronomical phenomenon of precession. So it's likely that a person classified today as a Gemini was actually born while the sun was in Taurus. So why aren't Geminis complaining about inaccuracies in their horoscopes? Because, says McRobert, there's no way to notice: any horoscope will work for you as well as any other!

Any piece of advice, he explains, can make you look at your life in a new way, thereby leading you to some insight. He claims that, when people are shown several personality readings based on Zodiac signs, they can't pick their own more often than would be expected by chance. Similarly, if key words in a person's forecast are replaced by their opposites, the modified ones will be rated as being just as insightful as the originals. He doesn't provide references for these results, but the results suggest some interesting activities for student projects for a CHANCE course.

The title of the article is a reference to the comics that appear on the inside of Bazooka bubble gum wrappers. The author recalls how he and his childhood friends would pick up wrappers they found discarded on playgrounds and try to interpret the comics as parables about their lives. It always seemed to work.


How would you design an experiment to test whether switching words with their opposites affected people's feelings as to how "insightful" their horoscopes are?

Correction: In Part I we referred to a book by Richard Dawkins which had a chapter "God's Utility Function" in which Dawkins discusses the evolution of the sex-ratio. The name of the book should have been "River out of Eden" instead of "The Garden of Eden".

Joan Garfield, our colleague on the Chance Project, has been busy this year editing two books. Both books will be of interest to our readers.

The assessment challenge in statistics education.
Edited by I. Gal and Joan Garfield
IOS Press, 1997
ISBN 90 5199 333 1, $65 (US)

In this book, international experts in statistical education discuss assessment in statistics courses at the pre-college and college level. Particular attention is paid to a first course in statistics, but the methods described apply more generally.

There is remarkable agreement among these authors on what should be taught in an introductory statistics course and the role of assessment in teaching such a course. The authors almost all feel that a modern introductory statistics course should be data driven and emphasize the understanding of what statistics is and how it is used in everyday life, as opposed to the traditional course which emphasizes special statistical techniques and their mathematical basis.

The contributors stress that how you assess students depends upon what you are trying to teach them. In dealing with data, if you just want to know if the students can compute means, medians and draw histograms, then traditional assessment methods work fine. If you want to know if students understand what these descriptive quantities tell you about the data, you must develop new kinds of questions. If you want students to understand connections between statistical concepts, then they should have practice in making these connections. In chapter 8, Candice Schau and Nancy Mattern discuss how this can be done using "concept maps". If you want students to understand how statistics is used in everyday life, then asking them to read and critically assess current issues in the news is a good way to achieve this. In Chapter 9 Jane Watson discusses how she does this.

Learning by doing statistics lends itself to active rather than passive learning. This can be accomplished by having students work co- operatively in groups on activities and projects. Several chapters are devoted to the new assessment problems this poses. For example, how do we assess individual contributions of members of the group? Again, the assessment process itself should be carried out in a way that enhances the effectiveness of the co-operative learning process.

That new assessment problems may be difficult, but can be solved, is dramatically illustrated by Peter Holmes's story in Chapter 9. Holmes tells us about the task that outside examiners faced in assessing the statistical knowledge of 18 year old students in schools and colleges in the United Kingdom. The examiners realized that the traditional examinations were not assessing things the examiners felt most important for the students to know and, as a result, their examinations were distorting the learning and teaching process itself. This led the examiners to add a compulsory project as a significant part of their evaluation of the students. Such courage should inspire all of us to examine our own assessment methods and then run to buy this book.

Research on the Role of Technology in Teaching and Learning Statistics.
Edited by Joan B. Garfield and Gail Burrill
ISI (International Statistics Institute) $30 soft-cover

These are the proceedings of the 1996 IASE round table conference held at the University of Granada, Spain, 23-27 July 1996. We had the pleasure of participating in this roundtable conference. The participants were 36 researchers from 13 different countries who had done serious work on the use of technology in teaching statistics. It was a charmed meeting in a beautiful setting with a very congenial group of people.

The articles in this book are grouped according to: (1) How technology is changing the teaching of statistics at the secondary level, (2) Developing exemplary software, (3) What we are learning from empirical research, (4) How technology is changing the teaching of statistics at the college level, and (5) Questions to be addressed on the role of technology in statistics education.

Of course standard statistical packages play an important role in teaching a modern statistics course. However, the technology discussed at this conference was software used to enhance the understanding of particular statistical topics. You will not find here the same consensus that you will find in the assessment book. Researchers in this area are still experimenting and trying to find the most user-friendly and effective software. Those reading these contributions will have their own favorites. We particularly enjoyed Gail Burrill's discussion of how she uses graphing calculators in her teaching. It is quite amazing what Gail could do with this small tool. The ability to write small procedures to plot custom-made graphs and to carry out simulations is especially useful.

The Central Limit Theorem by plotting the distribution of sample means seemed to be a favorite of developers. However, our favorite was a regression demo presented by John Behrens. In this demo, a population scatterplot was provided and the user was allowed to take repeated samples and to see the regression lines. This nicely shows how the variation of these lines depends on the sample size.

Assessment was sometimes lacking but not so in the work carried out at Tufts and reported by Steve Cohen and Richard Chechile. This was an impressive program developed jointly by several departments at Tufts. This software allows students in an introductory statistics course to experiment with a wide variety of statistical concepts. Assessment is based on recording every click the student makes recorded in terms of its purpose: how often did the student click on the "help button"? How many mistakes did the student make? etc. Our former colleague John Kemeny would be horrified by this since he used to tell his students: "The computer is your friend. It will never care how many mistakes you make!"

A huge amount of statistical software has been developed in recent years but, until now, it has been difficult to assess the quality of the software and how it can be used to improve our courses. In this book Joan and Gail have brought us a terrific resource to answer these questions.

Not surprisingly, our contribution to the ISAE roundtable discussion was a paper on statistical resources on the internet. Also, not surprisingly, this paper is already out of date, and so we will add here some interesting new internet resources we have found since writing this paper.

WebStat 1.0 Beta
Webster West, Department of Statistics, University of South Carolina

Webster West wrote the elegant applets for illustrating statistical concepts that we have referred to many times. His latest product, Webstat, is a statistical package designed to allow the user to analyze data on the web with the usual graphic tools and statistical tests. It works on any platform and is free.


Statlets. Java Applets for statistical analysis and graphics.
NWP Associates, Inc., Princeton NJ

Like WebStat, Statlets allows you to analyze data on the internet. It provides the standard types of graphical output and statistical tests. You can also download Statlets to your machine and run it locally. The academic version is free and permits data with 100 rows and 10 columns. The commercial version ($195) permits up to 20,000 rows and 100 columns.


HyperStat Online.
David Lane, Departments of Statistics and Psychology, Rice University

This is an introductory-level hypertext statistics book that can be read on the web. Alongside each chapter there are links to related material on the web including demos and related excerpts from other on-line text materials.


Introductory Statistics: Concepts, Models, and Applications.
David W. Stockburger, Psychology Department, Southwest Missouri State University

This is an introductory text by David Stockburger that is freely available on the web and can be downloaded as a zip file. This book is written for psychology and other behavioral science students with an emphasis on understanding the relation between statistics and models and measurement as a part of modeling.


Journal of Statistical Software.
UCLA Department of Statistics.

This journal publishes software and descriptions of software useful for statisticians. The Journal is peer-reviewed, electronic, and free. Articles are mostly appropriate for statistical research and advanced courses but it is where we learned WebStats.


Virtual Laboratories in Probability and Statistics.
Kyle Siegrist, Department of Mathematical Sciences, University of Alabama in Huntsville

This is an NSF project to develop interactive, web-based modules in probability and statistics. Each module explores a topic by means of expository text, exercises graphics, and interactive applets written in Java. This is an excellent site for those who still want to have their students understand, in a painless way, the probability theory behind some of the basic statistical concepts and tests. Kyle Siegrist, is also the author of "Interactive Probability" Wadsworth 1997 (See Chance News 6.06).


Lies, Damn Lies, and Psychology.
David Howell, Department of Psychology University of Vermont

This is the homepage for a course modeled after the Chance course but adapted for psychology students. This course was taught in the Fall Term 1997.


Seeing Statistics.
Gary H. McClelland, Department of Psychology, University of Colorado at Boulder.

This is an overview of a project to develop an interactive elementary statistics book using Java Applets. This book is under development in conjunction with Duxbury Press. You will find here a discussion of the design of the book and a sample chapter.


Carl James Schwarz, Department Mathematics and Statistics, Simon Fraser University

StatVillage is a hypothetical city on the web consisting of 128 blocks with 8 houses in each block and designed to permit students to carry out real-life surveys. Students decide on questions they want to ask, choose their sample, click on the houses in their sample, and get the results of their survey. The questions they can ask are ones that can be answered by census information and the answers they get are based on real census data. We tried it in a class and it was a great success (See Chance News 6.09).


Gary C. Ramseyer, Department of Psychology, Illinois State University

A self-explanatory site!

David Smith called our attention to a special section in the November Issue of American Psychologist.

Current Issues: Student Ratings of Professors.
American Psychologist, November 1997

This is a collection of papers dealing with student evaluation of teaching:

Validity concerns and usefulness of student ratings of instruction. Anthony G. Greenwald, University of Washington

Greenwald served as editor for this collection of articles. In his introductory piece, he relates an experience he had during 1989-90. In 1989, he received the highest student ratings he ever received; a year later, teaching the same course with only slight modifications to the syllabus, he received his lowest ratings. The two sets of scores were 2.5 standard deviations apart, representing an 8-decile separation according to the university's norms.

The experience spurred him to read up on the literature about SETs (student evaluations of teaching) and to collect more data of his own. He begins here by summarizing historical trends in research on ratings, from the early 1970s to date. Electronic searches of PsychINFO and ERIC data bases indicate that activity in this area seems to have peaked with 71 publications in the 5-year period 1976-80, shrinking to a low of 8 publications in 1991-95. The 1976- 80 period saw the largest proportion of publications critical of the validity of the SETs.

During the 1970s, a major source of concern was the possibility that grading practices were biasing the evaluations. Some experimental tests were done to show that manipulating grades upwards or downwards indeed influenced ratings. Although later authors criticized the methodology of these studies, Greenwald maintains that their conclusions have never been empirically refuted.

Publications in the 1980s focused on the "convergent validity" of student ratings; that is, the extent to which the ratings are correlated with other measures of teaching effectiveness. Among other things, this research pointed out that correlation between grades and ratings did not necessarily represent contamination of the ratings by grading practices but might be attributable to third variables such as student motivation.

One might conclude from the publication record that earlier concerns about validity had been settled. But the four articles which follow each address different points of concern in this regard.


(1) Would you expect the overall distribution of teaching ratings at a University to follow a normal curve? Assuming that it did, what would the fact that a 2.5 standard deviation swing corresponded to 8 deciles tell you about the positions of these rankings on the curve?

(2) What are some of the factors that could lead to such a dramatic swing?

What follows are brief comments on the other articles. All contain postscripts where authors can criticize and respond to each other, in much more detail than can be indicated here.


Making Students' evaluations of teaching effectiveness effective: The critical issues of validity, bias and utility.
Herbert W. Marsh and Lawrence A. Roche, University of Western Sydney

The authors address issues of validity and reliability of SETs. They point out that teaching is a complex multidimensional activity, which makes SETs challenging to validate.

Overall, class-average SETs are found to be reliable, as indicated by correlations which range from 0.95 for averages of 50 students to 0.60 for averages of 5 students. Studies attempting to separate teacher effects from course effects have found -0.05 correlation between overall ratings of several instructors teaching the same course, compared to 0.61 for the same instructor in two different courses and 0.72 for the same instructor in two separate offerings of the same course.

In contrast to student evaluators, ratings by colleagues and administrators based on classroom visits have not been found reliable and do not correlate well with SETs or with each other.


We do you think ratings by colleagues and administrators based on classroom visits are not reliable?


Navigating student ratings of instruction.
Sylvia d'Apollonia and Philip C. Abrami, Concordia University

Agreeing that teaching effectiveness is multidimensional, the authors find that SETs do a good job measuring general instructional skill, which is a composite of delivering instruction, facilitating interactions, and evaluating student learning. Nevertheless, research summarized here indicates that the instructor's rank, experience and autonomy, the class size, and the instructor's grading practices are all factors that work to diminish the validity of the ratings. Particular note is made of the "Dr. Fox" effect, named for a famous study in which students gave high ratings to an actor playing the role of professor (Dr. Fox). The concern here is that enthusiastic or expressive instructors can receive higher evaluations even though their demeanor does not make any measurable contribution to learning.


Grading leniency is a removable contaminant of student ratings.
Anthony G. Greenwald and Gerald M. Gillmore, University of Washington

The authors state that it is well-established that SETs are positively correlated with expected course grades. Here they investigate possible explanations for that correlation and focus specifically on leniency in grading as a contaminant of SETs. They also propose a scheme for adjusting ratings to account for this problem.


Student ratings: the validity of their use.
Wilbert J. McKeachie, University of Michigan

The authors are in general agreement with the earlier themes that (1) SETs are valid but that (2) certain external factors such as grading leniency moderate this validity. But the focus here is not on the ratings themselves but on their ultimate use in tenure and promotion decisions. This addresses the "consequential validity" of SETs. The authors feel that personnel committees lack the statistical sophistication to properly use the ratings.

While noting that it will probably be difficult to do observational studies on such committees in action, the authors said they expect the results would be similar to studies done on medical diagnoses or mortality predictions, namely that combination of computer diagnostic programs and pooled judgment of physicians is superior to individual predictions.

Priscilla Bremser suggested the following and asked if the CHANCE team ever writes Op/Ed pieces. I (Bill Peterson) have used Letter to the Editor format for a writing assignment in versions of the Chance course. Perhaps it's time to start mailing them in!

GOP just wants to check whether anybody likes the IRS.
The New York Times, 7 November, 1997, pA28
Richard W. Stevenson

For the last several months, Congress has been debating an overhaul of the IRS. The House just approved a taxpayer "bill of rights", and similar legislation will next be taken up by the Senate. Speaker of the House Newt Gingrich is now proposing that a 14-question voluntary response survey be mailed to every taxpayer next year. While the total cost of the plan--estimated at $30-35 million--seems excessive to some critics, Gingrich points out that it comes to less than 50 cents a return, a small price to pay for the chance to tell the Government how the IRS is doing.

Democrats counter that Congress already commissioned a professional poll on attitudes towards the IRS, and it cost only $20,000. 48% of respondents rated customer service by the IRS as excellent or good, compared with 44% who found it not so good or poor. 58% percent said tax forms were difficult to complete because of complexities in the tax code. Only 10% attributed their difficulties to IRS inefficiency.

In a letter to the House Appropriations Committee, Treasury official Linda Robinson criticized the Gingrich plan, arguing that it "would ill serve the American taxpayer to spend an inordinately large amount of money on an unscientific survey whose results could provide misleading guidance on how to improve the tax system."


(1) Gingrich is certainly savvy enough to know voluntary response won't give accurate results. Why might Republicans nevertheless want to back such a plan?

(2) What do you think of the findings from the poll? Do you think the quality of "customer service" provided by the IRS is people's biggest concern with the US tax system?

Ask Marilyn.
Parade Magazine, 9 November 1997, p16
Marilyn vos Savant

A reader asks:

I've often heard that a person has a better chance of being struck by lightning than of winning the lottery. One can mathematically determine a person's chance of winning the lottery; any set of numbers has an equal chance of winning. But how are the chances of being struck by lightning determined? It seems like there are too many other factors involved in this probability, such as one's location and existing weather conditions, and that this occurrence isn't a totally random one. I can say that I have just as good a chance of winning the lottery as another player, but I wouldn't say my chance of being hit by lightning are the same as those of a person standing under a tree during a thunderstorm. How can this comparison by made?
Marilyn agrees that the comparison is worthless. She points out that the probability of lightning strike is made by dividing the number of people struck over a given period of time by the total number of people alive during that period.


(1) Do you agree that the comparison is meaningless? Can you suggest a better way to explain to the "person in the street" just how unlikely winning the lottery really is?

(2) Do you think that most victims of lightning strikes were standing under trees?

Unconventional Wisdom.
The Washington Post; C05; 19 October 1997
Richard Morin

How To Raise Your I.Q.

In the new American Psychologist, Stephen Ceci and Wendy Williams of Cornell University have assembled data showing that one's I.Q. increases by going to school with every year completed. They analyzed several studies on diverse groups: Appalachian children, kids on summer vacation, and high school dropouts. According to Ceci and Williams, no single study is definitive, but the data taken together is convincing.

In one 1932 study of children living in Virginia's Blue Ridge Mountains, the I.Q.s of 6-year-olds were "not much below the national average, but, by age 14, the children's I.Q.s had plummeted into the "mentally retarded range," with the degree of falloff directly related to the years of school the child had missed. Likewise, a study done in the 1980s shows that I.Q. scores for kids on summer vacation drop by a statistically significant amount, as compared to their I.Q. scores before vacation. Swedish psychologists found that finishing high school bumps up I.Q. by about 8 points over what it would be if the same child had dropped out after junior high. Likewise, an American research team found that every year of schooling increases I.Q. by about 3.5 points.

Ceci and Williams have suggested that schooling increases I.Q. because I.Q. tests reward modes of thinking emphasized by schools.


The average I.Q. has been increasing consistently from generation to generation. This is known as the Flynn effect. In the United States this increase has amounted to about 3 points per decade. The cause is not known and there have been a number of possible explanations. Since the increase has been primarily in the non-verbal puzzle- solving aspects of the tests, it has been suggested that the increase comes, in part, from the increased exposure to visual information, television, computer games etc. This article suggests that increased educational opportunities might be a big factor. What kind of studies would be needed to try to settle this question?

Personal accounts and folded paper.
Christian Science Monitor, 3 October 1997
Frank Morgan

Frank Morgan reports interesting results from his Math Chat every other Friday in the Christian Science Monitor. He starts with an "Old challenge". This week it was:

Send in a true personal account with an estimate of its mathematical probability of occurring. The least likely will be the winner. Morgan reports:

In the best answer, Fred Wedemeier reports that his grand- father lost his wedding ring working in the field on the family farm. One day Fred's father got down off the tractor and found the ring stuck in a small crack in one of the tractor's tires.

Fred estimated that the probability that the crack on the tire would hit the ring in the field and his father would get off at that time and find it to be 1 in 1022. Math Chat estimates at least a 1 in 1,000 probability that the tire would catch the ring some time over the years and 1 in 10 probability someone would notice, for an overall probability of about 1 in 10,000.

In the second-best answer, Steve Gluck reports: "In a seemingly irresolvable dispute between my son and daughter, it was proposed we flip a coin. My son said coin flipping was unfair because his sister always won. [After he chose tails,] 13 consecutive flips were heads. Steve gave the probability of this happening as 1/2^13 = 1/8192. Actually, since such opportunities might have arisen, say, on 8 occasions, Math Chat puts the probability at about 1 in 1,000.


(1) Do the final estimates determined by Math Chat for the probability of the two winning answers seem reasonable?

(2) Do the odds of the winning contestants give you any idea how many people read this challenge question?

Colleges look for answers to racial gaps in testing.
New York Times, 8 Nov. 1997, A1
Ethan Bronner

Bronner writes: "Universities around the country are facing an agonizing dilemma: If they retain their affirmative-action admission policies, they face growing legal and political challenges, but, if they move to greater reliance on standardized tests, the results will be a return to virtual racial segregation."

The Journal of Blacks in Higher Education states that, if admission to the nation's top colleges and universities become based primarily on test scores, black enrollments at these institutions will drop by at least one half and in many cases by as much as 80 percent. A study published by the New York University Law review in April came to the same conclusion for law schools.

The result of this dilemma is that the FairTest organization, that has long argued that SAT scores play too prominent a role in admission, are not feeling so lonely out there. It is well known that the SAT scores are poor predictors of even the first-year-grade point average and worse for each succeeding class. At one New England College, where we have data for a class of about 1000, the correlation of the total SAT score to freshman grade-point average was .2. Researchers at ETS have found that this drop in validity for their tests can be blamed, to some extent, on grade inflation and difference in grading between departments. Controlling for these gives their scores a higher validity.

The reasons for the difference in tests scores between blacks and whites continues to be debated. According to the article, the current feeling is that it is a complex blend of psychology and culture. We have reported before on the work of Claude Steele, Stanford physiologist, who showed that, when blacks taking a test were told that such tests showed no distinction in white-black scores, they did as well as the white students, while, if they were not told this, they did not do as well. A similar phenomenon was found for women. (see Chance News 4.13)

In addition, studies have shown that those admitted with lower test scores and grades under affirmative action programs end up equally successful as measured by income, professional achievements etc. We were once told about a study to see which component of admission information used by a New England college was the best predictor of success in life. It turned out to be the alumni interview!

A recent book: "A is for admission" by Michele A. Hernandez, Warner Books 1997, written by a former Dartmouth admissions officer, describes the admission procedure typically used by Ivy League schools. You will find quite detailed information about how students are considered for admission, especially in the Ivy League and especially at Dartmouth.

In particular, you can find how the SAT scores are used. Students are assigned two stanines which are numbers from 1 to 9. One is called the academic stanine and the other the extra-curricular/person stanine. The academic stanine is based on the Academic Index. The Academic Index is the average of three factors: (1) The average of the student's highest SAT I math and verbal scores, using the first two digits only, (2) The average of the students three highest SAT II subject tests (again using only the first two digits) and (3) The student's rank in high school converted to take into account differences in the schools and with highest score 80. At Dartmouth the academic stanine is determined completely by this academic index. A student with an academic stanine of 7 and a personal stanine of 5 is referred to as a 7 over 5 or 7/5. Decisions are made using this quantitative information and other material in the student's folder.


Do you think that recent court rulings against using affirmative action will increase or decrease the use of SAT scores in admission? Why?

Ask Marilyn.
Parade Magazine, 19 Oct. 1997
Marilyn vos Savant

In a previous column (see Chance News 6.09), Marilyn asked the famous two-boy problem in the form:

A woman and a man (unrelated) each have two children. At least one of the woman's children is a boy, and the man's older child is a boy. Do the chances that the woman has two boys equal the chances that the man has two boys?
She answered correctly that the probability was 1/2 for the man and 1/3 for the woman. A reader wrote:
I will send $1,000 to your favorite charity if you can prove me wrong. The chances of both the woman and the man having two boys are equal.
Marilyn writes that she took this reader up on his bet and asked her readers to help her settle this bet by a survey. She said to her readers:
If you have exactly two children (no more), and at least one of them is a boy (either child or both of them), write -- or send e-mail -- and tell me the sex of both of your children. Don't consider their ages.
When we reported this survey earlier (see Chance News 6.09), reader Jeff Simonoff wrote that Marilyn was doing a disservice to statistics by her suggestion that such a self-selected survey has scientific validity.

In this column, Marilyn reports the results of her survey. She received 17,946 responses. 35.9% of the respondents wrote that they had two boys. Marilyn claims this is close enough to 1/3 so she wins the bet.

She also presents a letter from a reader from the Center for Health Data who writes:

The idea of defending impeccable logic with empirical data is irresistible. The U.S. Census Bureau interviews a random sample of families each year for the National Interview Survey. From 1987 to 1993, it interviewed 342,018 households, including 42,888 families with exactly two children. Of these, 9523 had two girls, leaving 33,365 with at least one boy: 11,118 with boy-girl, 10,913 with girl-boy and 11,334 with boy-boy. Thus 22,031 of the 33,365 (at least one boy group) had a child of each sex (66%), while 11,334 of the 33,365 (at least one-boy group) had two boys (34%). Close enough I'd say.

Limiting the samples to families with two children (as you did in your survey) introduces some biases; note for example, the smaller-than-expected percentage (22%) of two-children families with two girls. Including the first two children of any family with at least two children would probably give figures closer to those predicted.

Will Lassek,
Pittsburg, Pa.

(1) Why does Marilyn refer to the reader's offer of $1,000, if she can show that she is correct, as a bet?

(2) In which way to you think Marilyn's self-selected poll would be biased?

(3) What causes the bias in Marilyn's poll, described by Lassek?

(4) Does the 95% confidence interval for the probability obtained from Marilyn's survey include p = 1/3? Answer the same for Will Lassek's data.

The following article was suggested by Geoff Davis.

Buy a house, lose your job?
Slate Nov. 5, 1997
Steven E. Landsburg

Landsburg writes about research of Andrew Oswald of the University of Warwick who showed that higher rates of homeownership are positively correlated with higher rates of unemployment. We read:

In Switzerland, where about one-fourth of citizens own their homes, unemployment is only 2.9 percent. In Spain, where homeownership is three times as common, unemployment is a staggering 18.1 percent. Portugal's homeownership is midway between Switzerland's and Spain's and unemployment is a low-to- middling 4.1 percent.
Oswald found that this correlation also holds over time. A 10% rise in home ownership adds 1.5 to 2 percentage points of joblessness. Oswald believes that homeownership causes unemployment by tying people down geographically.

Landsburg discusses explanations others have suggested. For example, it may be the other way around: more unemployment causes more homeownership. If you lose your job you naturally want a nice place to hang around in so you will want a house. Landsburg suggests that a more reasonable explanation is that, when jobs disappear, renters tend to move out of houses so that only homeowners would remain. It is not clear to us why that would change the proportion of homeowners.

Lansburg also discusses the possibility of confounders, such as wealth or age, that might cause both unemployment and homeownership to increase. He comments that Alan Stockman blames the dual increase on the regulatory climate. Finally Lansburg suggests that the numbers may just be wrong. It might be easy to overlook a transient but less easy a homeowner.


(1) Can you see why unemployment causing renters to move out of homes would increase the proportion of homeowners?

(2) Which of the various explanations given do you think are reasonable? Why?

Compromise found on census sampling.
Washington Post, 10 Nov. 1997
Eric Pianin and Helen Dewar

The census bureau has proposed a "dress rehearsal", to be conducted in 1998, which is a dry run of all the operational aspects of the Census 2000. Recall that this includes two kinds of sampling: first in the plan to use direct enumeration until 90% of the households have been counted and then to use 10% sampling to estimate the remaining 10% of the households and then the plan to use sampling to adjust for the undercount. The latter is the politically sensitive issue since Republicans think they would be hurt by this.

The concern of the Republicans, about sampling in general, has led Congress to threaten to hold up the money for this dress rehearsal. A compromise has been reached which allows Congress to go ahead with a $32 billion Commerce-Justice-State spending bill.

The compromise allows the Census Bureau to go ahead with the dress rehearsal, while the Republicans would be granted an expedited hearing before the Supreme Court to challenge the legality of the sampling. The compromise also calls for an 8-member bipartisan Census Monitoring Board to oversee the preparation and implementation of the 2000 census.

According to the article, the White House considered the compromise a victory, but some Democrats thought the Republicans were given too much. They claim that it puts Congress on record as asserting that the use of statistical sampling "poses the risk of an inaccurate, invalid and unconstitutional census."


(1) How do you think the Supreme court will decide on the constitutionality of sampling? (See Chance News 5.04 for what the Constitution says about the census.)

(2) Doesn't the "risk of an inaccurate census" exists no matter what the Census Bureau does?

The truth is staring us in the back.
Sunday Telegraph, 2 Nov. 1997, p. 6
Robert Matthews

Matthews reports experiments by biologist Rupert Sheldrake, designed to show that it is possible for people to tell when someone is staring at them from behind. Like ESP, this is something that most people believe they have experienced but which skeptics will challenge. Matthews reports that Sheldrake has carried out a simple experiment with a large number of pairs of children. In a single experiment, say with Jim and Mary, Jim is blindfolded with his back to Mary. Mary then decides, by referring to a list of random 0's and 1's, whether to stare or not stare at Jim. Jim then records whether or not he thinks he is being stared at. This is repeated for a sequence of, say, 20 trials. Matthews reports that, in a total of more than 18,000 trials carried out worldwide, the children reported they are being stared at in 60% of the trials when they were being stared at and in 50% when they were not being stared at. To avoid claims of communication, Sheldrake carried out similar experiments when there is a window between the two children.

On his homepage (www.sheldrake.net) Sheldrake describes in detail his experiment and suggests that you try it out with children or students. It is simple enough to make an interesting activity for a class and, if it is as successful as Sheldrake predicts it will be, this will be quite an exciting activity.


On his homepage, Sheldrake asks experimenters to send in the results of their experiments. How do you think he is going to use these results?

We asked in Part I where we could find a table of frequencies of letters in the English language to compare with those found in Moby Dick. We found that almost any book on Cryptography has such a table. This frequency table is needed when discussing the simple substitution method code. A lecture on this method can be found at CLASSICAL CRYPTOGRAPHY COURSE - Lecture 01 which is Lecture 1 in a Classical Cryptography Course by Lanaki. Lanaki gives, in his lecture, the frequency table from a book "Cipher Systems, The Protection of Communications", by H. Becker and F. Piper, Wiley 1982. Lanaki also gives the frequency distribution of the letters from "A Tale of Two Cities so we include that for comparison with Moby Dick.

A: Becker and Pipe estimates for the frequency of letters in English text.

B: Frequency of letters in "Moby Dick" by Herman Melville - 925,141 letters.

C: Frequency of letters in "A Tale of Two Cities" by Charles Dickens - 586,747 letters.

D: Frequency of letters in 12 works of Mark Twain obtained for us from Gutenberg Project on the web by Peter Kostelec - 3,901,021 letters.

       A      B      C     D
E    .127   .123   .124  .121
T    .091   .093   .089  .095
A    .082   .082   .080  .083
O    .075   .073   .076  .077
I    .070   .069   .067  .068
N    .067   .069   .070  .072
S    .063   .068   .062  .062
H    .061   .066   .065  .061
R    .060   .055   .061  .054
D    .043   .040   .046  .047
L    .040   .045   .036  .040
C    .028   .024   .022  .023
U    .028   .028   .027  .030
M    .024   .025   .025  .024
W    .023   .023   .023  .026
F    .022   .022   .022  .021
Y    .020   .018   .020  .021
G    .020   .022   .020  .021
P    .019   .018   .016  .016
B    .015   .018   .013  .016
V    .010   .009   .008  .009
K    .008   .008   .007  .009
J    .002   .001   .001  .001
Q    .001   .002   .001  .001
X    .001   .001   .001  .001
Z    .001   .001   .000  .001
The similarity of these three distributions suggests an impressive regularity in English text.


(1) If you believe that the true distribution of letters is the one given by Becker and Piper, how would you test the hypothesis that the letters in Moby Dick were determined by this distribution?

(2) Suppose you just wanted to test if the distribution of letters in the two books were determined by a common unknown distribution? How would you do this?

(3) Do you think that there would be less variation in the frequencies of letters within the work of a single author than between the works of different authors? How would you decide this? Do you think that authors could be identified by the frequency of letters in their writings?

(4) As the name suggests, for a simple substitution code, you just substitute for each letter a different letter. How would you go about trying to decode a message coded by such a code?



CHANCE News 6.12

(10 October 1997 to 9 November 1997)