!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

CHANCE News 8.02

(21 January 1999 to 20 February 1999)

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

Prepared by J. Laurie Snell, Bill Peterson and Charles Grinstead, with help from Fuxing Hou, and Joan Snell.

Please send comments and suggestions for articles to
jlsnell@dartmouth.edu.

Back issues of Chance News and other materials for teaching a Chance course are available from the Chance web site:

Chance News is distributed under the GNU General Public License (so-called 'copyleft'). See the end of the newsletter for details.

Chance News is best read using Courier 12pt font.

===========================================================

Results are so random in sports that you can
   never look at a result and say it must be fixed.

Roxy Roxborough
Chance Magazine Winter 1999

===========================================================

Contents of Chance News 8.02

<<<========<<




>>>>>==============>
The January San Antonio mathematics meetings featured a number of talks on web sites of interest to teachers of statistics. Here are three that we found interesting:

Using web applets to assist statistics instruction
Robin H Lock
St. Lawrence University, Canton N.Y.

Robin showed how to find a variety of applets that are freely available on the web and how to use them in teaching a statistics course. The URLs for the web sites that Robin discussed are available from the web address in the header.
<<<========<<




>>>>>==============>
Life's a Risk! An Interdisciplinary Interactive Statistics Course
Linda C. Thiel
Ursinus College, Collegville, PA.


Linda discussed a new interdisciplinary course developed jointly by the Departments of Mathematics and Computer Science and the Department of Biology at Ursinus College. The course is team taught and is a study of risk analysis for living in a hazardous world. It includes readings from the scientific and popular literature, such as the New York Times (Science Tuesday), Science Magazine and John Paulos' book "A Mathematician Reads the Newspaper." You can find details about the course and the resources used at the web site in the header.
<<<========<<




>>>>>==============>
A statistics teaching and resource library on the WWW
Deborah J. Rumsey,
Department of Statistics, Kansas State University
STAR Library home

Deborah described a project to provide a web site where teachers can freely access a library of teaching activities. This library will be run as a peer-reviewed journal to ensure quality and to create an additional outlet for research. A prototype and more information about this resource can be found at the web site in the header. In this first phase, you will need to use Internet Explorer.
<<<========<<




>>>>>==============>
We also encourage you to check out a web site we mentioned in Chance News 5.1 but which now has many more articles than it did at that time.

Chance and Data in the News
Rod Boucher and Jane Watson

The newspaper articles appearing here were collected from the Tasmanian metropolitan daily newspaper, The Mercury, to represent the five areas of the Chance and Data curriculum and aspects of numeracy throughout mathematics. Each of these six sections contains a few introductory questions, and each article is followed by a short discussion. Many articles can be associated with more than one part of the curriculum. Some have been purposely linked. Others you may discover for yourself.

This is a wonderful model for presenting full text of news articles to integrate into a statistics course. It would be nice to work out such a co-operative effort with a U.S. newspaper.
<<<========<<




>>>>>==============>
Thaddeus Tarpey suggested the following article.

County offer of loan to buy Whitehall Farm turned down
Yellow Springs News, 18 Feb. 1999
Chad Stiles

The citizens of Yellow Springs Ohio (population about 4000) have a chance to protect a green area about the size of the whole village. They are trying to raise a million dollars to buy development rights on a 940-acre farm which is to be sold at auction Monday February 22. The farm is estimated to have a value of 3-5 million dollars. The article reports that, as of Wednesday, private pledges and donations amount to $376,316. These were supplemented by $6,531 raised at a benefit/celebration on the Antioch College campus, and $456 four local girls raised by a yard sale.

In last week's Yellow Springs News it was reported that one citizen proposed that villagers contribute to a fund to buy a number of tickets to the Wednesday or Saturday drawing of Ohio Super Lotto. In this lottery the jackpot starts at 4 million dollars and adds at least that amount every time. The proposer consulted a local statistician (Thaddeus) to ask if it would be better to buy all their tickets for a single drawing or to divide them between the two drawings. Thaddeus recommended that they put all their money into a single lottery to maximize the chance of winning a jackpot. To convince them, he suggested they imagine that they had just enough money to buy all possible tickets. Then they would be sure to be able to buy the farm, if they put their money in a single draw but not if they divided it between the two drawings.

Suppose they have a thousand dollars to buy lottery tickets. if they invest x dollars in the first lottery and 1000-x in the second, the probability of winning at least one jackpot is P(A)+P(B)-P(A and B) where A is the event that they win in the Wednesday drawing and B that they win in the Sunday drawing. Since the P(A) + P(B) is the same no matter how they divide their tickets it is clearly best to buy all their tickets for one drawing making P(A and B) as small as possible, namely 0.

We could not help asking what the villagers should do if they had to raise 6 million dollars to buy the farm. Assuming the next jackpot is 4 million dollars, they now must win both lotteries to assure success. Now instead of minimizing the P(A and B) we want to maximize it, that is maximize (x)(1000-x)/N^2 where N is the number of possible choices . for a ticket. This is maximized by choosing x = 500, (i.e. by buying the same number of tickets in each drawing). Having claimed this, our colleague Jody Trout showed that we should use common sense rather than admire our beautiful mathematics. Jody remarked that, since they need to win the jackpot in both drawing they should invest the whole $1000 in the first lottery doubling their chances of winning and then, if they won, use half the winnings to buy 2 million tickets in the second lottery.

To follow the attempt of the citizens of Yellow Springs to buy this land see: Yellow Springs News

LATE NEWS!

Cy Tebbetts, the proposer of the lottery, informed us that he was able to buy 131 tickets for the February 20th lottery which had a jackpot of 8 million dollars. These resulted in one winning ticket which had 4 of the six numbers correct and contributed $69 to the fund.

Cy and reporter Chad Stiles tells us that the village effort was a complete success! The publicity seems to scared off the developers and a local couple was able to purchise the farm for $3,275,000. The couple plans to sell the development rights to the town for a cost of just under a million dollars.

DISCUSSION QUESTIONS:

(1) In the Ohio Super Lotto you choose 6 distinct numbers from the numbers 1 to 47. What are the chances that Cy and his friends would win the jackpot? (At least one gets all 6 numbers correct), What are their chances that at least one person gets 4 correct?

(1) Alas, when a lottery says the jackpot is 4 million dollars, do you really get 4 million dollars if you win? Do you think this is false advertising?

(2) In Yellow Springs the money had to be raised in two weeks. If you had a year to raise it and had obtained $1000 dollars for tickets you might as well wait until the jackpot is reasonably big. How would you decide how long to wait?
<<<========<<




>>>>>==============>
Fiber does not help prevent colon cancer, study finds
The New York Times, January 11,1999. A14
Sheryl Gay Stolberg

They stand by their granola
The New York Times, January 24, 1999, Section 1 p. 33
Ginger Thompson

Dietary Fiber and the risk of Colorectal cancer and adenoma in women
The New England Journal of Medicine, 21, January, 1999, p. 169
Charles S. Fuchs, et al.
See also accompanying editorial.

For several decades, it has been fashionable to think that a high-fiber diet lowers the risk of colon cancer. This idea was a result of a study done in Africa, where certain groups of people were observed to have high fiber intake and low rates of colon cancer. Of course, association of two attributes does not imply that there is a causal relationship between them. In fact, in the intervening years since this study, many other studies on this relationship have been undertaken, with mixed results.

The present cohort study involved more than 88,000 women nurses, who were tracked over a 16-year period. The editorial remarks that cohort studies, which follow a particular group through a period of years, provide the least biased approach in epidemiology but may not be representative of the general population. (As might be expected with an issue as important as this one, there are controlled studies currently being carried out). The subjects filled out questionnaires that included questions on such things as amount of physical exercise, smoking, aspirin use, fat intake, and family history of colon cancer. After these data were collected, the subjects were divided into five groups, depending upon their average daily intake of fiber.

The null hypothesis, which is that there is no relationship between amount of fiber intake and the rate of incidence of colon cancer, could not be rejected for any of the five groups. In fact, the authors noted that if one looks just at the subset of subjects who ate the most vegetables, the risk of colon cancer was actually higher by 35% than the overall average risk. The authors believe that this last result, while statistically significant, was probably due to chance.

The authors conclude that, although their study does not establish a connection between intake of fiber and a reduction of risk of colon cancer, there are numerous other studies that do show an inverse relationship between fiber intake and things such as heart disease.

The first New York Times article discusses the study and the second the reaction of the cereal industry and shoppers to the news. The industry rushed to remind customers that other studies have found that fiber-rich foods are important for reducing the risk of many other health problems, for example heart disease and high blood pressure. Customers generally indicated their mixed feelings about science and studies. For example, one woman remarked:

I feel like most scientific studies are done in a vacuum. I only listen to the ones that make sense to me. There have been many other studies that show high fiber is good for you for many reasons.

The accompanying editorial in the New England Journal of Medicine discusses the relation of this study to other studies that have been carried out and discusses ways to try to resolve the contradictory results of these studies.

DISCUSSION QUESTIONS:

(1) How can an association, such as the one made between eating lots of vegetables and increasing the risk of colon cancer, be both statistically significant and due to chance?

(2) What do you think of the customer's assessment of the study?

(3) What are some of the potential problems with a study of this kind?

<<<========<<




>>>>>==============>
Unexpected praise from the Ivory Tower
The New York Times, 2 January, 1999, Section 3 p. 6
Mark Hulbert

Source: Can investors profit from the prophets?
Consensus analyst recommendations and stock returns
Brad Barber, Reuven Lehavy, Maureen McNichols, and Brett Trueman
www.gsm.ucdavis.edu/bmbarber

The New York Times article reports on the paper of Barber et.al. This paper delves into the question of whether the recommendations of security analysts are worth anything. The dedicated Chance News reader will recognize this topic as one that has been touched on before. In the field of asset pricing theory, there are several axioms which say, with varying degrees of emphasis, that investors cannot do better than average by using publicly available information. On the other hand, brokerage houses remain in business, selling their recommendations about stocks to their clients.

The authors' data consists of over 360,000 recommendations from 269 brokerage houses and 4340 analysts, and the corresponding security prices, during the period 1985-1996. Five portfolios are created, and tracked during this period. The portfolios consist of sets of stocks, sorted by the number of strength of the buy (or sell) recommendations in the database. The portfolios change over time as new recommendations are made.

The results initially sound quite impressive; the highest-rated portfolio earned an annual return of 18.8%, and the lowest-rated portfolio returned only 5.78% annually. These should be compared with a market average during this period of 14.5%. The authors then controlled for certain factors such as risk, with the final figures coming out as follows: the highest-rated portfolio returned 4.2% above the market return, and the lowest-rated portfolio returned 7.6% below the market return. The strategy that is considered consists of buying the highest-rated of these five portfolios and selling short the lowest-rated portfolio. If one used this strategy during the period in question, one would have had a return that was 11.8% above the market return on an annual basis. We are rather mystified how this works.

Before everyone rushes out to try this strategy, it must be admitted, and the authors do admit, that the above returns do not take into account transaction costs. It does not take a Sherlock Holmes to see what the end of this story will be. Since the contents of the portfolios are determined by analyst recommendations, one might imagine that there is quite a high turnover rate in these portfolios, and in fact this is the case. The annual turnover rates for the five portfolios average between 425% and 476%.

The result of taking into account the transaction costs is that the strategy of buying the highest rated portfolio does not produce a return that is significantly different from the market average. This means that this strategy probably cannot be used by the average investor to beat the market. Nevertheless, the authors are probably correct in stating that, in general, the stocks that are most highly recommended by analysts do better, on average, than those that are least recommended.

The paper touches on other questions that might have occurred to an investor. For example, in the period in question, the markets in general went up. In fact, 68% of the months in the period were ones in which the market index increased. Thus, one might wonder whether the highest rated portfolio outperformed the market because the stocks in this portfolio were riskier than average (i. e. had a beta greater than one). When the authors tested this hypothesis, they found that there was no significant difference in the amount by which the highest-rated portfolio beat the market in bull months or bear months.

Also investigated was the question of whether one has to act with great speed on any new recommendations that result in changes in the make-up of the portfolios. The authors found that waiting 1 day did not significantly decrease the rate of return of the strategy, but waiting 30 days made the difference between the strategy's return and the market return negligible.

The New York Times article remarks that consensus recommendations on particular stocks are widely available on a number of Internet sites, including cbs.marketwatch.com. Alternately, you can subscribe to services like Analyst Watch, published by Zacks.
<<<========<<




>>>>>==============>
The Winter 1999 issue of Chance Magazine arrived and marks the beginning of Hal Stern's role as Executive Editor. Hal is off to a good start with more interesting articles than we can possibly review. So, as usual, we will mention a couple that we found particularly interesting.

The man who makes the odds: An interview with Roxy Roxborough
Chance Magazine, Vol. 12, No. 1, Winter 1999, pp 15-21
Hal Stern

Hal Stern interviews Roxy Roxborough, the president of Las Vegas Sports Consultants. The Las Vegas Sports Consultants advise the Nevada Casinos and others throughout the world on the determination of the odds and point spreads for sports betting.

Hal is especially interested in how much probability and statistical theory Roxy uses in his work. He starts by asking him if he had had formal training in probability and statistics. Roxy replies that he took an elementary probability course at the University of Nevada and flunked twice. He attributed this partly to not taking the test. According to Roxy the hard thing is to know the right questions to ask, and if you do there are plenty of people who can answer them. He remarks that his staff has a statistics major from Berkeley.

The interview continues with a discussion of the problem of setting the line. The line means the point spread or the odds assigned for a bet. As in any other field, sports betting has its own jargon. The article provides translations for most of special terminology used in the article. We found it also useful to look at a daily column that Roxy provides called "America's line." This column gives the current line for all major sports and other bets on major events such as the Oscar awards or the outcome of the next presidential election. The line is posted here in the morning and updated in the evening. On the day this was written the first entry in the morning line for the National Basketball League (NBA) was

Favorite Points Over/Under Underdog
Wizards [8] 174 Bulls

The 8 is the point spread. (The bracket around the 8 means that for this game the betting is limited probably because of an injury or a suspension.) If you bet on the Wizards they must win by more than 8 points to win your bet. If you bet on the Bulls you win your bet if the Wizards fail to win by more than 8 points. The Casino would like the line to be established so that about an equal number of people bet on each team. If you win you get back an additional $10 for every $11 bet. Thus, if $55,000 are bet on each team, the casino is assured of winning $5,000.

The Over/Under number 174 allows a bet that the total number of points in the game will be more than 174 or less than 174. Going to the "updated evening line" we find that the spread has stayed the same but the Over/Under bet has increased to 175. The line changes by reacting to the betting in much the same way that the price of a stock changes in reaction to the buying and selling of the stock.

Hal asks Roxy how he comes up with the initial point spread. Roxy replies that they use a blend of computerized power ratings and subjective analysis. Computerized power ratings are up-to-date ratings of the teams that allow you to determine the spread based on these alone. For example, on the day we wrote this, the Dartmouth basketball team was to play Princeton. Jeff Sagarind's USA Today computer rating gave Dartmouth a rating of 66.05 and Princeton 80.75 and instructed us to add 4.40 to the rating of the home team (Dartmouth). Then the point spread is the difference the two ratings making Princeton favored to win with a point spread of 10.35. Roxy had a point spread of 10 for the morning line which changed to 8.4 for the evening line. Dartmouth lost this game 65 to 51 and so Princeton easily covered the point spread and those who bet on Princeton won.

In explaining more about how the line changes, Roxy remarks that "We follow the betting public rather than the teams' performance." He qualifies this by saying that it is the professional bettors whom they follow rather than the general public. "Professional players generally have a reason why they bet their money." Reminiscent of the argument for an efficient stock market, Roxy observes that, if the professionals have developed a way to win, this will be closed off by the spread following their betting. It is hoped that the final line will end up as an accurate reflection of the strength of the teams, making the outcome of a bet like the toss of a coin.

Hal continues to press Roxy about the use of statistics and asks if they do any kind of quality control, for example, to explain what happened when a particular client or type of bet did not do as well as predicted. Roxy replies that they do monitor their clients' fortunes, and, when a client is not doing well in a particular sport, they try to decide if this is a statistical anomaly or something they are doing wrong. He remarks:

If the point spread is supposed to make things equal, then half the time people will win and half the time people will lose. But, just like when tossing a coin 100 times streaks of heads can occur, there are going to be times when people just pick winners, and that's the nature of the game.

It would seem that a more basic question is whether the process is really behaving like a coin tossing process. In other words, do the subjective probabilities of the bettors lead to a point spread that gives a 50-50 chance for the bettor to win? Actually, readers of Chance News will recall that Hal himself has done research on this question (How accurate are the posted odds? Chance Magazine Fall 1998 17-21).

Hal takes advantage of Roxy's mention of streaks to ask him if he considers the question statisticians ask all the time: do streaks occur more often than would be expected by chance?

In answering this question, Roxy provides an interesting example. Suppose they look at a period in which client lost more than expected, and they see that in this period the favorite team won 60% of the time. Then the loss could have come from the fact that non-professional bettors tend to choose the favorites when they make parley bets. (Parley bets are bets where you can bet on more than one team, but you must get all teams correct to win the bet. If you bet on two teams you are paid off at 13-5 odds, on three at 6-1 odds, four at 10 to 1 odds, etc.) Thus the line may be correct but the psychology of the public causes a problem when there is an streak of favorite teams winning. Roxy goes on to explain that in some situations, having 60% of the favorite teams winning is not even so unusual. He looked at the history of the last 20 years of major league baseball and found that, on average, two teams a season won 60% or more of their games.

When, later in the interview, Roxy discussed future bets we returned to Roxy's America's Line column . As the name suggests, these are bets on events in the future, such as who will the 1999 World Series in baseball. Here are some samples of future bets from Roxy's current column:

Odds to win the 1999 NBA Championship

 Team  Open  Current
     
 Chicago Bulls  3-1  100-1
 Utah Jazz  9-2  7-2
 Los Angeles Lakers  9-2  3-1

The Chicago Bulls result is obviously a rather extreme change from the opening odds to the current odds.

Odds to win the 1999 World Series

 Team  Open  Current
     
 New York Yankees  2-1  5-2
 Atlanta Braves  7-2  7-2
 Cleveland Indians  8-1  8-1

Under "Specials and Proposition Wagers" we find that the odds that Al Gore will be elected President in the year 2000 are 8 to 5 and for George W. Bush they are 4 to 1. For a long shot you can choose Hillary Clinton at odds of 5,000 to 1. The odds that Saving Private Ryan will win the Best Picture Oscar Award are 1 to 3 meaning that you only win 1 dollar for every 3 dollars bet!

Well, Hal asks several more interesting questions and Roxy's answers are always interesting and indicate a good intuitive knowledge of probability and statistics, but we think that Hal will have to have another interview with that fellow from Berkeley to see if they really use much statistics in the setting the odds.

It is a fascinating article and you will enjoy reading it.

DISCUSSION QUESTIONS:

(1) Can you explain any of the changes from the opening and current futures odds in our samples?

(2) What are the true odds for the parley bets when the bets on individual teams are like coin tossing?

(3) Roxy remarks that they only try to understand what is going on when a client does not make as much as expected and not when they are doing something right and the client makes more than expected. He says that statisticians tell them they should look at both. Do you agree they should? Why?

(4) What kind of quality control would you recommend to Roxy?
<<<========<<




>>>>>==============>
Scott Berry takes over as editor of Chance Magazine's column "A statistician reads the sports page". Scott starts off with the topic of streaks in home-run hitting.

Does "the zone" exist for home-run hitters?
Chance Magazine, Vol. 12, No. 1, Winter 1999, 51-56
Scott M. Berry

Tversky and Gilovich, in their famous article (The cold facts about the "hot hand" in basketball, Chance Magazine, Winter 1998, pp. 15-21) challenged the common belief that basketball players have streaks of successful shots, or "hot hands". Their results led to considerations of streaks in other sports. S. C. Albright (A statistical analysis of hitting streaks in baseball, JASA, 88(424), 1175-1183) considered the question of hitting streaks in baseball. He examined a large data set consisting of American and National League players in the seasons 1987,1988, and 1989. He was not able to statistically validate that baseball players tend have streaks either of successful hitting or unsuccessful hitting.

Inspired by last season's great home-run race of Mark McGwire and Sammy Sosa, Berry considers the question of streaks in home-run hitting. For data he obtained the outcome for each at-bat for the 1998 season for McGwire, Sosa and 9 other players, chosen from the leading home-run hitters in 1997. The 11 players considered were McGwire, Sosa, Griffey, Walker, Martinez, Bagwell, Gonzalez, Galarraga, Pizza, Bonds, and Castilla. He obtained this data from ESPNET and it is available at Berry's web site

Berry considers a streak to be "a sequence of at least 20 at-bats, in which the player's probability of hitting a home run changes by a significant amount from the normal state."

A simple model for non-streaky behavior is Bernoulli trials with a probability for success (a home run) equal to the proportion of at-bats during the 1998 season that resulted in home runs. For this Bernoulli model, the waiting times between home runs has a geometric distribution. For each player, Berry gives a graphical presentation of the fit of the waiting time histograms to the theoretical geometric distribution. He also gives the results of a chi-square test for this fit. The fits look pretty good and none of the tests establish a significant difference.

Streaky behavior would show up in the form of dependence between successive waiting times. During a hot streak a short waiting time should be followed by a short waiting time and, during a cold streak, a long waiting time should be followed be a long waiting time. Berry classifies successive waiting times into four categories (+,+), (+,-), (-,+),(-,-). Here (+,+) means both waiting times were above the median waiting time, (+,-) means the first was greater than the median and the second less, etc. For the Bernoulli trials model each of these occurs with probability 1/4. Berry gives the results of a chi-squared test to see if the distribution of these four patterns are consistent with the Bernoulli process. He found that this was not the case only for Sosa and Galarraga. Sosa had an excessive number of (++) and (--) pairs, corresponding to streaky behavior, while Galarraga had an excessive number of (+,-) and (-,+) pairs corresponding to an anti-streaky behavior.

Finally, Berry considers a one parameter Markov chain model with states Cold, Normal, and Hot referring to the player's ability to hit home runs at any particular time. Conditioning on the home- run records of each hitter, Berry finds the probability of being in each of the states throughout the season. For most players these probabilities are highest for the normal state throughout the season. However, Sosa's apparent cold spell at the beginning of the 1998 season and hot spell during the middle of this season show up in this model as does the apparent cold spell for Griffey during the month of August.

Berry concludes that his analysis provides ammunition for both sides in the argument about whether "the Zone" exists for home-run hitting. It would be interesting to carry out Barry's analysis on the much bigger dataset used by Albert (available from the Chance Web site under teaching aids/data.)
<<<========<<




>>>>>==============>
Department of Commerce v. United States House of Representatives
Supreme Court of the United States, October term, 1998
No. 98-404. Argued November 30, 1998--Decided January 25, 1999

This is the text of the Supreme Court decision on the use of sampling in the census 2000. By a vote of 5 to 4 the Court upheld the recent ruling of the lower courts that the use of sampling to determine the population for the purpose of apportioning congressional seats among the states violates the Census Act.

Justices O'Conner and Scalia submitted opinions concurring with the majority opinion, Justice Breyer submitted an opinion concurring in part and dissenting in part and Justices Stevens and Ginsburg submitted dissenting opinions.

Part I of the opinion of Justice O'Conner provides a discussion of the history of the attempts by the Census Bureau to resolve the obvious undercount problem in the enumeration of the population. This includes the plans that the Census Bureau has made for the Census 2000.

Part II argues that the requirement of "standing" is satisfied. This amounts to showing that those bringing the suit, members of the House of Representatives, have demonstrated that actions of those being sued, the Department of Commerce, will cause them personal injury which will be prevented by the case being settled in their favor. The argument presented is based on an analysis of Ronald Weber, a Professor of Government at the University of Wisconsin, who showed, using census data, that the proposed sampling to determine the undercount would lead to a virtual certainty of Indiana losing a seat. This would cause personal injury to resident Gary Hofmeister of Indiana. (Hofmeister was a "faith, freedom, and family" Republican candidate for the house in Indiana's 10th congressional district in the 1998 election who lost badly.) It is perhaps ironic that the reliability of the undercount methods is needed to allow this case to be considered.

Part III provides a history of the evolution of the Census Act since the first Congress in 1790. The first Congress enacted legislation requiring "that the enumerators swear an oath to a just and perfect enumeration" of every person within the division that they were assigned. The first departure from this requirement came in 1954 when the Secretary of Commerce asked Congress to amend the Census Act to allow the Census Bureau to use sampling for some of the data that they collect. In response, Congress enacted section 195, which provided that

Except for the determination of population for apport- ionment purposes the Secretary may, where he deems it appropriate, authorize the use of the statistical method known as sampling in determining non-apportionment census information.

This led to the "long form" which asks a larger number of questions than the "short form" for a sample of the population.

The next change was in 1964 when Congress repealed a section which required getting information by a personal visit to allow the Census Bureau to use the mail to obtain census information.

In 1976 Congress revised section 141 of the Census Act now called "Population and other census information". It amended subsection 141(a) to authorize the Secretary to

take a decennial census of population as of the first day of April of such year, which date shall be known as 'the decennial census date', in such form and content as he may determine including the use of sampling procedures and special surveys.

At the same time section 195 was changed to say:

Except for the determination of population for purposes of apportionment of Representatives in Congress among the several States, the Secretary shall, if he considers it feasible, authorize the use of the statistical method known as sampling in carrying out the provisions of this title.

Justice O'Conner argues that section 195 was meant to limit the broad statement of 141(a) to allow sampling only for non-apportionment data. Justice Stevens argues that section 141(a) clearly permits sampling in all aspects of the census and section 195 simply states that sampling shall be used when the secretary considers it feasible, leaving it completely up to the Secretary whether to use sampling or not when determining apportionment.

DISCUSSION QUESTIONS:

(1) (Suggested by Stan Seltzer). In the next-to-last paragraph of the opinion of Justice Scalia's opinion we read:

In other words, genuine enumeration may not be the most accurate way of determining population, but it may be the most accurate way of determining population with minimal possibility of partisan manipulation.

Is Justice Scalia suggesting that sampling is inherently subjective?

(2) The Constitution speaks of an "actual enumeration" to determine the apportionment of Representatives among the various states and leaves it up to Congress to determine how this will be carried out. The court writes:

Because the Court concludes that the Census Act prohibits the proposed uses of statistical sampling in calculating the population for purposes of apportionment, the Court need not reach the Cconstitutional question presented.

What is the Constitutional question presented? Do you think that the previous congressional revisions of the Census Act were clear enough to justify not addressing this question?

(3) Statisticians tend to argue this case only in terms of accuracy of the results, politicians in terms of the political implications of sampling, and the courts in terms of interpreting existing laws. This is perhaps natural but is it desirable?
<<<========<<




>>>>>==============>
Emil Friedman sent us some interesting comments related to our discussion in the last Chance News of Clark Chapman's video. He writes:

Chance News 8.01 asks, "Are you convinced that your risk from dying from the impact of an asteroid is about the same as your dying from an airplane accident?"

This may not be a useful question to ask. A question which is more relevant to decisions that society needs to make might be, "What is the risk to society?" However, even this is a bit vague.

To clarify it, consider a sample of 300 million people who are currently alive. This is, roughly speaking, the population of the United States. If the probability of being killed by an airplane, is 1/20,000 per person per lifetime, then we can say with fairly good certainty that roughly 15,000 of the people in the sample will someday be killed by an airplane. This is a well defined "cost to society" that "we the people" might want to spend money to reduce.

We do not get the same answer for asteroids. In that case there is a very small probability that a very great many people will be killed by an asteroid, and a very high probability that less than five people will be killed. In other words, the expected value for the number of deaths is the same in both cases, but the distributions are completely different. Defining the risk to society posed by asteroids involves important subtleties that are not captured by simply calculating an expected value. (Misleadingly intuitive names such as "expected value" are endemic and contribute to distrust of mathematical thinking.)

We might even take things a bit further. The context of the original question involved decisions to be made by the US Congress. How can we best invest limited resources? Consider investing money to (optimistically) reduce airplane deaths by, for example 25%. This would save roughly 3500 of the people in our sample from being killed by airplanes. Now suppose that the same investment would reduce highway deaths by, pessimistically, 1%. The table Prof. Chapman presented to Congress suggested that the probability of being killed by a car is 1/100 per person per lifetime. This translates to 3 million people currently alive in the US. Saving a mere 1% would save 30,000 lives. Now what should we spend the money on?

Defining, revising, and refining the question is often the most difficult and most crucially important step in decision making.

Airplane/automobile decisions loosely relate to the Pareto principle. It states that there usually are a few causes that are responsible for most of the defects. It is usually best to identify and fix these first.

Asteroid risk reminds one of the following version of the "St. Petersburg Paradox" (eg, Sheldon Ross, "A First Course in Probability", 2nd ed, Macmillan, 1984, p 311, prob 10). You bet x dollars on the outcome of a series of coin tosses. If the first occurrence of heads appears on the n-th toss, you win 2^n dollars. What is the largest amount of money x you are willing to bet? Conversely, what is the smallest amount of money you would accept to let someone else bet?

Emil M Friedman
emil.friedman@alum.mit.edu
emfriedman@goodyear.com
Note: Clark Chapman's Congressional testimony that Emil referred to is at: Home Page for Dr. Clark R. Chapman
<<<========<<




>>>>>==============>
David Rutherford sent us remarks on the Economist article on estimating the number of large salt-water species and the probability 1 estimate for life on other planets. He writes:

Firstly, on the estimation of the number of as yet undiscovered salt-water species with length or width of at least 2 meters (6.6 feet), the author does not state that we are sampling only from that portion of the sea which trawlers fish (the bottom of sea trenches is relatively undersampled, for example, while the top portions of the sea are relatively oversampled). This is equivalent to sampling only the top half of the chocolate box. Here the analogy breaks down however, because the conditions at the bottom of the sea are markedly different (in terms of pressure, temperature and light) than conditions at the top of the sea, whereas the conditions in the chocolate box are fairly uniform. Presumably this qualifier appears in the original paper, either by Fisher or Paxton (ie 47 species remaining in that part of the sea which is or has in the past been sampled).

Also, the review of Aczel's "Probability 1" was repeated in no less an organ than the Australian Financial Review, complete with the (unchallenged) reviewer's comment that the probability "can just as easily tend to zero". Clearly, as Aczel says, the probability still tends to 1 if the starting values are different - the probability just tends to 1 more slowly.

David Rutherford
Melbourne, Australia

Editors comments: Fisher and his colleague (R. A. Fisher, A.S. Corbet, C.B. Williams, Journal of Animal Ecology, 1943, Vol. 12, p. 42-48) state that their estimates apply only to a region for which the sample is representative. So, for example, estimates for the number of species of butterflies based on a sample at the bottom of a mountain would not apply to the number in an area of higher elevation. Likewise, the estimates for the number in one season need not apply to another. Paxton does not discuss this issue directly but after remarking that his model assumes a constant sampling which may well not be satisfied he writes:

The validity of the analysis presented here would also be in doubt if many new species are split from existing species by information gained by molecular techniques or new methods of biological sampling found large numbers of new species of large marine fauna in as yet unexplored habitats.

Axcel computes the probability of life on another planet as 1-(1-p)^n. where p is the probability of conditions being such as to produce DNA or other mechanisms for life and n is the number of stars with planets capable of maintaining life. If you fix p and let n increase, then you get a limit of 1. If you fix n and let p get small, then you get a limit of 0. The reviewer's opinion was that we have reasonable estimates for n but no idea at all what p is so the estimate for the probability of life out there could be near 1 or near 0 depending on the choice of p. Aczel's remarks about the reviewer not knowing elementary probability seem not to be relevant to this issue.
<<<========<<




>>>>>==============>
Recent deaths spotlight ski risks.
But figures show it's public awareness of crashes,
not injury rate, that's on the rise
USA Today, 29 January 1999, p. 5D
Gene Sloan

After the deaths of Michael Kennedy and Sonny Bono on the ski slopes last year, there was much media discussion about the safety of the sport (see Chance News 7.02). Now this season, two fatal accidents at Breckenridge, Colorado have again raised concerns. Is skiing becoming more risky? Rick Kahl of Skiing Magazine says no. He sees an analogy with airline crashes, which are rare but spectacular, and therefore draw disproportionate attention in new reports.

Data from Jasper Shealy of the Rochester Institute of Technology appear to support Kahl's view. According to the article, Shealy's data show that "the number of deaths--both in absolute terms and per-million skier visits--has remained relatively constant for decades." Shealy adds that traveling to ski areas is actually a greater risk. Skiing has a fatality rate of 0.15 per million hours of exposure; for travel in an airplane or car the rate is more like 0.5 deaths per million hours.

For non-fatal accidents, the safety picture seems to be improving. According to the Consumer Product Safety Commission, a total of 121,800 skiers and snowboarders were treated at emergency rooms in 1997, compared to 127,000 in 1993. However, some doctors warn that focusing on the total number of injuries diverts attention from the types of injuries that are occurring. For example, improved boots and bindings have reduced the number of leg and ankle breaks, but the number of knee injuries is increasing. It can take up to nine months to recover from the types of knee injuries that skiers are now experiencing.

Meanwhile, the data do not support the popular perception that the rise of snowboarding has made the slopes more dangerous. Shealy's paper "Modalities of Death: Snowboarding vs. Alpine Skiing," prepared for an upcoming conference of the International Society for Skiing Safety, finds that snowboarders are 30-40% less likely than skiers to die on the slopes. This is true despite the fact that young males, who tend to be aggressive on the slopes, make up a larger proportion of the snowboarding population. According to Sheahy, the abruptness of snowboarding falls make them less likely to result in collisions with other people or obstacles.

DISCUSSION QUESTIONS:

(1) Does it make sense that both the absolute number of deaths and the rate per million skier visits could both have been constant over the last few decades?

(2) Do you think that "deaths per million hours of exposure" is an appropriate way to compare skiing to traveling?
<<<========<<




>>>>>==============>
One variety that doesn't fit the facts
The Boston Globe, 1 February 1999, p.E01
Dolores Kong

A controversial new Heinz advertisement suggests that the chemical lycopene, which occurs naturally in the company's ketchup, may prevent cervical and prostate cancer. The claim is based on preliminary medical evidence that consumption of tomatoes and tomato products is associated with a lower risk of certain cancers. Experts warn against leaping to the conclusion that ketchup prevents cancer. And even if there is some benefit, Dr. Edward Giovannucci of Harvard says we should pay attention to how we eat our ketchup. He notes that "If it's part of a Big Mac, that may not be ideal."

Giovannucci participated in some of the research on lycopene, which was headed by Dr. Steven Clinton of Ohio State University. Clinton objected to having his work footnoted in the Heinz ad without his knowledge. Other doctors echoed his concerns, stating that the ad was inappropriately making a medical claim. US law requires that such claims be approved by the FDA, and that there be "significant scientific agreement" before they can appear on food labels.

A spokesperson for Heinz said the company was neither making a health claim nor putting information on labels; instead, their ad campaign was designed to educate the public about lycopene. Nevertheless, the ad includes the seal of the Cancer Research Foundation of America, a nonprofit organization which has received grants of $60,000 from Heinz.

DISCUSSION QUESTION:

Does the phrase "significant scientific agreement" have anything to do with statistical significance? What do you think it means here?
<<<========<<




>>>>>==============>
Michael Olinick suggested the following article.

The cancer-cluster myth
The New Yorker, 8 February 1999, pp. 34-37
Atul Gawande

A cancer-cluster is a geographical area exhibiting an above- average rate of some cancers. In the currently popular movie "A Civil Action," John Travolta plays the real-life lawyer who brought suit against W.R. Grace, claiming that the company's contamination of ground water in Woburn Massachusetts was responsible for the elevated rate of childhood leukemia there. But how high above normal does the cancer rate have to be for us to sound an alarm? And do neighborhood clusters of cancer always indicate environmental problems are to blame?

Over the last twenty years the number of identified cancer clusters has been increasing dramatically. In the late 1980s, about 1500 clusters a year were being reported to public health officials! A famous example from the 1980s concerned the farming town of McFarland, California, where a woman whose child developed cancer found four other cases within a few blocks of her home. After doctors found six more cases in the town (population 6400), people began to fear that groundwater wells had been contaminated by pesticides. This led to lawsuits against the manufacturers of the pesticides. Nevertheless, despite extensive investigations of hundreds of such clusters in the US there are no cases in which environmental causes have been conclusively established.

Of course, this is profoundly frustrating for the "stricken" communities. Historically, there are many stories of disease clusters being used to identify causes. Think of John Snow's famous identification in 1854 of London's Broad Street pump as the culprit in a cholera outbreak. In recent times, AIDS first came to light through cases of an unusual form of pneumonia. Moreover, certain occupational clusters of cancer have led to the successful identification of carcinogens such as asbestos and vinyl chloride. But neighborhood cancer clusters are different. One reason is that many known carcinogens require exposure over an extended periods of time before they trigger cancer. With today's mobile population, it is unlikely that residents of a community have lived together long enough for there to be a local cause for their cancers.

What then can be said about the neighborhood clusters? They may reflect nothing more than people's tendency to seek causes for patterns that are perfectly well-explained by chance variation. The article cites Kahnemman and Tversky's psychological research into people's belief in "the law of small numbers;" e.g., the sequence of red-black roulette outcomes RRRRRR is perceived to be less random than RRBRBB. Similar misperceptions lead basketball fans to believe in the phenomenon of "streak shooting," even though statistical analysis fails to find no more runs of hits and misses than would be expected by chance. The article also cites probabilist William Feller's famous analysis of bomb hits in London during WWII. Because the hits appeared to cluster, residents suspected that German spies were picking the targets. In fact, Feller showed that a simple Poisson model fit the data--there was nothing in the patterns to suggest that the hits were non-random.

The article calls the tendency to focus attention on clusters the "Texas-sharpshooter fallacy," named for the self-proclaimed marksman who shoots at the side of a barn and then draws bulls-eyes around the holes. With cancer clusters, we first observe the cases, and then circle the at-risk population around them. The article quotes California's chief environmental health investigator as saying that "given a typical registry of eighty different cancers, you could expect twenty seven hundred and fifty of California's five thousand census tracts to have statistically significant but perfectly random elevations of cancer. So if you check to see whether your neighborhood has an elevated rate of a specific cancer, chances are better than even that it does--and it almost certainly won't mean a thing."

DISCUSSION QUESTIONS:

(1) Can you reproduce the calculation described in the last paragraph?

(2) How convincing do you find the arguments in the article? If you were looking to buy a house, would you actively avoid any area that you knew had reported a cancer cluster? If you learned that your neighborhood had an elevated cancer rate, would you try to move?
<<<========<<




>>>>>==============>
Compounding evidence from multiple DNA-tests
Mathematics Magazine, Vol. 72, No. 1, February 1999, pp. 39-43
Sam C. Saunders, N. Chris Meyer, Dane W. Wu

Some observers of the O. J. Simpson trial blamed the jury's innumeracy for the acquittal. Indeed, after the trial, some jurors indicated that they considered DNA evidence no more reliable than traditional fingerprinting. But other critics blamed the prosecution for its inability to state the DNA case clearly. The present article is focuses on how to combine multiple pieces of evidence (the defendant's blood at the crime scene, the trail of the defendants blood from the crime scene to the gate, the infamous glove, the victim's blood in the white Bronco) into an overall computation for the probability of guilt.

The article begins with a discussion of the famous "prosecutor's fallacy." Let M be the event that the defendant's DNA matches DNA from the crime scene, I be the event that the defendant is innocent, and I' denote the complement of I. DNA testing can reliably compute P(M|I), which can be on the order of 10^(-8) to 10^(-10). The prosecutor's fallacy is to use P(M|I) for P(I|M), the latter being what the jury needs to consider. But this requires an application of Bayes theorem:

     P(I|M) = P(I)P(M|I)/P(M)

Rewriting the denominator using the law of total probability

     P(M) = P(I)P(M|I) + P(I')P(M|I')

leads (by omitting the first term) to the bound

     P(I|M) < P(I)P(M|I) / [P(I')P(M|I')].

Since P(M|I') is essentially 1, we have the approximation:

     P(I|M) < [P(I)/P(I')] P(M|I).

In this way, we see that the prosecutor's fallacy amounts to ignoring the prior odds P(I)/P(I').

Now consider how to modify the above in light of matches on multiple pieces of evidence, given by the events M1, M2, M3,.. . The ratio P(M2|M1)/P(M1) measures the degree of association between events M1 and M2. The ratio is 1 if the events are independent, and greater or less than 1 depending on whether M2 is more or less likely to occur given that M1 occurs (for another application of this idea, see S. Gudder (1981), "Do good hands attract" Mathematics Magazine, Vol. 54, 13-16). A key notion in the present situation concerns whether multiple matches are more likely to occur if I is true than if I' is true. Define P_I(.) = P(.|I). The events M1 and M2 are said to be "more strongly associated conditionally given I' than I" if

      P_I'(M2|M1)/P_I'(M1)   >=  P_I(M2|M1)/P_I(M1).

The key theorem proved in this paper states that if M1 and M2 are more strongly associated conditionally given I' than given I, then

      P(I| M1 and M2) <= 
         [P(I)/P(I')]*[P(M1|I)/P(M1|I')]*[P(M2|I)/P(M2|I')].

(The development in the paper is more general--additional events Mi are shown to contribute additional factors of the same form: P(Mi|I)/P(Mi|I')). In the case of the O.J. trial, let M1 be the event that blood found near the victims' bodies matches O.J.'s, and M2 be the event that blood found on the sock in O.J.'s bedroom matches Nicole's. The DNA lab Cellmark Diagnostics estimated that P(M1|I) = 5.88 E-9 and P(M1|I) = 1.47 E-10. Applying the theorem gives:

     P(I|M1 and M2) <= [P(I)/P(I')](8.65 E-19).

While the prior ratio P(I)/P(I') remains unknown, a value on the order of (1 E+10) would seem quite generous to the defense. But this still leaves P(I|M1 and M2) <= (8.65 E-9).

DISCUSSION QUESTIONS:

(1) The authors argue that the (8.65 E-9) figure would establish guilt beyond a reasonable doubt. Do you agree?

(2) How would you explain the above calculation to a jury?
<<<========<<




>>>>>==============>
Ask Marilyn
Parade Magazine, 7 February 1999, p. 10
Marilyn vos Savant

If a DNA test could tell you whether you were carrying a gene associated with a fatal illness, would you want to know? A reader poses that question to Marilyn as follows:

Say that there is a test that can tell a woman if she has inherited such a gene, but she cannot decide whether to take the test. She tells you, 'Marilyn, if I don't have the gene I would prefer to know the test result, so I could live the rest of my life without fear. However, if I do have the gene, I would prefer not to know, so I would not lose hope.' What would you recommend that she do? I'm not looking for an answer that gives her personal advice like, 'You should go ahead and take the test and accept the result.' Instead, is there a way for her to know the test result if it is negative for the gene, but not know the result if it is positive...?

While at first glance it seems impossible to satisfy this request, Marilyn proposes the following scheme. The woman should take the test with the understanding that the report will be subject to the outcome of a coin toss. After the doctor sees the test result, he privately flips a coin marked "positive" on one side and "negative" on the other. If both the test and the coin come out negative, he tells the woman the result. On the other hand, if either the test or the coin (or both) comes out positive, he does not tell her the test result.

If she so desires, the woman can request additional coin tosses, with the same rules for reporting each time. Marilyn says that "If her test result is negative, she will eventually hear about it. If she wants to stop at three flips to avoid increasing discomfort, she can."

DISCUSSION QUESTIONS:

(1) Suppose that prior to the test, there is a 1 in 100 chance that the woman is carrying the gene. Suppose further that the test is perfectly accurate. After one coin toss leading to a non-report, what is the chance she has the gene? How does this change after three tosses with no report?

(2) How many non-reports could she receive before there is a better than even chance that she has the gene?
<<<========<<




>>>>>==============>
Bob Griffin suggested the following article.

Americans wait for the punch line on impeachment
The Washington Post, 26 January, 1999, A1
Howard Kurtz

The comments below are from Bob Griffen based on his reading this article in the Milwaukee Journal-Sentinel, where it was headlined "Uncle Slam: Millions get their updates from TV comics."

As I read this article, the gist and clear implication of the article was that a substantial portion of the population relies on TV comics (like Jay Leno) as their source of political information, that this segment of the populace is growing, and that sources like Leno affect public opinion. The article refers to a 1996 Pew Research Center poll (..."a quarter of those surveyed said they had learned about the presidential campaign from the likes of Leno and David Letterman...etc.), but otherwise the information in the article is anecdotal.

Some questions arose in my mine about how the poll might have established the apparent political communication role and effects of Jan Leno et al., and the idea that a growing portion of the populace was tuning into politics through this kind of content. So, I visited electronically the Pew Research Center web site and called up the report to which the article referred.

Editor's note: The report is called TV NEWS VIEWERSHIP DECLINES
Bob continues:

One problem with the data set is that the questions about the late night TV shows as sources news about the campaign or candidates (in Q29) are not posed in such a way as to be parallel in structure or interpretation to the questions about sources folks rely on most for presidential election news (Q27 and 28). Q29 ask how often R "ever learns something about the presidential campaign or the candidates" from the sources listed. It is a stretch to go from these data to stating that "a quarter of those surveyed said they had learned about the presidential campaign from the likes of Leno and David Letterman." Folks probably use multiple sources for information. The lead of the article is, there, even more problematic. The lead clearly suggests that many people avoided serious reports about the president's State of the Union Address and heard about it instead from Leno (even though Leno's show's come on after the new in most places that I'm aware of). There certainly may be a kernel of truth in the article, but it surely stretches way beyond that.

There are no trend data, at least that I found in the Pew analysis, that indicate that :a growing segment of the population is tuning into politics" through Leno et al., as the article states. Similarly, there is nothing establish effect, which is something a lot of folks in and out of the media tend to assume via intuition but, as you know, are hard to establish empirically.

DISCUSSION QUESTION:

(1) Here are the results of question 29.

Now I'd like to ask you about some other ways in which you might be getting news about the presidential campaign. For each item that I read, please tell me how often if ever you learn something about the presidential campaign or the candidates from this source.

                                     Regularly  Sometimes  Hardly Ever  Never    

     a.   Religious radio shows, such
          as "Focus on the Family"         6        12        15         67 

     b.   Christian Broadcasting Network   6        12        16         65 

     c.   Talk Radio shows                12        25        24         39 

     d.   MTV                              3        10        12         74 

     e.   Late night TV shows such as
          David Letterman and Jay Leno     6        19        19         56 

If you were writing about the results of this poll, how would you report the results of this question?

(2) The instructions to the questioner were to rotate the item. What does this mean and why is it done?
<<<========<<




>>>>>==============>
Better loving through chemistry
The New York Times, 14 February 1999, Sect. 4, p. 1
Denise Grady

When else would this story run but on Valentine's Day? It is based on a recent article in the Journal of the American Medical Association (Edward O. Laumann, et. al. "Sexual Dysfunction in the United States." JAMA. 1999; 281: 537-544).

The study was based on data from the National Health and Social Life Survey, which followed a 1992 cohort of 1749 women and 1410 men and concluded that Americans have a high risk of experiencing sexual dysfunction. The abstract for the article reports that overall, women have a 43% risk, compared with 31% for men. Variations associated with factors such as age and educational attainment were more pronounced in women, but similar patterns were seen for men. Higher educational attainment was negatively associated with dysfunction. Not surprisingly, people experiencing emotional or stress problems were at higher risk, including those who experienced a drop in economic position. A strong association was found between sexual dysfunction and negative experiences in sexual relationships and general well-being, leading the researchers to label this an "important public health concern."

But how was sexual dysfunction defined and measured here? The survey used seven dichotomous response items: (1) lacking desire for sex, (2) arousal difficulties, (3) inability to achieve climax or ejaculation, (4) anxiety about performance, (5) premature climax or ejaculation, (6) physical pain during intercourse, and (7) not finding sex pleasurable. Respondents were asked if they had experienced any of these in the last 12 months. Only those respondents reporting at least one partner in the prior 12 month period were included in the analysis.

The JAMA authors noted that sexual dysfunction tends to be under-reported and that those affected rarely seek medical treatment. The Times article notes that treatment might include counseling or hormone therapy, or even a new drug treatment now under investigation that would give Viagra to women. In this regard, it is interesting to note that two of the three JAMA authors had served as paid consultants to Pfizer, the maker of Viagra. A JAMA editor said that the authors' financial connection was not mentioned due to "an oversight." Pfizer was not a sponsor the current survey.

DISCUSSION QUESTIONS:

(1) Under-reporting? Upon reading the list of survey items, one of our colleagues expressed surprised that the study didn't find 100% of the population at risk. Do you agree?

(2) Why do you think the analysis excluded respondents who were not sexually active during the last 12 months? What effect might this have on the results?

(3) Is this a public health problem? Or is it an example of what the Times calls "creeping medicalization," the trend toward seeking medical solutions for any problem in people's lives?
<<<========<<




>>>>>==============>
We are often asked what students do for their projects in a typical Chance course. We have put some of their projects on the Chance web site under Teaching Aids. Here are some comments on the projects of the students in a Chance class taught this Fall at Dartmouth. Students were asked to make some hypotheses about the study before they carried it out and to give their results and suggestions for improvements if they were to do the study again.
<<<========<<




>>>>>==============>
What Influences a Home Run?
John Kline and Jin Park

For their project John and Jin hypothesized situations they thought would affect home-run performance. These situations were: (1) home vs. away, (2) night vs. day, (3) handedness for pitcher and batter the same vs different, (4) count(0,0) vs. count (0,2). They tested their hypotheses on the records of three players Walker, Bichette, and Castilla, using data over a three year period. They found that day/night was not significant for any of the three. Home/away and count were significant for Bichette and Castilla but not Walker and handedness was significant only for Walker.
<<<========<<




>>>>>==============>
Candy Corn and the chance variation among them.
Elizabeth.Ann Kavanaugh

This student decided to look at three different brands of candy corn and made three hypotheses: (1) she would find a mutation of at least 30% in each brand (mutations considered were: wrong color, wrong shape, no white tip, and a broken tip). (2) she would find a correlation of around .7 between kernel length and length or orange segment, and (3) she would find a normal distribution for the length of the candy corns.

The mutation hypothesis was confirmed with a mutation rate close to 30%. However, the correlation between length of orange section and total length was only about .3. The distribution of the total lengths slightly skewed to the left, but she did not feel that the normal hypothesis could be rejected.

Her poster was a work of art with real examples of "ideal" candy corn and real examples of mutations.
<<<========<<




>>>>>==============>
A Study about Surveys
Melissa Nagare
Melissa asked the question: Do you get a more representative sample of Dartmouth students by an e-mail survey or by a campus mail survey? She sent out an e-mail survey and a campus mail survey asking questions, for which she could obtain the exact population values such as: class, gender, member of a Greek house, ethnicity etc. She concluded that both methods give a representative sample but e-mail gave a 40% higher return rate and overall more reliable estimates. She also collected data from the other students' surveys to study the response rates and the value of "gimmicks" and follow-ups to improve response rates. Neither seemed to be very effective.
<<<========<<




>>>>>==============>
How good an indicator is the Dow Jones Industrial Average?
Jeffery D. Isaacs

Jeffery compared Dow Jones performance over a 60 day period with a similar number of a randomly chosen stocks and a group that he picked. No significant differences were found, but the random group was almost always lower than the Dow while Jeffery's choices followed the Dow quite closely.
<<<========<<




>>>>>==============>
Predicting success in Eastern Collegiate Athletic Conference hockey.
Shane Ness and David Risk

Shane and David identified six factors relating to the previous year's teams that might influence the teams' next year's performance: (1) percentage of last year's goals made by returning players, (2) percentage of last years games played by returning defensemen, (3) percentage of goaltender minutes played last year by returning goalies, (4) points scored in the previous season, (5) 5-year winning percentage, and (6) last year's second half improvement (difference between last years winning percentage in the second half and the first half of the season,

Shane and David considered data over the previous three seasons and found that no single factor was significantly correlated with the next seasons winnings but that their Factor Average did have a .6 correlation.

Most of the rest of the projects were surveys. Here are a couple of interesting examples.
<<<========<<




>>>>>==============>
What Dartmouth knows: a survey of cultural literacy
Steven Menashi

A common survey on cultural literacy was carried out at Harvard, Princeton, Stanford and Cornell. The survey asked the following 10 questions: (1) Name three of the four members of the Beatles. (2) Who wrote the Wealth of Nations? (3) Which came first, the Renaissance or the Enlightenment? (4) Name one lawyer from the O.J. Simpson Trial. (5) What was the exact date of the Japanese attack on Pearl Harbor? (6) Which 16th century scholar first proposed that the Earth revolves around the sun? (7) Name three of the 12 apostles. (8) Name the three authors of the Federalist Papers. (9) Who invented the printing press in the 15th century? (10) Who is the current Prime Minister of England?

Steven carried out this survey using Dartmouth students, analyzed the results, and compared them with the results at the other four schools. 89% of Dartmouth students could name three of the four Beetles but only 20% could name three of the 12 apostles. Surprisingly only 47% could name the date of Pearl Harbor and that was higher than the percentage at any of the other four schools. Dartmouth men did better than the women on all questions except questions 1 and 7. The schools differed on individual questions but it was hard to make overall comparisons. Harvard appeared to do the best with Dartmouth next.
<<<========<<




>>>>>==============>
Survey on e-mail use
Kenneth Luallen

Ken's project was a survey to ask students to estimate the number of e-mail they sent and received and to answer questions about how they felt about e-mail.

Freshman sent an average of 26 messages a day and received 39. These numbers increase each year until seniors who sent and average of 70 messages a day and receive 105. Women send and receive significantly more e-mail than men. Freshman check their e-mail more often than seniors but the differences are not great and each class checks e-mail an average of about 20 times a day. In all cases the students' estimates were reasonably close to the true values. On the whole they had a quite positive opinion about e-mail although 28% felt it was greatly overused. Asked if they considered it convenient, a hassle, or both, 40% answered both, 58% convenient, and 1% a hassle.

Other surveys dealt with (1) racial issues at Dartmouth, (2) Dartmouth team members' beliefs about streaks in sports (93% believe that they have streaks of total concentration where they are so focused that they cannot fail.), (3) sexual experiences of the students (they have a lot of these), (4) differences between students who are members of a Greek house and those who are Independents, (5) students' use of .mp3's (downloading of CD rom quality music usually illegally) (65% use them), (6) students' experience with "hookups" (random sexual encounters) and (7) Do students read and listen to the news and if they do how much do they trust what they read and hear?
<<<========<<




>>>>>==============>
Chance News
Copyright &#169 1998 Laurie Snell

This work is freely redistributable under the terms of the GNU General Public License as published by the Free Software Foundation. This work comes with ABSOLUTELY NO WARRANTY.

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

CHANCE News 8.02

(21 January 1999 to 20 February 1999)

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!