Prepared by J. Laurie Snell, Bill Peterson and Charles Grinstead, with help from Fuxing Hou and Joan Snell.
Please send comments and suggestions for articles to
jlsnell@dartmouth.edu.
Back issues of Chance News and other materials for teaching a Chance course are available from the Chance web site:
Chance News is distributed under the GNU General Public License (so-called 'copyleft'). See the end of the newsletter for details
Chance News is best read with Courier 12pt font and 6.5" margin.
===========================================================
Professor Marshall suggests that the weaklings among the sober had grandparents who were profligate. It may be so, but as a statistican I reply: -- "Statistics--and unselected statistics--on the table please."
Karl Pearson
Letter to the Times July 12, 1910
===========================================================
Note: This quote is the origin of the title of Stephen M. Stigler's recent book "Statistics on the Table" (Harvard University Press 1999). This is a wonderful book in which Stigler shows that the very first attempts at a new form of statistical analysis are often remarkably good and that we can learn a lot from studying the masters at work. The quote relates to an argument between Pearson and economist Alfred Marshal about a study Pearson did to show that alcoholism is not an heritable trait. This was a hot political question in those days with the temperance preaching that alcoholism was heritable. Stigler calls this study a model social scientific statistical investigation. Incidentally, it led to a serious statistical debate between academic experts in a major newspaper (The Times)--something you will not find today.Contents of Chance News 9.08
Note: If you would like to have a CD-ROM of the Chance Lectures
that are available on the Chance web site, send a request to
jlsnell@dartmouth.edu with the address where it should be sent.
There is no charge. If you have requested this CD-ROM and it has
not come, please write us again.
<<<========<<
>>>>>==============>
Here are a couple of amusing items from Forsooth! June 2000 RSS
News, V. 27, #10 page 5
Baker Arthur Jones is finally hanging up his apron after producing millions of loaves of bread. Mr. Jones has run the popular Iscoed Bakery near Barmouth for 50 years after taking over from his father. But his retirement this weekend will mark the end of an era for the village. Mr. Jones estimates he has baked more than 23 million loaves. The bakery is unique in delivering the bread daily to each of its 120 customers. |
Teletext (Welsh redion) 25 March 2000
Plastic Card Fraud Losses 1999 Type of fraud Lm loss Growth 98-99 Lost/stolen 80.1 41% Counterfeit 50.6 27% Mail non receipt 14.7 8% Application 11.4 27% Card not present 29.5 16% Other 3.0 2% Total 189.3 121% (We assume that Lm means thousands of pounds sterling) Fraud Watch (C&M Publications)<<<========<<
A reader of the Columbian noticed that winning numbers 6-8-5-5 for the Pick 4 Oregon Lottery game for Wednesday June 28 were announced in the Columbian before being announced by the Lottery people. Lottery officials were dismayed and checked all their methods of preventing a leak. After they were convinced this could not have happened, they called in the cops and Lloyd W. Beil, a detective with the Oregon State Police's gaming enforcement section, paid a visit to the newspaper. Here is the explanation the paper gave Beil:
The Columbian's computers crashed Wednesday and we had
to scramble to re-create a news page that had been lost.
It happened to be the page that had the lottery results.
A copy editor was assigned to go back and get the Oregon
Lottery numbers off the News wires. We were pushing
deadline, and he had to be quick.
He spotted the Pick 4 numbers. Problem was, he grabbed the Virginia Pick 4 numbers, not Oregon's and, miracle of miracles, Virginia's Pick 4 numbers were the same exact numbers that Oregon was about to draw that day. |
The Oregon Pick 4 game can be played in a number of ways. One option is to specify 4 numbers from 0 to 9 in a specific order. If these numbers occur in the order you specify, you win $5,000 for a dollar bet.
Lottery spokesman David Hooper commented:
The odds of hitting the Pick 4 are about 10,000 to 1. And the odds of a newspaper computer crash, pulling the Virginia lottery numbers by mistake and having those numbers be the same numbers drawn in Oregon the next day? A gazillion to one. |
DISCUSSION QUESTIONS:
What event should one consider in estimating the odds?
How does a gazillion to one sound to you?
<<<========<<
>>>>>==============>
Our second remarkable event was suggested by Dan Rockmore.
South Dakota man has lots of luck.
USA TODAY, 17 July, 2000
Melissa Geschwind
David Howard, 39, from Brookings, South Dakota sank a hole-in-one Monday evening and then bowled a 300 game Tuesday.
The article states that Howard is a better bowler, averaging 210 on the lanes and shooting about a 45 for nine holds of golf. His perfect bowling game was his fifth since taking up the game ten years ago. His hole-in-one was his first in six years of playing golf.
The article reports:Taking into account Howard's age and skill level at each sport, USA TODAY sports analyst Danny Sheridan figured the odds against Howard's feat. The verdict: Howard is more than twice as likely to win the Powerball jackpot -- a mere 80,089,128-to-1 long shot-- than he was to achieve dual perfections. |
DISCUSSION QUESTION:
How would you compute the odds for Howard's feat? Do you think Sheridan's estimate is reasonable?This article relates to the Canadian lotto 6/49. In this lottery you pick six distinct numbers each from 1 to 49, plus a bonus number. To win the jackpot you need to match your six numbers with those chosen by the lottery officials. The bonus number is used only for the second prize which requires 5 of the six numbers to be correct as well as the bonus number. You can find the odds for the possible prizes by going to Fred's lottery site.
In particular, the probability of winning the jackpot is 1/13,983,816 = 0.00000007151 which Fred explains is roughly the same probability as obtaining 24 heads in succession when flipping a fair coin!
According to this article:Rufus Arsenault has been playing the same 6/49 numbers since it began (eighteen years). He normally spends $18 to $30 each week playing combinations of the same numbers, and estimates that over the years he has won $3000 to $4000 over-all. |
Arsenault is president of the Valois Village Merchants Association, which held its fourth annual street fair on Saturday June 18. Arsenault said that on Friday night he put the slip of paper with his usual numbers on the counter with the keys on top, so he would't forget to buy his tickets the next day. He says:
I got up at 3:45 a.m. Saturday and I was in such a panic, whether it would rain, all the things I had to remember. I grabbed my keys and left the lottery paper on the counter. |
Sunday morning, as usual, he went for coffee with a friend and to check the lottery numbers in the newspaper.
On the first ticket I won nothing, on the second it was $10, on the third I had four numbers out of six and the fourth ticket was five numbers out of six. The last ticket has six out of six. |
Then he noticed that the date on his last ticket was for the draw of Wednesday, June 14, and not for Saturday June 17.
This remarkable event was also covered by the Saturday Night Magazine, which asked:
But what, we wondered, are the odds of your numbers
coming up, after eighteen years, on the one weekend
that you didn't play? To find out, we consulted
mathematicians at universities across the country. There have been 1,712 Lotto 6/49 draws since the first one on June 12, 1982. Arsenault missed one draw. |
Therefore the odds of him missing a draw (up until now) are one in 1,712. The odds of him not playing and his numbers winning in the same week are equal to the probability of his missing a week multiplied by the probability of him winning, or one in 23,940,292,992.
We don't understand the one in 1,712, nor the application of the product rule here, nor agree that it answers the question posed by Saturday Night Magazine.
A = {Arsenault forgot to buy a ticket for June 17} and
B = {Arsenault's numbers were selected for the June 17 jackpot}.
Clearly P(B) = 1/13,983,816. But how can a probability be assigned
to the event A which presumably will never be repeated by
Arsenault?
There is more. The question was posed to Fred by Saturday Night Magazine, but in the following fashion:
Is there a way of calculating the likelihood of this happening? That is to say, the likelihood of his [Arsenault] not playing the week his numbers were drawn? |
Fred's interpretation, based on the language of this question, was as a conditional probability P(A|B) = 1/1,712 given that Arsenault's numbers were drawn. Or should the answer be conditional on Arsenault forgetting to buy a ticket P(B|A) = 1/13,983,816, as suggested by Laurie Snell? This appears to be one of those semantic problems in which the language is a mischievous culprit, inadequate to convey the intent.
DISCUSSION QUESTIONS:
1. Which of the above calculations do you think reflects the intended meaning? Both Fred and Laurie agree with the value 1/13,983,816.
2. Is it really possible to assign a value to P(A) = P{Arsenault forgot to buy a ticket June 17} which represents either an objective or a subjective probability?This brief note reports:
The French council of state has cancelled the results of the 1995 third cycle medical exams for interns in northern and southern France because of procedural irregularities. The council upheld the complaint of a candidate who failed the diagnostics and therapeutics exam and alleged that the questions had not been selected at random as laid down by law. Statistical probability lies behind the council's decision. Of the 12 subjects selected from the 428 in the database, six for the exam in northern France and two in the south came from one of the 49 that were added to the programme in 1993. Moreover, four of the questions came up in both sets of exams, the council said in its decision. The Conseil National du Concours d'Internat is now drawing up a series of proposals to compensate the 4500 student doctors. |
It would be interesting to know more details about this decision. If anyone has more information please pass it on. Based on what we have here, John Haigh suggests the following discussion questions:
DISCUSSION QUESTIONS:
(1) If there really are 428 different topics, can an exam that selects just 12 of them be a fair test?
(2) If there is some overlap among the 428 topics, so that if (say) question 52 is asked, then it would be inappropriate to also ask question 53, on a similar subject, what does the law mean by "random selection"?
(3)The point that four questions came up in both north and south is suggested as evidence of non-randomness; does "randomness" demand INDEPENDENCE of the selections in north and south?
(4) Just what was the calculation in "statistical probability" that led to the decision to cancel?The New York Times article, a report on the findings by Ludwig and Cook published in JAMA, states that "there is no evidence that the Brady law requiring background checks of handgun buyers contributed to a reduction in homicide rates after it went into effect in 1994."
The study and its conclusions were immediately challenged by supporters of the Brady law, including the Clinton Administration and the Center to Prevent Handgun Violence.
As described in the Times article, and in more detail in JAMA, Ludwig and Cook compared homicide rates in the 32 states that had not imposed Brady-type restrictions on gun sales (i.e. background checks and waiting periods) before the law went into effect, with the 18 states that already had such restrictions. A faster rate of decline in gun homicide rates in the first group than in the second would constitute evidence that the Brady law has had a positive impact.
The main challenge to the study is that it does not take into account the possible effects that the Brady law had on the 18 "control" states. Specifically, as Ludwig and Jens acknowledge, "...it is possible that the Brady Act may have had a negative association with homicide rates in both the treatment and control states by reducing the flow of guns from treatment-state gun dealers into secondary gun markets." (See question (1) below.)
The Brady law does not apply to secondary gun markets, such as gun shows and private sales. Such unregulated transactions are estimated to account for 30 percent to 40 percent of the market.
The JAMA article also includes two graphs that show homicide and suicide rates, for adults and juveniles, over the period from 1985 to 1997.
DISCUSSION QUESTIONS:
(1) Suppose, as described above, that enactment of the Brady law indirectly reduced homicide rates in the control states. Would this affect the conclusions of the study? If so why?.
(2) Figure 2 in the JAMA article shows firearm homicide rates for adults (21 years and older) and juveniles, in both treatment and control states. Prior to enactment of the Brady law, and after enactment, the rates are higher in the control states than in the treatment states (though both declined after Brady). Does this make sense? What might the explanation be?
(3) The study reported a small reduction in the rate of gun suicides among people 55 and older. What might account for a reduction here but not for gun homicides?
(4) Butterfield writes, "The authors said that, based on their statistical analysis, the decline in handgun homicide in the new states with the Brady law should have been much more rapid if the law had really had an effect." Is this statement correct?The data set (TD 9617) on lightning that we used is available from the National Climatic Data Center.
A more detailed description of each particular lightning event (or any other storm event) can be found at Storm Events.
The data set we used has information about lightning injuries and deaths since 1959. It specifies the date, time, state, county, fatalities, injuries, gender, location, and damage. Unfortunately, the gender is indicated for only for a relatively small part of this data. The only clue given for the reason for being struck by lightning is provided by the location where the lightning strike occurred. Here are the locations specified:
1. Under treesHere is what we found:
2. In or near water, boating
3. Golfing
4. Under trees on golf course
5. Farming, construction or near heavy equipment
6. Out in the open: fields, playgrounds, ballparks, yard, street
7. Telephone related
8. Other electronics related: radio, TV
9. Various other or unknown
Fatalities Male Females Location Fatalities Percent Fatalities Percent 1 390 14.3 92 21.4 2 370 13.6 43 10.0 3 448 16.5 7 1.6 4 32 1.2 0 0.0 5 213 7.8 6 1.4 6 754 27.7 142 33.0 7 25 0.9 10 2.3 8 6 0.2 0 0.0 9 482 17.7 130 30.2 Total 2720 100.0 430 100
From this data verify that 86% of those killed by lightning are men. We also see that more men than women are killed by lightning while playing golf and while working outdoors as expected, but this cannot explain the large difference in the number of deaths between men and women. Our results do suggest that standing under trees, being in a boat, playing golf or being outdoors are not the best things to do in a lightning storm. But it does not tell us if women just know enough to come in out of the rain or simply do not spend as much time in these activities as men etc.
DISCUSSION QUESTION:
How could we carry out a study to try to settle the question of why so many more men are killed by lightning than women?The American Automobile Association's Foundation for Traffic Safety has recently released a study which found that 20% of fatal car crashes in the US involve unlicensed drivers. You can find the full report, entitled "Unlicensed to Kill," at the organization's home page.
Researchers from Texas A&M University examined data from the US Department of Transportation's Fatality Analysis Reporting System (FARS), covering the five years from 1993 to 1997. There were data on 278,078 drivers involved in 183,749 crashes that resulted in fatalities. It was found that 38,374 (13.8%) of these drivers were "unlicensed." This means that their licenses were suspended, revoked, expired, canceled or denied; or that they had no licenses. Looking at the data in terms of accidents, 36,750 (20%) involved at least one unlicensed driver.
The study also produced a ranking of states by percent of unlicensed drivers involved in fatal accidents. Maine was the lowest, with 6.4%; New Mexico was the highest, with 23.8%.
DISCUSSION QUESTIONS:
(1) There are more drivers (278,078) than crashes (183,749) because crashes can involve multiple vehicles. From the data in the article, what can you say about the distribution of the number of vehicles per accident?
(2) The article classifies states like New Mexico, which had large percentages of unlicensed drivers in the FARS data, as "high risk jurisdictions." Does this mean that these are the riskiest states to drive in? If not, what does it mean?
(3) According to the article, "the researchers did not know the total number of unlicensed drivers on US roads, but said they believe those drivers are involved in an inordinate number of fatal crashes." What assumptions are being made here? Do they seem reasonable?Univision, the largest Spanish-language television broadcaster in the US, has charged that Spanish-speaking households are underrepresented in the sample used for Nielsen ratings in the New York area. There are 6.8 million households with television in the area, about 1 million of which are estimated to be Hispanic. Nielsen has television-viewing meters in 500 households. While Nielsen believes its sample includes a reasonable fraction of Hispanic households, officials acknowledge that they do not distinguish which of these households speak primarily Spanish.
Nielsen's results are important to marketers, and the rapid growth of the Hispanic population over the last decade makes it important to get these numbers right. To address the language issue, Nielsen conducted a door-to-door survey to estimate the size of the primarily Spanish-speaking population. The results indicated that 43 of the 500 viewing-meters should be placed in "Spanish- dominant" households. Currently there are only 21.
Needless to say, English-language broadcasters were not pleased with the prospect of losing meters from English-speaking households. They have criticized the methodology of the language survey, because interviewers initiated their conversations in Spanish and allegedly asked leading questions. Nielsen agrees that more research should have been done into "Spanish-language polling techniques," but still feels that Spanish-speaking households are being undercounted. Concerns about the survey, however, have led Nielsen to delay adjusting its meter placements.
Although Nielsen has yet to change its primary rating system, since 1995 it has maintained a separate rating called the Nielsen Hispanic index (NHI), which it believes is more accurate for Hispanic viewers. The primary service is called the Nielsen station index (NSI). Reproduced below is a comparison from a sidebar to the article. It shows Nielsen NSI figures for the number of viewers aged 18-49 for the major evening news broadcasts. The figures in parentheses are NHI estimates for Univision.
February weekdays May "sweeps" Univision 137,200 (289,600) 165,100 (217,600) ABC 228,800 289,600 CBS 108,600 116,700 NBC 289,200 230,200
DISCUSSION QUESTIONS:
(1) What do you think appropriate "Spanish-language polling techniques" are?
(2) Suppose the interviewer flipped a coin to decide whether to start each interview in Spanish or English. Presumably this would produce different proportions reporting Spanish as the dominant language. What could be learned from these estimates? Is there a "right" way to combine them?
(3) The article reported that 1 million out of 6.8 million households were Hispanic. Do the NHI figures in the table mean that Hispanic households are more likely to be tuned in to the news?A reader writes: "Some time ago, you polled your readers for their opinions about men and women in the workplace. I'm still waiting to read the results."
Marilyn replies that the numbers are in and they are surprising. Her full discussion is available in a special online report from the Parade Magazine home page.
Marilyn received responses from 7758 readers. The survey listed a number of jobs (including full-time babysitter, computer tech support, airplane pilot, heart surgeon) and asked the respondents whether they would prefer a man or a women working in that position, or if it made no difference. The only position for which respondents preferred a woman was babysitter. Marilyn adds that there was no case where a "vast majority (say 90%)" said that it makes no difference.
The web site also includes Marilyn's critique of the often-heard statistic that "women, on average, earn 77 cents on the dollar compared to men." The figure often accompanies demands for "equal pay for equal work," but Marilyn says it actually has nothing to do with equal work. Instead, the 0.77 represents the weekly median earnings of all working women divided by the weekly median earnings of all working men. Marilyn asserts that no study has actually compared compensation of men and women working at the same job and producing equivalent results. She argues that most of the perceived "wage gap" is attributable to education, experience and delayed careers.
DISCUSSION QUESTIONS:
(1) Are you surprised by the survey results? What would you like to tell Marilyn about her sample?
(2) Do you agree with Marilyn that colloquial descriptions of the "77 cents on the dollar" calculation are misleading? What is the relevance of the figure?
(3) What do you think would happen to the 0.77 figure if it actually used the mean earnings rather than median to represent what happens "on average"?Surprisingly, it is rare for an author of a paper in a scientific journal to get a chance to explain to the readers of a major newspaper what the article is all about. In this article Dr. H. Gilbert Welch, at the Dartmouth Medical School, has this opportunity. He discusses the 14 June JAMA article for which he was a co-author.
The main point of the JAMA article is that the five-year survival rate is not a good way to measure progress in curing various kinds of cancer.
The five-year survival rate is the proportion of people alive five years after being diagnosed with the disease. In the LA times article Welch shows by simple examples two ways that this rate can increase even when there has been no change in the treatment of the cancer.
The first way this can happen results when new tests for a particular cancer permit earlier detection of the disease. Welch asks us to imagine a group of men who died of prostate cancer at age 78, the current median age of death for men with this disease. If these men had been diagnosed to have prostate cancer at age 75, their five-year survival rate would be 0. But if they had been diagnosed with the disease at age 70 this rate would be 100%. In other words, early detection of a disease can cause the five year survival rate to increase even with no change in the way the disease is treated.
The second way the five-year survival rate can increase with no change in the treatment for the disease comes from the fact some cancers can be both progressive and non-progressive. In the progressive form, the disease can progress to the point that it causes death. In the non-progressive form, there will typically be no symptoms and the disease will not result in death. Prostate cancer is such a disease. Before the recent aggressive testing, all patients diagnosed with prostate cancer would have symptoms for the cancer and might die from it. Thus if 1000 men were diagnosed with prostate cancer in 1950 and 400 of these were alive five years later, the five-year survival rate would be 40%. But now with new tests for prostate cancer, if 1000 men are diagnosed with prostate cancer some of these will have the non-progressive form of the disease and this will increase the survival rate even if there is no change in the treatment for the disease. For a graphical illustration of these two possibilities click here
In their JAMA article the authors obtain data for the 5-year survival rate, the mortality rate, and the incidence rate for 20 forms of cancer for 1950-1954 and 1989-1995. The results are presented in the following table:
5-year Increase survival % in 5-year % change(1950-1996) Cancer 50-54 89-95 survival Mortality Incidence Prostate 43 93 50 10 190 Melanoma 49 88 39 161 453 Testis 57 96 39 -73 106 Bladder 53 82 29 -35 51 Kidney 34 61 27 37 126 Breast 60 86 26 -8 55 Colon 41 62 21 -21 12 Rectum 40 60 20 -67 -27 Ovary 30 50 20 -2 3 Thyroid 80 95 15 -48 142 Larynx 52 66 14 -14 38 Uterus 72 86 14 -67 0 Cervix 59 71 12 -76 -79 Oral Cavity 46 56 10 -37 -38 Esophagus 4 13 9 22 -8 Brain 21 30 9 45 68 Lung 6 14 8 259 249 Stomach 12 19 7 -80 -78 Liver 1 6 5 34 140 Pancreas 1 4 3 16 9
Note that anyone wanting argue that we are winning the war against cancer should use five year survival rates. However when you look at the mortality rates things do not look so good.
The authors of the JAMA article point out that incidence rates suffer from some of the problems of 5-year survival rates: early detection can increase an incidence rate as can the detection of non-progressive versions of the cancer. On the other hand they say
Mortality rates can be expected to decrease with any improvement in cancer control: be it risk factor reduction, successful early detection efforts, or better treatment of advanced disease. |
From this they conclude:
To measure the true progress in the "war against cancer" physicians and policymakers should focus on mortality. |
For an interesting article on measuring the success on the war against cancer see: Cancer Undefeated, John C. Bailar 111, Heather L. Gornic, New England Journal of Medicine, 29 May 1997, Vol. 336, Number 22, 1569-1573.
DISCUSSION QUESTIONS:
(1) The authors of the JAMA article say "the five year survival rate is not a rate." Why do they say this?
(2) In their JAMA article the authors show a scatter plot for the increase in five-year survival for the period 1950-1995 and the percent change in mortality for the period 1950-1996. We would expect these to be negatively correlated. Does the data from their table suggest such a correlation? (The correlation turns out to be 0).
They also considered the correlation between the percent change in mortality for the period 1950-1995 and the percent change in incidence for the period 1950-1996. Would you expect these to be correlated? (The correlation was .49.)
(3) The authors say that the five year mortality rate is a reasonable measure to use randomized trials. Why is this?A little more than a decade ago, Allan Bloom's "The Closing of the American Mind" became an international best-seller. It was an intellectually demanding book, and the article questions whether all those buyers were actually readers as well. It introduces the idea of "the emperor's new book" or the "the unread best-seller," citing Stephen Hawking's "A Brief History of Time" as another historical candidate. Currently, Saul Bellow's "Ravelstein," a novel about Bloom, also seems to fit the bill.
While it is not easy to cite actual data, many publishers and booksellers seem to believe the unread best-seller theory. Michael Willis was the marketing director at the Free Press when that company published "The Bell Curve" by Herrnstein and Murray. He says "We thought it was very much the case that both professionals and the general public bought it to have it and didn't read it. We got the sense even from reviews that people basically read the first chapter and the last." One New York retailer--who said that booksellers often joke about the phenomenon--cited Harold Bloom's "Shakespeare: The Invention of the Human" as another example, noting that "Everybody would like to think they're going to read that much about Shakespeare, but then they don't."
A more scientific approach was tried by in 1985 Michael Kinsley, then working at The New Republic. He produced 70 coupons, each redeemable for $5 cash, and inserted them in the back of books in Washington, DC bookstores. Among those volumes chosen for the test were "Deadly Gambits: The Reagan Administration and the Stalemate in Nuclear Arms Control" by Strobe Talbott and "The Good News Is the Bad News Is Wrong" by Ben J. Wattenberg. These were judged to be books that Washington insiders would like to say they had read. As it turned out, none of the coupons was ever redeemed.
DISCUSSION QUESTIONS:
(1) Do you see any problems with Kinsley's approach? Do you believe that none of the books in his sample was read?
(2) Can you propose a better way to estimate the fraction of books that actually get read?USA TODAY has been presenting the impact hitter and impact pitcher of the week all season long.
How the Impact Player Index is computed
Total bases, earned run averages, innings pitched and hits given up are all part of tracking players. Now, USA TODAY has crafted an index that uses such statistics to develop one number that measures performance. First, separate indexes are created for pitchers and hitters. The final number, a standardized score for both hitters and pitchers, is comparable because the scores for each category are based on how far away they are from the average. On the index, a zero is average. Factors were decided through a combination of statistical analysis and surveys of USA TODAY's baseball writers. Factors for hitters include total bases, runs, hits, RBI and stolen bases, which are then weighted. RBI are weighted the most. Hitters must have a minimum of 12 plate appearances in a week to be included in the index. Factors for pitchers include hits, walks, earned runs and wins. Earned runs are weighted the most. Relief pitchers must have a minimum of 2 1/3 innings pitched in a week to be included in the index and starting pitchers must have a minimum of 5 2/3 innings pitched in a week. Elias Sports Bureau provides the raw baseball statistics for the index, which includes games from July 31-Aug. 6. |
For games from July 31 through Aug. 6, for example, Will Clark of the Saint Louis Cardinals was the impact hitter of the week and Garrett Stephenson, also of the St. Louis Cardinals, was the impact pitcher of the week.
Students can attempt to uncover the two equations used to determine impact by gathering impact values along with other player statistics using least-squares/regression. Data on hitters and pitchers, including the impact values, may be found at USA TODAY's Web page
Some preliminary analysis suggests, however, that not all the variables needed to compute a player's impact value are at this site. Further player data may be found here.
As impact players are determined weekly for games Monday through Sunday inclusive it would be important to access this second Web site sometime late Sunday or early Monday morning after the games that have been played Sunday have been recorded (it is possible to access player records for the previous 7 days at this site).
Presumably, impact values are scaled according to the amount of playing time. So, for example, hitter impact seems to depend on Total Bases/At Bats (the so called "slugging percentage") rather than just Total Bases.
I'm guessing you have already received plenty of correspondence regarding the recent story on the novel "Uncle Petros and Goldbach's Conjecture" and the one million dollar bounty put on the conjecture by the book's publisher ("In the Life of Pure Reason, Prizes Have Their Place" by Bruce Schechter, New York Times, April 25, 2000 D5). (Note: Goldbach's conjecture is that every even numbers is the sum of two prime numbers) I brought up the story with one of my classes, and it generated some great discussion. Besides the genuine interest it created re the conjecture itself (a million bucks will get most people's attention), the bounty offer makes a great subplot for a probability class. Some of the questions that we touched on: (1) What is a reasonable way to estimate the probability that a famous unsolved math problem will be solved within a given time window (in this case 2 years)? (Guesses were all under 0.5, but still a lot higher than mine. ) How does one tackle a question like this? (2) What kind of probability distribution could be useful to estimate the "arrival time" for the solution of a particular conjecture. (Here I brought up the history of recent solutions to other famous problems: The Four Color conjecture, Fermat's Last Theorem and Kepler's conjecture, which was a great opportunity to briefly introduce the students to the problems. This also led to an interesting sidebar: What is a good model to describe the growth of mathematical knowledge over time?) (3) According to the story, the publisher paid Lloyd's of London a five figure sum to insure against having to pay the prize. Assuming an estimated probability p for the solution being found within the next two years, how should the cost of the premium to insure for that liability be computed? (It was the unanimous sense of the class that the publisher paid too much for the premium.) All in all I spent two one hour class meetings to talk about this story. I think it was well worth it. The students keep coming back to the discussion and continue to talk about the issues among themselves. (Two told me that they spent part of a weekend working on the conjecture itself.) |
All this intrigues us and so we started by reading the book:
Uncle Petros and Goldbach's ConjectureThis book is light reading and provides and interesting account of a mathematician obsessed with proving one of the great unsolved problems of mathematics. We enjoyed it. You can find more detailed reviews on Doxiadis's home page.
Goldbach made his conjecture that every even number is the sum of two prime numbers in a letter to the famous Swiss mathematician Leonhard Euler dated 7 June 1742. Euler replied:
That every even number is a sum of two primes, I consider an entirely certain theorem in spite of that I am not able to demonstrate it. |
Alas, this problem has become one of the major unsolved problems in mathematics. It has been proven that every even number is the sum of at most 6 primes and in 1966 Chen proved every sufficiently large even integer is the sum of a prime plus a number with no more than two prime factors.
Goldbach's conjecture has been verified to be true for all integers up to 4*10^14.
You can find links to the New York Times article that started Tannenbaum's class discussions and another interesting article in The Times.
Another article on the prize is:
Lucky NumbersHolt makes the following interesting comment:
From Godel's incompleteness theorem we know that in any formal system of arithmetic there are infinitely many propositions that are neither provable nor disprovable. Could Goldbach's conjecture be one of them? (That is what Uncle Petros begins to suspect.) If Goldbach's conjecture could be shown to be undecidable -- neither provable nor disprovable-- then this would be tantamount to proving it true! For if it is false, there must be some counterexample to it. But such a counterexample would constitute a disproof of the conjecture--thereby contradicting its undecidability. |
Holt remarks that the publishers were nonplussed when he raised this possibility.
DISCUSSION QUESTION:
Suppose someone finds a counterexample to Goldbach's conjecture. The rules for the million dollars prize speak about proving Goldbach's conjecture. Do you think they will pay up for a counterexample. How about for a proof along the lines of Holt?Andrew told us that a philosopher, David Hawkins in a paper title "Random Sieves" (Mathematics Magazine, 31, (1957-8, pp. 1-3) proposed constructing random primes in much the same way that real primes are constructed by the sieve of Erotosthenes.
Here is how the sieve of Erotosthenes works for real primes. Start with the set of all integers excluding 1:
2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,...
Cross out all multiples of 2 leaving
2,3,5,7,9,11,13,15,17,19,21,23,...
Cross out all multiples of 3 leaving
2,3,5,7,11,13,17,19,23,...
Continuing in this way we will get more and more of the prime numbers.
Proceeding in an analogous way Hawkins defines random primes as follows:
Start again with the integers starting with 2:
2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,...
Now throw out each number with probability 1/2. This might give us
2,4,5,7,10,11,13,14,17,18,19,23,...
Then start with the second number, in this case 4, and throw out each larger number in the sequence with probability 1/4 leaving for example,
2,4,10,14,17,18,23,...
Then start with the third number 10 and throw out each larger number in the sequence with probability 1/10. Continue in this way to obtain a sequence of random primes.
Hawkins and others proved that random primes share many of the properties of real primes. For example, in his Mathematics Magazine paper Hawkins gives a very elementary proof that the probability that the nth number is prime is approximating 1/log(n) and thus we expect about n/log(n) random prime numbers less than n. This is in agreement with the estimate n/log(n) for the number of real primes less than n given by the prime number theorem. For example, 1000/log(1000) is about 145. The number of real prime numbers among the first 1000 numbers is 168. Of course, it is easy to simulate random primes. Here are the primes less than 1000 in one such simulation:
2 4 6 12 16 19 26 28 30 34 37 41 43 44 52 57 60 61 68 70 75 81 83 88 89 91 94 100 103 105 107 111 117 130 131 134 135 137 146 152 157 163 172 173 174 175 176 193 198 205 208 211 215 218 222 223 228 241 256 257 263 269 271 287 290 291 298 302 313 322 324 352 358 377 380 382 384 389 398 412 420 429 431 433 435 444 451 474 485 492 495 497 500 529 546 562 564 568 571 583 599 602 603 610 625 626 629 640 644 661 663 669 681 693 699 706 709 719 733 738 742 752 767 782 792 802 810 814 839 840 861 869 894 905 907 910 911 912 919 923 924 927 938 940 962 966 970 981 982
Thus in this simulation there are 150 random primes less 1000. This is actually closer to n/log(n) than the result for real primes but, of course, both estimates are meant to be for large n.
Another celebrated unsolved problem for primes is the "twin prime conjecture" : there are infinitely many pairs of primes separated by a single number. In his lecture, Odlyzko remarked that a version of the twin prime conjecture had been proved for random primes. In the question period he was asked if Goldbach's conjecture was true for random primes. He replied that the following version of Goldbach's conjecture for random primes is true:
With probability one, for sufficiently large n every even number is the sum of two random primes. |
After saying this, Andrew said something like: I guess I do not know a reference for this but it must be easy to proof.
Therefore: we offer a prize of $5 to the first person to proof Goldbach's conjecture for random primes. We will double the prize if this is achieved before the next Chance News.This work is freely redistributable under the terms of the GNU General Public License as published by the Free Software Foundation. This work comes with ABSOLUTELY NO WARRANTY.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!