Prepared by J. Laurie Snell, Bill Peterson and Charles Grinstead, with help from Fuxing Hou and Joan Snell.
Please send comments and suggestions for articles to
jlsnell@dartmouth.edu.
Back issues of Chance News and other materials for teaching a Chance course are available from the Chance web site:
Chance News is distributed under the GNU General Public License (so-called 'copyleft'). See the end of the newsletter for details.
Chance News is best read using Courier 12pt font
===========================================================
I hope I break even this week. I need the money.
Veteran Las Vegas Gambler
From "What are the Odds?" by Mike Orkin.
===========================================================
Contents of Chance News 9.05
Note: If you would like to have a CD-ROM of the Chance Lectures
that are available on the Chance web site, send a request to
jlsnell@dartmouth.edu with the address where it should be sent.
There is no charge. If you have requested this CD-ROM and it has
not come, please write us again.
<<<========<<
>>>>>==============>
Stan Seltzer called our attention to a Gallup poll, related to the
census, of 1,006 adults with a 3% margin of error, carried out
April 7-9th. The poll found that 75% reported they had already
returned their census form and another 13% said they plan to
return it.
Stan writes:
Interestingly, this is a case where we do know p: currently (April 11) it is 61%. My question at the moment is: Why should we believe the polls? |
DISCUSSION QUESTIONS:
(1) How would you answer Stan's question? You can find the poll and Gallup's explanation for what might have gone wrong.
(2) An earlier poll, carried out February 25-27, with a national sample of 1,004 adults and 3% margin of error found that 96% of those asked said they planned to fill out and return the census form. Would you have believed this poll?
(3) In the April poll, 80% of the Republicans said they returned the form compared to 71% of the Democrats. Can this be taken as evidence for the need to address the undercount problem?On average, April in the Rockies has gotten progressively warmer since the researchers began measuring in 1976. They say the month's average is 1.4 degrees centigrade higher now. This temperature change isn't enough to be statistically significant, but it seems to have relevance for the marmots, says botanist Ken Thompson of the University of Sheffield in England. "Marmots know nothing about statistics. They only know what's happening around them," he says.
DISCUSSION QUESTIONS:
(1) Explain how a result that's not significant could have relevance for the marmots.
(2) How could you do a test to see whether the temperature increase is significant? (How do you measure the size of the relevant random variability?)
(3) When it comes to statistics, do you ever feel like a marmot?Statistics show that 84 percent of lightning victims are male, 16 percent female. Children and young men ages 10-35 are hit more often than any other age groups. |
DISCUSSION QUESTIONS:
(1) Why do you think women are so much better off when it comes to being struck by lightning?
(2) How can you interpret the statement: Children and young men ages 10-35 are hit more often than any other age groups? What other age groups are they comparing this group to?
Here is a discussion piece for Chance News as a good
example to illustrate the need to be careful when
comparing samples which come from different populations. Every year along about the end of August, ETS puts out a listing of average SAT scores for each state. Accordingly, I ask my students, "Which state almost always has the highest average?" Rarely, if ever, does anyone come up with the invariable winner: Iowa [very occasionally the usual second- place finisher, North Dakota, edges Iowa out]. I try to solicit reasons and the typical remark from my Minnesota [which tends to finish third or fourth] students is that "There is nothing else to do in Iowa." Other reasons put forward have to do with flatness of the terrain, homey honesty of Iowans and other such comments. Then, I point out that when it comes to average on the ACT, Iowa is way down the list, suggesting that maybe the exams are testing different characteristics. At that point, I ask how many took the ACT and all hands go up; for the SAT, far fewer. I try to tease out of them why they all took the ACT and why some both the SAT and the ACT. They get the picture right away that in the Midwest where the ACT rules, only the ones eager to go the coasts [where the so-called "selective" institutions are] and feel they have a chance of acceptance will take the SAT. On the other hand, elsewhere in the country, almost all would-be students take the SAT, including the not very well prepared who drag down the average. But why Iowa as opposed to Wisconsin, Minnesota, etc.? Again, content expertise is important. The ACT is created in Iowa and thus, even fewer Iowans are likely to go beyond the home town team, consequently raising the average score. This sort of hidden volunteer sampling probably arises far more often than we imagine. |
We asked Howard Wainer at ETS for more information about this problem and he suggested looking at his article: "The SAT as a social indicator: A pretty bad idea" in H. Wainer (ed.) "Drawing inferences from self-selected samples". This book was published by Springer-Verlag in 1986. It has been out of print for a while, but has just been republished by Lawrence Erlbaum Associates.
Wainer's article has numerous examples involving SAT and ACT scores showing the difficulty of drawing inferences from self selected data even when adjustments are made for differences in other variables such as percentage of students taking the exam, demographic variables (race, sex, and income) etc. This continues to be an important issue as the federal government wants to base support states and schools on the performance of their students.
In discussing Paul's question, Wainer divides the states into two groups that he calls "SAT states" and "ATC states." In 1981-82 in 21 states and the District of Columbia, a majority of the students who take any college entrance exam take the SAT. Wainer calls these states the "SAT states." In 28 of the remaining states the majority of these students take the ACT exam. In the remaining state, Washington, a majority do not take either exam, but many take the ACT. He refers to these 29 states as the "ACT states." Wainer gives the following graph comparing the SAT scores of the students in the SAT states and the ACT states.
(Courier font needed to view these tables.)SAT V + M SAT states scores ACT states |1100| |1080| IA, SD |1060| ND |1040| KS, ME, MT |1020| MN, UT, WI, WY |1000| AR, ID, NM, OK, TN | 980| AZ, CO, IL, KY, LA, MI, MS, MO, WA | 960| AL, OH, WV | 940| NH | 920| AK, NV VT, OR, NY, DE, CT, CA | 900| VA, FL, ME, MD, MA, PA, RI | 880| HI, IN, NJ, TX | 860| | 840| GA, NC, DC | 820| | 800| SC | 780| | 760|
This shows dramatically that in the ACT states, students who also take the SAT tests do better, on average, on these tests than the students in the SAT states. Wainer remarks that from this plot we might conclude that:
(1) students in the ACT states are better at taking the SAT then those in the SAT states, or
(2) those in the ACT states who choose to take the SAT are a more select group than in the SAT states.
To show support for the second alternative he plots the following graph which compares the mean rank in class for students taking the SAT exam in the SAT states and the ACT states.
Rank SAT states (percentile) ACT states |91| SD |90| IA, ND |89| AR |88| MS, NE, NV, WY |87| ID, KS, KY, MT, OK, UT, WV |86| |84| AL, AZ, NM, WI |83| LA, MN, TN, WA |82| CO, MI |81| MO |80| AK, OH OR |79| IL CA |78| |77| VT, TX |76| NC, ME, FL |75| SC, NH, IN, GA |74| PA |73| VA, MN |72| RI, DC, DE |71| NY, NJ, MA, MI, CT |70|
DISCUSSION QUESTIONS:
(1) In a third such graph Wainer plots a graph comparing mean parental income per state for students who take the SAT exam. What do you think he found from this graph?
(2) Wainer concludes from his study:
The evidence so far available indicates that when decisions of importance need to be made from variables requiring statistical adjustment, one must, at some time, for some subset of observation, do it right--i.e., get a random sample. Could this be done in comparing performance of students in different states? |
This very technical article addresses the question of what factors affect a team's chances of winning a championship series in the three professional leagues of baseball, basketball, and hockey.
Before attempting to summarize their work, we note that this problem is related to one that is considered to be the problem that started the modern theory of probability theory, and is called the "problem of points." This problem, which was first successfully solved, independently, by Fermat and Pascal, asks for the probability that a particular team wins a series, given the current score in the series. A more complete description of this problem, together with the solutions given by Fermat and Pascal, can be found in the book "Introduction to Probability" which resides at the Chance website under "teaching aids."
In all three sports, the final series that determines a given year's champion consists of a best-of-seven series. There are many factors that conceivably affect the probability that a given team wins a given game in such a series. The factors that are considered in this article are as follows: 1) The relative abilities of the two teams, 2) home-field (or home-court) advantage, 3) difference in winning percentage during the regular season, 4) whether or not either team was in the championship series the previous year, and 5) the current score (in games) of the series. The authors also try to quantify the possibility that if a team gets behind by two games, say, they may "give up".
The model used in this article contains a complicated function, whose output value represents the probability that the first team wins a particular game, and whose input values are the above items, each with a coefficient. The coefficients are estimated from the data, which consists of the records from 1922 to 1993 in baseball, 1955 to 1994 in basketball, and 1939 to 1994 in hockey. These dates are not arbitrary; they correspond to major rule changes in each of the sports that brought the sports into the "modern era." For example in basketball, the 24-second clock was introduced in the 1954-55 season. In addition to estimating the coefficients using the data, bootstrap methods were employed, because the data sets were relatively small in size.
Using these coefficient estimates, one can compare the relative importance of various factors. For example, if one team played in the preceding year's championship series (and the other team did not), then this factor roughly cancels out the second team's advantage if they are playing a given game at home. As another example, if neither team played in last year's championship series, then the home advantage is roughly equal to having a better regular-season record of 16.5 percentage points in baseball and 35.3 percentage points in basketball.
One can also use the model to compute the probability, before the series starts, that a given team will win the entire series. This is done by computing, for each branch in a tree of possible outcomes, the probability of that branch, using the game probabilities given by the model.
The authors conclude by stating that the data are explained very well by the model, using only the record differential, experience, and home advantage factors. In other words, they see little, if any, evidence that the current series score changes the probability that a given team will win the next game in the series. Another way to interpret this conclusion is to say that the individual games in a given series can be assumed to be independent events (although they are not identically distributed, because the probabilities are affected during the series by the home advantage).
One interesting exception seems to have occurred in baseball; there have been significantly more series that have lasted seven games than have lasted six. If the games were truly independent, then, whether or not the teams are equal in ability, one would expect to see at least as many six-game series as seven-game series.
Many of the above ideas were also discussed by Hal Stern in his column "A Statistician Reads the Sports Pages", in Chance Magazine, Volume 11, No. 2.
DISCUSSION QUESTION:
Prove that in a best-of-seven series, in which the probability that a given team wins a game does not change from game to game, and in which the games are considered to be independent events, the probability of a six-game series always equals or exceeds the probability of a seven-game series.This is John Paulos' monthly column. It starts off on the first of the month at The Ranking Game and then, after a week or so, moves to Who's Counting? Columns
In his May column Paulos discusses the variation in the rankings we find in the news. He begins with People Magazine's annual 50 most beautiful people, listed this year in the May 6 issue. He wonders why so many people that were on last year did not make it this year. In fact there are only three on this year's list who were also on last year's list: actors Ben Affleck and Freddie Prince Jr. and singer Ricky Martin. All but 11 of this year's most beautiful people were actors (18), actresses (15), or singers (6). As Paulos observes, no scientist or mathematician made this year's list.
Paulos also discusses the college ratings. He suggests that those who produce these rankings have to introduce variability to keep interest in the ratings. For example, he says that for the U.S. News and World Report ranking this year, the per-student spending on instruction and education-related services was weighted more than last year. As a result, in the ranking of "best national universities" Caltech moved from 9th place in 1999 to 1st place in 2000. Curiously, the only other big change (i.e., more than 4 positions) was John's Hopkins that dropped from 7th place to 14th place.
Finally, Paulos mentions that, while the variability in the Dow and Nasdak has been large lately, this is exaggerated by the fact that when the Dow is at 11,000, a 2% drop corresponds to a 220 point drop which the news reports as a bad day for the market. Also, news commentators have their "most beautiful" internet stocks which, like the most beautiful people may be mostly makeup. (However, no doubt the hype helps create instant millionaires.)
DISCUSSION QUESTIONS:
(1) If a different group of people choose the 50 most beautiful people each year, how much overlap would you expect?
(2) The change the U.S. News and World Report made in taking into account student spending on instruction and education-related services was more complicated than stated above. Read how U.S. News really changed this factor at their web site to get a more accurate statement of the change. Does this change seem reasonable to you?
(3) The weightings given to the various factors that are used to determine the final ranking are given at Undergraduate ranking criteria and weights
Do these weightings seem reasonable to you? Which would you change? How could you see how sensitive the final ranking is to small changes in the weights of the various factors used?A book with this title is usually a collection of odds such as for being killed in an airplane accident, struck by lightning, etc. The subtitle "Chance in everyday life" better describes this book.
Orkin uses the familiar examples of lotteries, roulette and other gambling games, and even war to give a remarkably clear picture of what "the laws and the many faces of chance are". Orkin writes in a clear and easy-to-understand style. He is particularly adept at using folksy analogies to illustrate important ideas about chance experiments. For example, in discussing the strategy of making many different kinds of bets at roulette, Orkin comments:
You can't turn a group of bad bets into a good one. If you try, you'll end up like the retailer who sells everything at a loss, hoping to make a profit from the increased volume. |
Of course we have learned from Parrondo's paradox that you can sometimes turn bad bets into good bets (See Chance News 9.01).
Orkin discusses the lottery early in his book. He describes "the lottery principle" to be the observation that even though any one player has a negligible chance of winning, it is very likely that someone will win when a large number of tickets are sold. He uses this principle throughout the book to explain the occurrence of many other apparently unlikely occurrences such as a nuclear power plant accident, a NATO "smart bomb" killing civilians in the war in Yugoslavia, etc.
In addition to the classical gambling games, Orkin discusses less well-known topics including chaos applied to weather prediction and animal population growth, the Kelly system for money management applied to gambling and the stock market, and I Ching applied to help Orkin himself make the decision to take a break from being a professor of statistics at California State University, Hayward, to manage the university television station.
Orkin illustrates the use of chance in mathematical models in terms of zero sum game theory and the prisoner's dilemma. He then asks if game theory can be used to model the war in Yugoslavia. He shows that many aspects of game theory have their parallels in this war. However, as often happens with simple mathematical models, some key assumptions are not satisfied. For example, a game has a set of rules that must be followed. For a war these rules come from international laws. But both players in this war clearly violated these laws: Milosevic by his policies of ethnic cleansing and NATO by its bombing of the Chinese embassy and large groups of civilians. This book that began with a charming and light hearted discussion of chance in everyday life ends with the sobering remark:
Regardless of whether a particular cause is just, if a ruthless leader, a country or a powerful alliance of countries disregard international laws, the world can become a vicious arena of uncontrollable violence, a place where the cooperative paradigm suggested by the prisoner's dilemma has little chance of taking hold. |
We recommend reading this article. Briefly, STATS is a resource for science writers and also publishes a free monthly electronic newsletter "Vital Stats: the numbers behind the news" which critiques newspaper articles that abuse statistics. This newsletter is archived on their site. STATS also provides a printed version of the newsletter available free to science writers, and to others for $25 a year.
In the April issue of Vital Stats we find an interesting discussion of the coverage of a recent poll carried out by Dan Yankelovich's firm, DYG Inc. This poll was prepared for the "People for the American Way" (PFAW) and was titled "Evolution and creationism in public education." STATS says:
But the results must have been written on a banana peel, given the way journalists slipped on the findings. |
Poll: Creationism has support: Majority of Americans
want biblical accounts taught in Darwin's theory, survey
shows.
Austin American Statesman (March 11) Survey finds support is strong for teaching two origin theories.New York Times (March 11) Teach evolution as science, most say in national poll.USA Today (March 13) Poll finds preference for teaching evolution rather than creation.Kansas City Star (March 11) |
So what did the survey actually find? The data show that 83% of Americans support the teaching of evolution, but 79 percent also accept the place of creationism in the curriculum. While nearly half regarded evolution as a theory "far from being proven scientifically," fully 68 percent regarded an evolutionary explanation of human presence to be compatible with a belief in the role of God "creating" and "guiding" human development. Only 20 percent thought that schools should teach only evolution with no mention of creationism. |
We noticed that STAT's brief statement "The data show that 83% of Americans support the teaching of evolution, but 79% also accept the place of creationism in the curriculum," appears to miss an important point.
In the 54 page report on the poll prepared by DYG, Inc. and available at www.pfaw.org/issues/education/creationism-poll under "Main Findings" (page 5) we read:
The overwhelming majority of Americans (83%) want Evolution taught in public schools. While many Americans also support the in-school discussion of religious explanations of human origins, the majority do not want these religious explanations presented as "science". They would like these Creationist ideas to be taught about in separate classes other than science (such as Philosophy) or taught as a "belief." Only a minority of the public (fewer than 3 in 10) wants creationism taught as science in public schools. |
This suggests to us that the USA Today headline is the most accurate but even better would be to combine this headline with that of the New York Times to something like:
Survey finds support is strong for teaching two origin theories: evolution theory in the study of science and creationist theory in the study of philosophy or religion.
DISCUSSION QUESTIONS:
(1) Why do you think there was such strong support both for the teaching of Evolution and Creationism? Does this seem contradictory to you?
(2) In the Baltimore Sun article, Shane interviews David Murray, research director of STATS and Robert Lichter, president of STATS. The article reports that STATS has a staff of three and an annual budget of about $450,000. Shane asks if the fact that most of their support comes from conservative organizations and that STATS seems to cover a disproportional number of issues of particular interest to conservative organizations, such as environmental issues, indicates a bias on the part of STATS. Murray replies:
If STATS goes after environmentalists' overblown claims more often then those of corporations, that's because fewer questionable corporate claims make it into print. |
According to the article, Lichter acknowledges the rightward tilt of key backers but says:
The conservative foundations fund you first because they hate the media the most. |
They both argue that STATS is not biased in its presentations giving examples where they have criticized conservative articles and obtained funding from liberal sources.
How would you test the hypothesis that STATS's critiques of newspaper articles are biased?
(3) The People For the American Way would appear to be a liberal organization. Do you think that when a polling company is paid by a liberal or a conservative organization to carry out a poll that the final report they produce will be biased?
(4) A survey (n = 2) of newsletters that critique newspaper articles reported that such newsletters have a staff of three people and an average yearly budget of $225,000. What is your estimate for the budget of Chance News?"Gonorrhea Rates Decline with Higher Beer Tax." That is the startling title of a new report from the Centers for Disease Control, which investigated the effect of national alcohol policy over the period 1981-1995. In fact, in 24 of 36 states, higher beer prices were accompanied by lower gonorrhea rates. But the title reads like a textbook example of the perils of confusing association with causation. David Murray, research director of STATS, did not mince words in criticizing the report. He compared the conclusion to saying that "the sun goes down because we turn on the street lights." But the fact that the finding drew wide press coverage brings to mind the "Divorce Revolution" case discussed earlier in this Chance News. Indeed, as the present article ruefully notes, the themes of "sex, youth and alcohol" apparently proved irresistible to the press.
But just what did the CDC have in mind? An editorial accompanying the report warned against jumping to causal conclusions, citing two limitations of the study. First, reporting practices for gonorrhea differ across states, making direct comparisons difficult. Second, it noted that "the analysis may be subject to confounding effects of unobservable factors. Omitting these variables could cause substantial bias." Nevertheless, Harold Chessen, a health economist at the CDC, states that alcohol use is known to be associated with risky sexual behavior. He is quoted as saying that the report is "consistent with the idea that higher taxes can reduce sexually transmitted disease rates."
Chessen was reportedly amused when asked why the Prohibition hadn't proved helpful in reducing risky behavior back in the 1920s. He responded simply that "our study didn't address Prohibition! I don't know."
DISCUSSION QUESTIONS:
(1) The article suggests some possible confounding variables that might help explain the findings. What examples can you think of?
(2) Do you think that the Prohibition is relevant to the present discussion?This book takes its title from a phrase uttered by Alan Greenspan, chairman of the Federal Reserve Board, on December 5, 1996. He was characterizing the behavior of investors in the (U. S.) stock market. The title implies, and Shiller argues, that stock markets do not always behave in a rational manner, and that psychology plays an important role in the pricing of stocks by the market.
The author is a professor of economics at Yale University, is the author of several other books on markets, and is well-known as an expert on volatility of markets. This book is especially timely reading for those among us who wonder how long the current bull market will continue.
As Shiller himself says, for every statement made by an "expert" about the market, one can find another "expert" who will make exactly the opposite statement. Thus, this book represents a point of view about the market. As such, it will resonate more with those who basically share this point of view.
Shiller's main thesis, which is not original with him, is that the present U. S. stock market is significantly over-valued. It is in a "speculative bubble" phase, and sooner or later (Shiller thinks sooner), the air will be let out of the bubble, leading to significant declines in the values of stocks. In fact, in an interview that appears in the current issue of "In The Vanguard", a publication sent to all shareholders of Vanguard mutual funds, Shiller is quoted, in answering the question "How big a drop in the market do you think is plausible?" as saying:
A drop in the range of 50% to 60% is not implausible. If you ask me for a forecast of the Dow in 2020, I'd say as good a guess as any is 10,000. |
For comparison purposes, as this review is being written, the Dow stands slightly higher than 10,500.
Shiller begins by reviewing some standard economic theory about the stock markets. One of the central tenets of market theory is that the price of a share of stock of a company, while set by investors, is correlated to the current (and probable future) earnings of the company. Of course, future earnings are very difficult to predict, but one can make some reasonable assumptions about the rate at which a company's earnings can increase (at least in the near future). Since future earnings are discounted, the earnings in the distant future are less important in determining the current value of a share of stock.
The average price-earnings ratio of the stock market (specifically, the S&P 500) are higher today than at any time in the last 120 years. In fact, until 1996, the price-earnings ratio was higher than 23 only three times during this period (1901, 1929, and 1966), and each time, the real return of the stock market over the 20-year periods following these peaks were -0.2%, 0.4%, and 1.9%, respectively. The current price-earnings ratio is above 40.
If standard economic theory is to be believed, the above figures are scary. Yet there are many people who believe that the markets have undergone fundamental changes, and that we are entering a new era. James K. Glassman, a journalist, in the same issue of "In The Vanguard," is quoted as saying
It is very hard to deny that something profound has been happening in the stock market. It's gone from 777 on the Dow in August 1982 to over 10,000 today. That's a 13-fold increase. We've had five years in a row in which the S&P 500 Index has returned more than 20% in each year. Never before had it done so for more than two years in a row. |
Shiller gives similar quotes from books written in 1929. For example, in the book "New Levels in the Stock Market", Charles Amos Dice wrote of a "new world of industry," a "new world of distribution," and a "new world of finance." One of the most eminent economists at the time, Professor Irving Fisher (coincidentally at Yale), wrote in 1929 that "stock prices have reached what looks like a permanently high plateau."
Of course, for every statement that this has all been seen before, someone can construct an argument as to why this time, it is different. The question as to who, if anyone, is closer to the truth is impossible to answer. As to who is to be believed, it depends upon the perceived strengths of the arguments, and also on the psychological makeup of the listener. This reviewer is certainly impressed by Shiller's work in the psychological basis of stock prices. For example, Shiller went through newspapers that were printed in the months preceding both the October, 1929 and October, 1987 stock market crashes, searching for any news items that might be thought to have precipitated these crashes. In 1987, he was able to survey institutional and individual investors as to what they were thinking about when the crash came.
In neither case could he come up with any item, or set of items, that was a clear cause for the crashes. Shiller claims that such events as the 1929 and 1987 crashes are best explained in terms of changes in the psychological makeup of the bulk of investors. There are contradictory ideas in the minds of most investors, and these ideas compete for attention. Which ones hold sway at any given moment may depend, to a large degree, upon what other people are thinking. When enough people change their feelings about the direction in which the market will go, most of the remaining people will change their feelings quite quickly. This creates a large difference between the numbers of buyers and sellers, and can lead to sudden, substantial movements in stock prices.
What does this have to do with probability and statistics? The leading arguments against the theory that psychology is an important factor in the pricing of stocks are those that involve the ideas of efficient markets and random walks. The efficient markets theory states that stock prices accurately reflect the true value of the stocks, given the current public information. According to this theory, stock prices are unpredictable and describe random walks through time, since new public information is generally unpredictable. Shiller claims that this theory has been statistically rejected many times in various scholarly journals of finance and economics. In addition, he says that "the efficient markets hypothesis does not tell us that the stock market cannot go through periods of significant mispricing lasting years or even decades." Thus, even if the "smart money" knows that a certain stock is mispriced, unless many people can be convinced of this fact it is not possible to make money from this knowledge. While it is true that eventually this knowledge will probably become public, the theory does not say how long this will take.
What does Shiller suggest that people who believe his arguments do with their money? Certainly, if one believes that over the next 20 years the stock market returns will be just a few percent annually, then one should switch much of one's investments to other vehicles, such as bonds or real estate. At several points, Shiller states that if one wants a totally riskless investment one should buy government issued inflation-indexed bonds. His point, and it is a good one, is that such bonds are impervious to changes in the market as well as being immune to the effects of inflation. He also suggests investing in stock markets outside of the country, since such markets do not always move in tandem with the U. S. markets. He certainly believes, unlike some analysts today, that stocks are not low-risk investments, and therefore one's portfolio should contain a significant amount invested in safer places.
DISCUSSION QUESTION:
What do you think this reviewer did with his stocks after writing this review?It is well-known that using a seat belt decreases the chances of being killed or seriously injured in an automobile accident. Because of this fact, and because in the past many people did not regularly use seat belts, many states have passed laws requiring the use of seat belts.
These laws are of two types, standard and secondary. A standard seat belt law allows police officers to stop a car in which they have observed a person in the front seat who is not using a seat belt. A secondary law allows the police to issue a citation for the violation of a seat belt law only if they have first pulled the car over for another reason.
There is a difference in the rates of compliance between states having the two types of laws. There is also a marked difference between the rates of compliance of the general population and of African-Americans between the ages of 18 and 29. For example, in those states that have a standard law, 80 percent of the general population comply, but only 58 percent of the young blacks comply.
There are two competing goals when considering whether to aggressively enforce these laws. On the one hand, it is certainly true that many serious injuries and deaths would be prevented by increasing the compliance rates. On the other hand, some people fear that the laws will be used to summarily stop African- Americans, who are already stopped a disproportionate number of times.
The article states that a survey, conducted in some states with the standard seat belt law, concluded that African-Americans strongly favored those laws. The survey also found that black residents in states with standard seat belt laws reported less harassment than those who lived in states with secondary seat belt laws. This last result may not be significant, however, because very few African-Americans reported harassment as a result of seat belt laws.
DISCUSSION QUESTION:
Do you see a problem here? If so how would you solve it?In her 1985 book "The Divorce Revolution," sociologist Lenore Weitzman reported the alarming fallout from California's no-fault divorce law: divorced women's standard of living had dropped 73% on average, while that of their former husbands had increased by 43%. These figures were widely publicized, and influenced national debate over divorce law for the next decade. But in 1996, another sociologist found that Weitzman's data had been flawed, leading to an exaggeration of the effect she found. The women's standard of living had dropped, but only by 27%; the men's standard had gone up, but only by 10%.
While Weitzman's error appears to have been an honest mistake, such cases raise the issue of when and how new research findings should be reported to the public. In the academic world, a research paper goes through a rigorous process of peer review before it is accepted for publication in a scholarly journal. But when the findings are of wider public interest, there is a great temptation to disseminate them earlier, and stories like the Divorce Revolution often appear in the popular press.
A more recent example concerns the debate over school choice. Edward Muir of the American Federation of Teachers charges that schools are rushing to adopt new education programs whose effectiveness has never been verified by peer-reviewed research. His critics counter that Muir is agitating on behalf of the teachers' unions, questioning only the programs that the unions oppose.
Max Frankel, of the American Association for the Advancement of Science, cites peer-review as critical to the acceptance of scientific results. Still, he acknowledges that there may be compelling reasons to shortcut the process, such as "when public health is at stake or Congress is about to enact a sweeping policy change." Henry Levin of Columbia University disagrees, noting that "one study should never make a difference between moving towards policy or not."
Other social scientists pointed out that it is not realistic to expect that exciting new findings will escape news coverage. Excellent advice is given in a letter to the editor in response to the article (14 April 2000, A 30):
Reading [the story] brought back memories of my
struggles with a required statistics class in my
senior year at the University of Wisconsin in 1946. The one lasting lesson I learned from that class was to read every chart, graph and survey result with skepticism. There may have been problems with the sampling, the questions asked may have been wrong, the information misinterpreted, and researchers may have made human errors, and so on. Your article reconfirms those concerns. ELINOR POLSTER Shaker Heights, Ohio, April 8, 2000 |
We would be delighted if we left our Chance students with such memories!
DISCUSSION QUESTIONS:
(1) Why do you think the mistake in the Divorce Revolution report went undiscovered for 10 years?
(2) Regarding the exchange between Mr. Frankel and Mr. Levin, can you think of any single study that would compel a major policy change? Do you think the Divorce Revolution is an example?We return to our ongoing attempts to unravel the dispute between physicists John Gott and Carleton Caves regarding Gott's controversial method for estimating the future life of a phenomenon. We have discussed various aspects of the problem in the last two issues of Chance News, see 9.03 and 9.04.
We promised in this issue to explain Caves' approach and to try to pinpoint the source of the disagreement. Recall that we let L denote the lifetime of the phenomenon in question. If we come to observe the phenomenon in progress, the we have the decomposition L = A + R, where the "age" A is the time since the start of the process and the "remaining lifetime" R is its future life. Gott's conception of the "Copernican principle" says that if there is nothing special about our time of observation, then it is reasonable to assume that A is uniformly distributed on the interval [0,L]. Then for any c > 0 we have
(1) Pr(R > cA) = 1/(1+c).
This comes from observing that R > cA equivalent to A < [1/(1+c)]L (just write R = L-A). In other words, our observation falls within the first 1/(1+c) fraction of the total lifetime with probability 1/(1+c). All of Gott's recipes for prediction intervals can be seen to follow from this simple statement. The particular form (1) is directly relevant to Caves' "Dogs" paper, so we will use (1) as our basis of discussion here. For example, with c = 2, we see that
Pr(R > 2A) = 1/3.
Given an observed age A = a, Gott would form the interval (2a, infinity) as a 33% prediction interval for future life. The prediction confidence means that over the long run, 33% of the intervals constructed in this way will turn out to contain the eventual future life. Gott asserts that this procedure is completely justified if one accepts the probability statement (1).
As it stands, (1) is a statement about the joint probability distribution of A and R. Caves is willing to accept this statement (indeed he regards it as almost trivial), but claims that it has no predictive value since it is only relevant when we do not know the value of A. In all of Gott's applications, we actually observe an age A = a, so Caves says we should really be interested in conditional probabilities of the form Pr(R > cA | A = a). In his "Critical Assessment" paper, he presents a detailed Bayesian approach to the problem.
Letting f(x) be the prior density for the total lifetime, and F(x) the corresponding distribution function, he first derives
(2) Pr(L > x | A = a) = [1-F(x)]/[1-F(a)], x > a.
(He actually presents the corresponding conditional density for L, namely f(x)/[1-F(a)], which follows from our formula by taking - d/dx.) He calls this a "straightforward" Bayesian analysis, since it is an obvious conditional probability calculation, scaling the upper tail of the distribution by the chance of survival to age A = a.
Notice that (2) makes no appeal to any Copernican principle. In order to investigate the meaning of such a principle, Caves notes that we need a model for how we come to observe the phenomenon. His point is observing the phenomenon in progress is itself special, in the sense that we our observation time could occur either before the phenomenon started or after it ended. This leads to a more lengthy Bayesian analysis. Caves assumes that the phenomenon is equally likely to begin at any time. He models this by assuming that the starting time is uniformly distributed on an interval whose length D is a constant much greater than other variables in the problem, such as the lifetime of the phenomenon (the subsequent analysis implicitly involves a limiting operation where D tends to infinity). The lifetime X is assumed to be independent of the starting time, and distributed according to the prior density f.
At some time t, we make our "observation." Of course, we may not find the phenomenon in progress. In fact, finding it in progress should bias upwards our of how long it will last, since longer- lasting phenomena have a better chance of being observed. Caves analysis produces a posterior density for the lifetime, given that it is observed in progress, which is
(3) h(x) = xf(x)/m,
where m is the mean associated with density f. Readers familiar with "inspection paradox" from renewal theory will immediately recognize this density. In the renewal setup consider X(1), X(2),... to be independent inter-event times having density f, and define the usual partial sums S(n) = X(1) +...+ X(n), taking S(0) = 0. For a fixed time t, let k be the random index for which S(k-1) <= t < S(k). Then X(k) is the length of the renewal interval containing t; it is commonly denoted by L(t). The "inspection paradox" is that L(t) does not follow the common distribution of the X's. Instead, as t goes to infinity (washing out the initial effect of the origin), it has the "length biased" density h shown in (3). With informed intuition, we realize that longer intervals have a better chance of covering the fixed time t. The density h reflects this, being proportional to the length of the interval as well as to the underlying density f. Similarly, in Caves' model, the uniform starting time over the long interval of length D means that the chance of the phenomenon being observed in progress will be proportional its length, up to endpoint effects on that long interval. These are negligible for large D (much as the origin effect in the renewal setup is negligible for large t). It therefore makes sense that the Bayesian updating of the prior density f would lead to the posterior density h.
Next, conditioning on observing the phenomenon in progress, Caves derives a density for the age of phenomenon at the time of observation. This turns out to be
(4) g(x) = [1-F(x)]/m.
Again, this corresponds to a standard result from renewal theory. Continuing from our earlier discussion, A(t) = t - S(k-1) gives the time since the last renewal prior to time t. It is the "age" of the interval that we find ourselves in at time t. Renewal theory shows that the limiting density of A(t) is g. Furthermore, R(t) = S(k) - t, the time until the next renewal (the "remaining life"), has this same limiting density g.
We return again to Caves. Letting I denote the event that the phenomenon is observed in progress, he derives his own analogs of (1) and (2):
(1') Pr(R > cA | I) = 1/(1+c), and (2') Pr(L > x | A = a, I) = [1-F(x)]/[1-F(a)], x > a.
He notes their derivation does not require any prior assumptions about the density f (indeed, we note that f does not even appear in the first formula--this is essential to Gott's method). We can complete our correspondence with renewal theory as follows. First observe that
Pr(R > cA| A = a) = Pr(R > ca | A = a) = Pr(L > a+ca | A = a),
so by (2) we have
(5) Pr(R > cA | A = a) = [1-F((1+c)*a)]/[1-F(a)],
We can uncondition (5) by integrating against the limiting density (4) for A. Doing so gives formula (1). Recall that (1) is essentially Gott's Copernican Principle. We see that it emerges naturally from Caves analysis or from asymptotic renewal analysis. Gott takes it as a modeling assumption without specifying any underlying machinery: if the observation time is not special, he models A as being uniform on [0, L].
We now have agreement on the two key formulas, one unconditional and the other conditional, which we restate here for convenience.
(unc) Pr(R > cA) = 1/(1+c) (con) Pr(R > cA | A = a) = [1-F((1+c)*a)]/[1-F(a)],
The debate now turns on how to make legitimate use of these. As described earlier, the unconditional statement is equivalent to Gott's Copernican Principle. He proceeds to use it as a recipe for prediction intervals: after observing A = a, he constructs the prediction interval (ca, infinity) for R, with [1/(1+c)]*100% confidence. The confidence depends on a frequentist interpretation of the long run success rate expected when making predictions according to this recipe. Of course, if we knew F, we could make more efficient predictions--i.e., get tighter bounds-- using (con). But without such knowledge, Gott feels justified in using (unc).
Caves objects to this on two levels. First, he seems to feel that Gott is asserting after the observation A = a that R will fall in (ca, infinity) with probability 1/(1+c). This is indeed a common misinterpretation of a frequentist confidence statement; however, in our reading of Gott we find that he is careful to avoid this trap. However, Caves is a Bayesian, so on a philosophical level he insists that an interval estimate must have probability content. Having observed A=a, he states that any further analysis must necessarily involve conditioning via (2) to update the prior. For example, in the famous Berlin Wall example, it seems quite natural to use our knowledge of the political situation at our time of observation. He is therefore led to ask if there are any prior distributions F for which the conditioning would lead to Gott's results. That is, does any F give
[1-F((1+c)*a)]/[1-F(a)] = 1/(1+c) ?
It turns out that this can only happen for 1-F(x) = 1/x, which corresponds to the unnormalizable prior density 1/(x^2). Caves concludes that Gott's analysis is much more restricted in applicability than advertised.
Needless to say, we cannot hope to settle the frequentist/Bayesian philosophical debate here. On a practical level, we agree that any knowledge of F would lead to more efficient predictions, but also feel that there is no mathematical law that obligates one to use such knowledge. Failing to do so is relevant in terms of the ultimate usefulness of the predictions. There is an interesting comment at the end of the "Critical Assessment" paper which suggests that Caves might at least partly agree. He writes: "The intervals that Gott finds for survival times are so wide that he is likely to be right, at least until he is asked to make bets on the high probabilities he assigns..." The first part of this is all that Gott is claiming. He is not assigning probabilities to individual predictions (the Bayesian paradigm), he is relying on a general principle to get an overall success rate (the frequentist paradigm). Achieving such generality necessitates conspicuously wide intervals, some of which can be obviously silly. But there is no internal contradiction in his not wanting to bet on just the silly ones.
Having skirted the philosophical debate, let us conclude by noting that what can certainly be questioned--as some of Caves's examples show--are the practical implications of Gott's work. His predictions about people's ages ignore the conditional data found in a life table, so they would never be applied by an insurance company to price individual policies. His predictions about historical entities like the Wall ignore current affairs, so they would never be applied by foreign policy experts to analyze the longevity of a particular regime. His predictions about the runs of plays ignore reviews or box office figures, so they might not seem helpful if you are wondering whether that show you've been waiting to see will be around when you visit New York next month. The real moral of the story may be that the popular accounts of "How to Predict Everything" have oversold the method.
NOTE: We have prepared a Mathematica program to illustrate both the renewal results and Cave's results by simulation. You can find this program at Introduction to Probability under "Additional resources for teaching an introductory probability course"
DISCUSSION QUESTION:
Well, what do you think about all this?This work is freely redistributable under the terms of the GNU General Public License as published by the Free Software Foundation. This work comes with ABSOLUTELY NO WARRANTY.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!