Prepared by J. Laurie Snell, Bill Peterson and Charles Grinstead, with help from Fuxing Hou, and Joan Snell.
Please send comments and suggestions for articles to
jlsnell@dartmouth.edu.
Back issues of Chance News and other materials for teaching a Chance course are available from the Chance web site:
Chance News is distributed under the GNU General Public License (so-called 'copyleft'). See the end of the newsletter for details.
===========================================================
The salt controversy is the number one perfect example of why science is a destabilizing force in public policy.
Sanford Miller
===========================================================
Contents of Chance News 7.08
<<<========<<
>>>>>==============>
We have added two new videos to our Chance Lecture Series.
The first video is of a talk that Freeman Dyson gave in a 1994 Dartmouth Chance course. Dyson discusses several chance issues that he has dealt with in his long scientific career. He begins with his early days as a statistician when, in the second world war, he was asked to determine if experienced bombers had a better chance of returning than did inexperienced bombers. They did not but for a surprising reason. He next discusses problems in determining if electromagnetic fields cause cancer clusters. He then discusses the risk of asteroids hitting the earth and how we should respond to this risk. Finally, on the lighter side, he gives his thoughts about Richard Gott's method of determining confidence limits for the length of life on earth, the maximum population, the length of this Chance News, etc.(See Chance News 2.11, 2.12, 2.13, 2.14.)
The second video is a lecture entitled "Probability and DNA" given by Jonathan Koehler this summer at the Dartmouth Business School. Koehler begins with a discussion of intuition in probability and how it differs from intuition in other fields. He carries out experiments based on simple Bayesian examples to show how our intuition can lead us astray. He discusses the problems people have with conjunction and disjunction of events. Of course, he cannot resist including the infamous Linda problem. Koehler describes experiments with jurors that show that lack of understanding of how to combine probabilistic information can cause jurors serious problems. In particular, he describes experiments that show that about 50% of jurors would vote to convict in a DNA trial when told that the chance of an innocent person matching the DNA was 1 in a million, and this proportion is not significantly changed when jury members are given the additional information that there is a 2% chance of a lab error. Koehler ends with an interesting discussion of the role of probability in the legal concept of "beyond a reasonable doubt". A lively and informal talk backed up with lots of data.
<<<========<<Howell's course is the first serious use of our Chance videos, that we have heard about. We would appreciate hearing from others who have used one or more of the videos, successfully or unsuccessfully.
<<<========<<A study that would be good to use for such an experiment is the recent study carried out at Carnegie Mellon University suggesting that people who use the internet are more subject to depression than those who do not. This study was discussed on NPR on Talk of the Nation 4 Sept. 1998. You can listen to this.
It was also reported in the New York Times article:
Sad, lonely world discovered in cyberspaceand the Washington Post article:
A search for the net impact on human social lifeThe Washington Post article will be available from the net until Sept. 20 at A search for the net impact on human social life and both articles are available from Lexis Nexis.
You might ask your students to listen to the NPR program and read one of the newspaper articles and ask them which media they preferred and which they felt explained the study the best. It might be interesting to include a comparison of the two newspaper articles. Another possibility would be to divide the class into three groups and have different groups use different sources. Perhaps you or your class can come up with another relevant experiment. If you try any such experiment, please pass the results on to us and we will report what we learned in the next Chance News. To not muddy the waters, we will not give our own review of this study.
<<<========<<A graph in an article on global warming in the May 1998 issue of the National Geographic gave the projected temperatures in the Southeast U.S. for July 1 during the years from 2000 to 2500. The caption states:
Given an expected doubling of CO2 by 2100, July temperatures in the Southeast U.S. would approach 100 F, a 20 percent increase over today's (about 83 F).
A letter to the editor states:
In the caption for the graph titled "Steamy Summers for Southeast U.S." (page 69), you state that a temperature increase from about 83 F to 100 F is a twenty percent increase. This is incorrect. If a temperature change is to be expressed as a percentage change, a scale such as Kelvin--for which zero is the lowest temperature possible--should be used. In this case the temperature change is from 302K to 311K, an increase of 3 percent.
Anton Sawatzky
Pinawa Mannitoba
DISCUSSION QUESTIONS:
(1) Isn't it true that the temperature expressed in Fahrenheit will have increased by twenty percent if the projections are correct? Do you agree with the reader that this statement is incorrect? Why?
(2) In response to the letter the editor writes:
We regret the error and also our use of an inaccurate label in the illustration. The graph does not represent temperature alone, as indicated, but rather heat index. This is a measure of how warm it feels, based on a combination of humidity and temperature.
Does this change anything?
<<<========<<All families with six children were surveyed in a city. In seventy-two families the exact order of births of boys and girls was GBGBBG. What is your estimate of the number of families surveyed in which the exact order of births was BGBBBB?
The authors write:
The two sequences are about equally likely, but most people will surely agree that they are not equally representative. The sequence with five boys and one girl fails to reflect the proportion of boys and girls in the population. Indeed, 75 of 92 subjects judged this sequence to be less likely than the standard one.
Peter asks: What distribution should we expect for birth orders for families with six children? Surely not a uniform distribution! We do not know the answer but are trying to get the data to answer his question. We would appreciate hearing from anyone who knows a source for the relevant data.
The data for gender order in families with two-children was provided by Will Lassek in a letter to Marilyn vos Savant (See Chance News 6.12.) From his data it is clear that a person asked to compare the frequency of BG and GG in families with two children would be quite justified in estimating fewer GG families. See discussion question (2).
DISCUSSION QUESTIONS:
(1) Here is a favorite discussion question in a Chance class: Assume that gender of offspring can be considered like coin tossing. If families have children until they have a boy or until they decide not to have more children would there be more boys or girls. The answer is the same.
Under these assumptions what proportion of the families with six children would you expect to have gender order GGGGGB and what proportion to have gender order BGBBBB?
(2) Thanks to reader Will Lassek of Marylin vos Savant's column (see Chance News 6.12), we have the data to look at the distribution of two-children families. Lassek says that he looked at the gender order for the 42,888 two children families in the 342,018 families included in the National Interview Survey carried out by the census bureau from 1987 to 1993. (We assume this means the National Health Interview Survey.)
It is generally assumed that there are about 105 boys born for each 100 girls. Assuming a Bernoulli process with probability of a boy = 105/205 = .512 and a girl = .488 we found:
Order observed expected observed-expected BB 11,334 11,251 83 BG 11,118 10,716 402 GB 10,913 10,716 197 GG 9,523 10,205 -682
This gives a chi^2 value of 64.9753 and so the chi^2 test would certainly reject this Bernoulli model.
Can you explain why this model does not fit?
<<<========<<The Genetics & IVF Institute in Fairfax VA. has developed a method to allow a couple to have a good chance of having a child of a sex it chooses. The method uses the fact that the only difference between sperm that carry the Y chromosome, which produces males, and sperm that carry the X chromosome, which produces females, is that the sperm with the Y chromosome have about 2.8 percent less genetic material.
The process lines up the sperm individually in a stream and measures the DNA content for each sperm. Sperm containing the X- chromosome are separated from those with a Y chromosome. It takes about a day to sort the 200 million sperm of a single ejaculate. The sperm are then used for artificial insemination.
In a paper being published in the current issue of the journal Human Reproduction, the investigators report results for couples who wanted girls. Ten out of 11 babies born so far were girls.
The researches have carried out a similar number of pregnancies where boys are desired, but this article states that the researchers are not releasing the results until they are published. However, from the Institute's home page we find that the company estimates that the procedure makes it 5 or 6 times more likely to have a girl for those wishing a girl and about 2 times more likely to have a boy than girl for those who prefer a boy.
Couples with a family history of sex-linked diseases (illnesses that are genetically tied to either the X chromosome or the Y chromosome, such as hemophilia) could use sperm-sorting to avoid the chromosome that carries the disease.
But the institute has found that most couples who want to select the sex of their children do so for reasons of "family balancing". For example, a couple may already have three children, all of whom are boys, and want the fourth to be a girl. The institute offers the sex determination option to such couples. It does not offer the procedure to couples without children who want a particular sex. However, a spokesman said that this policy could change.
DISCUSSION QUESTIONS:
(1) Do you see any ethical problems with allowing families to choose the sex of their children?
(2) What do you think will happen to the gender distribution of single and two child families if it becomes really easy for families to choose the sex of their children?
<<<========<<This article on coincidences is excerpted from an article by Martin in the Sept-Oct issue of Skeptical Inquirer. Of course, the article starts with the infamous birthday problem and the well know examples of "unbelievable" coincidences such as the numerous similarities between Lincoln and Kennedy: both assassinated on Friday, both succeeding vice presidents were southern Democrats and former senators named Johnson with 13 letters in their names and born 100 years apart etc.
Martin then remarks: so far as is known, the decimal digits of the irrational number pi are random. He then goes on to say that since the digits are random we can model coin-tossing using the expansion for pi (H = even digit, T = odd digit). He then shows that he can find streaks and other unusual sequences by fishing around in this random sequence. This is supposed to show that the coincidences in our life are really just chance events that are selected out of a huge set of things that happen to us.
The article in the Skeptical Inquirer discusses these examples plus a discussion of random prices in the stock market. Again Martin uses the digits in the expansion of pi to generate stock prices and then shows that he can find in these random prices behavior that technical market analysis consider peculiar to the stock market.
In the original article an insert (presumably by the Skeptical Inquirer) remarks:
Back in 1992, the Skeptical Inquirer held a Spooky Presidential Coincidences Contest, in response to Ann Landers printing "for the zillionth time" a list of chilling parallels between John F. Kennedy and Abraham Lincoln. The task was for readers to come up with their own list of coincidences. The results were impressive and can be found in Skeptical Inquirer Spring 1992, 16(3), and Winter 1993,17(2).
DISCUSSION QUESTIONS:
(1) Martin states that the digits of pi being random means that the value of any single digit is not predictable from preceding digits. What does this mean?
(2) Do you think that the pi example would convince your Uncle Joe that there was nothing strange about getting a call from his college roommate Peter, who he had not heard from for 30 years, the day after he dreamed about him?
<<<========<<From the top of Mt. Washington the Weather Notebook presents the following problem:
What are the chances that you have breathed the same air as Copernicus?
This is an old problem. Warren Weaver, in his book Lady Luck attributes it to Sir. James Jeans who asked for the probability that a breath you take includes molecules from Julius Caesar's last breath. However, if you want to make your own calculations you can find how to enter the contest at the url given above.
<<<========<<The two topics discussed in this hour, global warming and recommendations on salt, illustrate the difficulty in getting scientists to come to an agreement on how to interpret scientific studies. We will consider only the salt controversy. The most complete article in the current news on this topic is:
The (political) science of saltThis is a long article that provides a case study of a national health recommendation: eat less salt. For three decades the National Heart, Lung and Blood Institute, and the National High Blood Pressure Education Program and numerous other organizations have recommended a daily allowance of 6 grams of salt. This is based on the theory that lowering salt intake will lower blood pressure and prevent strokes.
The original recommendation was based on "ecological" studies that showed that countries whose population had low salt diets had lower rates of hypertension. Unfortunately, studies within a given population did not show that those with low salt diets had lower blood pressure than those with high salt diets. By now there have been many more studies including controlled studies. However, researchers took their position on this issue before there were many studies and seem to be able to interpret modern studies to fit their position. This has kept scientists divided and makes it impossible to reach a consensus on the proper role of salt in our diet.
Taubes writes:While the government has been denouncing salt as a health hazard for decades, no amount of scientific effort has been able to dispense with the suspicions that it is not. Indeed, the controversy over the benefits, if any, of salt reduction now constitutes one of the longest running, most vitriolic, and surreal disputes in all of medicine.Taubes also states:
The controversy itself remains potent because even a small benefit--one clinically meaningless to any single patient--might have a major public health impact. This is a principal tenet of public health: Small effects can have important consequences over entire populations.
Richard Peto has stated that, if by eating less salt the world's average blood pressure could be reduced by a single millimeter of mercury, that would prevent several hundred thousand deaths a year. The problem is that small effects are very difficult to establish by statistical studies. Recent studies have suggested that too little salt may cause other problems. If so it might make more sense to limit salt only for those who have high blood pressure.
The use of the Central Limit Theorem in the NPR discussion arose over such a point. One expert argued that too little salt meant less than the 6 grams a day recommended so would not occur if people followed the recommendation. The other expert then pointed out that, if the present recommendations were followed, you would have a normal distribution with mean 6 and could expect significantly lower levels for a reasonable proportion of the population.
DISCUSSION QUESTION:
How can an effect be clinically meaningless for any single patient but yet save hundreds of thousand lives?
<<<========<<Guests are: Harvey Choldin, author of "Looking for the last percent", Stephen Fienberg, who writes regularly on the census for Chance Magazine, and Stephen Holmes, New York Times correspondent.
Other interesting articles on this issue are:
Excerpts from ruling on planned use of statistical sampling in 2000 censusA three-judge Federal panel, in a lawsuit filed by the House of Representatives against the Commerce Department, ruled that plans to use sampling in the 2000 census violated Federal laws. The first article gives excerpts from the ruling that indicated the courts reasoning.
Since the Constitution specifies enumeration, the court felt that the claim that statistical sampling in the apportionment enumeration does not violate the Census Act must come from the 1976 amendments to sections 141(a) and 195 which addressed the problem of sampling.
The post-1976 version of sections 141(a) states:
The Secretary shall, in the year 1980 and every 10 years thereafter, take a decennial census of population...in such form and content as he may determine, including the use of sampling procedures and special surveys.The 1976 amendment to section 195 more specifically states:
Except for the determination of population for purposes of apportionment of Representatives in Congress among the Sates, the Secretary shall, if he consider it feasible, authorize the use of statistical method known as "sampling" in carrying out the provisions of this title...The court ruled that these amendments should be considered together when deciding on sampling. Thus it was argued that the case rests on whether the exception stated in the amendment to section 195 meant "you cannot use sampling methods for purposes of apportionment" or "you don't have to use sampling methods". To settle this use of English the court provided the following two examples of the use of the word except:
Except for Mary, all children at the party shall be served cake. Except for my grandmother's wedding dress, you shall take the contents of my closet to the cleaners.
The court argues that the interpretation of except must be made in the context of the situation. In the first example, it would be all right if Mary were also served cake. In the second example, it would not be all right if grandmother's delicate wedding dress were sent to the cleaners. It is argued that the context of the census is similar to the second example and so the exception means that you really cannot use sampling for apportionment purposes.
The Clinton government is appealing this ruling to the Supreme Court and the Supreme Court has agreed to accept the appeal and put it on the fast track so that a decision could come by March. Commentators on the NPR program point out that, in fact, the Congress has the last say on this issue.
There were lots of interesting letters to the editor on this issue. Here is our favorite:
To the Editor: Republicans claim that sampling cannot be used in the census because it violates the constitutional requirement of an "actual enumeration", interpreted to mean a literal head count (editorial, Aug. 25). There are ways to count other than adding 1 many times. For instance, multiplying the two sides of a checkerboard (eight squares each) would yield a total of 64 squares, whereas a hand count might yield 63 or 65 because of human error. Human intelligence plus a little brute force is often far more efficient and accurate than brute force alone. This is why statistical sampling is the superior way to carry out an "actual enumeration" of a large population. Just ask any Republican who relies on a poll or who takes a blood test rather than drain every drop from his body. BRIAN CONRAD Cambridge, Mass., Aug. 25, 1998
The writer is an assistant professor of mathematics at Harvard University.
The article by Weinstein is the only one we saw that pointed out that statisticians themselves are not in agreement about the sampling methods the census bureau plans to use. He writes:
The handful of statisticians who have mastered the ferocious mathematical and practical complexities are split over the usefulness of sampling. William Kruskal, former chairman of the Statistics department at the University of Chicago, speaks for many of his colleagues when he admits that "no one really knows".
Weinstein makes clear that this statement is in reference to sampling methods relating to the undercount problem and goes on to state that most experts do agree that the use of sampling to complete the census, after getting responses from 90% of the households is a sensible way to cut the soaring costs of conducting the census without sacrificing accuracy.
In his article, Weinstein makes a valiant effort to explain the concerns that some of the statisticians have about the sampling methods proposed for the undercount problem. However, these are difficult ideas and if you really want to understand what the issues are, you will be better off reading the following paper by David Freedman and his colleagues.
Planning for the census in the year 2000A more up-to-date version will be also available soon. We will give the reference in the next Chance News. While you are getting the census article we strongly recommend that you also get David's most recent paper: "From Association to Causation: some remarks on the history of statistics", Technical Report No. 521, August 1998. Here you will find Freedman's thoughts on this classic problem in the context of his discussion of famous experiments such as John Snow's demonstration that cholera is a waterborne infectious disease, and studies identifying the health hazards of smoking.
We remind readers that the discussion of the technical problems involved in the undercount question from the point of view of the Census people can be found in Tommy Wright's article in the May- June 1998 issue of the American Scientist (See Chance News 7.06) and in his presentation in our Chance Lectures Series.
In the second half of the NPR program there is a discussion of the need for scientists to explain the issues involved using sampling for the undercount problem in such a way that the public and congressmen can understand them. Shortly after this discussion, Fienberg is invited to explain the undercount method in terms of estimating the number of fish in a lake. Have your students listen to this and see if they get it.
DISCUSSION QUESTIONS:
(1) One letter to the editor went something like this: The next thing you know they will be proposing that we elect our President by the results of a poll. What do you think of this argument?
(2) What do you think about the courts' linguistic argument?
(4) The NPR program opens with the remark: The 1990 count missed almost 2% of the population. This meant that more than 8 million people were not counted. A listener called in and said he did not see how this could be unless the population of the U.S. had suddenly increased to 400 million. Steven Fienberg offered an explanation -- could you?
<<<========<<Brady states that the boundaries between investing and gambling are getting blurred. He remarks that the state government blurs it by trying to drag investing down to the level of gambling. He cites the ad:
Saving for a rainy day takes too long; you could win $50,000 instantly if you play the lottery.
As another example, he cites the discussion about changing social security to make it based on the stock market rather than Treasury Bonds. On the other hand, financial service people want to take advantage of the popularity of lotteries to sell their services.
Brady also discusses the innumeracy of the public and gives some examples. For his first example he says:
Innumerate people will prefer a 10% raise in a period of 15% inflation over a 5% raise in a period of 3% inflation.For his second example he writes:
Most people cannot do the odds. What is a better deal over a year? A 100% safe return with 5% interest or a 90% safe return with a 20% return. For the first deal, your return will be 5%. For the second, your return will be 8%.
He closes with the comment:
Remember that, buried beneath all the talk of betting, risk and odds, are two simple numbers that the state does not want you to think about. The lottery returns 50 cents on the dollar while the stock market returns 110 cents. Any Fool can see which is the best.
DISCUSSION QUESTIONS:
(1) In Brady's inflation argument, what is the real choice you are offered? Dana Williams remarks that he would certainly choose the 5% just to get the 3% inflation.
(2) Brady gets his 8% return for the 90% save return with a 20% return by asking you to consider investing $1000 10 times and to assume that 9 of the 10 times you win. Then you made $1800 and lost $1000 for a net return of $800 or 8% on your $10,000 investment. But does this answer the original question related to a single choice?
(3) Do you agree that the stock market returns 110 cents on a dollar?
<<<========<<
Tonsil removal improves children's grades
Baltimore Sun, 9 Sept. 1998
Associated Press
When you read the article you find that the variable lurking in the background is lack of sleep.
The article states that the connection between tonsils and children's abilities has long been known. An example of this is a study reported in 1889 in the British Medical Journal titled "The awkwardness and stupidity of children with large tonsils."
A study by Dr. David Gozal in the September issue of Pediatrics purports to explain this by showing that the real problem is sleep apnea which causes children to wake up many times during the night and leaves them feeling tired the next day. According to Gozal, sleep apnea can often be cured by removing the tonsils and adenoids. This, in turn, improves the student's grades.
For this study Gozal questioned the parents of 300 first-graders whose school performance was in the bottom tenth of their class. Symptoms of sleep apnea were found in 54 children. Parents of these children were advised to consult their doctors to consider having their children's tonsils and adenoids removed. The parents of 24 of these 54 children decided to have this done and 30 decided not to. The article says that, a year later, almost all the children who underwent surgery had improved their school performance an average of half a letter. The grades of the students who remained untreated remained the same. We found this a little confusing and so consulted the original to see what it meant.
In the original article the grade improvement is described as follows:
Grades were on a scale 0 to 4 with 2 a minimal passing grade and 2 to 2.5 representing poor performance.
For the 24 treated, the mean grade in first grade before treatment was 2.42 with sd .17 and in the second grade after treatment it rose to 2.87 with sd 19.
For the 30 not treated, the mean grade in first grade was 2.44 with sd 13 and in the second grade 2.46 with sd 15.
All those treated improved their scores from first to second grade although 2 were still in the lowest 10th percentile of their class.
DISCUSSION QUESTIONS:
(1) Do you find this a definitive study? What other questions would you like to have answered about the study to evaluate it?
(2) Is it true that almost all of those treated raised their score by an average of half a letter? How many would you estimate the number who did raise their grade by 1/2 a letter?
<<<========<<This article describes three mechanical processes for estimating e, gives the underlying theoretical justifications, and presents the results of computer simulations of the physical experiments.
One method uses derangements, permutations leaving no element in its original place: randomly permuting 10 elements 10^5 times, it uses the reciprocal of the proportion of outcomes yielding derangements as an estimate of e.
A second method tosses 10^5 darts at a board consisting of 10^5 equally likely target regions; by the binomial model and/or the Poisson approximation thereto, the ratio of 10^5 to the number of regions with no hits approximates e.
The last method involves shaking N salt particles at random onto a table with area A having a hole with area a on it. Then p = a/n is the probability that any one particle goes through the hole. After the shaking there will be a number N1 that did not go through the hole. These N1 particles are again shaken onto the table with a resulting N2 not going through the hole. This process is interated n times. Since each of the N particles has probability (1-1/p)^n of having not gone through the hole the expected value of Nn = N*(1-p)^n. If we choose n = 1/p, then Nn/N becomes an estimate for 1/e. For his experiments the author used N = 50,000. He chooses the area of the hole and the table so that p = 1/10,000 and e is estimated as N/Nn. This whole process is repeated 100 times resulting in 100 estimates for e.
The first two methods were simulated 1,000 times, the third one was repeated only 100 times. Results of simulations suggest that the second (dart board) method uses the fewest computer resources for a given level of accuracy and precision:
Method sample mean +/- sample standard error cpu time Derangements 2.7181 +/- 0.0004 1,969 Darts 2.7182 +/- 0.0002 186 Salt particles 2.7183 +/- 0.0010 32,896
(sample standard error = sample standard deviation divided by sqrt (number of simulations))
The author explains the differences in accuracy, precision, and relative effort among the three methods as follows:
The dart method involves only one parameter, i.e, the number of darts, N, which needs to be chosen very large to guarantee a reasonable estimate of e. The derangement method, on the other hand, involves two parameters, i.e., the number of objects in the array, N, and the number of permutations or shufflings. The former needs to be only moderately large (ten in our simulation), but the latter has to be very large to ensure sufficient convergence. This causes the derangement method to be less efficient than the dart method. Finally, the salt-shaker algorithm involves two parameters, i.e., the number of particles, N, and the number of iterations, n, both of which have to be very large. This causes the computations to be exceedingly slow and, consequently, renders the algorithm very inefficient. Finally, the salt-shaker algorithm involves two parameters, i.e., the number of particles, N, and the number of iterations n, both of which have to be very large. This causes the computations to be exceedingly slow and, consequently, renders the algorithm very inefficient.
DISCUSSION QUESTIONS:
(1) What is the distinction between slowness of computations and inefficiency of an algorithm? If they are distinct, why does the former imply the latter?
(2) Do the numbers of parameters really explain the differing effectiveness of the methods here, or are the choices of numbers of iterations, regions on a dartboard, elements permuted, etc. more fundamental?
(3) Readers may find it interesting to consider how one could estimate, in advance of the simulations, measures of their effectiveness. There is room for further study here.
<<<========<<The article opens with the assertion that, since the 1970s, virtually all income gains in the US have gone to households in the top 20% of the income distribution. This is the greatest inequality observed in any of the world's wealthy nations, a fact largely ignored in the current rosy picture of corporate profitability, widespread job creation and negligible inflation (note that the article appeared before our stock market meltdown of recent weeks!).
While previous writers have expressed moral concerns with the growing income inequality, there is now evidence of a medical downside as well. Research indicates that countries with more pronounced differences in incomes experience shorter life expectancies and greater risks of chronic illness. The risks are described as being as large in magnitude as those linked to more widely publicized factors, such as cigarettes or fatty foods.
As a historical perspective, the article cites a 20-year-old study of 17,000 British civil servants which found that the annual heart attack fatality rate was four times as high among clerks and messengers as for administrators, despite the fact that the clerks could afford reasonable housing and had access to national health care. Moreover, the effect persisted even among workers at the high end of the income distribution; for example, a senior statistician had twice the risk as did a chief statistician. This led Michael Marmot, a University of London epidemiologist, to conclude that factors beyond class-related differences in diet and smoking were involved. He suggested that job control and sense of security also played a role.
Richard Wilkinson, an economic historian at Sussex University took the analysis one step further, looking at health differences among different countries. He found that, among nations with gross domestic product at least $5000 per capita, one nation could have twice the per capita income of another yet still have a lower life expectancy. On the other hand, income equality emerged as a reliable predictor of health. The finding ties together a variety of international comparisons. For example, the greatest gains in British civilian life expectancy came during WWI and WWII, periods characterized by compression of incomes. By contrast, over the last ten years in Eastern Europe and the former Soviet Union, small segments of the population have had tremendous income gains while living conditions for most people have deteriorated. These countries have actually experienced decreases in life expectancy. Among developed nations, the US and Britain today have the largest income disparities and the lowest life expectancies. Japan has a 3.6 year edge over the US in life expectancy (79.8 years vs 76.2 years) even though it has a lower rate of spending on health care. The difference is roughly equal to the gain the US would experience if heart disease were eliminated as a cause of death!
The July 1998 issue of the "American Journal of Public Health" presents analogous data in comparisons of US states, cities and counties. Research directed by John Lynch and George Kaplan of the University of Michigan finds that mortality rates are more closely associated with measures of relative, rather than absolute, income. Thus the cities Bixoli, Mississippi, Las Cruces, New Mexico and Steubenville, Ohio have both high inequality and high mortality. By contrast, Allentown, Pennsylvania, Pittsfield Massachusetts and Milwaukee, Wisconsin share low inequality and low mortality.
DISCUSSION QUESTION:
It is easy to see how to compare mortality rates among communities. How do you think "income inequality" is measured?
<<<========<<Lamberth is a member of the psychology department of Temple University. In 1993, he was contacted by attorneys whose African- American clients had been arrested on the New Jersey Turnpike for possession of drugs. It turned out that 25 blacks had been arrested over a three-year period on the same portion of the turnpike, but not a single white. The attorneys wanted a statistician's opinion of the trend. Lamberth was a good choice. Over 25 years his research on decision-making had led him to consider issues including jury selection and composition, and application of the death penalty. He was aware that blacks were underrepresented on juries and sentenced to death at greater rates than whites.
In the article, he describes the process of designing a study to investigate this question. He focused on four sites between Exits 1 and 3 of the Turnpike, covering one of the busiest segments of highway in the country.
The first challenge was to define the "population" of the highway, so he could determine how many people traveling the turnpike in a given time period were black. Lamberth notes that Census data don't exist for this question. He devised two surveys, one stationary and one "rolling." For the first, observers were located on the side of the road. Their job was to count the number of cars and the race of their occupants during randomly selected 3-hour blocks of time over a two-week period. During 21 recording sessions. from, June 11 to June 24, 1993, his team conducted over 20 sessions, counting some 43,000 cars, 13.5% of which had one of more black occupants. For the "rolling survey", a public defender drove at a constant 60 mph (5 mph over the speed limit), counting cars that passed him as violators and cars that he passed as non-violators, noting the race of the drivers. In all, 2096 cars were counted, 98% of which were speeding and therefore subject to being stopped by police. Black drivers made up 15% of these violators.
Lamberth then obtained data from the New Jersey State Police and learned that 35% of drivers stopped on this part of the turnpike were black. He says "in stark numbers, blacks were 4.85 times as likely to be stopped as were others." He did not obtain data on race of drivers searched after being stopped. However, over a three year period, 73.2% of those arrested along the turnpike by troopers from the area's Moorestown barracks were black, "making them 16.5 times more likely to be arrested than others."
Lamberth's finding that blacks were being stopped at rates disproportionate both to their numbers on the road and their tendency to speed led to a March 1996 ruling by New Jersey Superior Court. Judge Robert E. Francis ruled that state police were effectively targeting blacks, violating their constitutional rights. Evidence gathered in the stops was suppressed.
Lamberth speculates that drug policy is the reason for police behavior in these situations. Testimony in the Superior Court case revealed the troopers' performance is considered deficient if they do not make enough arrests. Police training targets minorities as likely drug dealers, and in this sense the officers had an incentive to stop black drivers. But when Lamberth obtained data from Maryland (similar data has not been available from other states) he found that about 28% of drivers searched in that state have contraband, regardless of race. Why then, the perception that blacks are more likely to carry drugs? It turns out that, of 1000 searches in Maryland, 200 blacks were arrested compared to only 80 non-blacks. More blacks being arrested for drugs feeds the perception that they are the principal perpetrators. The problem is that the sample is biased: of those searched, 713 were black and 287 were non-black.
DISCUSSION QUESTIONS:
(1) How did Lamberth arrive at the figure that blacks were 4.85 times as likely to be stopped as others? What about the figure that blacks were 16.5 times more likely to be arrested?
(2) How do the data in the last paragraph show that the chance that a search will produce drugs does not depend on race?
<<<========<<In her column on May 31 of this year (discussed in Chance News 7.06), Marilyn gave a curious explanation of the margin of error for an opinion poll. Her conclusion was that "the published margin of error on a poll merely tells us the size of the sample." This provoked the following response from Kathleen Frankovic, writing on behalf of the American Association for Public Opinion Research: "Your answer about the source of a poll's margin of error was incomplete. When conducting a poll, we can calculate the error that comes from selecting a sample, if that sample is representative of all people. 'Margin of error...' refers to the sampling error."
Marilyn agrees with Frankovic's description but maintains that her original comments were correct. The guiding principle, she says, is that, the larger the sample the more accurate the poll. "So, if a poll is conducted properly, the published margin of error-- while it literally refers to all sampling errors--is more an indicator of the size of the sample than anything else."
Pollsters also objected to her comment that the margin of error is based on past polls. Here she amplifies that comment, explaining that she meant that the formulae applied to present polls are derived from experience with past polls.
DISCUSSION QUESTIONS:
(1) Does Marilyn understand the difference between accuracy and precision?
(2) What do you think she means by the phrase "while it literally refers to all sampling errors"?
(3) Do you agree that the margin of error is based on experience?
-----------Also in the present column, Marilyn responds to an anonymous reader who asked Marilyn to settle a dispute about whether a child's IQ can never be higher than his parents' IQs. Marilyn says that the child's IQ can indeed exceed his parents' but adds that the reader may be thinking of an effect known to statisticians as regression to the mean. In lay terms, she describes this as meaning that "within a given population, average intelligence appears stable, but people tend to be more average than exceptional." She asserts that, if both parents have IQs of 125, they are more likely to have a child with an IQ in the range 100-125 than an IQ exceeding 125.
DISCUSSION QUESTIONS:
(1) Comment on Marilyn's description of regression to the mean.
(2) Do you agree with her probability assessment? What assumptions are you making?
<<<========<<According to John Manning, an evolutionary biologist at the University of Liverpool, comparing the length of the fingers on a man's right hand with his left gives an indication of his fertility: less match between the fingers indicates fewer and less active sperm. Further information is available from the relative lengths of fingers on the same hand: men whose ring fingers are much longer than their index fingers tend to have higher levels of the male hormone testosterone. Although the claims may seem outrageous, it was biological evidence led Manning to consider the link. Experiments in mice showed that the same set of genes that control development of fingers and toes also control the ovaries and testes.
Manning reports running three separate tests. The first involved 100 men and women at a fertility clinic, the second involved 10 healthy men; and the third looked at 300 men and women who have children. Sometimes he tested fertility by measuring sperm or hormone levels, while other times he tried to gauge it after the fact by the size of the subject's families. But he says the link with finger size always showed up. His results were announced in "New Scientist", and he claims they have been accepted for publication in two scientific journals.
Reaction from the medical community is mixed. Doctors quoted in the article urged caution in interpreting the results and called for further review of the study. Dr. Merle Berger of Beth Israel Deaconess Medical says he is "dumbfounded by the whole thing because it's so complicated. If there were some truth to this it would be very subtle differences in length that would have to be accurately measured. It would not be just the way it looked."
The Globe article, meanwhile, closes on an even more skeptical note. It quotes a palm reader who says she too can tell people how many children they will have by looking at their hands!
DISCUSSION QUESTIONS:
(1) What problems do you see with Manning's samples? What about his definition(s) of fertility?
(2) How do you reconcile the headline of the article with the tone of the closing?
<<<========<<In past editions of Chance News, we have reviewed numerous conflicting reports on whether mammograms are beneficial for women in their 40s. Overall this age group has a relatively low risk for breast cancer, but the risk rises quickly with age. On the other hand, it has not been clear how to balance this concern with the high false positive rate reported for screening in this age group.
Now scientists at the National Cancer Institute have developed a formula to help women calculate whether their personal risk is high enough to justify having an annual mammogram. Writing in the "Journal of Clinical Oncology", Dr. Mitchell Gail and biostatistician Barbara Rimer explain that the method is designed to identify women in their 40s who, because of family history or other risk factors, have a least as great a chance of developing cancer as a woman in her 50s with no risk factors.
The paper describes two versions of the method, an "exact age" procedure and a "grouped age" procedure that uses two ages, the groups 40-44 and 45-49. In either case, women begin with a checklist of what are called "strong" risk factors: previous breast cancer; the BRCA1 or BRCA2 genes; a mother, sister or daughter with a history of breast cancer; abnormal cells found in a previous biopsy; 75% or more dense breast tissue at age 45-49; and two or more previous biopsies. A woman with none of these conditions then checks a table of weaker factors: age at menarche, number (0 or 1) of previous biopsies and age at which she first gave birth. Values from these factors are combined with the woman's age to compare her risk of developing cancer in the next year to the risk for a 50-year-old woman with no risk factors.
DISCUSSION QUESTION:
(1) What do you see as the advantages of having such a formula? Do you see any downside?
(2) Dr. Mary Burton, a physician affiliated with the Harvard Vanguard Medical Associates, is quoted as saying: "Any method that helps clarify clinical risk to patients will help patients make better decision for themselves." Do you agree?
<<<========<<This year, 38% of S.A.T. test-takers had 'A' averages, compared with 28% ten years ago. But S.A.T. verbal scores averaged 12 points lower and math scores 3 points lower than they were ten years ago. The disparity has led College Board president Donald Stewart to commission the Rand Corporation to study the trend. Stewart wants to know if it really represents positive changes in education. He worries that "it may also reflect greater focus on personal qualities instead of academic achievement."
Stewart speculates that the apparent grade inflation may be attributable in part to increased emphasis on teacher accountability. Teachers who are supposed to be improving may be covering themselves by giving out higher grades. This would be exposed by standardized tests. This interpretation was disputed by Bob Schaeffer of the Fairtest organization, a longtime critic of the College Board. He argues that high school record is a better predictor of college performance than the S.A.T.
Beyond the overall average, Stewart noted two other disturbing trends. First, suburban students are improving their S.A.T. scores, while urban and rural students are falling behind. There is now a 30-point gap between urban and suburban students. Second, the scores for children with less education are falling further below the national average.
The article is accompanied by a graphic entitled "Keeping Track: Suspicious Growth of A's" Here are the data:
A+ GRADE AVERAGE PERCENTAGE OF STUDENTS GETTING A+ GRADE AVERAGE (1988): 4 PERCENTAGE OF STUDENTS GETTING A+ GRADE AVERAGE (1998): 7 AVERAGE S.A.T. SCORES -- VERBAL (1988): 625 AVERAGE S.A.T. SCORES -- VERBAL (1998): 615 AVERAGE S.A.T. SCORES -- MATH (1988): 632 AVERAGE S.A.T. SCORES -- MATH (1998): 629 A GRADE AVERAGE PERCENTAGE OF STUDENTS GETTING A GRADE AVERAGE (1988): 11 PERCENTAGE OF STUDENTS GETTING A GRADE AVERAGE (1998): 15 AVERAGE S.A.T. SCORES -- VERBAL (1988): 582 AVERAGE S.A.T. SCORES -- VERBAL (1998): 569 AVERAGE S.A.T. SCORES -- MATH (1988): 586 AVERAGE S.A.T. SCORES -- MATH (1998): 582 A- GRADE AVERAGE PERCENTAGE OF STUDENTS GETTING A- GRADE AVERAGE (1988): 13 PERCENTAGE OF STUDENTS GETTING A- GRADE AVERAGE (1998): 16 AVERAGE S.A.T. SCORES -- VERBAL (1988): 554 AVERAGE S.A.T. SCORES -- VERBAL (1998): 542 AVERAGE S.A.T. SCORES -- MATH (1988): 556 AVERAGE S.A.T. SCORES -- MATH (1998): 554 B GRADE AVERAGE PERCENTAGE OF STUDENTS GETTING B GRADE AVERAGE (1988): 53 PERCENTAGE OF STUDENTS GETTING B GRADE AVERAGE (1998): 48 AVERAGE S.A.T. SCORES -- VERBAL (1988): 495 AVERAGE S.A.T. SCORES -- VERBAL (1998): 483 AVERAGE S.A.T. SCORES -- MATH (1988): 490 AVERAGE S.A.T. SCORES -- MATH (1998): 487 C GRADE AVERAGE PERCENTAGE OF STUDENTS GETTING C GRADE AVERAGE (1988): 19 PERCENTAGE OF STUDENTS GETTING C GRADE AVERAGE (1998): 13 AVERAGE S.A.T. SCORES -- VERBAL (1988): 442 AVERAGE S.A.T. SCORES -- VERBAL (1998): 430 AVERAGE S.A.T. SCORES -- MATH (1988): 431 AVERAGE S.A.T. SCORES -- MATH (1998): 428 (Source: The College Board)
DISCUSSION QUESTIONS:
(1) From the data in the table, how can you see that the percentage of test-takers with 'A' averages, has grown by 10%? How do you see that the verbal scores have fallen by 12 points?
(2) What do you think of the Fairtest argument presented in the article? If college grades are also being inflated, wouldn't you expect inflated high school grades be a good predictor of first year college grades?
This work is freely redistributable under the terms of the GNU General Public License as published by the Free Software Foundation. This work comes with ABSOLUTELY NO WARRANTY.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!