MALS 178 - CHANCE, Fall 1996

Instructors:

 
 Claudis Herrion                            Laurie Snell
Office hours Wed. 11:30-12:30 Office hours Mon. 4:00-5:00
Chance Lab in Choate House Chance Lab in Choate House

Content

Course Description

Class 1 Lotto

Class 2 Teenagers and smoking

Class 3 PSAT bias and JMP

Class 4 JMP Lab and Descriptive Statistics

Class 5 Do Prisons Reduce Crime and Standard Deviation

Class 6 Birthday Problem and Coincidences

Class 7 Binomial Distribution

Class 8 Margin of Error

Class 9 Polling and Binomial coefficient

Class 10 Math Review

Class 11 Surveys and Confidence Intervals

Class 12 Correlation and Regression

Class 13 Regression and Environmental News

Class 14 Conditional Probability and False Positives / Finishing Regression

Class 15 Tversky and Streaks

Class 16 ESP and Hypothesis Testing

Class 17 Chi Squared

Class 18 Prenatal Testing and Card Shuffling

Class 19 Shuffling, JMP and Games of Chance


Course Description



Welcome to Chance!

Chance is an unconventional math course. The standard elementary math course develops a body of mathematics in a systematic way and gives some highly simplified real-world examples in the hope of suggesting the importance of the subject. In Chance, we will choose serious applications of probability and statistics and make these the focus of the course, developing concepts in probability and statistics only to the extent necessary to understand the applications. The goal is to make you better able to come to your own conclusions about news stories involving chance issues.

Topics that might be covered in Chance include:

During the course, we will choose a variety of topics to discuss, with special emphasis on topics currently in the news. We will start by reading a newspaper account of the topic in a newspaper likeThe New York Times or The Boston Globe. We will read other accounts of the subject as appropriate, including articles in journals like Chance magazine,Science, Nature, and Scientific American, and also original journal articles. We will supplement these articles by readings on the basic probability and statistics concepts relating to the topic. We will use computer simulations and statistical packages to better illustrate the relevant theoretical concepts.


Organization

The class will differ from traditional math classes in organization as well as in content: The class meetings will emphasize group discussions, rather than the more traditional lecture format. Students will keep journals to record their thoughts and questions. Additional homework will be assigned regularly. There will be a major final project in place of a final exam.

Scheduled meetings

The class meets Tuesday and Thursday from 5:30 to 7:30 p.m. in 217 Dartmouth Hall. On October 15, however, we will meet in the 1930 Room in Rockefeller Center.


Discussion groups

Discussions are central to the course and usually focus on a current article in the news. They provide a context in which to explore questions in more depth and understand material better by explaining it to others.

Every member of each group is expected to take part in these discussions and to make sure that everyone is involved: that everyone is being heard, everyone is listening, that the discussion is not dominated by one person, that everyone understands what is going on, and that the group sticks to the subject.

Text

The text for the course is Statistics, second edition by Freedman, Pisani, Purves and Adhikari (FPPA), available from the Dartmouth Bookstore and Wheelock Books. Students will also learn to use the JMP statistical package that is available from the public server as a key served application.

Journals

Each participant should keep a journal for the course. This journal will include:

A good journal should answer questions asked, and raise questions of your own, with evidence that some time has been spent thinking about the questions. In addition, there should be evidence of original thought: evidence that you have spent some time thinking about things that you weren't specifically asked about. In writing in your journal, exposition is important. If you are presenting the answer to a question, explain what the question is. If you are giving an argument, explain what the point is before you launch into it. What you should aim for is something that could communicate to a friend or a colleague a coherent idea of what you have been thinking and doing in the course.

We encourage you to cooperate with each other in working on anything in the course, but what you put in your journal should be your own alone. If it is something that has emerged from work with other people, write down who you have worked with. Ideas that come from other people should be given proper attribution. If you have referred to sources other than the texts for the course, cite them.

Journals will be collected and read on these dates:

Thursday   10   October 
Thursday 24 October
Thursday 7 November
Thursday 21 November
Tuesday 3 December

Homework

To supplement the discussion in class and assignments to be written about in your journals, we will assign readings from your text FPPA, together with accompanying homework. When you write the solutions to these homework problems, you should keep them separate from your journals. Homework assignments will be assigned once a week and should be handed in on Thursdays.

Final project

We will not have a final exam for the course, but in its place, you will undertake a major project. This project may be a paper investigating more deeply some topic we touch on lightly in class. Alternatively, you could design and carry out your own study. Or you might choose to do a computer-based project. To give you some ideas, a list of possible projects will be circulated. You can also look at some previous projects on the Chance Database. However, you are also encouraged to come up with your own ideas for projects.

Chance Fair

At the end of the course we will hold a Chance Fair, where you will have a chance to present your project to the class as a whole, and to demonstrate your mastery of applied probability by playing various games of chance. The Fair will be held during the final examination time assigned by the registrar.

Resources

Materials related to the course will be kept on our web site and on Kiewit PUBLIC server (PUBLIC: Courses & Support: Academic Departments & Courses: Math: Chance). In addition supplementary readings will be kept on reserve in Baker Library.


Class 1 Lotto

September 26, 1996

Class Discussion

How to play NH Powerball

Homework:

1. Journal assignment: Find two articles in recent newspapers that are relevant to the course. In your journals, describe what each is about in one or two paragraphs. Come up with at least 2 questions that each article raises for you. (Due Tuesday)

2. Read Chapters 1 and 2 from FPPA. Do the review exercises at the end of Chapter 2 on page 22. (Due Thursday)


Class2 Teenagers and smoking

October 1, 1996


1. Base Group Discussion

2. Group Discussion

Read the article "Study Finds Stunted Lungs in Young Smokers".

Homework:

Journal Assignment: Read the article from the New England Journal of Medicine on teenagers and smoking. Are there any new things you learn from reading it?

Reminder, due Thursday: Read Chapters 1 and 2 from FPPA. Do the review exercises at the end of Chapter 2 on page 22.

*********************************************************

NOTE: David A. Kessler, Commissioner of the Food and Drug Administration (FDA) will examine health issues and the FDA's evolving policy concerning tobacco in a lecture titled "Tobacco Policy and Children" to be given on Thursday, October 10, 7:30 PM, Cook Auditorium, Murdough Center. We encourage you to attend and will finish class a few minutes early so that we can walk over together.


Class 3 PSAT bias and JMP

October 3, 1996


1. Introduction to JMP (Statistical Program)

2. Measurement Bias

3. Read NYT Article "College Board Revises Test to Improve Chances for Girls"

Homework:

From FPPA: (due Thursday)
Journal Assignment:
Learn how to use JMP.
What kinds of conclusions can you draw from the survey conducted in class?

For class next Tuesday, please bring in a graph--somewhat unusual ones preferred. You can look in books, newspapers, magazines, reports, some document from your work, or anyplace else!

****NOTE**** We will be meeting in the computer classroom in the basement of Kiewit at 5:30 on Tuesday 10/8/96. During the second half we will return to our regular classroom.

*********************************************************

Reminder: David A. Kessler, Commissioner of the Food and Drug Administration (FDA) will examine health issues and the FDA's evolving policy concerning tobacco in a lecture titled "Tobacco Policy and Children" to be given on Thursday, October 10, 7:30 PM, Cook Auditorium, Murdough Center. We encourage you to attend and will finish class a few minutes early so that we can walk over together.


Class 4 JMP Lab and Descriptive Statistics

October 8, 1996


1. JMP Laboratory

Back to Dartmouth 217

2. Homework
Review difficult homework problems from last week.

3. Measures of Central Tendency.

3. Histograms.

4. Standard Deviations (if time)

**********************************************************

HOMEWORK CHANGE:
In Chapter 3, you ONLY need to do the following review exercises 1,3,7-10. (Homework for Chapter 4 and 6 is unchanged.)

Journal assignment.
Read the Gould's article "The Median isn't the Message." Comment on it in your journal.

Reminder:
Thursday is the MALS party (before class) and Kessler talk (after class.)

Measurements of Central Tendency.

1. Turn to a neighbor and discuss the following:

2. For the following data, decide which measure of central tendency would be most appropriate to use:

a. Table 1 (From Statistics Without Tears )
Method of Transport Number of Students.
Bicycle 15
Foot 12
Bus 9
Motorcycle 6
Car 5
Train 3
-----------------------------------------------------------------
Total 50


b. Here are two different groups of 5 people's yearly income. Find the median and the mean for each. Which is a better measure of central tendency in this case, and why?

X $30,000 $38,000 $42,000 $57,000 $73,000
Y $30,000 $38,000 $42,000 $57,000 $244,000

c. You are given the ages of five bus passengers as:
Under 12, 22, 48, 54, over 65 years.
What measure of central tendency would you use?


Class 5 Do Prisons Reduce Crime and Standard Deviation

October 10, 1996


1. Class Discussion: Do Prisons Reduce Crime?

2. Standard Deviation

3. Video and Discussion of Stephen Jay Gould's "Why the Death of .400 Hitting Records Improvement of Play" (from his Full House, Harmony Books, 1996).

*********************************************************

Journal assignment:

Comment on the Star Tribune's review of Gould's Full House (handed out in class.)

Homework:

Read chapters 13 and 14 ("What Are the Chances" and "More about Chance" in FPPA. Do the even Review Exercises in chapter 13, and odd Review Exercises in chapter 14.


Class 6 Birthday Problem and Coincidences

October 15, 1996


1. Introduction to the Birthday Problem.

2. Small Group Discussion of Birthday Problem (groups of four).

3. The Mathematics of the Birthday Problem.

4. Video clip: VVhat is Probability? (Against All Odds)

5. Coincidences in Air Crashes.

6. Coin Tossing Experiment.

**********************************************************

Journal assignment:

Think of coincidences in your own life. What is the likelihood of these being random chance occurrences?


Class 7 Binomial Distribution

October 17, 1996


1. Coke vs. Pepsi-- Small Group Activity.

a. Break into groups of four.

b. Identify a member of your group who claims to be able to tell the difference between Pepsi and Coke. (Coke Classic, that is; accept no substitutes!)

c. Design an experiment to test whether this is true. Remember that one swallow doth not a summer make: Don't certify your taste-tester just on the basis of one taste. Write down exactly what data you will collect and what you will do with the data before you start collecting it.

d. What is being tested?

e. Carry out the experiment.

f. Record your results.

2. Binomial Distribution.

**********************************************************

Homework assignment: (Due Thursday)

FPPA--


Class 8 Margin of Error

October 22, 1996


Margin of Error

1. The CNN Tracking Poll for October 19-20 interviewed 732 likely voters. They reported that 55% favored Clinton, 34% favored Dole and 6% favored Perot with a sampling error of + or - 4% (sampling error is also called margin of error).

2. The New York Times often puts at the end of an article about a poll, an explanation of how their poll was carried out. In a recent poll of 1,166 people, the article stated that the margin of error was 3%. In their explanation of how the poll was carried out they explained the margin of error by the statement "In theory, in 19 cases out of 20 the results based on such samples will differ by no more than three percentage points in either direction from what would have been obtained by seeking out all American adults." 3. Read the NYT article "Use of Daily Election Polls Generates Debate in Press" (Oct. 4, 1996). Consider the following questions:
**********************************************************
Journal Questions

(1) Read the NYT article "Misreading the Gender Gap" by Carol Tavris (September 17,1996), What do you think of her explanation of the gender gap in the current election.

(2) How would you explain "margin of error" to a friend who had not had a statistics course?


Class 9 Polling and Binomial coefficient

October 24, 1996


Part I.

Speaker: Tami Buhr from Harvard University will speak on her experiences in polling.

Part II.

Review of mathematical ideas. Binomial coefficients. Homework problems.

**********************************************************

Homework Assignment: (due Thursday, 10/31/96)

Read Chapter 19, 20, 21. Do exercises:

(We will be discussing a couple of the ideas from Chapter 18 in class. If you miss the class or need additional support, you may want to look through that chapter.)

Journal assignment:

Comments and reflections on speaker's talk.

*************** IN ADDITION: ********************

Please hand in on a separate sheet of paper a description of your plans for your final project. (due next Thursday)


Class 10 Math Review

October 29, 1996

Part I.

Review of Mathematics

Part II.

Stephen Jay Gould talk in Cook Auditorium.

**********************************************************

Journal assignment:

Handouts:

(1)Project suggestions

(2)Comments on journals

(3)Standard Error and Normal Approximation

by John Finn (optional reading)


Class 11 Surveys and Confidence Intervals

October 31, 1996


Part I.

Guest Speaker: Nancy Mathiowetz will speak on Surveys and Data Collection

Part II.

Confidence Intervals and Standard Deviation

**********************************************************

Homework Assignment:

(For those who need a review of how to plot lines, find slopes, etc., read Chapter 7)


Class 12 Correlation and Regression

Tuesday, November 5, 1996


1. An introduction to correlation and regression.

2. Cookie Experiment.

3. Class Discussion: You be the judge: did regression analysis reveal a voting fraud, and was the fraud decisive?

Read "Probability Experts May Decide Pennsylvania Vote" (The New York Times, April 11, 1994).

Discussion Questions:

*************************************************************

Journal Assignment:

Look for a couple of articles in the news that use statistics or probability. Summarize the article and talk about 2 or 3 questions the article raises for you.

REMEMBER Journals are due this Thursday, November 7.


Class 13 Regression and Environmental News

Thursday, November 7, 1996


1. Elizabeth Bankert will taLk about guidelines for carrying out projects that involve human subjects.

2. Guest Speaker:

Bob Braille who is an adjunct professor in Environmental Studies and has written for the Boston Globe will tank about reporting on environmental issues.

3. More on Regression.

*************************************************************

Homework Assignment:

Journal Assignment:

Read the two articles about Electromagnetic Fields and health risks. Comment on the differences between the two articles.


Class 14 Conditional Probability and False Positives / Finishing Regression

Tuesday, November 12, 1996

Class Discussion: HIV Testing and False Positives

1. In one of Marilyn vos Savant's columns in Parade Magazine the following question was asked.

Suppose we assume that 5% of the people are drug-users. A test is 95% accurate, which we'll say means that if a person is a user, the result is positive 95% of the time; and if she or he isn't, it's negative 95% of the time. A randomly chosen person tests positive. Is the individual highly likely to be a drug-user?

Marilyn's answer was:

Given your conditions, once the person has tested positive, you may as well flip a coin to determine whether she or he is a drug-user. The chances are only 50-50.

How can Marilyn's answer be correct?

2. An article in the New York Times some time ago reported that college students are beginning to routinely ask to be tested for the AIDS virus.

The standard test for the HIV virus is the Elisa test that tests for the presence of HIV antibodies. It is estimated that this test has a 99.8% sensitivity and a 99.8% specificity. 99.8% specificity means that, in a large scale screening test, for every 1000 people tested who do not have the virus we can expect 998 people to have a negative test and 2 to have a false positive test. 99.8% sensitivity means that for every 1000 people tested who have the virus we can expect 998 to test positive and 2 to have a false negative test.

The Times article remarks that it is estimated that about 2 in every 1000 college students have the HIV virus. Assume that a large group of randomly chosen college students, say 100,000, are tested by the Elisa test. If a student tests positive, what is the chance this student has the HIV virus? What would this probability be for a population at high risk where 5% of the population has the HIV virus?

If a person tests positive on an Elisa test, then another Elisa test is carried out. If it is positive then one more confirmatory test, called the Western blot test, is carried out. If this is positive the person is assumed to have the HIV virus. In calculating the probability that a person who tests positive on the set of three tests has the disease, is it reasonable to assume that these three tests are independent chance experiments?

Journal Assignment:

Read and comment on the Manchester, NH Union Leader story "Exit Poll Wrong Call in Senate Race Leaves Anger, Hurt, Red Faces." There are a couple of discussion questions at the end of the article.


Class 15 Tversky and Streaks

Thursday November 14, 1996

Part l--Guest Speaker.

Jamshed Barucha from the Psychology Department will speak on Judgment under Uncertainty.

Part 2- Streaks in Sports

  • Class Activity on streaks. See handout.

  • Class discussion:

    Do you believe in streaks?
    What do you mean by streaks?
    How would you recognize streaky behavior?

  • Read the NYT article "'Hot hands' phenomenon: a myth?"

  • Statistical examples of the three different kinds of behavior. (random, streaks, averse)


    More Discussion:

    What would it take to believe in streaks?
    What would it take to not believe in them?

    ***********************************************************************
    Homework Assignment:

    Chapter 26, Review ex. 2, 5.
    Chapter 28, Review ex. 2, 3.
    Chapter 29, Review ex. 1, 2, 4.

    Journal Assignment:

    Read the Discover article "Decisions, Decisions" and comment on it.


    Class 16 ESP and Hypothesis Testing

    Tuesday November 19, 1996

    1. ESP Experiment

    2. More on Streaks

    What is a streak? How would you recognize streaky behavior?
    Computer simulations.

    *************************************************************

    Journal Assignment:

    Read the article on ESP "They Laughed at Galileo Too", NYT, August 11, 1996 and comment on it in your journal.


    Class 17 Chi Squared

    Thursday November 21, 1996

    Part I-- Guest Speaker.

    Charlie Lewis from ETS will talk about security problems on SAT exams.

    Part II-- Chi Squared Test

    
    	    Left Handed 		Right Handed		Total
    	________________________________________________________
    	Men
    
    	Women
    
    	Total
    
    
    *************************************************************

    No more homeworks from the text! Time to really focus on your projects. We will collect your journals at the end of the term instead of next week.


    Class 18 Prenatal Testing and Card Shuffling

    Tuesday November 26, 1996
    Part I -- Video: The Burden of Knowledge

    Discussion Questions:

  • 1) Is more knowledge an asset in the case of prenatal testing?
  • 2) Three reasons for having prenatal testing were discussed in Burden:
    prevention, preparation, and reassurance. How valid are those reasons?
  • 3) What information do people have a responsibility to access?
    a) Is it irresponsible for a woman to refuse prenatal testing?
    b) Is she morally responsible to access all information or only that which is inexpensive'? Or only that which poses no lisk to the fetus?
  • 4) What is the quality of the information given'?
    a) How is "increased risk" for fetal anomaly like or different from "increased risk" for having some inherited trait such as breast cancer, depression, pattern baldness.
    b) What is the relevance of variation determined by diagnostic tests'? What are the advantages and disadvantages of gaining information about the fetus'?

    Part II -- Card Shuffling

    I ) Read the NYT article, 1/9/90 , In Shuffling Cards, 7 Is Winning Number.

    2) ShufNing activity.

    A boring game of solitaire, which I call Yin/Yang, shows that 7 ordinary riffle shuffles, followed by a cut, of a 52-card deck are not enough to make every permutation equally likely.

    Hearts and Clubs are called the Yin suits, and Diamonds and Spades are called the Yang suits. We shuffle the deck of cards 7 times, then cut it, and then start removing and revealing each card from the top of the deck, making a new pile of them face-up (so if this were all we did, we'd just have the deck unchanged after going through it once, except that the deck would be lying face-up on the table).

    We start the pile for each suit when we discover its ace, and add cards of the same suit to each of these 4 piles, according to the rule that we must add the cards of each suit in order.

    Thus a single pass through the deck is not going to accomplish much in the way of completing the 4 piles, so having made this pass, we turn the remaining deck back over, and make another pass.

    We continue this until we complete either the two Yin piles (hearts & clubs), or the two Yang piles (diamonds & spades). If the Yin piles get completed first, we call the game a win; it's a loss if the Yang piles get completed first.

    If the deck has been thoroughly permuted (by having put the cards through a clothes dryer, say), then the Yin and Yangs will be equally likely to be first to get completed. Thus our expected proportion of wins will be 1/2.

    *************************************************************
    Journal Assignment:

    Discuss your reactions to the video, and respond to some of the discussion questions in today's handout.


    Notes on second journal assignments.

    Everyone seemed to appreciate Tami Burh's talk and learned a lot from it. You had mixed feelings about Steven Jay Gould's talk. Some felt he was unnecessarily rude and egotistical. Most thought his ideas were interesting and it was worthwhile hearing his talk.

    You raised some interesting questions about coincidences and you all had had plenty of incidents that you think of as coincidences. Much has been written about whether more "coincidences" have happened they should by chance alone and we are not going to settle this issue easily. However, it is important to bear in mind our earlier discussion about the different between the probability of a specific event and the probability of this event or a similar event sometime during a longer period, for example, during your lifetime. For example, one of you mentioned that you and your mother both had dreams the same night that involved animals with weird colors. This, by itself, would seem very unlikely but perhaps to have in a lifetime to a dream very similar in some weird way to someone else on the same night is not so strange.

    It is possible to make models to show more concretely how the an event on a particular day can have very small probability but "once in a lifetime" not so small. The famous psychiatrist Jung was one of those who believed that coincidences occulTed more often than they could be expected to by chance. He mentioned once that he was struck by hearing references to fish 6 times during a 24 hour period. We can make a model to see how unlikely this would be in a lifetime. We have to specify how often on average you hear a story involving fish. The time between fish stories is random . It might be 2 hours or 2 weeks. Let's assume the average time between fish stories is one week. Then you can write a program to simulate this process where the time between stories is random with average time between stories one week. Then run this program for the equivalent of 40 years and record whether or not there is a 24 hour period that contains 6 or more fish stories. Finally repeat this many times to estimate the chance that 6 or more fish stories in a single day occurs during a 40 year period.

    Amy asked if you could make a map of acquaintances in a small town and then try to see who knew who. This kind of a problem is a favorite of mathematicians called "graph theorists" They draw a line between each two people who are know each other. Then there is a "path" between two people a and b if you can go from a to b following these lines. It is natural to ask for the smallest number k such that there is a path between x and y of length less than k for every pair of people x and y in the town. If it is a small town you might guess that there is a path of length at most 3 between any two people. In the movie "Six Degrees of Separation" it was suggested that everyone in the world is connected by a path of length at most 6. Just to show you how berserk mathematicians can get over such problems, here are some remarks from my friend David Gliffeath's home page.

    Apparently, sometime within the past few years, MTV talk show host Jon Stewart devised a parlor game in which contestants are to link any movie star to Kevin Bacon by a chain of films that share performers. Thus we imagine actors as vertices in a large network, with edges between any two who have been in a movie together. The goal is to find the path of minimal length connecting x to Kevin Bacon, that length then being the Bacon number, which we denote here as B(x). Quite a cult has grown up around this pastime, as witnessed by more than a dozen web pages now devoted to the game. My favorite link to Bacon fanatics is The Center of the Hollywood Universe.

    Now Brett Tjaden and Glenn Wasson at the University of Virginia have automated the calculation of B(x) at their Web site, The Oracle of Bacon. Their program, which makes use of the marvelous Intemet Movie Database (IMDB), will compute the Bacon number of any performer you care to specify within a few seconds. Hard-core cultists seem threatened by the power of the Oracle, but the rest of us welcome this automation of what can be an arduous evaluation. For instance, B(Bara, Theda) = 3, and in fact the Oracle confirmed an outstanding conjecture that for any x from the United States, either B(x) is at most 4 or B(x) is infinite.

    Mathematicians have had their own version of this story for many years, centered around the Hungarian number theorist and combinatorist Paul Erdos, where the links are formed by joint authorship of research papers.

    Sue wondered about the coincidence of getting an even dollar amount for the total charge of your groceries. This is an interesting problem. You are adding up several chance events that represent the individual costs for your items. You could make some assumption about the probability distribution for each individual item. For example, the probability that it is 17 cents etc. It is reasonable to assume that your total bill will be more than a dollar. I think you could then show that the probabilities that the last two digits are 00,01,02,..,99 are approximately the same so you could conclude that the chance that your total bill is exactly x dollars for some x is about 1 in one hundred. More interesting is the probability that the leading digit of your bill is 1,2,3,...,9. (If you bill is $14.89 then the leading digit is l). It has been found that the distribution of the leading digits in nature are not equally likely but rather follow a logarithmic distribution. Here is an account of how this tact has been used to find people who cheat on their income tax.

    Mark Negrini, who teaches accounting at St. Mary's University in Halilax, wrote his PhD thesis on: "The detection of income evasion through an analysis of digital distributions". He has persuaded business and government people to use Benford's law to test suspicious financial records such as bookkeeping checks and tax returns. Benford's law states that the distribution of the leading digit in data sets is typically not equi-distlibuted hut rather given by the distribution p(k) = log(k+l) - log(k) for k = 1,2,...,9. (The leading digit of .0034 is 3, of 243 is 2 etc.). This gives the probabilities .301, .176, .125, .097, .079, .067, .058, .052, .046 for the chance that 1,2,3,4,5,6,7,8, or 9 will be the leading digit.

    Numerous explanations for this have been given but perhaps the most persuasive is that Benford's distribution is the unique distribution for the leading digits that is not changed by a change of units, i.e. multiplying the data by a constant c. Negrini's idea is that, if we are honest, the numbers in our tax returns and on our checks should satisfy Benford's law and if they do not there may he some skullduggery.

    The article states that "Mr. Negrini has also lent his expertise to federal and state tax authorities, otficials in Denmark and the Netherlands and to several companies. He has even put President Clinton's tax returns to the Benford's Law test. When he analyzed the president's returns for the past 13 years he found that 'the returns by Clinton follow Benford's Law quite closely"'.

    Your explanations to your Uncle George as to what the "margin of er or means" indicates that there is still some confusion about what it mean.s The margin of elTor includes only the sampling error and not the other kinds of er ors that Tami talked about such as elTors caused by non-response. You will find that it is typically about l/sqr(n) cor esponding to this estimate of two standard er ors that we discussed. For example, for a sample of 200, l/sqr(200) = .0707 so this would be reported as a 7 percent margin of error. You have also be careful not to tell Uncle George that the error will be no more than 7 percent since even George will realize that it is possible for the poll to really screw up. A couple of you described it in terms of what would happen when you toss a coin, say 100 times and look at the proportion of heads that comes up. We can say with 95% confidence that this number will be between 40 and 60. Thus if our estimate the probability for heads coming is the proportion of heads in the sample we can be quite certain we are not off by more than 10% for the true probability of heads. I think George would be best served by just giving him the "box" that is included at the end of the New York Times poll reports.

    How the Poll Was Conducted

    The latest New York Times/ CBS News Poll is based on telephone interviews conducted Feb. 22 to 24 with 1,223 adults throughout the United States.

    The sample of telephone exchanges called was randomly selected by a computer from a complete list of active residential exchanges in the country. The list of more than 36,000 residential exchanges is maintained by Marketing Systems Group of Philadelphia.

    Within each exchange, random digits were added to folm a complete telephone number, thus permitting access to both listed and unlisted numbers. Within each household, one adult was designated by a random procedure to be the respondent for the survey.

    The results have been weighted to take account of household size and number of telephone lines into the residence and to adjust for variations in the sample relating to geographic region, race, sex, age, and education.

    In theory, in 19 cases out of 20 the results based on such samples will differ by no more than three percentage points in either direction from what would have been obtained by seeking out all American adults.

    For smaller subgroups the potential sampling error is larger. For example, it is plus or minus five percentage points for those who say they are likely to vote in a Republican primary or caucus this year.

    In addition to sampling error, the practical difficulties of conducting any survey of public opinion may introduce other sources of error into the poll. Variations in question wording or the order of questions, for instance, can lead to somewhat different results.

    So you see, as usual, your journals raised as many interesting questions as they solved which is a sign of good journals


    Class 19 Shuffling, JMP, and Games of Chance

    Tuesday, December 3, 1996

    Part I -- Card Shuffling

    1) Read the NYT article, 1/9/90, "In Shuffling Cards, 7 Is Winning Number."

    2) Shuffling activity.

    A game of solitaire, which we call Yin/Yang, shows that 7 ordinary riffle shuffles, followed by a cut, of a 52-card deck are not enough to make every permutation equally likely.

    Hearts and Clubs are called the Yin suits, and Diamonds and Spades are called the Yang suits. We shuffle the deck of cards 7 times, then cut it, and then start removing and revealing each card from the top of the deck, making a new pile of them face-up (so if this were all we did, we'd just have the deck unchanged after going through it once, except that the deck would be lying face-up on the table).

    We start the pile for each suit when we discover its ace, and add cards of the same suit to each of these 4 piles, according to the rule that we must add the cards of each suit in order.

    Thus a single pass through the deck is not going to accomplish much in the way of completing the 4 piles, so having made this pass, we turn the remaining deck back over, and make another pass.

    We continue this until we complete either the two Yin piles (hearts & clubs), or the two Yang piles (diamonds & spades). If the Yin piles get completed first, we call the game a win; it's a loss if the Yang piles get completed first.

    If the deck has been thoroughly permuted (by having put the cards through a clothes dryer, say), then the Yin and Yangs will be equally likely to be first to get completed. Thus our expected proportion of wins will be 1/2.

    Part II -- Review of Tests in JMP

    Part III- Games of Chance