No Title

SCI199Y: October 24, 1995

Required for next week

Short project 2 (with extension allowed until November 7)

More on the polls

The following table is taken from the Globe & Mail, Wednesday, October 18 and Saturday, October 21. It seems to be the definitive summary of the polls, to date, although it's not 100% consistent with earlier published stories.

The numbers that make the headlines have the undecideds allocated to either yes or no. There was an excellent discussion of this in the G& on October 18 (A5). SOM allocates the undecideds in the percentages observed in the decided voters; approximately 50-50. Leger & Leger allocates them based on their answers to other questions in the poll and on demographic factors; about 70% to no. This results in a substantial boost to the reported No vote. The poll corresponding to the last line in the above table was reported as a headline: ``Yes 50.2, No 49.8, poll suggests". (The poll of size 2020 has a margin of error of approximately 2.2%; the poll of size 1820 has a margin of error of approximately 2.3%. The other polls have a margin of error of 3.1%)

In other news this week...

``Bouchard slip helps to boost dollar": G& Oct. 17,B1
``Dollar plunges on fears of Yes vote": G& Oct. 21,A1
``Poll pratfall of '80 not applicable now": G& Oct. 17,A4. ``On May 18, 1980, two days before the first referendum, Dimanche-Matin published the following banner headline: Yes 52%, No 48%. The ultimate result was very different: Yes 40%, No 60%." The article goes on to say that ``polling has become far more sophisticated since 1980. In the infamous Dimanche-Matin survey, 24 per cent of people described themselves as undecided. And this 48 hours before The Day? No way. Then the prediction was exacerbated by dividing the undecided evenly between the two camps. In fact, more than 80 per cent of the undecided voted No."
``Trade winds that blow across Canada": G& Oct. 16. Nice graphic showing interprovincial and international trade balances, by province. Ontario has a huge overall trade surplus (twice the closest contender, Alberta) entirely created by interprovincial trade surplus. (Ontario has the largest international trade deficit of all the provinces.)
``Council bans smoking in Toronto food courts" Councillor Peter Tabuns, chairman of the city's Board of Health, is quoted as saying: ``With more than 3,000 Canadians dying each year of second-hand smoke, it is certainly a public health issue". Where do you think this number came from?
``When racial categories make no sense": G& Oct. 21,D8. Statistics Canada will ask a question on race in the next census (see handout, Sept. 12): this article describes biologists' viewpoints that race does not account for a meaningful proportion of genetic variability between individuals. The main research referred to is a study of Richard Lewontin carried out in 1972.

What is the bell curve anyway

One way to describe a measurement that varies in a population is to quote the frequency of each possible measurement in that population. (Think of the students lined up on the lawn of Penn. State, according to their height.) With many types of measurements, as you get a larger and larger population, with frequencies measured on a finer and finer grid, the plot of the frequencies will start to look like a curve of a very predictable form. (Think of getting 5 times as many students lined up by height classes, and taking a photo from an airplane. Then 500 times as many students, seen from the space shuttle...)

The frequency curve has a particular mathematical expression, in standard form, as:

and looks like this:

It is a surprising, but true, fact that a wide variety of potential measurements tend to follow this curve, once the measurements are converted to standard units. It is a theorem, known as the central limit theorem, that the frequency distribution of measurements that are averages will always follow this curve, as the number of things being averaged increases. (With a bit of a stretch, you could imagine a particular person's height being determined by a number of effects that are averaged: genetic makeup, pre-natal nutrition, post-natal nutrition, etc.. IQ's are typically computed by averaging over a number of test items.)

The frequency curve given above has the property that about 2/3 of its mass is contained in the interval

and about 95% of it mass is contained in the interval

. In other words, 95% of the measurements, heights, IQ scores, whatever, will fall within 2 standard units of the center point. The center point is called the mean, and the standard unit is called the standard deviation.

In fact, my analogy above to looking at the students lined up by height, from space, doesn't quite work, unless we think of lining up either all females, or all males. Even from space we'd probably see two humps.

Even when measurements do have a frequency distribution that is exactly described by the bell curve, they can vary enough to look surprising. On p.172 of VDQI, Tufte shows 12 sets of 50 measurements from the bell curve. What is plotted is the actual frequency distribution for each set of 50 measurements.

About this document ...

The command line arguments were:
latex2html -split 0 lec6.tex.

The translation was initiated by Marie K. Snell on Thu Nov 16 15:35:47 CST 1995

laurie.snell@chance.dartmouth.edu