Class 9: Coke vs. Pepsi: Analyzing the Results.
When you design an experiment like this you should ask several questions. First,
what do you want to test? Do you want to test if a person can tell given a single
cup whether it contains Coke or Pepsi? Can a person decide which of two cups is
Coke and which is Pepsi? Can a person given two cups simply decide if they have the same or
different drinks? These are all testing slightly different abilities.
You might think of the experiment as trying to settle an argument say between Linda
and Laurie. Linda claims she can tell the difference between Pepsi and Coke and
Laurie claims she cannot. Now Linda does not claim she can do it every time in a
series of tests but rather can get it right more times that just by guessing. There are two
kinds of errors we can make in our experiment. The first type, called a type I error,
is that Linda could establish her claim when in fact she is just guessing and was
lucky. The second kind of error, called type II error, occurs if Linda really does have
the ability she claims but just has a bad day and does not get enough correct to
establish her claim. Laurie wants to be sure that the chance of a type I error is
small and Linda wants to make sure the chance of a type II error is small.
Let's consider first the group 2 experiment. The experimenters gave a single taster
6 cups known to contain either Pepsi or Coke. The taster was not told how many had
Coke and how many had Pepsi. In fact 3 cups contained Coke and 3 contained Pepsi.
Now suppose we required that the taster get all 6 correct to establish the claim.
Since the taster was not told how many cups contained Coke and how many contained
Pepsi, it is reasonable to assume that if the taster was just guessing what a single
cup contained, they would have a 50% chance of being correct. It is harder to say
just what having the ability means. One simple solution is to ask the taster what percentage
of the time they would expect to get it right. If the answer is 80% then we might
say that if the taster's claim is correct, they would have an 80% chance of being
correct on a single cup.
Then the probability of a type I error is (1/2)^6 = .015. The probability of a type
II error is 1-.8^6 = .74. This shows that it is obviously unfair to the taster to
require all correct. Thus we should consider changing the requirement to getting,
say 5 or more correct.
********************************************************************
We should consider this kind of analysis for all the experiments. Here is the information
we have about the experiments that were carried out in class last time.
Group 1: One taster did three sets of three cups. each set contained one cup of Coke,
one of Pepsi, and one of RC Assume the order was always P, C, RC in each case.
actual:
P, C, RC
P, C, RC
P, C, RC
taster reported:
RC, C, P
RC, C, P
P, RC, C
*******************************************************************
Group 2: A single taster was given 6 cups total: 3 of Coke, 3 of Pepsi. The taster
was not told how many cups there were of each.
actual:
P C P P C C
taster reported
P C C P P C
********************************************************************
Group 3: There were three tasters. Each taster was given 6 cups and not told how
many cups contained Coke and how many contained Pepsi. The experimenters deliberately
used 4 cups of one drink and 2 of the other. The same cups were used for each taster.
actual:
C C P C C P
taster 1:
P C P C C P
taster 2:
P C C C C P
taster 3:
P C P C C P
********************************************************************
Group 4:
The taster was given 3 sets of 2 cups.
actual:
P,C P,C C,P
taster:
C,P C,P C,C
********************************************************************
Group 5:
This group had two tasters each given 3 sets of 2 cups. The content of each cup was
determined by picking one piece of paper out of two folded pieces, one of which said
Pepsi and the other said Coke (so one pair of cups could consist of two Cokes, two
Pepsis, or one of each). The tasters were told how the contents of cups were picked
actual:
P,C C,P C,P
taster 1:
P,C C,P C,P
taster 2:
P,C C,P C,P
Note: We have decided to have the chance fair where you present
your project the last day of the reading period Tuesday May 14
instead of during the final exam period.
Monday we will have a guest speaker John Paulos author of the
best selling book "A mathematician reads the newspaper".
Linda's Laborious Solutions to the Discussion Questions from Class 6
I'll try to work the discussion problems and see if that helps you do the first
journal question for Class 7 (which is supposed to be the same idea, with
different numbers).
These written solutions are no substitute for coming to precepts or office
hours or discussing the problems among yourselves, and I encourage you to do
these things, too.
Here is how I work out the discussion problems.
- 1) the drug test
Suppose you have a large group of people who take the drug test, say X people.
(You can plug in 100,000 for X if you prefer.) These people can be divided
into 4 categories:
- i) drug users who test positive on the drug test
- ii) drug users who test negative on the drug test
- iii) non-users who test positive on the drug test
- iv) non-users who test negative on the drug test
I'll figure out how many are in each category.
- i) there are about .05*X drug users, and about 95% of them test positive, so
there are about .95*.05*X drug users who test positive
- ii) there are .05*X drug users and 5% of them test negative, so there are
.05*.05*X drug users who test negative
- iii) there are .95*X non-users, 5% of which test positive, so there are
.05*.95*X non-users who test positive
- iv) there are .95*X non-users, 95% of which test negative, so there are
.95*.95*X non-users who text negative
Now, if you test positive for drug use, you must be in group i) or iii). There
are a total of .95*.05*X + .05*.95*X = 2*.95*.05*X people in groups i) and iii)
(since there is no overlap, I can just add the numbers). Of these people, only
the people in group i) are actually drug users. There are .95*.05*X people in
group i). So out of the 2*.95*.05*X people who test positive for drug use,
.95*.05*X actually use drugs. If you test positive, the chance that you use
drugs is .95*.05*X / 2*.95*.05*X, or 1/2.
- 2) Now I'll try to do the second discussion question the same way. If I
remember right, the problem suggests you look at a large sample of college
students, say 100,000.
I could just use X instead of 100,000, as above, but I will try using 100,000.
Again, there are 4 categories:
- i) students in this sample who have HIV and test positive
- ii) students in this sample who have HIV and test negative
- iii) students in the sample who don't have HIV and test positive
- iv) students in the sample who don't have HIV and test negative
I'll figure out how many people are in each category
- i) about .002*100,000 students in the sample have HIV, and about 99.8% of them
test positive, so there are about .998*.002*100,000 students in this category
- ii) .002*100,000 students in the sample have HIV and .2% test negative, so
there are .002*.002*100,000 students in this category
- iii) .998 * 100, 000 students in the sample don't have HIV, and .2% of them
test positive, so there are .002*.998*100,000 students in this category
- iv) similarly, there are .998*.998*100,000 students in this category
Again, if you test positive for HIV, you must be in groups i) or iii). There
are a total of .998 * .002 * 100,000 + .002 * .998 * 100,000 people in groups
i) and iii) since the two groups don't overlap. Only those in group i)
actually have HIV. There are .998*.002*100,000 people in group i). So if you
test positive, your chances of having HIV are .998*.002*100,000/(.998 * .002 *100,000 + .002 * .998 * 100,000) or 1/2 again.
Somoeone asked in class about the probability that a person who test
negative is really HIV-free. Since there are
.002*.002*100,000 + .998*.998*100,000 people who test negative (groups ii
and iv), and .998*.998*100,000 of them are HIV-free (group iv), this
probability comes out as
(.998*.998*100,000)/(.002*.002*100,000 + .998*.998*100,000)
or .99999
Recall that there is a delay from the time a person is infected with the HIV
virus and the time it would show up on a test. Let's say this is 3 months.
Then this last probability should be interpreted as the probability that a
person who tests negative today did not have the HIV virus three months ago.
People who have reason to be concerned about recent contacts with the
virus are encouraged to be retested at a later time.
Now, suppose you are in a different risk group in which 5% of the people have
HIV.
Taking a large random sample of this group (say 100,000) and dividing up into
the same 4 categories, you have again:
- i) people in this sample who have HIV and test positive
- ii) people in this sample who have HIV and test negative
- iii) people in the sample who don't have HIV and test positive
- iv) people in the sample who don't have HIV and test negative
The numbers work out differently in this case.
- i) .998*.05*100,000
- ii) .002*.05*100,000
- iii) .002*.95*100,000
- iv) .998*.95*100,000
So if you are in groups i) or iii), then the chance you are in group i) is
.998*.05*100,000/(.998*.05*100,000 + .002*.95*100,000)
= .998*.05/(.998*.05 + .002*.95) = .0499/(.0518) = ~ .98 = 98% (pretty high)
---------------------------
The journal question should be about the same, though you have to figure out
some of the numbers to use yourself. The answer should come out somewhere in between 50% and 98%.
Linda
Some comments from Laurie:
Recall that P(A|B) means the probability of A given that B is true.
The AIDS example in terms of conditional probability amount to the following:
You are given
P(+test | HIV positive)
and
P(- test | HIV negative)
(both .998 in our case) and you want to know
P(HIV positive | + test)
In general P(A|B) is not equal to P(B|A)
For example, consider two tosses of a coin and let A be the event that both
tosses are heads and B the event that the first toss is a head. Then P(A|B) =
.5 and P(B|A) = 1.
To find one of these conditional probabilities from the other you need also to
know P(A) and P(B) (actually their ratio is sufficient). In the AIDS example
this amounts to knowing the probability the patient is HIV positive before the
test is performed.
In law, mixing these two probabilities up is called the "prosecutor's paradox".
A prosecutor will often have a reasonable estimate of
P( the evidence | the accused is innoncent)
and then incorrectly state this probability as
P(the acused is innoncent | the evidence)
because that is what the jury wants to know.
For example, in the Simpson trial the DNA experts give a very small probability for the probability of a DNA match for a person chosen at random say in the Los Angeles area. This is
P(match ! Simpson is innocent)
but this is not the same as
P(Simpson is innocent | match).
Also, don't forget Linda's three general comments about AIDS testing.
1. These calculations show that if a patient is not from a high risk group so
that the probability before testing of being HIV positive is small, then a
single positive Elisa test will not result in a high probability that the
patient is HIV positive. It will if the patient is initially from a high risk
group.
2. In practice when a lab has a positive test it carries out another Elisa
test and then a Western blot test. If all three are positive it is reported
that the patient is HIV positive.
3. The two Elisa tests cannot be assumed to be independent since they are
usually carried out on the same blood sample and there may be something other
than the HIV virus in the blood that results in a positive test. The Western
blot test is more specific to the Aids virus and so it is more reasonable to
assume it is independent of the other tests.