Toward a Firm Foundation for Statistical Interpretation

Statistical interpretation of DNA typing evidence has probably yielded the greatest confusion and concern for the courts in the application of DNA to forensic science. Some courts have accepted the multiplication rule based on the grounds of allelic independence, others have used various ad hoc corrections to account for nonindependence, and still others have rejected probabilities altogether. Some courts have ruled that it is unnecessary even to test allelic independence, and others have ruled that allelic independence cannot be assumed without proof. The confusion is not surprising, inasmuch as the courts have little expertise in population genetics or statistics.

In reaching a recommendation on statistical interpretation of population frequencies, the committee balanced the following considerations:

DNA typing should be able to provide virtually absolute individual identification (except in the case of identical twins), provided that enough loci are studied and that the population-genetics studies are developed with appropriate scientific care. The importance of this long-term goal justifies substantial investment in ensuring that the underlying population-genetics foundation is firm.
Statistical testimony should be based on sound theoretical principles and empirical studies. Specifically, the validity of the multiplication rule in any application depends on the empirical degree of population differentiation for the loci involved. Adequate empirical data must be collected, and appropriate adjustments must be made to reflect the remaining uncertainties.
It is feasible and important to estimate the degree of variability among populations to determine ceiling frequencies for forensic DNA markers and to evaluate the impact of population substructure on genotype frequencies estimated with the multiplication rule.
Careful population genetics is especially important for the development and use of databanks of convicted-offender DNA patterns. Whereas the comparison of an evidence sample to a single suspect involves testing only one hypothesis, the comparison of a sample to an entire databank involves testing many alternative hypotheses. Special attention must thus be paid to the possibility of coincidental matches.

On the basis of those considerations, the committee reached conclusions, which now will be discussed.

Population Studies to Set Ceiling Frequencies

In view of the long-term importance of forensic DNA typing, the population-genetics foundation should be made as secure as possible. Accordingly, population studies should be promptly initiated to provide valid estimation of ceiling frequencies, as described above. Specifically, variation in allele frequencies should be examined in appropriately drawn random samples from various populations that are genetically relatively homogeneous. The selection, collection, and analysis of such samples should be overseen by the National Committee on Forensic DNA Typing (NCFDT) recommended in Chapter 2.

Given the effort involved in drawing appropriate population samples and the continuing need to type new markers as the technology evolves, the samples should be maintained as immortalized cell lines in a cell repository; that would make an unlimited supply of DNA available to all interested investigators. We note that preparation of immortalized cell lines through transformation of Iymphoblasts with Epstein-Barr virus is routine and cost- effective. Transformation and storage can be handled as contract services offered by existing cell repositories, such as the NlH-supported repository in Camden, N.J.

Such a cell repository would be analogous to that of the international consortium Centre d'Etude du Polymorphisme Humain (CEPH)[28] created in 1983. It holds some 1,000 samples from 60 reference families, which are used for genetic mapping of human chromosomes. The cell lines have played an essential role in the development of the human genetic-linkage map. The existence of a common resource has also promoted standardization and quality control through the ability to recheck samples. (We should note that the CEPH families themselves are not appropriate for studying population frequencies, because they represent closely related people in a small number of families.)

Substantial benefits will accrue to forensic DNA typing through the availability of a reference collection that can be maintained at an existing facility like the ones at the Coriell Institute of Medical Research and the American Type Culture Collection. Although there is an initial investment in collecting, transforming, and storing cells, the cost will be more than repaid in the broad and continued availability of well-chosen samples for population studies of newly developed DNA typing systems and the ability of investigators to confirm independently the DNA typing that was done in another laboratory.

Reporting of Statistical Results

Until ceiling frequencies can be estimated from appropriate population studies, we recommend that estimates of population frequencies be based on existing data by applying conservative adjustments:

1. First, the testing laboratory should check to see that the observed multilocus genotype matches any sample in its population database. Assuming that it does not, it should report that the DNA pattern was compared to a database of N individuals from the population and no match was observed, indicating its rarity in the population. This simple statement based on the counting principle is readily understood by jurors and makes clear the size of the database being examined.

2. The testing laboratory should then calculate an estimated population frequency on the basis of a conservative modification of the ceiling principle, provided that population studies have been carried out in at least three major "races" (e.g., Caucasians, blacks, Hispanics, Asians, and Native Americans) and that statistical evaluation of Hardy-Weinberg equilibrium and linkage disequilibrium has been carried out (with methods that accurately incorporate the empirically determined reproducibility of band measurement) and no significant deviations were seen. The conservative calculation represents a reasonable effort to capture the actual power of DNA typing while reflecting the fact that the recommended population studies have not yet been undertaken. The calculation should be carried out as follows.

For each allele, a modified ceiling frequency should be determined by (1) calculating the 95% upper confidence limit for the allele frequency in each of the existing population samples and (2) using the largest of these values or 10%, whichever is larger. The use of the 95% upper confidence limit represents a pragmatic approach to recognize the uncertainties in current population sampling. The use of a lower bound of 10% (until data from ethnic population studies are available) is designed to address a remaining concern that populations might be substructured in unknown ways with unknown effect and the concern that the suspect might belong to a population not represented by existing databanks or a subpopulation within a heterogeneous group. We note that a 10% lower bound is recommended while awaiting the results of the population studies of ethnic groups, whereas a 5% lower bound will likely be appropriate afterwards. In the context of the discussion of the ceiling principle, the higher threshold reflects the greater uncertainty in using allele frequency estimates as predictors for unsampled subpopulations.

Once the ceiling for each allele is determined, the multiplication rule should be applied. The race of the suspect should be ignored in performing these calculations.

Regardless of the calculated frequency, an expert should-given with the relatively small number of loci used and the available population data- avoid assertions in court that a particular genotype is unique in the population. Finally, we recommend that the testing laboratory point out that reported population frequency, although it represents a reasonable scientific judgment based on available data, is an estimate derived from assumptions about the U.S. population that are being further investigated.

As an example, suppose that a suspect has genotype A1/A2, B1/B2 at loci A and B and that three U.S. populations have been sampled in the current "convenience sample" manner and typed for these loci. The likelihood of a match for this two-locus genotype would be estimated as follows:

Loci A and B combined [2(0.10)(0.156)][2(0.10)(0.249)] = 0.001554

a The upper 95% confidence limit is given by the formula

, where

is the observed frequency and

is the number of chromosomes studied.

A frequency of 0.001554 corresponds to about 1 in 644 persons. Addition of two loci with about the same information content would yield a fourlocus genotype frequency of about 1 in 414,000 persons. of course, if fewer than four loci were interpretable, as is common in forensic typing, the estimated genotype frequency would be much higher.

Significantly more statistical power for the same loci will be available when appropriate population studies have been carried out, because the availability of data based on a more rigorous sampling scheme will make it unnecessary to take an upper 95% confidence limit for each allele frequency nor to put such a conservative lower bound (0.10) on each allele frequency. Assuming that the population studies do not reveal significant substructure, the 5% lower bound recommended earlier should be used.

Finally, once appropriate population studies have been conducted and ceiling frequencies estimated under the auspices of NCFDT, population frequency estimates can be based on the ceiling principle (rather than the modified ceiling principle discussed above). Such calculations can never be perfect, but we believe that such a foundation will be sufficient for calculating frequencies that are prudently cautious-i.e., for calculating a lower limit of the frequency of a DNA pattern in the general population. In addition, new scientific techniques (e.g., minisatellite repeat codings [29]) are being and will be developed and might require re-examination by NCDFT of the statistical issues raised here.

Our recommendations represent an attempt to lay a firm foundation for DNA typing that will be able to support the increasing weight that will be placed on such evidence in the coming years. We recognize that a wide variety of methods for population genetics calculations have been used in previous cases-including some that are less conservative than the approach recommended here. We emphasize that our recommendations are not intended to question previous cases, but rather to chart the most prudent course for the future.

Openness of Population Databanks

Any population databank used to support forensic DNA typing should be openly available for reasonable scientific inspection. Presenting scientific conclusions in a criminal court is at least as serious as presenting scientific conclusions in an academic paper. According to long-standing and wise scientific tradition, the data underlying an important scientific conclusion must be freely available, so that others can evaluate the results and publish their own findings, whether in support or in disagreement. There is no excuse for secrecy concerning the raw data. Protective orders are inappropriate, except for those protecting individual's names and other identifying information, even for data that have not yet been published or for data claimed to be proprietary. If scientific evidence is not yet ready for both scientific scrutiny and public re-evaluation by others, it is not yet ready for court.

Reporting of Laboratory Error Rates

Laboratory error rates should be measured with appropriate proficiency tests and should play a role in the interpretation of results of forensic DNA typing. As discussed above, proficiency tests provide a measure of the false-positive and false-negative rates of a laboratory. Even in the best of laboratories, such rates are not zero.

A laboratory's overall rate of incorrect conclusions due to error should be reported with, but separately from, the probability of coincidental matches in the population. Both should be weighed in evaluating evidence.

Next: Summary of Recommendations Up: No Title Previous: Laboratory Error Rates

laurie.snell@chance.dartmouth.edu