Background

The Scholastic Aptitude Test, published by the Educational Testing Service (ETS), is frequently used by colleges to aid in the selection of incoming freshmen. It is also used as a criteria for the awarding of many academic scholarships. The test is categorized into three areas: math, verbal, and the test of standard written English (TWSE). Colleges are primarily interested in the math and verbal scores for admission and may use the TWSE score for college course placement in English. Supposedly, the test measures aptitude, not intelligence or high school performance. When used as an evaluation tool along with other factors such as high school grade point average, grades, etc., the SAT ideally should help predict how students will perform in college . The ETS stresses that the SAT should not be used ``as [the] sole basis for important decisions affecting the lives of individuals, when other information of equal or greater relevance... [is] available" (Lubetkin, 77).

Recently, there has been controversy over the validity of the test because studies have revealed that men consistently score higher on the test than women. In the past, women scored higher than men on the verbal portion of the test while men outscored women on the math section. However, the gap between men's and women's math scores has been increasing despite greater encouragement for women to study math and science. Also, the edge women used to have in the verbal section has disappeared and men have started to outscore women on that section too. On a scale from 200 to 800 for each part, in 1992, male college-bound seniors averaged 9 points higher on the verbal section than female college-bound seniors and 43 points higher on the math section (Lewis, ETS). The more troubling differential lies in the math section because of the magnitude and increasing gap between men's and women's performance.

The main question researchers struggle with is why women have better high school and freshman college grades and yet perform less well than men on the SAT. A study by Susan Gross, conducted in Montgomery County schools in Maryland, asserted that girls who took the same advanced math courses as boys and who earned higher grades, had lower math SAT scores by a differential of 33 to 52 points. The effects of the differential carry over into higher education because using the SAT in the admission process risks underpredicting women's performance in college. It is likely that a woman with the same SAT score as a man will get better grades her freshman year than him (Brush, 409).

There are many theories as to why men are constantly and increasingly outscoring women on the SAT. These studies have been done on the topic by independent researchers or commissioned researchers that have a stake in the results. The ETS commissioned its own study and published its results in 1988 on the relationship between shifts in demography and the widening score gap between men and women. The report analyzes sex score difference using background variables such as ethnicity, socioeducational status, basic high school curriculum, and proposed college major (Burton, 1). However, the research was limited to those factors because researchers gathered their data from an ETS student survey that accompanied the SAT. The survey data is tainted by the problems of non-response, dishonest response, old information, and vague categorizations. Students, who take the SAT as juniors, fill out the questionnaire and then may not update the information if they take it again as seniors. Therefore, the information on courses taken and possible college majors may be inaccurate. The ``vague categorization" refers to the fact that questionnaire asks questions such as ``How many years of a math taken?" instead of the type of math taken (4).

With the results from the 1975, 1980, and 1985 SATs, the researchers performed a series of multiple regressions to weigh the impact of the ``background variables". After adjusting for lower income, minority status, fewer years of math and science, and other related factors, the researchers could not account for the declining trend of women's SAT scores in relation to men's and accordingly threw out their hypothesis that the gap was related to shifts in demography. By acknowledging a score differential between sexes and ruling out their original contention as a cause of score discrepancy, the researchers assigned the blame to ``the way young women are being educated" (14). The report's conclusions shy away from further analysis of SAT content and structure and instead suggests future studies at the high school level in curriculum and gender differences in mathematical skills (1, 14).

In 1992, Howard Wainer and Linda Steinberg of the ETS released the results of their own follow up study. In this study, the researchers employed a retrospective analysis to measure the difference between men's and women's math SAT scores. Instead of predicting performance based on given SAT scores, these researchers took a sample of 47,000 college men and women, according to the type of math course taken and the grade received, and then predicted what each sex's math SAT score should be. They found that men scored, on average, 33 points higher on math retrospectively while their prospective research suggested that men outscored women by an even larger difference (Wainer, 323). The retrospective method is of little use to the college admission process since the college wants to predict how a student will do before he or she is accepted (328). The researchers tried the two methods to provide different perspectives on an old problem but did not draw inferences from the varying size of the methodical point gaps because the two methods were in fact distinct (329, 332).

Again, the ETS researchers do not speculate on the cause of the differential but the report is radical in that it acknowledges the abuse of the use of SAT scores and that such abuse can be detrimental to women. The abuse includes overweighing the SAT or using it as sole criterion in some admission processes. Also, the test is employed as a proficiency cut-off for some college math courses and scholarships which is a potential misuse(330). The researchers suggest three factors which could cause the differential but do not peruse them in this study. The factors could be: that different selection mechanisms by sex (guidance, role models, etc.), the possibility that the math SAT favors men, or that the grading practices in first year math courses favors women (331). The researchers conclude that despite intensive studies previously performed on these possible causes, it is impossible to uncover the truth because of biased sampling, i.e. the impossibility of obtaining and tracking a truly random sample. Toward the end of the report Wainer and Steinberg pose questions to society and suggest social control in correcting the SAT bias through external methods such as awarding equal numbers of scholarships to men and women by creating selection pools based on sex (333).

Both ETS reports were vague and avoided making any concrete assessment of the potential inequity within the SAT. In 1989, Phyllis Rosser decided to take a closer look at SAT inequity and did an investigative report for the Center For Women Policy Studies in which she scrutinized test content. Rosser asserted that women lost the edge they had in the verbal section because the ETS gradually changed the SAT format to have more questions about science, business, and ``practical affairs" and lessened the number of questions on the arts, human relations, and humanities (Rosser, 29). She studied the SAT through item analysis and her own survey results of a sample of students taken from the Princeton Review coaching class. The uniformity of the sample was chosen to explore sex differentiation while controlling income and educational background (30).

Rosser found seven verbal items that significantly favored one sex or the other but also found ten math items which significantly favored men. Of the ten biased math items, three were specifically about boys' enterprises which were concerned with a basketball win/loss record and activities at a boys camp(32). She also concluded that test taking anxiety and time pressure were not factors that added to the score gap but did note that the emphasis placed on the SAT and women's lower performance on it may lead to problems in self-esteem and discourage potential selective college choices for women (44,45). She suggested that the ETS restructure the test and establish a new proofreading method that better detects ``loaded" questions (46).

Arguments have been raised that women get better grades than men in college because they take easier courses and concentrate in the humanities or history. However, the Massachusetts Institute of Technology performed a study on women's course loads and compared their grades with their SAT scores and found that ``women hold their own across subject areas, even though their SAT scores are lower" (Rowe, 23). To compensate for the underprediction based on SAT scores, MIT adjusted its admission process to give greater weight to other criteria. Since 1980, the women admitted to MIT score an average of 20 to 25 points lower on the math sections of the SAT but have had higher cumulative grade-point averages in 11 of 21 majors, including math, science, and computer sciences (Brush, 409).

Other colleges beside MIT adjust women's scores formally or informally but for the most part, women are unaware that the adjustment practice takes place and at which colleges it takes place at. Since test scores largely influence where women apply, adjustment will not help if women are discouraged from applying to selective schools in the first place (Rowe, 23).

Another problem associated with the underprediction of women's performance by the SAT, pertains to the awarding of merit scholarships based on SAT scores. The National Merit Scholarship uses the Preliminary Scholastic Aptitude Test (PSAT), which a student takes his or her junior year, for first round screening of qualified students. This puts women at the risk of missing the cut-off because of lower scores. The awarding of National Merit Scholarships demonstrates the inequity by the fact that two-thirds of the annual scholarships consistently go to men (Wainer, 333).

New York State used to award Empire State and Regent Scholarships based solely on SAT scores but in a 1989 district court decision, a judge ruled that the method of awarding had to be changed because the practice discriminated against women and therefore violated Title IX and the equal protection clause of the Constitution (Walker, 2). Judge Walker was satisfied by expert testimony that the SAT was not designed to measure academic performance, which the scholarships are supposed to be awarded for. He also stated that misusing the SAT to measure academic performance underpredicted women's academic performance as compared to men's (2).

An important aspect of the case was that the state had used different procedures for awarding the scholarships in prior years. Some interesting results follow:

In 1988, under the procedure using a combination of grades and SATs weighted equally, women received substantially more Regents and Empire Scholarships than in all prior years in which the SAT had been the sole criterion. In both 1987 and 1988, young women comprised approximately 54 percent of the applicant pool for the scholarship, yet the results in 1988 when grades and SATs were used were markedly different. the results are summarized as follows:

When GPAs were used in 1988, the mean GPAs were: 85 for females and 84.4 for males. (Walker, 8)

The statistics in the paragraph above and other studies question the validity of using the SAT in the awarding of scholarships or the college admission process at all. Middlebury College, Union College, Bates College, and a few other colleges have abandoned the SAT as an admission evaluation tool. The admission director of Bates said that the college did not stop using the SAT because of discrimination but had done a study and found ``the test didn't tell them anything they couldn't learn equally well from high school grades and achievement tests" (Rowe, 23).

Furthermore, a study done at the University of Pennsylvania, published in 1992, concluded that class rank and achievement tests were significant in college grade prediction while the SAT made a relatively small contribution to prediction. The study emphasized a greater reliance on class rank and achievement tests as effective and reasonable grade predictors. The researchers also suggested that predictive validity of SAT scores might not be significant enough to warrant its use and the financial costs it incurs (Baron and Norman, 1054).

If the SAT is of little statistical prediction value to begin with, then its overemphasized and its frequent misuse in college admissions and scholarship awarding is detrimental to women because the test has been proved to underpredict women's academic performance. Given the SAT bias problems, is the SAT worth the millions of dollars families spend on it each year? Can the use of the SAT be justified in the college admissions process and scholarship awarding, knowing that the test is frequently abused by evaluators? By placing value on the SAT, are higher educational institutions ignoring inequity between the sexes? Do any of the studies performed on the SAT bias issue reveal the truth?

Next: Statistical Ideas Up: Sex Bias and the Scholastic Aptitude Test Previous: No Title

laurie.snell@chance.dartmouth.edu