                                                                      NOT PRECEDENTIAL

                        UNITED STATES COURT OF APPEALS
                             FOR THE THIRD CIRCUIT


                                       No. 09-3512


                          SHARYN STAGI, individually and on
                           behalf of all others similarly situated;
                                   WINIFRED LADD,
                                                        Appellants

                                             v.

                         NATIONAL RAILROAD PASSENGER
                          CORPORATION, t/d/b/a AMTRAK


                        Appeal from the United States District Court
                          for the Eastern District of Pennsylvania
                              (D.C. Civil No. 2-03-cv-05702)
                         District Judge: Honorable Anita B. Brody


                                   Argued May 28, 2010

                         Before: McKEE, Chief Judge,
                    RENDELL and STAPLETON, Circuit Judges.

                                 (Filed : August 16, 2010)




Ari R. Karpf, Esq.
Karpf, Karpf & Virant
3070 Bristol Pike
Building 2, Suite 231
Bensalem, PA 19020
Timothy M. Kolman, Esq.
Michael F. Mirarchi, Esq.
Timothy M. Kolman & Associates
414 Hulmesville Avenue
Penndel, PA 19047

Scott M. Lempert, Esq.
Alan M. Sandals, Esq. [ARGUED]
Sandals & Associates
One South Broad Street
Suite 1850
Philadelphia, PA 19107
  Counsel for Appellants

Sarah Andrews, Esq.
Morgan, Lewis & Bockius
301 Grant Street
One Oxford Centre, Suite 3200
Pittsburgh, PA 15219

James E. Bayles, Jr., Esq.
Morgan, Lewis & Bockius
77 West Wacker Drive
6th Floor
Chicago, IL 60601

William J. Delany, Esq. [ARGUED]
Morgan, Lewis & Bockius
1701 Market Street
Philadelphia, PA 19103
  Counsel for Defendant


                             OPINION OF THE COURT


RENDELL, Circuit Judge.

      Plaintiffs Sharyn Stagi and Winifred Ladd brought a class action against the



                                           2
National Railroad Passenger Corporation (“Amtrak”), asserting that a company policy

requiring all union employees to have one year of service in their current position before

they could be considered for promotion has a disparate impact on female union

employees in violation of Title VII of the Civil Rights Act of 1964, 42 U.S.C. § 2000e,

and the Equal Protection component of the Due Process Clause of the Fifth Amendment.

The District Court, presented with motions for class certification and for summary

judgment, granted summary judgment in favor of Amtrak, finding that “the plaintiffs’

evidence of disparate impact lack[ed] both statistical and practical significance,” thus

concluding that “the plaintiffs have failed to make out a prima facie case of

discrimination under Title VII.” Stagi v. Nat’l R.R. Passenger Corp., Civ. No. 03-5702,

2009 WL 2461892, at *1 (E.D. Pa. Aug. 12, 2009) (Stagi II).

       Although it is a close call, we will reverse and remand for further proceedings

consistent with this opinion.

                                             I.

       At issue in this case is Amtrak’s policy referred to as the “one-year blocking rule.”

Under that rule, a union member must be in her current union position for at least one

year in order to be eligible for promotion into a management position. The policy states,

“[a]n agreement covered employee may not apply for a posted non-agreement covered




                                             3
position unless he or she has been in his or her current union for one year.” App. 299.1

The rule has no exceptions. The rule was first promulgated on May 1, 1994 and was

revised in September 2000, which revision was in force during the time period relevant

for this case.

       Plaintiffs Stagi and Ladd are long-time Amtrak employees who have been

employed in both its union and management ranks during their careers. Stagi began her

career at Amtrak in 1973 as a reservation and information clerk, and eventually worked

her way up to various union positions until the early 1990s, when she was promoted to a

management position. She was in a management position in April 2002 when she was

laid off as a result of a corporate-wide management restructuring effort. Ladd was

promoted to management in 1986 and continued to be promoted through management

until April 2002, when her job was similarly eliminated. Because they had previously

worked in Amtrak’s union ranks, they were both entitled to “bump down” into a union

position based on their retained union seniority. In the year following their layoffs, both

applied for management vacancies, some of which they had previously held or




       1
         Although the policy says “in his or her current union[,]” the parties agree that the
policy has been interpreted and applied by Amtrak as blocking an employee who has not
been in his or her current union position for at least one year. See Stagi II, 2009 WL
2461892, at *1 n.4 (“Although the way the rule is written appears to prevent
consideration of agreement-covered employees based on time-in-current-union, since at
least 1999 or 2000, the Policy has been applied consistently to consider time-in-position,
not time-in-union . . . . The language of the policy [sic] was changed in 2004 (after the
commencement of this litigation).”).

                                              4
supervised. They were both blocked by the one-year rule from being considered for those

positions. Stagi remains in her union position. Ladd was not able to return to

management before 2004, when she left on long-term disability and retired with benefits

inferior to those she would have enjoyed had she been permitted to access a management

position.

       In October 2003, Stagi filed a class complaint, and later amended it to add Ladd.

Plaintiffs’ complaint alleges that Amtrak violated Title VII, 42 U.S.C. § 2000e et seq.,

and the Equal Protection component of the Due Process Clause of the Fifth Amendment

by adopting and applying the one-year rule to plaintiffs.

       In May 2005, Amtrak moved for judgment on the pleadings under Rule 12(c) of

the Federal Rules of Civil Procedure. The District Court denied Amtrak’s motion holding

that plaintiffs had “made out a prima facie case” of disparate impact by the blocking rule

at issue here. Stagi v. Amtrak, 407 F. Supp. 2d 671, 676 (E.D. Pa. 2005) (Stagi I).

       The District Court held a discovery conference on January 2, 2006, and plaintiffs

moved to compel production of discovery material related to the qualifications of the

various management positions as well as the work histories and other qualifications of

union employees who might have been qualified for management positions (although they

might be blocked by the one-year rule). The court held additional discovery conferences

on April 4, 2007 and May 4, 2007. One of the issues discussed at each conference was

the use and availability of qualifications data. Amtrak subsequently produced certain



                                             5
employee data in July 2007. Based in part on this data, plaintiffs submitted an expert

report by Mark R. Killingsworth on October 23, 2007. Amtrak submitted a responsive

expert report by David W. Griffin on January 25, 2008.

       Plaintiffs filed a motion for class certification under Rule 23 on February 29, 2008.

Before that motion was fully briefed, Amtrak moved for summary judgment on April 21,

2008. Briefing was complete for the class certification motion on June 6, 2008 and for

the summary judgment motion on November 17, 2008. A hearing was held on July 21,

2009, at which each party’s expert testified. By memorandum and order dated August 12,

2009, the District Court granted Amtrak’s summary judgment motion.2 Stagi II, 2009 WL

2461892, at *13. Plaintiffs timely appealed.3

                                            II.

                            A. Title VII and Disparate Impact

       Under Title VII of the Civil Rights Act of 1965, it is unlawful for an employer to

“limit, segregate, or classify his employees or applicants for employment in any way

which would deprive or tend to deprive any individual of employment opportunities or

otherwise adversely affect his status as an employee, because of such individual’s race,

       2
         On appeal, plaintiffs argue that the District Court erred in ruling on the summary
judgment motion when it did because the District Court had informed the parties that the
July 21 hearing would be limited to questions relating to class certification. Because we
will reverse the District Courts’s order granting summary judgment on other grounds, we
need not decide this issue.
       3
        The District Court had subject matter jurisdiction under 28 U.S.C. §§ 1331 and
1343(a)(4). We have jurisdiction under 28 U.S.C. § 1291.

                                             6
color, religion, sex, or national origin.” 42 U.S.C. § 2000e-2(a)(2). This prohibition

against disparate impact is distinct from disparate treatment by an employer, which

requires a showing of discriminatory intent. Under Section 2000e-2(a)(2), an otherwise

facially neutral business practice that disproportionately affects or impacts a protected

group may be unlawful. Griggs v. Duke Power Co., 401 U.S. 424, 431 (1971); see also

Lanning v. SEPTA, 181 F.3d 478, 485 (3d Cir. 1999). “Title VII strives to achieve

equality of opportunity by rooting out artificial, arbitrary, and unnecessary

employer-created barriers to professional development that have a discriminatory impact

upon individuals.” Connecticut v. Teal, 457 U.S. 440, 451 (1982) (internal quotation

marks omitted). Accordingly, the Supreme Court has noted that “[i]n considering claims

of disparate impact . . . this Court has consistently focused on employment and promotion

requirements that create a discriminatory bar to opportunities. This Court has never . . .

requir[ed] the focus to be placed . . . on the overall number of minority or female

applicants actually hired or promoted.” Id. at 450.

       A prima facie case of disparate impact discrimination has two components. First,

a plaintiff must identify “the specific employment practice that is challenged.” Watson v.

Ft. Worth Bank & Trust, 487 U.S. 977, 994 (1988). Second, the plaintiff must show that

the employment practice “causes a disparate impact on the basis of race, color, religion,

sex, or national origin.” 42 U.S.C. § 2000e-2(k)(1)(A)(i). To show causation, the

plaintiff must present “statistical evidence of a kind and degree sufficient to show that the



                                              7
practice in question has caused exclusion of applicants for jobs or promotions because of

their membership in a protected group.” Watson, 487 U.S. at 994; see also EEOC v.

Greyhound Lines, 635 F.2d 188, 193 (3d Cir. 1980).

       If a plaintiff makes out a prima facie case, the burden shifts to the employer to

show that the employment practice at issue is job related for the position in question and

is consistent with business necessity.4 Watson, 487 U.S. at 994; 42 U.S.C. § 2000e-

2(k)(1) (clarifying that to maintain a claim, plaintiff must make out a prima facie case and

the employer must then “fail[] to demonstrate that the challenged practice is job related

for the position in question and consistent with business necessity”).5

                                  B. The Prima Facie Case

       As the District Court noted, there is no “rigid mathematical formula” courts can

mandate or apply to determine whether plaintiffs have established a prima facie case.

Stagi II, 2009 WL 2461892, at *3. If statistical evidence is used, as it typically will be in

disparate impact cases, it must be “sufficiently substantial” to raise “an inference of

causation.” Id. (quoting Watson, 487 U.S. at 994-95). The Supreme Court has not

provided any definitive guidance about when statistical evidence is sufficiently



       4
         The District Court did not reach the issue of business necessity because it held
that plaintiff failed to establish a prima facie case and ended its inquiry.
       5
         The statute also allows plaintiff to show that an alternative employment practice
exists that has a less disparate impact and would also serve the business’s legitimate
interest and the employer refuses to adopt it. 42 U.S.C. § 2000e-2(k)(1)(A)(ii); Lanning,
181 F.3d at 489-90. This alternative is not relevant here.

                                              8
substantial, but a leading treatise notes that “[t]he most widely used means of showing

that an observed disparity in outcomes is sufficiently substantial to satisfy the plaintiff’s

burden of proving adverse impact is to show that the disparity is sufficiently large that it

is highly unlikely to have occurred at random.” 1 B. Lindemann & P. Grossman,

Employment Discrimination Law 124 (4th ed. 2007) (hereinafter “Lindemann &

Grossman”). This is typically done by the use of tests of statistical significance, which

determine the probability of the observed disparity obtaining by chance.

       There are two related concepts associated with statistical significance: measures of

probability levels and standard deviation. Probability levels (also called “p-values”) are

simply the probability that the observed disparity is random—the result of chance

fluctuation or distribution. For example, a 0.05 probability level means that one would

expect to see the observed disparity occur by chance only one time in twenty cases—there

is only a five percent chance that the disparity is random. A standard deviation is a unit

of measurement that allows statisticians to measure all types of disparities in common

terms.6 In this context, the greater the number of standard deviations from the mean, the

greater the likelihood that the observed result is not due to chance. To offer some sense


       6
         Technically, a standard deviation is defined as “a measure of spread, dispersion,
or variability of a group of numbers equal to the square root of the variance of that group
of numbers.” D. Baldus & J. Cole, Statistical Proof of Discrimination 359 (1980). The
“variance” of the group of numbers is computed by subtracting the “mean,” or average, of
all the numbers, “squaring the resulting difference, and computing the mean of these
squared differences.” Id. at 361.


                                               9
of the relationship between these two measures, two standard deviations corresponds

roughly to a probability level of 0.05; three standard deviations correspond to a

probability level of 0.0027. See Lindemann & Grossman 126 n.85 and accompanying

text.

        As a legal matter, the Supreme Court has stated that “[a]s a general rule for . . .

large samples, if the difference between the expected value and the observed number is

greater than two or three standard deviations, then the hypothesis that the [result] was

random would be suspect to a social scientist.” Castaneda v. Partida, 430 U.S. 482, 496

n.17 (1977). Additionally, many courts accept a 0.05 probability level (or below) as

sufficient to rule out the possibility that the disparity occurred at random. See, e.g.,

Waisome v. Port Auth., 948 F.2d 1370, 1376 (2d Cir. 1991) (“Social scientists consider a

finding of two standard deviations significant, meaning there is about one chance in 20

that the explanation for a deviation could be random and the deviation must be accounted

for by some factor other than chance.” (citation omitted)); Palmer v. Shultz, 815 F.2d 84,

92-96 (D.C. Cir. 1987) (noting that “statistical evidence meeting the .05 level of

significance . . . [is] certainly sufficient to support an inference of discrimination”

(citation and internal quotation marks omitted, alterations in original)).

        In addition to using formal measures of statistical significance, some courts have

also relied upon the “80 percent rule” from the Equal Employment Opportunity

Commission’s (EEOC) Uniform Guidelines on Employee Selection Procedures to assess



                                              10
whether a plaintiff has established a prima facie disparate impact case. See, e.g., Stout v.

Potter, 276 F.3d 1118, 1124 (9th Cir. 2002) (applying “four-fifths rule” and calling it

“rule of thumb” courts use when considering adverse impact of selection procedures);

Boston Police Superior Officers Fed’n v. City of Boston, 147 F.3d 13, 21 (1st Cir. 1998)

(affirming district court’s use of four-fifths rule in context of consent decree, holding that,

although “violation of the four-fifths rule, standing alone, is not conclusive evidence of

discrimination,” it nonetheless serves as an “appropriate benchmark”); Smith v. Xerox

Corp., 196 F.3d 358, 365 (2d Cir. 1999) (finding EEOC Guidelines “persuasive”). These

Guidelines are codified at 29 C.F.R. § 1607.4(D), entitled “Adverse impact and the ‘four-

fifths rule,’” and they state, in relevant part,

               A selection rate for any race, sex, or ethnic group which is
               less than four-fifths (4/5) (or eighty percent) of the rate for the
               group with the highest rate will generally be regarded by the
               Federal enforcement agencies as evidence of adverse impact,
               while a greater than four-fifths rate will generally not be
               regarded by Federal enforcement agencies as evidence of
               adverse impact.

29 C.F.R. § 1607.4(D).

       EEOC Guidelines are entitled only to Skidmore deference, Skidmore v. Swift &

Co., 323 U.S. 134, 140 (1944), under which EEOC Guidelines “get[] deference in

accordance with the thoroughness of [their] research and the persuasiveness of [their]

reasoning.” El v. SEPTA, 479 F.3d 232, 244 (3d Cir. 2007) (citing EEOC v. Arab




                                                   11
American Oil Co., 499 U.S. 244, 257 (1991)).7 The “80 percent rule” or the “four-fifths

rule” has come under substantial criticism, and has not been particularly persuasive, at

least as a prerequisite for making out a prima facie disparate impact case. The Supreme

Court has noted that “[t]his enforcement standard has been criticized on technical grounds

. . . and it has not provided more than a rule of thumb for the courts.” Watson, 487 at 995

n.3. See also Lindemann & Grossman 130 (noting that the 80 percent rule “is inherently

less probative than standard deviation analysis”); E. Shoben, Differential Pass-Fail Rates

in Employment Testing: Statistical Proof Under Title VII, 91 Harv. L. Rev. 793, 806

(1978) (arguing that the “four-fifths rule should be abandoned altogether” and that “flaws

in the four-fifths rule can be eliminated by replacing it with a test of . . . statistical

significance”).

       Another non-statistical standard that has been discussed in the context of assessing

whether a plaintiff has made out a prima facie case is the requirement that the disparity

have “practical significance.” 8 For example, Lindemann and Grossman write that “[t]o

       7
          It is worth noting that although the Supreme Court initially said that EEOC
Guidelines were entitled to “great deference,” the Supreme Court itself has made it clear
that this is not the case. As we noted in El v. SEPTA: “It does not appear that the
EEOC’s Guidelines are entitled to great deference. While some early cases so held in
interpreting Title VII, Griggs, 401 U.S. at 434 . . . more recent cases have held that the
EEOC is entitled only to Skidmore deference.” 479 F.3d at 244 (citing Arab American
Oil, 499 U.S. at 257).
       8
         A related concern, that the statistical disparity be “substantial,” has been held out
as an additional requirement for a plaintiff’s prima facie case. See, e.g., Thomas v.
Metroflight, Inc., 814 F.2d 1506, 1511 n.4 (10th Cir.1987) (suggesting that courts may
require, in addition to statistical significance, that the observed disparity be substantial).

                                                12
guard against the possibility that a finding of adverse impact could result from the

statistical significance of a trivial disparity or a meaningless difference in results, the

Uniform Guidelines on Employee Selection Procedures and some courts have adopted an

additional test for adverse impact: that a statistically significant disparity also has

practical significance.” Lindemann & Grossman 131 (citations omitted).

       We can identify no Court of Appeals that has found “practical significance” to be a

requirement for a plaintiff’s prima facie case of disparate impact, including the Third




This requirement, however, appears to be derived from the Supreme Court’s early
disparate impact cases that were decided prior to the use of formal notions of statistical
significance as the means by which causation was to be demonstrated. In these early
formulations of the causation requirement, rather than requiring a particular level of
statistical significance, the Supreme Court required that the relevant rule had a
“substantially” disproportionate effect. See, e.g., Griggs, 401 U.S. at 426 (examining
“requirements [that] operate[d] to disqualify Negroes at a substantially higher rate than
white applicants”); Albemarle Paper Co. v. Moody, 422 U.S. 405, 425 (1975) (plaintiffs
are required to show “that the tests in question select applicants for hire or promotion in a
racial pattern significantly different from that of the pool of applicants”); Washington v.
Davis, 426 U.S. 229, 246-47 (1976) (“hiring and promotion practices disqualifying
substantially disproportionate number of blacks”); Dothard v. Rawlinson, 433 U.S. 321,
329 (1977) (employment standards that “select applicants for hire in a significantly
discriminatory pattern”). The Supreme Court has made it clear that the “substantial”
language was meant to address the plaintiff’s burden to demonstrate causation. As the
Supreme Court noted in Watson, the Supreme Court’s “formulations . . . have consistently
stressed that statistical disparities must be sufficiently substantial that they raise . . . an
inference of causation,” in other words, that the statistical disparities are adequate to
“show that the practice in question has caused the exclusion of applicants for jobs or
promotions because of their membership in a protected group.” 487 U.S. at 994-95
(O’Connor, J., plurality opinion) (emphasis added). The requirement of “substantiality”
was not meant to introduce an additional burden on the plaintiff above that of offering
evidence of causation.


                                               13
Circuit. The “practical significance” language stems from the EEOC Uniform Guidelines

on Employee Selection Procedures, which note that “[s]maller differences in selection

rate may nevertheless constitute adverse impact, where they are significant in both

statistical and practical terms.” 29 C.F.R. § 1607.4(D) (emphasis added). However, even

the non-binding EEOC Guidelines only suggest that “practical significance” might be a

requirement when differences in the selection rate were greater than eighty percent. Id.

The one case identified by Lindemann and Grossman, Waisome, noted that the EEOC

Guidelines, including the aforementioned one, “provide no more than a rule of thumb to

aid in determining whether an employment practice has a disparate impact.” 948 F.2d at

1376 (internal quotation marks and citation omitted), cited in Lindemann & Grossman

131 n.98. The Second Circuit Court of Appeals in Waisome did disregard a finding of

statistical significance (2.68 standard deviations), but on the grounds that the African-

American pass rate for a written examination was 87% of the white pass rate, and that the

statistical significance of the disparity would disappear if just two additional African-

American candidates, out of a total of 64 African-American candidates, had passed the

written examination. 948 F.2d at 1376-77. Other courts have also found that, in cases

where the “statistical significance” of the results would disappear if the numbers were

altered very slightly, the plaintiff failed to make out a prima facie case. See, e.g., Apsley

v. Boeing Co., --- F. Supp.2d ---, No. 05-1368, 2010 WL 2670880, at *18 (D. Kan. June

30, 2010) (noting that “[s]tatistical significance does not tell us whether the disparity we



                                              14
are observing is meaningful in a practical sense nor what may have caused the disparity,”

and finding that because of the fact that if “forty-eight more people over the age of 40

would have been hired, Plaintiffs’ hiring statistics would not have been statistically

significant,” plaintiffs failed to establish a prima facie case). As “practical” significance

has not been adopted by our Court, and no other Court of Appeals requires a showing of

practical significance, we decline to require such a showing as part of a plaintiff’s prima

facie case.

       In sum, to establish a prima facie case of disparate impact in a Title VII case, a

plaintiff must (1) identify a specific employment policy or practice of the employer and

(2) proffer evidence, typically statistical evidence, (3) of a kind and degree sufficient to

show that the practice in question has caused exclusion of applicants for jobs or

promotions (4) because of their membership in a protected group. See Watson, 487 U.S.

at 994. With respect to meeting her burden with respect to (3), a plaintiff will typically

have to demonstrate that the disparity in impact is sufficiently large that it is highly

unlikely to have occurred at random, and to do so by using one of several tests of

statistical significance. There is no precise threshold that must be met in every case, but a

finding of statistical significance with a probability level at or below 0.05, or at 2 to 3

standard deviations or greater, will typically be sufficient. See Castaneda, 430 U.S. at

496 n.17.

                               III. The District Court Decision



                                              15
       As noted above, the District Court granted Amtrak’s summary judgment motion on

the grounds that Plaintiffs failed to carry their burden of presenting a prima facie case of

disparate impact. This decision was based on two main considerations: (1) that “the

applicant pool plaintiffs analyzed to demonstrate the disparate impact of Amtrak’s policy

erroneously compares employees who may not have the minimal qualifications for the

particular jobs at issue,” and (2) that “when viewed in context, plaintiffs’ evidence of

discrimination lacks practical significance.” Stagi II, 2009 WL 2461892, at *13. The

District Court’s reasoning behind these conclusions is nuanced and worth considering in

some detail.

       The District Court, in laying out the standard for a prima facie disparate impact

case, correctly noted that the plaintiff does not need to offer proof of the employer’s

subjective intent to discriminate, but that, instead, she must “first identify the specific

employment practice that is challenged” and then she must “show causation” by offering

“statistical evidence of a kind and degree sufficient to show that the practice in question

has caused the exclusion of applicants for jobs or promotions because of their

membership in a protected group.” Stagi II, 2009 WL 2461892, at *3 (internal quotation

marks and citations omitted). The District Court also noted that the “statistical disparities

must be sufficiently substantial such that they raise an inference of causation.” Id.

(internal quotation marks and citation omitted).

       The District Court then stated that there is no “rigid mathematical formula that



                                              16
satisfies the sufficiently substantial standard in the disparate impact analysis.” Id.

(internal quotation marks and citation omitted). But rather than discuss the importance of

various measures of statistical significance, particularly with respect to demonstrating that

the disparity is unlikely to have been the product of chance, the District Court instead

referenced the EEOC Guidelines “eighty percent” rule. The District Court stated that “the

Supreme Court has indicated that the guidance of this administrative body should be

considered with ‘great deference,’ and no consensus has developed around any alternative

standard.” Id. (quoting Griggs, 401 U.S. at 433-34). The District Court did note that this

rule “is not intended to be an absolute requirement.” Id.

       Applying its statement of the law to the facts of the case before it, the District

Court noted that Plaintiffs satisfied the first part of their prima facie case by identifying

the one-year rule as the specific employment practice being challenged. Id. at *4. The

District Court then conducted an extended discussion of the statistical evidence of

disparate impact offered by Plaintiffs in the form of the expert report of Dr.

Killingsworth, and the criticism of that report by Amtrak’s expert, Dr. Griffin.

       The District Court found that the one-year rule makes this situation equivalent to

an “entrance requirement” case, which means that the pool of actual applicants for the

position will under-represent those who would otherwise qualify, because the requirement

itself would discourage the people who are claiming that the requirement has a disparate

impact from applying. Id. at *5. The District Court noted that “[i]n such cases, it is



                                              17
proper to establish disparate impact through reference to a reasonable proxy for the pool

of individuals actually affected by the alleged discrimination.” Id. (internal quotation

marks and citation omitted).

       The District Court then discussed Dr. Killingsworth’s method for creating proxy

pools. The key part of Dr. Killingsworth’s method of creating the proxy pools is this

multi-step process:

       (1) Identify each management vacancy occurring during the time at issue (between
              March 8, 2002 and June 30, 2007).

       (2) Of that full set of vacancies, isolate the vacancies that were filled by a union
              employee (which we will refer to as a “job fill”).

       (3) For each successful union employee, identify the job title that the union
              employee had prior to getting the management job (which we will refer to
              as a “feeder job”).

       (4) Define a “Feeder Pool” for a particular management vacancy as the set of
              people who had the same job title as the successful candidate for that
              vacancy on the date just before the vacancy was filled.

Dr. Killingsworth’s model, using the above approach, identified 716 separate “Feeder

Pools,” each tied to a specific management vacancy, at a specific point in time. Each

entry in a pool is called a “candidacy,” rather than a candidate or person because the same

potential applicants (or people) could be in more than one Feeder Pool. After discussing

Dr. Killingsworth’s method of creating the Feeder Pools, the District Court found that

“[b]ased on the information provided to Dr. Killingsworth by Amtrak, plaintiffs’ method

is a reasonable one.” Id. at *6.



                                             18
       The District Court objected, however, to Dr. Killingsworth’s decision to

“aggregate” all of the individual Feeder Pools into “one giant pool” (the “Aggregated

Pool”) in order to analyze “the degree to which the Policy disqualified women in the

Aggregated Pool relative to men.” Id. Specifically, Dr. Killingsworth combined all 716

individual Feeder Pools into one large pool in order to conduct his statistical analysis.

The District Court noted that when Dr. Killingsworth analyzed the data using a “corrected

probit analysis” (which corrects for the fact that the same individual might appear in more

than one pool), the results yielded a standard deviation of 3.855, with a p-value of less

than 0.001—results which the District Court acknowledged were “unlikely to have

occurred as a result of chance alone.” 9 Id.

       Despite the statistical significance of this result, however, the District Court found

that Plaintiffs had not done enough to carry their prima facie burden. First, the District

Court was convinced by Amtrak’s argument that Dr. Killingsworth’s analysis was flawed,

and that the statistical significance of his result was thus irrelevant. Amtrak’s expert, Dr.

Griffin, offered a report demonstrating that if one does not combine the 716 Feeder Pools

into one large Aggregated Pool, and if, instead, one just examines whether women in each

individual Feeder Pool were ineligible at a greater than expected level (given the




       9
         The District Court also noted that using an “uncorrected” conventional chi-square
test to analyze the data, Dr. Killingsworth’s results were even more statistically
significant (in terms of being unlikely to have occurred at random), with a standard
deviation measure of 8.42.

                                               19
ineligibility rate of that particular pool), one does not find that women were

disadvantaged relative to men at a statistically significant level.

       Dr. Griffin determined this by first determining the percentage of ineligible men

and women in a particular Feeder Pool (i.e., if 50 out of 500 people are blocked, the total

ineligibility rate would be 10%). Next, Dr. Griffin multiplied that percentage by the total

number of women in the pool to determine the number of “expected” ineligibles (i.e., if

there were 300 women in the pool, multiplied by 10%, one would expect 30 women in the

pool to be ineligible). Finally, Dr. Griffin compared the “expected” number of ineligible

women with the actual number of ineligible women in the pool, to assess whether there

was a shortfall or a surplus of ineligible women in that particular pool, relative to what

was expected (i.e., if 20 women were actually ineligible, then there would be a shortfall

of 10 women—10 fewer women were ineligible than would be expected given the Feeder

Pool’s particular ineligibility rate as a whole).

       Having conducted this analysis for approximately 600 “job fills,” Dr. Griffin then

summed the surpluses and shortfalls of ineligible women across those approximately 600

“job fills.” This resulted in a net surplus of 6.2 ineligible women, meaning that 6.2 fewer

women were promotion eligible than would have been if there were perfect gender parity

across all 600 job fills. As the District Court noted, “[s]ix fewer promotion eligible

females across 600 plus ‘job fills’ is not statistically significant by any measure, and does

not support an inference of discrimination.” Id. at *8 (emphasis in original).



                                              20
       At this point, the District Court noted that “the parties have merely presented two

different statistical models that produce opposite results,” and that “[s]imply

demonstrating that an alternative analysis leads to alternative results is not sufficient to

defeat a plaintiff’s prima facie case—the defendant must also show that there is no

genuine issue of material fact that plaintiffs’ model is fundamentally flawed for the

purpose of demonstrating disparate impact in the case at issue.” Id. (citation omitted).

The District Court continued:

              The key difference between the experts can be boiled down to
              this: Dr. Griffin looks at whether women applying to job X
              are disadvantaged relative to men applying to job X, whereas
              Dr. Killingsworth analyzes whether women applying to jobs
              X and Y are disadvantaged relative to men applying for jobs
              X and Y, combined. When seen in those terms, the difference
              between the expert analysis presented in this case is simply a
              question of whether the plaintiffs have analyzed the
              appropriate relevant labor pool for purposes of comparison.
              This question can be decided as a matter of law.

Id. at *9. Essentially, the District Court saw itself as forced to decide whether Dr.

Killingsworth’s decision to aggregate the 716 Feeder Pools into one Aggregate Pool was

appropriate, and considered this to be a question of law.

       The District Court noted that “[a]ggregated statistical data may be properly used to

prove disparate impact where it is more probative than subdivided data,” id. (citing Paige

v. California, 291 F.3d 1141, 1148 (9th Cir. 2002)), but that “‘[w]hen special

qualifications are required to fill particular jobs, comparisons to the general population

(rather than to the smaller group of individuals who possess the necessary qualifications)

                                              21
may have little probative value.’” Id. (quoting Hazelwood Sch. Dist. v. U.S., 433 U.S.

299, 308 n. 13 (1977)). The District Court then stated that Dr. Killingsworth

acknowledged that every union employee was not fungible for purposes of promotion,

since he created the 716 Feeder Pools, “otherwise he would have simply compared all

union employees across the board.” Id. The District Court contended that because Dr.

Killingsworth takes the “distinctions between job categories [to be] important . . . then the

defendant’s argument that these distinctions should be maintained throughout the analysis

rings true.” Id. Accordingly, the District Court found that “because plaintiffs’ analysis is

focused on an overbroad and incomparable pool of employees, it lacks the statistical

significance necessary to make out a prima facie case of discrimination.” Id. at *11.

       In the alternative, the District Court found that “[e]ven if Dr. Killingsworth’s

methodology was sound and his results recognized as having ‘statistical significance,’ the

results of his analysis are undermined by a lack of practical significance.” Id. at *12. To

reach this conclusion, the District Court credited Dr. Griffin’s calculation that if female

candidates in the Aggregated Pool had the same eligibility rate as male candidates, this

would have translated to a “gender gap” of only 726 additional female promotion-eligible

candidacies (not necessarily equal to the number of affected individual people or

candidates) overall. The District Court also noted that, under the EEOC Guidelines’ “80

percent rule,” the adverse impact ratio’s “practical significance is of limited magnitude,”

since the ratio here was 96.8 percent—well over the 80 percent baseline. Id.



                                             22
       In conclusion, the District Court found that “the applicant pool plaintiffs analyzed

to demonstrate the disparate impact of Amtrak’s policy erroneously compares employees

who may not have the minimal qualifications for the particular jobs at issue,” and that

“plaintiffs’ evidence of discrimination lacks practical significance.” Id. at *13. The

Court therefore granted Amtrak’s motion for summary judgment.

                                             IV.

       We review a district court’s grant of summary judgment de novo. See, e.g., Slagle

v. County of Clarion, 435 F.3d 262, 263 (3d Cir. 2006). Under Rule 56(c) of the Federal

Rules of Civil Procedure, summary judgment is appropriate when “there is no genuine

issue as to any material fact.” The moving party “bears the initial responsibility of

informing the district court of the basis for its motion, and identifying those portions of

the pleadings, depositions, answers to interrogatories, and admissions on file, together

with the affidavits, if any, which it believes demonstrate the absence of a genuine issue of

material fact.” El, 479 F.3d at 237 (quoting Celotex Corp. v. Catrett, 477 U.S. 317, 323

(1986)). The court must draw all reasonable inferences against the moving party. Id. at

238. “If the moving party successfully points to evidence of all of the facts needed to

decide the case on the law short of trial, the non-moving party can defeat summary

judgment if it nonetheless produces or points to evidence in the record that creates a

genuine issue of material fact.” Id. “Thus, if there is a chance that a reasonable

factfinder would not accept a moving party’s necessary propositions of fact, pre-trial



                                             23
judgment cannot be granted.” Id.

       We find that there is a genuine issue of material fact as to whether the one-year

rule caused a disparate impact on female employees. Accordingly, although it is a close

case, we find that the District Court should not have granted Amtrak’s motion for

summary judgment based on this record.

       As noted above, to establish a prima facie case of disparate impact in a Title VII

case, a plaintiff must (1) identify a specific employment policy or practice of the

employer and (2) proffer evidence, typically statistical evidence, (3) of a kind and degree

sufficient to show that the practice in question has caused exclusion of applicants for jobs

or promotions (4) because of their membership in a protected group. To establish (3), a

plaintiff will typically have to demonstrate that the disparity in impact is sufficiently large

that it is highly unlikely to have occurred at random, and to do so by using one of several

tests of statistical significance. A plaintiff need not demonstrate that the disparate impact

ratio satisfies the EEOC’s 80 percent rule (the figure at which or below the EEOC will

presume the existence of disparate impact). As noted above, the EEOC Guidelines are

not entitled to great deference, but to Skidmore deference, under which EEOC Guidelines

“get[] deference in accordance with the thoroughness of [their] research and the

persuasiveness of [their] reasoning.” El, 479 F.3d at 244 (citing EEOC v. Arab American

Oil Co., 499 U.S. at 257). The 80 percent rule has come under significant criticism and

we do not find the reasoning that might support its application here persuasive in light of



                                              24
the statistical significance of Dr. Killingsworth’s results.

       Similarly, this Court has never established “practical significance” as an

independent requirement for a plaintiff’s prima facie disparate impact case, and we

decline to do so here. The EEOC Guidelines themselves do not set out “practical”

significance as an independent requirement, and we find that in a case in which the

statistical significance of some set of results is clear, there is no need to probe for

additional “practical” significance. Statistical significance is relevant because it allows a

fact-finder to be confident that the relationship between some rule or policy and some set

of disparate impact results was not the product of chance. This goes to the plaintiff’s

burden of introducing statistical evidence that is “sufficiently substantial” to raise “an

inference of causation.” Watson, 487 U.S. at 994-95. There is no additional requirement

that the disparate impact caused be above some threshold level of practical significance.

Accordingly, the District Court erred in ruling “in the alternative” that the absence of

practical significance was fatal to Plaintiffs’ case.

       There is no question that Dr. Killingsworth’s results, if the product of a relevant

and otherwise compelling statistical analysis, are statistically significant above the

threshold that courts have required.10 As noted above, when Dr. Killingsworth analyzed

the data using a corrected probit analysis, the results yielded a standard deviation of




       10
        Even Amtrak concedes that the results, if they stand, meet the threshold
requirement for statistical significance. Oral Arg. Tr., 47-48.

                                              25
3.855, with a p-value of less than 0.001—meaning the results are incredibly unlikely to

have occurred as a result of chance alone. The Supreme Court has suggested that a

standard deviation between 2 and 3 would be sufficient, and Dr. Killingsworth’s results

are considerably above that. See, e.g., Castaneda, 430 U.S. at 496 n.17.

       Thus, the only issue is whether the District Court was correct in finding that Dr.

Killingsworth’s statistical analysis was, in effect, legally irrelevant to satisfying Plaintiffs’

burden with respect to their prima facie case because his analysis used aggregation, and in

particular the Aggregated Pool, in conducting his statistical analysis. We find that Dr.

Killingsworth’s decision to aggregate the data, although not obviously correct, is also not

obviously incorrect, and so there remains a genuine issue of material fact—whether the

one-year rule caused a disparate impact on Amtrak’s female employees.

        The one-year rule applies to all union employees. However, including all union

employees in the statistical sample would have been inappropriate, since many of them

may not have been even remote candidates for any management position. To identify all

those union employees who might reasonably be thought to be candidates for a

management position, Dr. Killingsworth identified those candidates who obtained a

management position during the relevant five-year span, and then identified the previous

union positions held by those candidates. At that point, Dr. Killingsworth assumed, and

the District Court found this assumption reasonable, that all those individuals who were in

the same union position as the position that the successful candidate had previously



                                               26
occupied might reasonably be thought to have been a possible candidate for the

management position that the successful candidate actually obtained. Thus, if Smith was

hired into Management Position One, and Smith had previously been in Union Position

One, Dr. Killingsworth assumed that all other individuals—Jones, Williams, Johnson,

etc.—who had been in Union Position One were possible candidates for Management

Position One. This is not a perfect proxy, as all parties concede. For example, Smith

might have had much more experience than Jones and Williams, or he might have

educational degrees that they lack. But, given that the one-year rule operates as an initial

bar from even becoming a candidate for a job, the only way to measure its effect is to

devise some way of identifying those who might reasonably be thought to have been

possible candidates were it not for the existence of the one-year rule. We agree with the

District Court that Dr. Killingsworth’s method here was reasonable.

       It is true that while “the population selected for statistical analysis need not

perfectly match the pool of qualified persons,” without “a close fit between the population

used to measure disparate impact and the population of those qualified for a benefit, the

statistical results cannot be persuasive.” Carpenter v. Boeing Co., 456 F.3d 1183, 1196

(10th Cir. 2006). One must have the proper pool of people in view before performing

statistical analysis, or that analysis will be irrelevant. This, however, goes to the issue of

whether Dr. Killingsworth’s use of the individual Feeder Pools was reasonable or not. In

discussing this issue, the District Court stated:



                                              27
              In the absence of explicit measures of qualifications and job
              interest, Dr. Killingsworth assumed that information about the
              position held prior to promotion could reasonably serve as an
              indicator of qualifications and job interest. Based on the
              information provided to Dr. Killingsworth by Amtrak,
              plaintiffs’ method is a reasonable one.

Stagi II, 2009 WL 2461892, at *6. We agree.

       Where the District Court identified a problem was with the combining of the

individual Feeder Pools into one Aggregated Pool. The District Court stated that because

Dr. Killingsworth takes the “distinctions between job categories [to be] important” in

creating the individual Feeder Pools, “then the defendant’s argument that these

distinctions should be maintained throughout the analysis rings true.” Id. at *9. Amtrak’s

counsel made this same point repeatedly at oral argument, stating that “if you’re going to

live in a stratified world, you have to follow that stratified world through to your analysis”

and that “the problem is that we’re aggregating after we stratify, that’s the heart of the

matter.” Oral Arg. Tr., at 41, 45.

       However, neither the District Court nor Amtrak’s counsel has offered a convincing

explanation of why the use of aggregated data in this case is improper. The District Court

reintroduces the “qualifications” issue, asserting that “[t]he single aggregated statistic Dr.

Killingsworth relies on compares individuals who may never actually be in competition

for the same jobs, and does not accurately account for what job the employee in question

is coming from, where they are looking to go, and what the relevant qualifications are.”

Stagi II, 2009 WL 2461892, at *9. But this criticism misses its target. Creating the

                                              28
Aggregated Pool out of the individual Feeder Pools does not erroneously imply that a

person from Feeder Pool A (created based on Management Position A) is a possible

candidate, along with the members of Feeder Pool B, for Management Position B.

Rather, it just puts together all of those people (or candidacies, more precisely) who are in

union positions currently, and who are reasonably thought of as possible candidates for

some management position or other. All of these people are susceptible to the one-year

rule, and thus all of them are potentially “blocked” by its uniform application if they have

served less than one year in their respective union positions. Aggregating the individual

Feeder Pools in this way appears to be no more problematic, at least with respect to the

issue of qualifications, than doing what Dr. Griffin did when he simply “added up” the

difference between the expected ineligibility rate and the actual ineligibility rate for each

of the 600 plus individual Feeder Pools.

       At various points, Amtrak’s counsel at oral argument appeared to be arguing that,

as a matter of consistency, once one has subdivided the pool into categories, one ought

not to recombine those categories into an aggregate pool. The District Court appeared to

accept a similar line of thought when it noted that because Dr. Killingsworth took the

“distinctions between job categories [to be] important” in creating the individual Feeder

Pools, “then the defendant’s argument that these distinctions should be maintained

throughout the analysis rings true.” Id. at *9. But there has been no argument made that

somehow the statistical analysis is corrupted if one “changes horses” from a stratified to



                                              29
an aggregated analysis midstream. Indeed, Amtrak’s counsel explicitly stated that “the

actual manner in which [Dr. Killingsworth] performs the numbers is not incorrect, it’s the

underlying numbers that are the problem.” Oral Arg. Tr., at 44. Finally, Plaintiffs’

counsel stresses that they never were doing a “stratification” analysis in the first place, but

that they were simply attempting to “define what is the subset of total union employees

who seemed to be in positions that made them eligible to seek promotion.” Id. at 56.

       A final possible reason to object to the use of aggregated data is presented by the

District Court when it notes that Dr. Griffin’s report suggests that there are some Feeder

Pools in which fewer women than men were made ineligible by the one-year rule, and

some in which the reverse was true, and that the overall result of women doing worse

than men (at least under Dr. Killingsworth’s model) obscures these facts. This would be

a reason against aggregating insofar as aggregating produces a misleading picture of the

overall situation for women. (As one court has noted, “[i]f Microsoft-founder Bill Gates

and nine monks are together in a room, it is accurate to say that on average the people in

the room are extremely well-to-do, but this kind of aggregate analysis obscures the fact

that ninety percent of the people in the room have taken a vow of poverty.” Abram v.

United Parcel Serv. Inc., 200 F.R.D. 424, 431 (E.D. Wis. 2001).) For example, it might

be that in 400 of the 716 Feeder Pools, women are made ineligible at a rate significantly

greater than that of men, and that in 316 of the Feeder Pools, the reverse is true. In such a

situation, the one-year rule appears to have a disparate impact on women only in a subset



                                              30
of the 716 Feeder Pools.

       Plaintiffs’ second expert, Ramona Paetzold, submitted an affidavit arguing that

stratification is inappropriate in this case precisely because of this possibility. In

particular, stratification is inappropriate because the numbers of women in each feeder

job at any given point in time is determined, in part, by the existence of the one-year rule

itself, “because the one-year rule at least partially affects how long men and women must

remain in the feeder job before being eligible for promotion.” Paetzold Aff. 3. The

District Court contends that this is a problem for Plaintiffs, because “the gender

composition of feeder jobs may very well be affected by additional factors such as wage

levels, working conditions, movement prospects, layoffs, and the union’s collectible

bargaining agreement that allows unrestricted lateral job movements among union

employees, none of which the plaintiffs have made any attempt to identify or control for

in their analysis.” Stagi II, 2009 WL 2461892, at *10. But this seems to be a problem

only if the reasons against aggregation are compelling. There is no legal requirement to

use the smallest possible unit of analysis. If there are additional factors (such as seniority

rules)—apart from just the one-year rule—that are determining the composition of the

individual Feeder Pools in a “gendered” way, these factors may aid Amtrak in mounting a

business justification defense, but it is inappropriate to require Plaintiffs to control for

every possible such factor in order to sustain their burden of proving a prima facie case.

If the aggregated data yields a statistically significant finding, such as the one here, that



                                              31
the one-year rule is having a disparate impact on women, and there is no compelling

reason to avoid use of aggregated data, that is enough for Plaintiffs to establish their

prima facie case.

       Additionally, there may be good reasons to aggregate data in a case such as

this—reasons that have nothing to do with simply picking and choosing the model which

will generate the most favorable results for plaintiffs’ case. Perhaps most significantly, as

the Fourth Circuit has observed, “by increasing the absolute numbers in the data, chance

will more readily be excluded as a cause of any disparities found.” Lilly v. Harris-Teeter

Supermarket, 720 F.2d 326, 336 n.17 (4th Cir. 1983). This makes intuitive sense. “For

example, if a coin were tossed ten times . . . and came up heads four times, no one would

think the coin was biased (0.632 standard deviations), but if this same ratio occurred for a

total of 10,000 tosses, of which 4,000 were heads, the result could not be attributed to

chance (20 standard deviations).” Id. Here, by combining all of those candidacies in the

716 Feeder Pools into one Aggregated Pool, Dr. Killingsworth was better able to test

whether the difference in the ineligibility rate for men and women was merely the product

of chance. Many courts have found such a reason for aggregating compelling. See, e.g.,

Eldredge v. Carpenters 46 N. California Counties Joint Apprenticeship and Training

Comm., 833 F.2d 1334, 1339 (9th Cir. 1987) (“Aggregated data presents a more complete

and reliable picture.”); Cook v. Boorstin, 763 F.2d 1462, 1468-69 (D.C. Cir. 1985)

(rejecting defendant’s argument to restrict statistical analysis to particular job categories);



                                              32
Capaci v. Katz & Besthoff, 711 F.2d 647, 654 (5th Cir. 1983) (allowing a plan to

aggregate data over several years because aggregation was necessary in order to

accomplish a meaningful statistical analysis).

       At a minimum, we find that there is a genuine issue of material fact as to whether

the one-year rule caused a disparate impact on female employees. It is possible that there

are reasons to prefer Dr. Griffin’s methodology to Dr. Killingsworth’s methodology,

given that they yield conflicting conclusions regarding whether the one-year rule has an

all-things-considered disparate impact on women. But we cannot so conclude on this

record, and the reasons presented by the District Court for finding that Plaintiffs have

failed to make out a prima facie case do not withstand scrutiny. Accordingly, we find that

the District Court should not have granted Amtrak’s motion for summary judgment based

on this record.

                                             V.

       We will reverse the judgment of the District Court granting Amtrak’s motion for

summary judgment and will remand for further proceedings consistent with this opinion.




                                             33
