ArticlePDF Available

The Validity and Utility of Selection Methods in Personnel Psychology: Practical and Theoretical Implications of 85 Years of Research Findings

September 1998
Psychological Bulletin 124(2):262-274

September 1998
124(2):262-274

DOI:10.1037//0033-2909.124.2.262

Authors:

University of Iowa

This article summarizes the practical and theoretical implications of 85 years of research in personnel selection. On the basis of meta-analytic findings, this article presents the validity of 19 selection procedures for predicting job performance and training performance and the validity of paired combinations of general mental ability (GMA) and the 18 other selection procedures. Overall, the 3 combinations with the highest multivariate validity and utility for job performance were GMA plus a work sample test (mean validity of .63), GMA plus an integrity test (mean validity of .65), and GMA plus a structured interview (mean validity of .63). A further advantage of the latter 2 combinations is that they can be used for both entry level selection and selection of experienced employees. The practical utility implications of these summary findings are substantial. The implications of these research findings for the development of theories of job performance are discussed.

Content uploaded by Frank L. Schmidt

Content may be subject to copyright.

Pwcholo2ical Bulletin

199M.Vol.12~.No.2. 262-27~ Copynght 1998 by the Amencan PsychologIcal ASSO<:I."on.In<.

0033-2'!09/9I!1SJ.OO

The Validity and Utility of Selection Methods in Personnel Psychology:

Practical and Theoretical Implications of 85 Years of Research Findings

Frank L. Schmidt

University of Iowa John E. Hunter

Michigan State University

This article summarizes the practical and theoretical implications of 85 years of research in personnel

selection. On the basis of meta-analytic findings, this article presents the validity of 19 selection

procedures for predicting job performance and training performance and the validity of paired

combinations of general mental ability (GMA) and the 18 other selection procedures. Overall. the

3 combinations with the highest multivariate validity and utility for job performance were GMA

plus a work sample test (mean validity of .63). GMA plus an integrity test (mean validity of .65).

and GMA plus a structured interview (mean validity of .63). A further advantage of the latter 2

combinations is that they can be used for both entry level selection and selection of experienced

employees. The practical utility implications of these summary findings are substantial. The implica-

tions of these research findings for the development of theories of job performance are discussed.

From the point of viewof practical value. the most important

property of a personnelassessmentmethod is predictivevalidity:

the abilityto predictfuturejob performance,job-related learning

(such as amount of learning in training and development pro-

grams), and other criteria. The predictive validity coefficient is

directly proportional to the practical economic value (utility)

of the assessment method (Brogden. 1949; Schmidt, Hunter.

McKenzie. & Muldrow. 1979). Use of hiring methods with

increased predictive validity leads to substantial increases in

employee performance as measured in percentage increases in

output. increasedmonetaryvalueof output. andincreasedlearn-

ing of job-related skills (Hunter. Schmidt. & Judiesch, 1990).

Today.the validity of different personnel measures can be

determined with the aid of 85 years of research. The most well-

known conclusion from this research is that for hiring employ-

ees without previous experience in the job the most valid pre-

dictor of future performanceand learningis generalmental abil- .

ity ([GMA], Le.. intelligence or general cognitive ability;

Hunter & Hunter, 1984; Ree & Earles, 1992). GMA can be

measured using commercially available tests. However.many

other measures can also contribute to the overall validity of

the selection process. These include. for example. measures of

Frank L. Schmidt. Department of Management and Organization. Uni-

versity of Iowa; John E. Hunter. Department of Psychology. Michigan

State University.

An earlier version of this article was presented to Korean Human

Resource Managers in Seoul. South Korea, June 1I. 1996. The presenta-

tion was sponsored by Tong Yang Company. We would like to thank

President Wang-Ha Cho of Tong Yang for his support and efforts in this

connection. We would also like to thank Deniz Ones and Kuh Yoon for

their assistance in preparing Tables 1 and 2 and Gershon Ben-Shakhar

for his comments on research on graphology.

Correspondence concerning this article should be addressed to Frank

L. Schmidt. Department of Management and Organization. College of

Business. University of Iowa. Iowa City. Iowa 52240. Electronic mail

may be sent to frank-schmidt@uiowa.edu.

conscientiousness and personal integrity. structured employment

interviews, and (for experienced workers) job knowledge and

work sample tests.

On the basis of meta-analytic findings, this article examines

and summarizes what 85 years of research in personnel psychol-

ogy has revealed about the validity of measures of 19 different

selection methods that can be used in making decisions about

hiring, training, and developmental assignments. In this sense,

this article is an expansion and updating of Hunter and Hunter

( 1984). In addition, this article examines how well certain com-

binations of these methods work. These 19 procedures do not

all work equally well; the research evidence indicates that some

work very well and some work very poorly. Measures of GMA

work very well. for example, and graphology does not work at

all. The cumulative findings show that the research knowledge

, now available makes it possible for employers today to substan-

tially increase the productivity, output. and learning ability of

their workforces by using procedures that work well and by

avoiding those that do not. Finally. we look at the implications

of these research findings for the development of theories of job

performance.

Determinants of Practical Value (Utility)

of Selection Methods

The validityof a hiring method is a direct determinantof its

practical value, but not the only determinant. Another direct

determinant is the variability of job performance. At one ex-

treme, if variability were zero, then all applicants would have

exactly the same level o'.later job performanceif hired. In this

case. the practical value or utility of all selection procedures

would be zero. In such a hypotheticalcase. it does not matter

who is hired. because all workers are the same. At the other

extreme. ifperformancevariabilityis very large.it thenbecomes

importanttohirethe best performingapplicantsand the practical

utility of valid selection methods is very large. As it happens.

this "extreme" case appears to be the reality for most jobs.

262

VALIDITY AND CTILITY

Research over the last 15 years has shown that the variability

of performance and output among (incumbent) workers is very

large and that it would be even larger if all job applicants were

hired or if job applicants were selected randomly from among

those that apply (cf. Hunter et al.. 1990; Schmidt & Hunter,

1983; Schmidt et al., 1979). This latter variability is called the

applicant pool variability, and in hiring this is the variability

that operates to determine practical value. This is because one

is selecting new employees from the applicant pool, not from

among those already on the job in question.

The variability of employee job performance can be measured

in a number of ways, but two scales have typically been used:

dollar value of output and output as a percentage of mean output.

The standard deviation across individuals of the dollar value of

output (called SDy) has been found to be at minimum 40% of

the mean salary of the job (Schmidt & Hunte!; 1983; Schmidt

et al., 1979; Schmidt. Mack, & Hunte!; 1984). The 40% figure

is a lower bound value; actual values are typically considerably

higher. Thus, if the average salary for a job is $40,000, then

SDy is at least $16,000. If performance has a normal distribution,

then workers at the 84th percentile produce $16,000 more per

year than average workers (i.e., those at the 50th percentile).

And the difference between workers at the 16th percentile ( "be-

low average" workers) and those at the 84th percentile ("supe-

rior" workers) is twice that: $32,000 per year. Such differences

are large enough to be important to the economic health of an

organization.

Employee output can also be measured as a percentage of

mean output; that is, each employee's output is divided by the

output of workers at the 50th percentile and then multiplied by

100. Research shows that the standard deviation of output as a

percentage of average output (called SDp) varies by job level.

For unskilled and semi-skilled jobs, the average SDp figure is

19%. For skilled work, it is 32%, and for managerial and profes-

sional jobs, it is 48% (Hunter et aI., 1990). These figures are

averages based on all available studies that measured or counted

the amount of output for different employees. If a superior

worker is defined as one whose performance (output) is at the

84th percentile (that is, 1 SD above the mean), then a superior

worker in a lower level job produces 19% more output than an

average worker. a superior skilled worker produces 32% more

output than the average skilled worker, and a superior manager

or professional produces output 48% above the average for those

jobs. These differences are large and they indicate that the payotf

from using valid hiring methods to predict later job performance

is quite large.

Another determinant of the practical value of selection meth-

ods is the selection ratio-the proportion of applicants who are

hired. At one extreme, if an organization must hire all who

apply for the job, no hiring procedure has any practical value.

At the other extreme, if the organization has the luxury of hiring

only the top scoring I%, the practical value of gains from selec-

tion per person hired will be extremely large. But few organiza-

tions can afford to reject 99% of all job applicants. Actual

selection ratios are typically in the .30 to .70 range, a range that

still produces substantial practical utility.

The actual formula for computing practical gains per person

hired per year on the job is a three way product (Brogden. 1949;

Schmidt et al.. 1979):

263

.;~.u/hire/year =f::.r"SD"i,

(when performance is measured in dollar value) (I)

llU/hire/year =llr'CVSD~ix

(when performance is measured in percentage of average output).

(2)

In these equations. f::.r(Vis the difference between the validity

of the new (more valid) selection method and the old selection

method. If the old selection method has no validity (that is,

selection is random), then llr '0' is the same as the validity of

the new procedure; that is, llrry =rry. Hence, relative to random

selection, practical value (utility) is directly proportional to

validity. If the old procedure has some validity, then the utility

gain is directly proportional to llr cry'Z, is the average score on

the employment procedure of those hired (in z-score form), as

compared to the general applicant pool. The smaller the selection

nitio, the higher this value will be. The first equation expresses

.selection utility in dollars. For example, a typical final figure

for a medium complexity job might be $18,000, meaning that

increasing the validity of the hiring methods leads to an average

increase in output per hire of $18,000 per year. To get the full

value, one must of course multiply by the number of work-

ers hired. If 100 are hired, then the increase would be

(100)($18,000) = $1,800,000. Finally, one must consider the

number of years these workers remain on the job. because the

$18,000 per worker is realized each year that worker remains

on the job. Of all these factors that affect the practical value.

only validity is a characteristic of the personnel measure itself.

The second equation expresses the practical value in percent-

age of increase in output. For example, a typical figure is 9%,

meaning that workers hired with the improved selection method

will have on average 9% higher output. A 9% increase in labor

productivity would typically be very important economically

for the firm, and might make the difference between success

and bankruptcy.

What we have presented here is not. of course, a comprehen-

sivediscussion of selection utility. Readers who would like more

detail are referred to the research articles cited above and to

Boudreau (1983a, 1983b, 1984), Cascio and Silbey (1979).

Cronshaw and Alexander ( 1985), Hunter, Schmidt, and Coggin

( 1988), Hunter and Schmidt (1982a, 1982b), Schmidt and

Hunter (1983), Schmidt, Hunter, Outerbridge, and Trattner

( 1986), Schmidt, Hunter, and Pearlman ( 1982), and Schmidt et

al. ( 1984). Our purpose here is to make three important points:

(a) the economic value of gains from improved hiring methods

are typically quite large, ( b )these gains are directly proportional

to the size of the increase in validity when moving from the old

to the new selection methods, and (c) no other characteristic of

a personnel measure is as important as predictive validity. If

one looks at the two equations above. one sees that practical

value per person hired is a three way product. One of the three

elements in that three way product is predictive validity. The

other two- SD" or SD~ and Z(-are equally important, but they

are characteristics of the job or the situation, not of the personnel

measure.

264 SCHMIDT AND HUNTER

Validity of Personnel Assessment Methods:

85 Years of Research Findings

Researchstudiesassessing theability of personnelassessment

methodsto predict future job performanceand future learning

(e.g., in training programs) have been conductedsince the first

decade of the 20th century. However,as early as the 1920s it

became apparent that different studies conducted on the same

assessmentprocedure did not appear to agree in their results.

Validityestimates for the same method and samejob were quite

different for different studies. During the 1930s and 1940s the

belief developed that this state of affairs resulted from subtle

differences between jobs that were difficult or impossible for

job analysts and job analysis methodology to detect. That is.

researchers concluded that the validity of a given procedure

really was different in different settings for what appeared to

be basically the samejob, and that the conflictingfindings in

validity studies were just reflecting this fact of reality. This

belief, calledthetheoryof situationalspecificity.remaineddom-

inant in personnel psychologyuntil the late 1970swhen it was

discovered that most of the differences across studies were due

to statistical and measurementartifacts and not to real differ-

ences in the jobs (Schmidt & Hunter, 1977; Schmidt, Hunter,

Pearlman, & Shane, 1979). The largest of these artifacts was

simple sampling error variation, caused by the use of small

samples in the studies. (The number of employees per study

was usually in the 40-70 range.) This realization led to the

developmentof quantitativetechniques collectivelycalled meta-

analysis that could combine validity estimates across studies

and correct for the effects of these statistical and measurement

artifacts (Hunter &Schmidt, 1990;Hunter,Schmidt,& Jackson.

1982). Studies based on meta-analysis provided more accurate

estimates of the average operational validity and showed that

the levelof real variability of validitieswas usually quitesmall

and mightin fact be zero (Schmidt, 1992; Schmidtet al., 1993).

In fact, the findingsindicatedthat the variabilityof validity was

not only smail or zero across settings for the same type of job,

but was also smail across differentkinds of jobs (Hunter, 1980;

Schmidt, Hunter. & Pearlman, 1980). These findings made it

possibleto select themostvalidpersonnelmeasures for anyjob.

They also made it possible to compare the validity of different

personnel measures forjobs in general, as we do in this article.

Table I summarizes research findingsfor the prediction of

performanceon the job. The firstcolumn of numbers in Table

I shows the estimated mean validity of 19 selection methods

for predicting performance on the job, as revealed by meta-

analyses conducted over the last 20 years. Performanceon the

job was typically measured using supervisory ratings of job

performance. but production records. sales records, and other

measures were also used. The sources and other information

about these validityfigures are givenin the notes to Table I.

Many of the selection methods in Table I also predict job-

related learning; that is. the acquisition of job knowledge with

experience on the job, and the amount learned in training and

developmentprograms.However.the overallamount of research

on the predictionof learningis less. Formanyof theprocedures

in Table I. there is little research evidence on their ability to

predict futurejob-related learning.Table2 summarizesavailable

research findings for the prediction of performance in training

programs. The first column in Table 2 shows the mean validity

of 10 selection methods as revealed by available meta-analyses.

In the vast majority of the studies includedin these meta-analy-

ses. performance in training was assessed using objective mea-

sures of amount learned on the job: trainer ratings of amount

learned were used in about 5% of the studies.

Unlessotherwise notedinTables I and2, allvalidityestimates

in Tables I and 2 are corrected for the downward bias due to

measurement errorin the measures of job performance and to

range restriction on the selection method in incumbentsamples

relative toapplicant populations.Observedvalidityestimatesso

corrected estimate operational validities of selection methods

when used to hire from applicant pools. Operational validities

are also referred to as true validities.

In the pantheon of 19 personnel measures in Table 1. GMA

(also called general cognitive ability and general intelligence)

occupies a special place, for several reasons. First, of all proce-

dures that can be used for all jobs, whether entry level or ad-

vanced, it has the highest validity and lowest application cost.

Work sample measures are slightly more valid but are much

more costly and can be used only with applicants who already

know the job or have been trained for the occupation or job.

Structured employmentinterviews are more costly and. in some

forms, contain job knowledge componentsand therefore are not

suitable for inexperienced, entry level applicants. The assess-

ment center and job tryout are both much more expensive and

have less validity.Second, the research evidencefor the validity

of GMA measures for predicting job performance is stronger

than that for any othermethod (Hunter, 1986:Hunter& Schmidt,

1996;Ree & Earles, 1992; Schmidt& Hunter, 1981). Literally

thousands of studies have been conducted over the last nine

decades. By contrast, only 89 validity studies of the struc-

tured interview have been conducted (McDaniel, Whetzel,

Schmidt, & Mauer,1994). Third, GMA has been shown to be

the best available predictor ofjob-related learning. It is the best

predictor of acquisitionofjob knowledge on thejob (Schmidt&

<Hunter,1992; Schmidt, Hunter, & Outerbridge, 19861and of

.performanceinjob trainingprograms(Hunter,1986;Hunter&

Hunter,1984;Ree& Earles, 1992). Fourth,the theoreticalfoun-

dation for GMA is stronger than for any other personnel mea-

sure. Theories of intelligence have been developedand tested

by psychologistsfor over 90 years(Brody, 1992;Carroll, 1993;

Jensen, 1998). As a resultof this massiverelated research litera-

ture, the meaningof the construct of intelligenceis muchclearer

than, for example, the meaning of what is measured by inter-

viewsor assessmentcenters(Brody, 1992;Hunter,1986:Jensen,

1998).

The value of .5I in Table I for the validityof GMA is from

a very large meta-analytic study conducted for the U.S. Depart-

ment of Labor (Hunter, 1980; Hunter & Hunter, 1984). The

database for this unique meta-analysis included over 32,000

employees in 5 IS widelydiverse civilianjobs. This meta-analy-

sis examined both performance on the job and performance in

job trainingprograms. This meta-analysisfound that the validity

of GMA for predicting job performance was .58 for profes-

sional-managerialjobs, .56 for high level complex technical

jobs. .51 for medium complexityjobs, .40for semi-skilledjobs,

and .23 for completely unskilledjobs. The validity for the mid-

dle complexity level of jobs (.51) -which includes 62%of all

VALIDITY A0ID UTILITY 265

Note. T & E =training and experience. The percentage of increase in validity is also the percentage of increase in utility (practical value). All of the validities presented

are based on the most current meta-analytic results for the various predictors. See Schmidt. Ones. and Hunter (1992) for an overview. All of the validities in this table are

for the criterion of overall job performance. Unless otherwise noted. all validity estimates are corrected for the downward bias due to measurement error in the measure

of job performance and range restriction on the predictor in incumbent samples relative to applicant populations. The correlations between GMA and other predictors are

corrected for range restriction but not for measurement error in either measure (thus they are smaller than fully corrected mean values in the literarure). These correlations

represent observed score correlations between selection methods in applicant populations.

.From Hunter (1980). The value used for the validity of GMA is the average validity of GMA for medium complexity jobs (covering more than 60% of all jobs in the

United States).Validitiesare higher for more complexjobs and lower for less complexjobs. as described in the text. bFrom Hunterand Hunter (1984. Table 10).The

correction for range restriction was not possible in these data. The correlation between work sample scores and ability scores is .38 (Schmidt. Hunter. & Outerbridge.

1986). c,d FromOnes. Viswesvaran.and Schmidt (1993.Table 8). The figureof .41 is from predictivevaliditystudiesconductedon job applicants.The validityof .31

for conscientiousness measures is from Mount and Barrick (1995. Table 2). The correlation between integrity and ability is zero. as is the correlation between conscientiousness

and ability (Ones. 1993: Ones et al.. 1993). d From McDaniel. Whetzel. Schmidt. and Mauer (1994. Table 4). Values used are those from studies in which the job

performance ratings were for research purposes only (not administrative ratings). The correlations between interview scores and ability scores are from Huffcutt. Roth.

and McDaniel(1996. Table3). The correlationfor structUredinterviewsis .30 and for unstrUcruredinterviews..38. gFrom Hunterand Hunter(1984. Table II). The

correction for range restriction was not possible in these data. The correlation between job knowledge scores and GMA scores is .48 (Schmidt, Hunter. & Outerbridge.

1986). 'From Hunter and Hunter (1984. Table 9). No correction for range restriction (if any) could be made. (Range restriction is unlikely with this selection method.)

The correlation between job tryout ratings and ability scores is estimated at .38 (Schmidt. Hunter. & Outerbridge. 1986): that is. it was taken to be the same as that between

job sample tests and ability. Use of the mean correlation between supervisory performance ratings and ability scores yields a similar value (.35. uncorrected for measurement

error). 'From Hunter and Hunter (1984. Table 10). No correction for range restriction (if any) could be made. The average fully corrected correlation between ability

and peer ratings of job performance is approximately .55. If peer ratings are based on an average rating from 10 peers. the familiar Spearman-Brown formula indicates

that the interrater reliability of peer ratings is approximately .91 (Viswesvaran. Ones. & Schmidt. 1996). Assuming a reliability of .90 for the ability measure. the correlation

between ability scores and peer ratings is .55~.91(.90) =.50. JFrom McDaniel. Schmidt, and Hunter (1988a). These calculations are based on an estimate of the correlation

between T & E behavioral consistency and ability of .40. This estimate reftects the fact that the achievements measured by this procedure depend on nOt only personality

and other noncognitive characteristics. but also on mental ability. . From Hunter and Hunter (1984. Table 9). No correction for range restriction (if any) was possible. In

the absence of any data. the correlation between reference checks and ability was taken as .00. Assuming a larger correlation would lead to lower estimated incremental

validity. IFromHunter(1980).McDaniel.Schmidt.andHunter(1988b).andHunterandHunter(1984).Intheonlyrelevantmeta-analysis.Schmidt.Hunter.andOuterbridge

(1986.Table 5) found the correlationbetweenjob experienceand ability to be .00. This valuewas used here. mThe correlation betweenbiodatascores and abilityscores

is .50 (Schmidt, 1988). Both the validity of .35 used here and the intercorrelation of .50 are based on the Supervisory Profile Record Biodata Scale (Rothstein. Schmidt.

Erwin. Owens. and Sparks. 1990). (The validity for the Managerial Profile Record Biodata Scale in predicting managerial promotion and advancement IS higher [.52;

Carlson. Scullen. Schmidt. Rothstein. & Erwin. 1998]. However. rate of promotion is a measure different from overall performance on one's current job and managers are

less representative of the general working population than are first line supervisors). 'From Gaugler. Rosenthal. Thornton. and Benson (1987, Table 8). The correlation

between assessment center ratings and ability is estimated at .50 (Collins. 1998). It should be noted that most assessment centers use ability tests as pan of the evaluation

process: Gaugleret aI. (1987) found that 74% of the 106assessment centersthey examinedused a written test of intelligence(see their Table 4). 0FromMcDaniel.

SchmidL and Hunter (1988a. Table 3). The calculations here are based on a zero correlation between the T & E point method and ability; the assumption of a positive

correlation would at most lower the estimate of incremental validity from .0 I to .00. 'From Hunter and Hunter (1984. Table 9). For purposes of these calculations. we

assumed a zero correlation between years of education and ability. The reader should remember that this is the correlation within the applicant pool of individuals who

apply to get a panicular job. In the general population. the correlation between education and ability is about .55. Even within applicant pools there is probably at least

a small positive correlation: thus. our figure of .0 I probably overestimates the incremental validity of years of education over general mental ability. Assuming even a

small positive value for the correlation between education and ability would drive the validity increment of .01 toward .00. 'From Hunter and Hunter (1984. Table 9).

The general finding is that interests and ability are uncorrelated (Holland. 1986). and that was assumed to be the case here. 'From Neter and Ben-Shakhar (1989). Ben-

Shakhar (1989). Ben-Shakhar. Bar-Hillel. Bilu. Ben-Abba. and Rug (1986). and Bar-Hillel and Ben-Shakhar (1986). Graphology scores were assumed to be uncorrelated

with mental ability. .From Hunter and Hunter (1984. Table 9). Age was assumed to be unrelated to ability within applicant pools.

Table I

Predictive ValiditY for Overall Job Perfonnance of General Mental Abilitv (GMA) Scores

Combined With a Second Predictor UsinR (Standardized) Multiple ReRression

Standardized regressIOn

Gain in validity weIghts

from adding % increase

Personnel measures Validity (r) Multiple Rsupplement in validity GMA Supplement

GMA tests' .51

Work sample testsb .54 .63 .12 24% .36 Al

Integrity tests' .41 .65 .14 27% .51 .41

ConscientiousnesstestsJ .31 .60 .09 18% .51 .31

Employmentinterviews(structured)" .51 .63 .12 24% .39 .39

Employmentinterviews(unstructured)' .38 .55 .04 8% .43 .22

Job knowledgetests" .48 .58 .07 14% .36 .31

Job tryout procedureh .44 .58 .07 14% .40 .20

Peer ratings' .49 .58 .07 14% .35 .31

T & E behavioralconsistencymethod! .45 .58 .07 14% .39 .31

Referencechecksk .26 .57 .06 12% .51 .26

Job experience(years)1 .18 .54 .03 6% .51 .18

Biographicaldata measuresm .35 .52 .01 2% .45 .13

Assessmentcenters" .37 .53 .02 4% .43 .15

T & E point method° .11 .52 .01 2% .39 .29

Yearsof educationP .10 .52 .01 2% .51 .10

Interests .10 .52 .01 2% .51 .10

Graphology' .02 .51 '.00 0% .51 .02

Age' -.01 .5'1 .00 0% .51 -.01

266 SCHMIDT AND HUNTER

Table 2

Predictive Validityfor Overall Perfomwnce in Job Training Programs of General Mental Ability (GMA) Scores

Combined With a Second Predictor Using (Standardized) Multiple Regression

Note. The percentageof increase in validityis also the percentageof increasein utility (practicalvalue).All of the validitiespresentedare based

on the most current meta-analyticresults reported for the various predictors.All of the validities in this table are for the criterion of overall

performancein job training programs. Unless otherwisenoted. all validity estimates are corrected for the downward bias due to measurementerror

in the measure of job performanceand range restriction on the predictor in incumbent samples relativeto applicant populations.All correlations

between GMA and other predictors are corrected for range restriction but not for measurement error.These correlations represent observed score

correlations betweenselection methods in applicant populations. ..

.The validity of GMA is from Hunter and Hunter (1984. Table 2). It can also be found in Hunter (1980). b.cThe validity of .38 for integrity tests

is from Schmidt. Ones. and Viswesvaran (1994). Integrity tests and conscientiousness tests have been found to correlate zero with GMA (Ones.

1993; Ones. Viswesvaran & Schmidt. 1993). The validity of .30 for conscientiousness measures is from the meta-analysis presented by Mount and

Barrick(1995.Table2). dThe validityof interviewsis fromMcDaniel.Whetzel.Schmidt.and Mauer(1994.Table5). McDanielet al. reported

values of .34 and .36 for structured and unstructured interviews. respectively. However. this small difference of .02 appears to be a result of second

order sampling error (Hunter & Schmidt. 1990. Ch. 9). We therefore used the average value of .35 as the validity estimate for structured and

unstructured interviews. The correlation between interviews and ability scores (.32) is the overall figure from Huffcutt. Roth. and McDaniel 0996.

Table 3) across all levels of interview structure. 'The validity for peer rarings is from Hunter and Hunter (1984. Table 8). These calculations are

based on an esrimate of the correlation between ability and peer ratings of .50. (See note i to Table I). No correction for range restriction (if any)

was possible in the data. fThe validity of reference checks is from Hunter and Hunter (1984. Table 8). The correlation between reference checks

and ability was taken as .00. Assumption of a larger correlation will reduce the estimate of incremental validity. No correction for range restriction

waspossible. gThevalidityof job experienceis from HunterandHunter(1984.Table6). Thesecalculationsare basedon an estimateof the

correlation betweenjob experience and ability of zero. (See note I to Table I). bThe validity of biographical data measures is from Hunterand

Hunter (1984. Table 8). This validity estimate is not adjusted for range restriction (if any). The correlation between biographical data measures and

ability is estimated at .50 (Schmidt. 1988). IThe validity of education is from Hunter and Hunter (1984. Table 6). The correlation between education

and ability within applicant pools was taken as zero. (See note p to Table 1). j The validity of interests is from Hunter and Hunter (1984. Table

8). The correlation between interests and ability was taken as zero (Holland, 1986).

thejobs in the U.S.economy-is the value entered in Table 1.

This category includes skilled blue collar jobs and mid-level

white collarjobs. such as upper level clerical and lower level

administrativejobs. Hence. the conclusions in this article apply

mainlyto the middle 62% of jobs in the U.S. economy in terms

of complexity.The validity of .51 is representativeof findings

for GMA measures in other meta-analyses (e.g., Pearlman et

al.. 1980)and it is a value that produces high practical utility.

As noted above, GMA is also an excellent predictor of job-

related learning.It has been found to have high andessentially

equal predictive validity for performance (amount learned) in

job training programs for jobs at all job levels studied. In the

U.S.Departmentof Labor research,the averagepredictivevalid-

ity performance in job training programs was .56 (Hunter &

Hunter. 1984.Table 2); this is the figure entered in Table 2.

Thus. when an employer uses GMA to select employees who

will havea high levelof performanceon thejob. that employer

is also selecting those who will learn the most from job training

programsand willacquirejob knowledgefasterfrom experience

on the job. (As can be seen from Table 2. this is also true of

integrity tests, conscientiousness tests, and employment

interviews.)

Because of its special status, GMA can be considered the

primary personnel measure for hiring decisions, and one can

consider the remaining 18 personnel measures as supplements

to GMA measures. That is, in the case of each of the other

measures, one' can ask the following question: When used in a

properly weighted combination with a GMA measure. how

much will each of these measures increase predictive validity

for job performance over the .51 that can be obtained by using

only GMA? This "incremental validity" translates into incre-

mental utility, that is, into increases in practical value. Because

validity is directly proportional to utility. the percentage of in-

crease in validity produc~dby the adding the second measure

is also the percentage of increase in practical value (utility).

The increase in validity (and utility) depends not only on the

validity of the measure added to GMA, but also on the correla-

tion between the two measures.The smallerthis correlations is.

the larger is the increase in overall validity. The figures for

incrementalvalidityin Table I are affectedby thesecorrelations.

Standardized regression

Gain in validity weights

from adding '7c-increase

Personnel measures Validity (r) Multiple Rsupplement in validity GMA Supplement

GMA Tests' .56

Integrity testsb .38 .67 .11 20% .56 .38

Conscientiousness testsC .30 .65 .09 16% .56 .30

Employment interviews

(structured and unstructured)d .35 .59 .03 5% .59 .19

Peer ratings' .36 .57 .01 1.4% .51 .II

Reference checksf .23 .61 .05 9% .56 .23

Job experience lyears)g .01 .56 .00 0% .56 .01

Biographical data measuresb .30 .56 .00 0% .55 .03

Years of education' .20 .60 .04 7% .56 .20

Interests' .18 .59 .03 5% .56 .18

VALIDITY AND CTILITY

The correlations betweenmental ability measuresand the other

measures were estimated from the research literature (often

from meta-analyses): the sources of these estimates are given

in the notes to Tables 1 and 2. To appropriately represent the

observedscorecorrelationsbetweenpredictorsin applicantpop-

ulations, we corrected all correlations between GMA and other

predictors for range restriction but not for.measurement error

in the measure of either predictor.

Consider work sample tests. Worksample tests are hands-on

simulationsof part or all of thejob that must be performed by

applicants.Forexample, as part of a work sampletest, an appli-

cant might be required to repair a series of defective electric

motors.Worksampletestsare oftenusedto hireskilledworkers.

such as welders. machinists.and carpenters.When combined in

a standardized regressionequation with GMA. the work sample

receives a weight of .41 and GMA receives a weight of .36.

(The standardized regression weights are given in the last two

columns of Tables1 and 2.) The validity of this weighted sum

of the two measures (the multiple R) is .63, which represents

an increment of .12 over the validity of GMA alone. This is a

24% increase in validity over that of GMA alone-and also a

24% increase in the practical value (utility) of the selection

procedure. As we saw earlier. this can be expressed as a 24%

increase in the gain in dollar value of output.Alternatively..it "

can be expressed as a 24% increasein the percentageof increase

in output produced by using GMA alone. In either case. it is a

substantial improvement.

Work sample tests can be used only with applicants who

already knowthe job. Such workersdo not need to be trained,

and so the ability of work sampleteststo predict training perfor-

mance has not been studied. Hence. there is no entry for work

sample tests in Table2. .

Integrity tests are used in industry to hire employees with

reduced probability of counterproductivejob behaviors. such as

drinking or drUgson thejob. fighting on thejob. stealing from

the employer.sabotaging equipment.and other undesirable be-

haviors. They do predict these behaviors, but they also predict

evaluationsof overalljob performance(Ones. Viswesvaran. &

Schmidt. 1993). Even though their validity is lower. integrity

tests produce a larger increment in validity (.14) and a larger

percentage of increase in validity (and utility) than do work

samples.This isbecauseintegritytestscorrelatezero withGMA

(vs. .38 for work samples). In terms of basic personalitytraits.

integritytestshavebeenfoundto measuremostlyconscientious-

ness. but also some componentsof agreeableness and emotional

stability (Ones. 1993). The figures for conscientiousness mea-

sures per se are given in Table 1.The validityof conscientious-

ness measures lMount & Barrick. 1995) is lowerthan that for

integrity tests (.31 vs. .41). its increment to validity is smaller

(.09). and its percentage of increase in validity is smaller

(18%). However,these values for conscientiousness are still

large enough to be practically useful.

A meta-analysis based on 8 studies and 2,364 individuals

estimated the mean validity of integrity tests for predicting per-

formance in training programs at .38 (Schmidt. Ones, & Vis-

wesvaran. 1994). As can be seen in Table 2. the incremental

validity for integrity tests for predicting training performance

is .11. which yields a 20% increase in validity and utility over

that produced by GMA alone. In the prediction of training per-

267

formance. integrity tests appear to produce higher incremental

validity than any other measure studied to date. However. the

increment in validity produced by measures of conscientious-

ness (.09. for a 16% increase) is only slightly smaller. The

validity estimate for conscientiousness is based on 21 studies

and 4.106 individuals (Mount & Barrick. 1995). a somewhat

larger database.

Employment interviews can be either structured or unstruc-

tured (Huffcutt. Roth. & McDaniel. 1996; McDaniel et aI.,

1994). Unstructured interviews have no fixed format or set of

questions to be answered. In fact. the same interviewer often

asks different applicants different questions. Nor is there a fixed

procedure for scoring responses; in fact. responses to individual

questions are usually not scored. and only an overall evaluation

(or rating) is given to each applicant. based on summary impres-

sions and judgments. Structured interviews are exactly the oppo-

site on all counts. In addition. the questions to be asked are

usually determined by a careful analysis of the job in question.

As a result. structured interviews are more costly to construct

and use. but are also more valid. As shown in Table 1. the

average validity of the structured interview is .51. versus .38

tor the unstructured interview (and undoubtedly lower for care-

.lessly conducted unstructured interviews). An equally weighted

combination of the structured interview and a GMA measure

yields a validity of .63. As is the case for work sample tests.

the increment in validity is .12 and the percentage of increase is

24%. These figures are considerably smaller for the unstructured

interview (see Table 1). Clearly. the combination of a structured

interview and a GMA test is an attractive hiring procedure. It

achieves 63% of the maximum possible practical value (utility),

and does so at reasonable cost.

As shown in Table 2, both struc,tured and unstructured inter-

views predict performance in job training programs with a valid-

ity of about .35 (McDaniel et ai.. 1994; see their Table 5). The

incremental validity for the prediction of training performance

is .03. a 5% increase.

The next procedure in Table 1 is job knowledge tests. Like

work sample measures, job knowledge tests cannot be used to

evaluate and hire inexperienced workers. An applicant cannot

be expected to have mastered the job knowledge required to

perform a particular job unless he or she has previously per-

formed that job or has received schooling. education, or training

for that job. But applicants for jobs such as carpenter. welder.

accountant, and chemist can be administered job knowledge

tests. Job knowledge tests are often constructed by the hiring

organization on the basis of an analysis of the tasks that make

up the job. Constructing job knowledge tests in this manner is

generally somewhat more time consuming and expensive than

constructing typical structured interviews. However. such tests

can also be purchased commercially; for example, tests are

available that measure the job knowledge required of machinists

(knowledge of metal cutting tools and procedures). Other exam-

ples are tests of knowledge of basic organic chemistry and tests

of the knowledge required of roofers. In an extensive meta-

analysis, Dye, Reck and McDaniel ( 1993) found that commer-

cially purchased job knowledge tests ("off the shelf" tests)

had slightly lower validity than job knowledge tests tailored to

the job in question. The validity figure of .48 in Table I for job

knowledge tests is for tests tailored to the job in question.

268 SCHMIDT AND HUNTER

As shown in Table I. job knowledge tests increase the validity

by .07 over that of GMA measures alone, yielding a 14% in-

crease in validity and utility. Thus job knowledge tests can have

substantial practical value to the organization using them.

For the same reasons indicated earlier for job sample tests,

job knowledge tests typically have not been used to predict

performance in training programs. Hence, little validity informa-

tion is available for this criterion. and there is no entry in Table

2 for job knowledge tests.

The next three personnel measures in Table 1 increase validity

and utility by the same amount as job knowledge tests (i.e.,

14%). However, two of these methods are considerably less

practical to use in many situations. Consider the job tryout

procedure. Unlike job knowledge tests, the job tryout procedure

can be used with entry level employees with no previous experi-

ence on the job in question. With this procedure, applicants are

hired with minimal screening and their performance on the job

is observed and evaluated for a certain period of time (typically

6-8 months). Those who do not meet a previously established

standard of satisfactory performance by the end of this proba-

tionary period are then terminated. If used in this manner, this

procedure can have substantial validity (and incremental valid-

ity), as shown in Table 1. However, it is very expensive to

implement, and low job performance by minimally screened

probationary workers can lead to serious economic losses. In

addition, it has been our experience that supervisors are reluc-

tant to terminate marginal performers. Doing so is an unpleasant

experience for them, and to avoid this experience many supervi-

sors gradually reduce the sta\1dards of minimally acceptable

performance, thus destroying the effectiveness of the procedure.

Another consideration is that some of the benefits of this method

will be captured in the normal course of events even if the

job tryout procedure is not used, because clearly inadequate

performers will be terminated after a period of time anyway.

Peer ratings are evaluations of performance or potential made

by one's co-workers; they typically are averaged across peer

raters to increase the reliability (and hence validity) of the rat-

ings. Like the job tryout procedure, peer ratings have some

limitations. First. they cannot be used for evaluating and hiring

applicants from outside the organization; they can be used only

for internal job assignment, promotion, or training assignment.

They have been used extensively for these internal personnel

decisions in the military (particularly the U.S. and Israeli mili-

taries) and some private firms, such as insurance companies.

One concern associated with peer ratings is that they will be

influenced by friendship, or social popularity, or both. Another

is that pairs or clusters of peers might secretly agree in advance

to give each other high peer ratings. However, the research that

has been done does not support these fears; for example, par-

tialling friendship measures out of the peer ratings does not

appear to affect the validity of the ratings (cf. Hollander, 1956;

Waters & Waters, 1970).

The behavioral consistency method of evaluating previous

training and experience (McDaniel, Schmidt, & Hunter, 1988a;

Schmidt, Caplan, et aI., 1979) is based on the well-established

psychological principle that the best predictor of future perfor-

mance is past performance. In developing this method, the first

step is to determine what achievement and accomplishment di-

mensions best separate top job performers from low performers.

This is done on the basis of information obtained from experi-

enced supervisors of the job in question. using a special set of

procedures (Schmidt. Caplan, et al.. 1979). Applicants are then

asked to describe (in writing or sometimes orally) their past

achievements that best illustrate their ability to perform these

functions at a high level (e.g., organizing people and getting

work done through people). These achievements are then scored

with the aid of scales that are anchored at various points by

specific scaled achievements that serve as illustrative examples

or anchors.

Use of the behavioral consistency method is not limited to

applicants with previous experience on the job in question. Pre-

vious experience on jobs that are similar to the current job in

only very general ways typically provides adequate opportunity

for demonstration of achievements. In fact. the relevant achieve-

ments can sometimes be demonstrated through community,

school, and other nonjob activities. However, some young people

just leaving secondary school may not have had adequate oppor-

tunity to demonstrate their capacity for the relevant achieve-

ments and accomplishments; the procedure might work less well

in such groups.

.fn terms of time and cost, the behavioral consistency proce-

dure is nearly as time consuming and costly to construct as

locally constructed job knowledge tests. Considerable work is

required to construct the procedure and the scoring system;

applying the scoring procedure to applicant responses is also

more time consuming than scoring of most job knowledge tests

and other tests with clear right and wrong answers. However,

especially for higher level jobs, the behavioral consistency

method may be well worth the cost and effort.

No information is available on the validity of the job tryout

or the behavioral consistency procedures for predicting perfor-

mance in training programs. However, as indicated in Table 2,

peer ratings have been found to predict performance in training

programs with a mean validity of .36 (see Hunter & Hunter,

1984, Table 8).

.; For the next procedure, reference checks, the information

. presented in Table 1 may not at present be fully accurate. The

validity studies on which the validity of .26 in Table 1 is based

were conducted prior to the development of the current legal

climate in the United States. During the 1970s and 1980s, em-

ployers providing negative information about past job perfor-

mance or. behavior on the job to prospective new employers

were sometimes subjected to lawsuits by the former employees

in question. Today. in the United States at least, many previous

employers will provide only information on the dates of employ-

ment and the job titles the former employee held. That is, past

employers today typically refuse to release information on qual-

ity or quantity of job performance, disciplinary record of the

past employee, or whether the former employee quit voluntarily

or was dismissed. This is especially likely to be the case if the

information is requested m writing: occasionally, such informa-

tion will be revealed by telephone or in face to face conversation

but one cannot be certain that this will occur.

However, in recent years the legal climate in the United States

has been changing. Over the last decade. 19 of the 50 states

have enacted laws that provide immunity from legal liability

for employers providing job references in good faith to other

employers, and such laws are under consideration in 9 other

VALIDITY AND CTILITY

states (Baker. 1996). Hence,reference checks.formerlya heav-

ily relied on procedure in hiring, may again come to provide

an increment to the validity of a GMA measure for predicting

job performance. In Table I, the increment is 12%. only two

percentagepoints less than the incrementsfor the fivepreceding

methods.

Older research indicatesthat referencechecks predict perfor-

mance in trainingwith a mean validityof .23 (Hunter & Hunter,

1984,Table8), yielding a 9% increment in validityoverGMA

tests. as shown in Table 2. But, again. these findings may no

longer hold; however.changes in the legal climate may make

these validity estimates accurate again.

Job experience as indexed in Tables 1 and 2 refers to the

numberof years of previousexperience on the same or similar

job; it conveys no informationon past performance on thejob.

In the data used to derivethe validityestimates in these tables.

job experience varied widely:from less than 6 months to more

than 30 years. Under these circumstances. the validity of job

experiencefor predictingfuture job performance is only .18 and

the incrementin validity(and utility) overthat from GMAalone

is only .03 (a 6% increase). However.Schmidt, Hunter, and

Outerbridge( 1986) found that when experienceon the job does.

not exceed5 years. thecorrelation betweenamountofjob expe-:

rience andjob performance is considerablylarger: .33 whenjob

performance is measured by supervisory ratings and .47 when

job performanceis measured using a work sample test. These

researchers found that the relation is nonlinear: Up to about 5

yearsofjob experience.job performanceincreases linearlywith

increasing experienceon thejob. After that, the curve becomes

increasingly horizontal.and further increases in job experience

produce little increase in job performance. Apparently, during

the first 5 years on these (mid-level, medium complexity) jobs.

employeeswerecontinuallyacquiring additionaljob knowledge

and skills that improved their job performance. But by the end

of 5 years this process was nearly complete. and further in-

creases injob experience led to littleincreasein job knowledge

and skills (Schmidt & Hunter. 1992). These findings suggest

that even under ideal circumstances,job experience at the start

of ajob will predictjob performanceonlyfor the first5 years on

thejob. By contrast, GMA continues to predictjob performance

indefinitely(Hunter & Schmidt, 1996;Schmidt, Hunter.Outer-

bridge,& Goff. 1988;Schmidt,Hunter.Outerbridge.& Trattner,

1986).

As shown in Table2. the amount of job experience does not

predict performance in training programs teaching new skills.

Hunter and Hunter ( 1984,Table 6) reponed a mean validity of

.01. However.one can note from this finding thatjob experience

does not retard the acquisition of new job skills in training

programs as might have been predicted fromtheories of proac-

tive inhibition.

Biographical data measurescontain questions about past life

experiences. such as early life experiences in one's family. in

high school. and in hobbies and other pursuits. For example,

there may be questions on offices held in student organizations.

on sports one participated in. and on disciplinary practices of

one's parents. Each question has been chosen for inclusion in

themeasurebecauseinthe initialdevelopmentalsampleitcorre-

lated with a criterion of job performance. performancein train-

ing,or some othercriterion.That is. biographicaldata measures

269

are empiricallydeveloped. However.they are usuallynot com-

pletelyactuarial.because somehypothesesareinvokedinchoos-

ing the beginning set of items. However,choice of the final

questions to retain for the scale is mostly actuarial. Todayanti-

discrimination laws prevent certain questions from being used.

such as sex. marital status. and age, and such items are not

included.Biographicaldata measures havebeen used to predict

performance on a wide variety of jobs. ranging in level from

blue collar unskilled jobs to scientific and managerial jobs.

These measures are also used to predict job tenure (turnover)

and absenteeism, but we do not consider these usages in this

article.

Table 1 showsthat biographical data measureshavesubstan-

tial zero-ordervalidity (.35) for predictingjob performancebut

produce an increment in validity over GMA of only .01 on

average(a 2% increase). The reason that the incrementin valid-

ity is so small is that biographicaldata correlates substantially

with GMA (.50; Schmidt, 1988). This suggeststhat in addition

to whatever other traits they measure, biographical data mea-

sW'esare also in pan indirect reflections of mental ability.

-As shownin Table2, biographicaldata measurespredict

performance in training programs with a mean validity of .30

(Hunter & Hunter, 1984, Table 8). However, because of their

relativelyhigh correlation with GMA, they produce no incre-

ment in validity for performance in training.

Biographical data measuresare technicallydifficultand time

consuming to construct (although they are easy to use once

constructed). Considerable statistical sophisticationis required

to develop them. However,some commercial firms offer vali-

dated biographical data measures for particular jobs (e.g., first

line supervisors, managers, clerical workers, and law enforce-

mentpersonnel). These firmsmaintaincontrol oftheproprietary

scoring keys and the scoring of applicant responses.

Individuals who are administered assessment centers spend

one to several days at a central location wherethey are observed

participating in such exercises as leaderless group discussions

and business games. Various ability and personality tests are

usuallyadministered.and in-depthstructured interviewsare also

pan of most assessment centers.The averageassessmentcenter

includes seven exercises or assessments and lasts 2 days

(Gaugler. Rosenthal.Thornton. & Benson, 1987). Assessment

centers are used for jobs ranging from first line supervisors to

high level management positions.

Assessmentcenters are like biographicaldata measures:They

have substantial validity but only moderate incrementalvalidity

over GMA (.01. a 2% increase). The reason is also the same:

They correlate moderately highlywith GMA-in pan because

they typicallyinclude a measure of GMA (Gaugler et aI.. 1987).

Despite the fact of relatively low incremental validity, many

organizations use assessment centers for managerialjobs be-

cause they believeassessment centersprovide themwith a wide

range of insights about candidates and their developmental

possibilities.

Assessment centers have generally not been used to predict

performance in job training programs; hence. their validity for

this purpose is unknown. However. assessment center scores

do predict rate of promotion and advancementin management.

Gaugler et al. ( 1987,Table 8) reponed a mean validity of .36

for this criterion (the same value as for the prediction of job

270 SCHMIDT AND HUNTER

performance). Measurements of career advancement include

numberof promotions.increasesin salary over giventimespans.

absolute levelof salary attained. and managementrank attained.

Rapid advancementin organizationsrequires rapid learning of

job related knowledge. Hence. assessment center scores do

appear to predict the acquisition of job related knowledge on

thejob.

The point method of evaluatingprevious tr!liningand experi-

ence (T&E) is used mostly in governmenthiring-at all levels,

federal. state. and local. A major reason for its widespread use

is that point method procedures are relatively inexpensive to

construct and use. The point method appears under a wide vari-

ety of different names (McDaniel et aI., 1988a). but all such

procedures have several important characteristics in common.

All point method procedures are credentialistic; typically an

applicant receivesa fixed number of points for (a) each year or

month of experienceon the same or similarjob. (b) each year of

relevantschooling(or each course taken), and (c ) each relevant

training program completed. and so on. There is usually no

attempt to evaluatepast achievements,accomplishments,or job

performance;in effect, the procedure assumes that achievement

and performance are determined solely by the exposures that

are measured.As shown in Table I, the T&E point methodhas

low validity and produces only a 2% increase in validity over

that availablefrom GMA alone. The T&E point methodhas not

been used to predict performance in training programs.

Sheer amount of education has even lower validity for pre-

dictingjob performancethan theT&E pointmethod(.10). How-

ever,its incrementto validity,rounded to two decimal places,

is the same .01 as obtained with the T&E point method. It is

important to note that this findingdoes not imply that education

is irrelevant to occupational success; education is clearly an

important determinant of the level of job the individual can

obtain. What this finding shows is that among those who 'apply

to get a particularjob years of education does not predict future

performance on that job very well. For example, for a typical

semi-skilled blue collar job, years of education among appli-

cants might range from 9 to 12. The validity of .10 then means

that the average job performance of those with 12 years of

education will be only slightly higher (on average) than that

for those with 9 or 10 years.

As can be seen in Table 2, amount of education predicts

learning in job training programs better than it predicts perfor-

mance on the job. Hunter and Hunter (1984. Table 6) found a

mean validityof .20 for performance in training programs. This

is not a high level of validity, but it is twice as large as the

validityfor predictingjob performance.

Many believe that interests are an important determinant of

one's level of job performance. People whose interests match

the content of theirjobs (e.g., people with mechanical interests

who have mechanical jobs) are believed to have higher job

performance than with nonmatching interests. The validity of

.10 for interests shows that this is true only to a very limited

extent. To many people, this seems counterintuitive. Why do

interestspredictjob performanceso poorly? Researchindicates

that interestsdo substantiallyinfluencewhichjobs people prefer

and whichjobs they attempt to enter. However.pnce individuals

are in a job. the quality and level of their job performance is

determinedmostlyby theirmentalability and by certain person-

ality traits such as conscientiousness, not by their interests. So

despite popular belief. measurement of work interests is not a

good means of predicting who will show the best future job

performance (Holland. 1986).

Interests predict learning in job training programs somewhat

better than they predict job performance. As shown in Table 2.

Hunter and Hunter (1984, Table 8) found a mean validity of

.18 for predicting performance in job training programs.

Graphology is the analysis of handwriting. Graphologists

claim that people express their personalities through their hand-

writing and that one's handwriting therefore reveals personality

traits and tendencies that graphologists can use to predict future

job performance. Graphology is used infrequently in the United

States and Canada but is widely used in hiring in France

(Steiner, 1997; Steiner & Gilliland, 1996) and in Israel. Levy

( 1979) reported that 85% of French firms routinely use graphol-

ogy in hiring of personnel. Ben-Shakhar. Bar-Hillel, Bilu. Ben-

Abba, and Flug ( 1986) stated that in Israel graphology is used

more widely than any other single personality measure.

Several studies have examined the ability of graphologists and

nongraphologists to predict job performance from handwriting

samples (Jansen, 1973; Rafaeli & Klimoski, 1983; see also Ben-

Shakhar, 1989; Ben-Shakhar, Bar-Hillel. Bilu, et aI., 1986; Ben-

Shakhar, Bar-Hillel, & Flug, 1986). The key findings in this

area are as follows. When the assessees who provide handwrit-

ing samples are allowed to write on any subject they choose.

both graphologists and untrained nongraphologists can infer

some (limited) information about their personalities and job

performance from the handwriting samples. But untrained non-

graphologists do just as well as graphologists; both show validi-

ties in the .18- .20 range. When the assessees are required to

copy the same material from a book to create their handwriting

sample, there is no evidence that graphologists or nongrapholo-

gists can infer any valid information about personality traits or

job performance from the handwriting samples (Neter & Ben-

Shakhar, 1989). What this indicates is that, contrary to graphol-

;ogy theory, whatever limited information about personality or

. job performance there is in the handwriting samples comes from

the content and not the characteristics of the handwriting. For

example, writers differ in style of writing, expressions of emo-

tions, verbal fluency, grammatical skills, and so on. Whatever

information about personality and ability these differences con-

tain~the training of graphologists does not allow them to extract

it better than can people untrained in graphology. In handwriting

per se, independent of content, there appears to be no informa-

tion about personality or job performance ( Neter & Ben-

Shakhar, 1989).

To many people, this is another counterintuitive finding, like

the finding that interests are a poor predictor of job performance.

To these people, it seems obvious that the wide and dramatic

variations in handwriting that everyone observes must reveal

personality differences aI}longindividuals. Actually. most of the

variation in handwriting is due to differences among individuals

in fine motor coordination of the finger muscles. And these

differences in finger muscles and their coordination are probably

due mostly to random genetic variations among individuals. The

genetic variations that cause these finger coordination differ-

ences do not appear to be linked to personality; and in fact there

is no apparent reason to believe they should be.

VALIDITY AND UTILITY

The validity of graphology for predicting performance in

training programs has not been studied. However. the findings

with respect to performance on the job make it highly unlikely

that graphology has validity for training performance.

Table I shows that age of job applicants shows no validity

for predicting job performance. Age is rarely used as a basis

for hiring. and in fact in the United States. use of age for individ-

uals over age 40 would be a violation .of the federal law against

age discrimination. We include age here for only two reasons.

First. some individuals believe age is related to job performance.

We show here that for typical jobs this is not the case. Second.

age serves to anchor the bottom end of the validity dimension:

Age is about as totally unrelated to job performance as any

measure can be. No meta-analyses relating age to performance

in job training programs were found. Although it is possible

that future research will find that age is negatively related to

performance in job training programs (as is widely believed),

we note again that job experience. which is positively correlated

with age, is not correlated with performance in training pro-

grams (see Table 2).

Finally, we address an issue raised by a reviewer. As discussed

in more detail in the next section. some of the personnel mea- .

sures we have examined (e.g., GMA and conscientiousness

measures) are measures of single psychological constructs,

whereas others (e.g., biodata and assessment centers) are meth-

ods rather than constructs. It is conceivable that a method such

as the assessment center. for example, could measure different

constructs or combinations of constructs in different applica-

tions in different firms. The reviewer therefore questioned

whether it was meaningful to compare the incremental validities

of different methods (e.g., comparing the incremental validities

produced by the structUred interview and the assessment center) .

There are two responses to this. First, this article is concerned

with personnel measures as used in the real world of employ-

ment. Hence. from that point of view. such comparisons of

incremental validities would be meaningful, even if they repre-

sented only crude average differences in incremental validities.

However. the sitUation is not that grim. The empirical evi-

dence indicates that such methods as interviews, assessment

centers, and biodata measures do not vary l1Ulchfrom applica-

tion to application in the constructs they measure. This can be

seen from the fact that meta-analysis results show that the stan-

dard deviations of validity across studies (applications), after

the appropriate corrections for sampling error and other statisti-

cal and measurement artifacts, are quite small (ef. Gaugler et

al., 1987; McDaniel et aI., 1994; Schmidt & Rothstein, 1994).

In fact, these standard deviations are often even smaller than

those for construct-based measures such as GMA and conscien-

tiousness (Schmidt & Rothstein. 1994).

Hence, the sitUation appears to be this: We do not know

exactly what combination of constructs is measured by methods

such as the assessment center. the interview. and biodata (see

the next section). but whatever those combinations are. they do

not appear to vary much from one application (stUdy) to another.

Hence. comparisons of their relative incremental validities over

GMA is in fact meaningful. These incremental validities can be

expected to be stable across different applications of the meth-

ods in different organizations and settings.

271

Toward a Theory of the Determinants

of Job Performance

The previous section summarized what is known from cumu-

lative empirical research about the validity of various personnel

measures for predicting futUre job performance and job-related

learning of job applicants. These findings are based on thousands

of research studies performed over eight decades and involving

millions of employees. They are a tribute to the power of empiri-

cal research, integrated using meta-analysis methods. to produce

precise estimates of relationships of interest and practical value.

However, the goals of personnel psychology include more than

a delineation of relationships that are practically useful in select-

ing employees. In recent years. the focus in personnel psychol-

ogy has turned to the development of theories of the causes of

job perfonnance (Schmidt & Hunter, 1992). The objective is

the understanding of the psychological processes underlying and

determining job perfonnance. This change of emphasis is possi-

ble because application of meta-analysis to research findings

has provided the kind of precise and generalizable estimates of

the validity of different measured constructs for predicting job

,performance that are summarized in this article. It has also

provided more precise estimates than previously available of

the correlations among these predictors.

However, the theories of job performance that have been de-

veloped and tested do not include a role for all of the personnel

measures discussed above. That is because the actual constructs

measured by some of these procedures are unknown. and it

seems certain that some of these procedures measure combina-

tions of constructs (Hunter & Hunter. 1984; Schmidt &

Rothstein, 1994). fur example, employment interviews probably

measure a combination of previous experience. mental ability,

and a number of personality traits, such as conscientiousness:

in addition, they may measure specific job-related skills and

behavior patterns. The average correlation between interview

scores and scores on GMA tests is .32 (Huffcutt et al.. 1996).

This indicates that, to some extent. interview scores reflect men-

tal ability. Little empirical evidence is available as to what other

traits they measure (Huffcutt et al., 1996). What has been said

here of employment interviews also applies to peer ratings, the

behavioral consistency method. reference checks, biographical

data measures, assessment centers, and the point method of

evaluating past training and experience. Procedures such as these

can be used as practical selection tools but. because their con-

struct composition is unknown, they are less useful in con-

structing theories of the determinants of job performance. The

measures that have been used in theories of job performance

have been GMA, job knowledge, job experience, and personality

traits. This is because it is fairly clear what constructs each of

these procedures measures.

What has this research revealed about the determinants of

job performance? A detailed review of this research can be

found in Schmidt and Hunter ( 1992); here we summarize only

the most important findings. One major finding concerns the

reason why GMA is such a good predictor of job performance.

The major direct causal impact of mental ability has been found

to be on the acquisition of job knowledge. That is. the major

reason more intelligent people have higher job performance is

that they acquire job knowledge more rapidly and acquire more

272 SCHMIDT AND HUNTER

of it: and it is this knowledgeof how to perform the job that

causes theirjob performanceto be higher (Hunter. 1986).Thus.

mental abilityhas its most important effect on job performance

indirectly.through job knowledge.There is also a direct effect

of mentalabilityonjob performanceindependentofjob knowl-

edge. but it is smaller.For nonsupervisoryjobs. this direct effect

is only about 20% as large.as the indirecteffect: for supervisory

jobs. it is about 50% as large (Borman, White. Pulakos. &

Oppler. 1991; Schmidt. Hunter.& Outerbridgt, 1986).

It has also been found that job experience operates in this

samemanner.Job experienceis essentiallya measure ofpractice

on the job and hence a measure of opportunity to learn. The

majordirectcausaleffect ofjob experienceis onjob knowledge,

just as is the case for mental ability.Up to about 5 years on the

job. increasingjob experienceleadsto increasingjob knowledge

(Schmidt. Hunter.& Outerbridge, 1986). which. in turn. leads

to improvedjob performance. So themajor effect ofjob experi-

ence on job performance is indirect, operating through job

knowledge.Again. there is also a direct effect of job experience

on job performance, but it is smaller than the indirect effect

throughjob knowledge (about 30% as large).

The major personality trait that has been studied in causal

models of job performance is conscientiousness.This research

has found that. controlling for mental ability, employees who

are higher in conscientiousness develop higher levels of job

knowledge, probably because highly conscientious individuals

exert greater efforts and spend more time "on task." This job

knowledge, in turn, causes higher levels of job performance.

From a theoretical point of view, this research suggests that the

central determiningvariables in job performance may be GMA.

job experience (i.e., opportunity to learn), and the personality

trait of conscientiousness.This is consistent withour conclusion

that a combinationof a GMA test and an integrity test (which

measures mostly conscientiousness) has the highest high valid-

ity (.65) for predicting job performance. Another combination

with high validity (.63) is GMA plus a structured interview,

which may in part measure conscientiousness and related per-

sonality traits (such as agreeableness and emotional stability,

which are also measured in part by integrity tests).

Limitations of This Study

This article examined the multivariatevalidity of only certain

predictor combinations: combinations of two predictors with

one of the two being GMA. Organizations sometimes use more

than two selection methods, and it would be informative to

examine the incremental validity from adding a third predictor.

For some purposes, it would also be of interest to examine

predictor combinationsthat do not include GMA. However.the

absence of the needed estimates of predictor intercorrelations

in the literature makesthis impossible at the presenttime.In the

future. as data accumulates,such analyses may becomefeasible.

In fact. evenwithinthe context of the present study,some of

the estimated predictor intercorrelations could not be made as

precise as would be ideal, at least in comparison to those esti-

mates that are based on the results of major meta-analyses. For

example. the job tryout procedure is similar to an extended job

sample test. In the absence of data estimating the job tryout-

ability test score correlation, this correlation was estimated as

being the same as the job sample-ability test correlation. It is

to be hoped that future research will provide more precise esti-

mates of this and other correlations between GMA and other

personnel measures.

Questionsrelated to genderor minority subgroupsare beyond

the scopeof this study.These issuesinclude questionsof differ-

ential validity by subgroups, predictive fairness for subgroups,

and subgroupdifferences in mean scoreon selectionprocedures.

An extensive existing literature addresses these questions (cf.

Hunter & Schmidt, 1996; Ones et al., 1993; Schmidt, 1988;

Schmidt & Hunter, 1981; Schmidt, Ones. & Hunter. 1992;

Wigdor & Garner, 1982). However,the general findingsof this

research literature are obviously relevanthere.

For differential validity,the general finding has been that va-

lidities (the focus of this study) do not differ appreciably for

different subgroups. For predictive fairness, the usual finding

has been a lack of predictive bias for minorities and women.

That is, given similar scores on selection procedures. later job

performance is similar regardless of group membership. On

some selection procedures (in particular, cognitive measures),

subgroupdifferences on means are typically observed.On other

selection procedures (in particular, personality and integrity

measures), subgroup differences are rare or nonexistent. For

many selection methods (e.g., reference checks and evaluations

of education and experience), there is little data (Hunter &

Hunter, 1984).

For many purposes, the most relevantfinding is the finding

of lack of predictivebias. That is, even whensubgroups differ

in mean score, selection procedure scores appear to have the

same implications for later performance for individualsin all

subgroups (Wigdor & Garner, 1982). That is, the predictive

interpretation of scores is the same in different subgroups.

Summary and Iffiplications

Employers must make hiring decisions; they have no choice

about that. But they can choose which methods to use in making

those decisions. The research evidence summarized in this arti-

cle shows that different methods and combinations of methods

have very different validities for predicting future job perfor-

mance. Some, such as interests and amount of education, have

very low validity. Others, such as graphology, have essentially

no validity; they are equivalent to hiring randomly. Still others,

such as GMA tests and work sample measures, have high valid-

ity. Of the combinations of predictors examined, two stand out

as being both practical to use for most hiring and as having

high composite validity: the combination of a GMA test and an

integrity test (composite validity of .65); and the combination

of a GMA test and a structured interview (composite validity

of .63). Both of these combinations can be used with applicants

with no previous experience on the job (entry level applicants),

as well as with experienced applicants. Both combinations pre-

dict performance in job training programs quite well (.67 and

.59. respectively), as well as performance on the job. And both

combinations are less expensive to use than many other combi-

nations. Hence, both are excellent choices. However. in particu-

lar cases there might be reasons why an employer might choose

to use one of the other combinations with high. but slightly

lower. validity. Some examples are combinations that include

VALIDITY AND CTILITY

wnscientiousness tests. work sample tests. job knowledge tests.

and the behavioral consistency method.

In recent vears. researchers have used cumulative research

findings on the validity of predictors of job performance to

create and test theories of job performance. These theories are

now shedding light on the psychological processes that underlie

observed predictive validity and are advancing basic understand-

ing of human competence in the wotkplace.

The validity of the personnel measure (or combination of

measures) used in hiring is directly proportional to the practical

value of the method-whether measured in dollar value of in-

creased output or percentage of increase in output. In economic

terms. the gains from increasing the validity of hiring methods

can amount over time to literally millions of dollars. However,

this can be viewed from the opposite point of view: By using

selection methods with low validity, an organization can lose

millions of dollars in reduced production.

In fact. many employers, both in the United States and

throughout the world. are currently using suboptimal selection

methods. For example. many organizations in France. Israel.

and other countries hire new employees based on handwriting

analyses by graphologists. And many organizations in the United ,

States rely solely on unstructured interviews. when they could..

use more valid methods. In a competitive world. these organiza-

tions are unnecessarily creating a competitive disadvantage for

themselves (Schmidt. 1993). By adopting more valid hiring

procedures. they could turn this competitive disadvantage into

a competitive advantage.

References

Baker. T. G. (1996). Practice network. The Industrial-Organizational

Psychologist. 34. 44-53.

Bar-Hillel. M.. & Ben-Shakhar. G. (1986). The a priori case against

graphology: Methodological and conceptUal issues. In B. Nevo (Ed.).

Scientific aspects of graphology (pp. 263-279). Springfield. IL:

Charles C Thomas.

Ben-Shakhar. G. (1989). Nonconventional methods in personnel selec-

tion. In P. Herriot (Ed.). Handbook of assessment in organizations:

Methods and practice for recruitment and appraisal (pp. 469-485).

Chichestet England: Wiley.

Ben-Shakhar. GooBar-Hillel. M., Bilu. Y.. Ben-Abba. E., & Flug, A.

( 1986). Can graphology predict occupational success? Two empirical

stUdies and some methodological ruminations. Journal of Applied

Psychology. 71. 645-653.

Ben-Shakhar. GooBar-Hillel. M.. & Flug, A. (1986). A validation stUdy

of graphological evaluations in personnel selection. In B. Nevo (Ed.),

Scientific aspects of graphology (pp. 175-191). Springfield. IL:

Charles C Thomas.

Borman. W. CooWhite. L. A.. Pulakos. E. Doo& Opple!; S. H. (1991).

Models evaluating the effects of ralee ability. knowledge, proficiency,

temperament. awards. and problem behavior on supervisory ratings.

Journal of Applied Psychology. 76. 863-872.

Boudreau. J. W. (1983a). Economic considerations in estimating the

utility of human resource productivity improvement programs. Per-

sonnel PS)'chology. 36. 551-576.

Boudreau. J. W. (1983b). Effects of employee !tows or utility analysis

of human resources productivity improvement programs. Journal of

Applied PS)'chology, 68. 396-407.

Boudreau. J. W. (1984). Decision theory contributions to human re-

source management research and practice. Illdustrial Relations. 23.

198-217.

273

Brody. N. (1992). llitelli~ence. :-<ew York: Academic Press.

Brogden. H. E. ( 1949). When testing pays off. Personnel Psvchology.

2. 171-183.

Carlson. K. DooScullen. S. E.. Schmidt. F. L.. Rothstein. H. R.. & Erwin.

F. W. (1998). Generalizable biographical data: Is multi-organiza-

tional development and keying necessary? Manuscript in preparation.

Carroll. J. B. (1993). Human cognitive abilities: A survey of factor

analytic studies. New York: Cambridge University Press.

Cascio. W. F.. & Silbey, V. (1979). Utility of the assessment center as

a selection device. Journal of Applied Psychology, 64. 107-118.

Collins. J. (1998). Prediction of overall assessment center evaluations

from ability. personality. and motivation measures: A meta-analysis.

Unpublished manuscript, Texas A & M University, College Station.

TX.

Cronshaw.S.F..&Alexander.R.A.( 1985). One answer to the demand

for accountability: Selection utility as an investment decision. Organi-

zational Behavior and Human Performance. 35. 102-118.

Dye, D. A.. Reck. M.. & McDaniel. M. A. (1993). The validity of job

knowledge measures. International Journal of Selection and Assess-

ment. I, 153-157.

Gaugler. B. B., Rosenthal. D. B., Thornton. G. c.. & Benson. C. (1987).

. 'Meta-analysis of assessment center validity. Journal of Applied Psy-

chology, 72.493-511.

Holland, J. (1986). New directions for interest testing. In B. S. Plake &

J. C. Witt (Eds.), The future of testing (pp. 245-267). Hillsdale. NJ:

Erlbaum.

Hollander. E. P. ( 1956). The friendship factor in peer nominations. Per-

sonnel Psychology, 9. 435-447.

Huffcutt. A. I.. Roth, P. L., & McDaniel. M. A. ( 1996) . A meta-analytic

investigation of cognitive ability in employment interview evaluations:

Moderating characteristics and implications for incremental validity.

Journal of Applied Psychology. 81. 459-473.

Hunter. J. E. (1980). Validity generalizationfor 12.000 jobs: An applica-

tion of synthetic validity and validity generalization to the General

Aptitude Test Battery (GATB). WashingtOn. DC: U.S. Department of

Labor. Employment Service.

Hunter. J. E. (1986). Cognitive ability, cognitive aptitUdes. job knowl-

edge, and job performance. Journal of Vocational Behavior. 29. 340-

362.

Hunter. J. E.. & Hunter. R. F. (1984). Validity and utility of alternative

predictors of job performance. Psychological Bulletin. 96. 72-98.

Hunter. J. E., & Schmidt. F. L. ( 1982a). Fitting people to jobs: Implica-

tions of personnel selection for national productivity. In E. A. Fleish-

man&M.D.Dunnette(Eds.),Human performance andproductivity.

Volume I: Human capability assessment (pp. 233-284). Hillsdale.

NJ: Erlbaum.

Hunter. J. E., & Schmidt, F. L. ( 1982b ). Quantifying the effects of psy-

chological interventions on employee job performance and work force

productivity. American Psychologist. 38. 473-478.

Hunter. J. Eoo& Schmidt. F. L. (1990). Methods of meta-analysis: Cor-

recting error and bias in researchfindings. BeverlyHills. CA: Sage.

Hunter, J. E.. & Schmidt, F. L. ( 1996). Intelligence and job performance:

Economic and social implications. Psychology. Public Policy. and

Law. 2. 447-472.

Hunter. J. E.. Schmidt, F. L.. & Coggin. T. D. (1988). Problems and

pitfalls in using capital budgeting and financial accounting techniques

in assessing the utility of personnel programs. Journal of Applied

Psychology. 73, 522-528.

Hunter. J. E.. Schmidt. F. Loo& Jackson. G. B. (1982). Meta-analysis:

Cumulating research findings across studies. Beverly Hills. CA: Sage.

Hunter. J. EooSchmidt. F. Loo& Judiesch. M. K. (1990). Individual dif-

ferences in output variability as a function of job complexity. Journal

of Applied Psychology. 75. 28-~2.

274 SCHMIDT AND HUNTER

Jansen. A. (1973). Validation of graphological Judgments: An experi-

mental study. The Hague. the Netherlands: Monton.

Jensen. A. R. ( 1998). The gfactor: The science of mental ability. West-

port. CT: Praeger.

Levy. L. (1979). Handwriting and hiring. Dun's Review. 113,72-79.

McDaniel. M. A.. Schmidt. F. L.. & Hunter. J. E. (1988a). A meta-

analysis of the validity of methods for rating training and experience

in personnel selection. Personnel Psychology, 41, 283-314.

McDaniel. M. A.. Schmidt. F. L.. & Hunter. J. E. (l988b). Job experi-

ence correlates of job performance, Journal of Applied Psychology,

73. 327-330.

McDaniel. M. A.. Whetzel. D. L.. Schmidt, F. L.. & Mauer, S. D. (1994).

The validity of employment interViews: A comprehensive review and

meta-analysis.Journal of Applied Psychology, 79,599-616.

Mount. M. K.. & Barrick. M. R. ( 1995). The Big Five personality di-

mensions: Implications for research and practice in human resources

management.In G. R. Ferris(Ed.). Researchinpersonnel and human

resources maTlllgement (Vol. 13. pp. 153-2(0). 1Al Press.

Neter. E.. & Ben-Shakhar. G. ( 1989). The predictive validity of grapho-

logical inferences: A meta-analytic approach. PersoTllllity and Individ-

ual Differences, /0, 737-745.

Ones. D. S. ( 1993). The construct validity of integrity tests. Unpublished

doctoral dissertation. University of Iowa. Iowa City.

Ones. D. S.. Viswesvaran. C.. & Schmidt. F. L. (1993). Comprehensive

meta-analysis of integrity test validities: Findings and implications

for personnel selection and theories of job performance. Journal of

Applied PsychologyMonograph. 78, 679-703.

Pearlman. K.. Schmidt, F. L.. & Hunter, 1. E. ( 1980). Validity generaliza-

tion results for tests used to predict job proficiency and training criteria

in clerical occupations. Journal of Applied Psychology, 65, 373-407.

Rafaeli. A., & Klimoski, R. 1. (1983). Predicting sales success through

handwriting analysis: An evaluation of the effects of training and

handwriting sample context. Journal of Applied Psychology, 68, 212-

217.

Ree, M. J., & Earles. 1. A. (1992). Intelligence is the best predictor of

job performance. Current Directions in Psychological Science, I. 86-

89.

Rothstein. H. R., Schmidt, F. L., Erwin, F. w.. Owens, W. A., & Sparks.

C. P.( 1990). Biographical data in employment selection: Can validities

be madegeneralizable?Journal of Applied Psychology,75, 175-184.

Schmidt. F.L. (1988). The problem of group differences in ability .

scores in employment selection. Journal of VocatioTlllIBehavior; 33. .

272-292.

Schmidt, F. L. (1992). What do data really mean? Research findings.

meta analysis. and cumulative knowledge in psychology. American

Psychologist, 47. 1173-1181.

Schmidt. F. L. ( 1993). Personnel psychology at the cutting edge. In N.

Schmitt & w. Borman (Eds.), Personnel selection (pp. 497-515).

San Francisco: Jossey Bass.

Schmidt, F. L., Caplan. 1. R, Bemis. S. E., Decuir. R., Dinn, L.. &

Antone, L. ( 1979) .Development and evaluation of behavioral consis-

tency method of uTlllssembledexamining (Tech. Rep. No. 79-21).

U.S. Civil Service Commission. Personnel Research and Development

Center.

Schmidt, F. L.. & Hunter, J. E. (1977). Development of a general solu-

tion to the problem of validity generalization. Journal of Applied

Ps:-..chology,62, 529-540.

Schmidt, F. L.. & Hunter. 1. E. ( 1981 ). Employment testing: Old theories

and new research findings.American Psychologist.36. 1128-1137.

Schmidt. F. L.. & Hunter, J. E. ( 1983 ). Individual differences in produc-

tivity: An empirical test of estimates derived from studies of selection

procedure utility. Journal of Applied Psychology, 68, 407-415.

Schmidt. F. L.. & Hunter. J. E. ( 1992). Development of causal models

of processes determining job performance. Current Directions in Psv-

chological Science. 1. 89-92.

Schmidt. F. L., Hunter. J. E.. McKenzie. R. C.. & Muldrow. T.W.

( 1979). The impact of valid selection procedures on work-force pro-

ductivity. Journal of Applied Ps:...chology,64. 609-626.

Schmidt. F. L.. Hunter. J. E.. & Outerbridge. A. N. (1986), The impact

of job experience and ability on job knowledge. work sample perfor-

mance.and supervisoryratingsof job performance.JournalofApplied

Psychology, 7I. 432-439.

Schmidt. F. L.. Hunter. 1. E.. Outerbridge. A. N.. & Goff. S. (1988).

The joint relation of experience and ability with job performance: A

test of three hypotheses. Journal of Applied Psychology, 73, 46-57.

Schmidt, F. L., Hunter, J. E., Outerbridge. A. M.. & Trattner. M. H.

( 1986). The economic impact of job selection methods on the size.

productivity, and payroll costs of the federal work-force: An empirical

demonstration. Personnel Psychology. 39. 1-29.

Schmidt, F. L., Hunter. 1.E.. & Pearlman. K. (1980). Task difference

and validity of aptitude tests in selection: A red herring. Journal of

Applied Psychology, 66. 166-185.

Schmidt, F. L.. Hunter. J. E.. & Pearlman. K. (1982). Assessing the

economic impact of personnel programs on workforce productivity.

Personnel Psychology,; 35. 333-347.

ScQmidt, F. L., Hunter. J. E., Pearlman. K.. & Shane. G. S. (1979).

.Further tests of the Schmidt-Hunter Bayesian Validity Generalization

Model. Personnel Psychology, 32, 257-281.

Schmidt. F. L.. Law. K.. Hunter, 1.E.. Rothstein. H. R.. Pearlman. K., &

McDaniel. M. (1993). Refinements in validity generalization meth-

ods: Implications for the situational specificity hypothesis. Journal of

Applied Psychology. 78, 3-13.

Schmidt. F. L., Mack. M. 1.. & Hunter. J. E. ( 1984). Selection utility in

the occupation of U.S. Park Ranger for three modes of test use. Jour-

nal of Applied Psychology, 69, 490-497.

Schmidt. F. L.. Ones, D. S.. & Hunter, J. E. (1992). Personnel selection.

Annual Review of Psychology, 43, 627-670.

Schmidt. F. L., Ones. D. S.. & Viswesvaran. C. (1994. June 3D-July

3). The persoTllllity characteristic of integriry predicts job training

success. Presented at the 6th Annual Convention of the American

Psychological Society, Washington. DC.

Schmidt. F. L., & Rothstein. H. R. ( 1994). Application of validity gener-

.alizationmethodsof meta-analysisto biographicaldata scores in em-

ployment selection. In G. S. Stokes. M. D. Mumford. & w. A. Owens

(Eds.), The biodata handbook: Theory, research. and applications

(pp. 237-260). Palo Alto. CA: Consulting Psychologists Press.

Steiner, D. D. (1997). International forum. The Industrial-Organiza-

tioTlllIPsychologist,34. 51-53.

Steiner,D. D., & Gilliland.S. W.(1996). Fairnessreactionsto personnel

selection techniquesin France and the United States. Journal of Ap-

plied Psychology.81, 134-141.

Viswesvaran, C.. Ones. D. S., & Schmidt. F. L. (1996). Comparative

analysisof the reliability of job performanceratings. Journal of Ap-

plied Psychology, 81. 557-560.

Waters. L. K., & Waters. C. W. (1970). Peer nominations as predictors

of short-term role performance. Journal of Applied Psychology, 54.

42-44.

Wigdor, A. K., & Garner. W. R. (Eds.). (1982). Abiliry testing: Uses.

consequences, and controversies (Report of the National Research

Council Committee on Ability Testing). Washington. DC: National

Academy of Sciences Press.

ReceivedApril 8. 1997

Revision received February 3.1998

AcceptedApril 2. 1998.

The need for “Considered Estimation” versus “Conservative Estimation” when ranking or comparing predictors of job performance

Article

Jun 2024

A recent attempt to generate an updated ranking for the operational validity of 25 selection procedures, using a process labeled “conservative estimation” (Sackett et al., 2022), is flawed and misleading. When conservative estimation's treatment of range restriction (RR) is used, it is unclear if reported validity differences among predictors reflect (i) true differences, (ii) differential degrees of RR (different u values), (iii) differential correction for RR (no RR correction vs. RR correction), or (iv) some combination of these factors. We demonstrate that this creates bias and introduces confounds when ranking (or comparing) selection procedures. Second, the list of selection procedures being directly compared includes both predictor methods and predictor constructs, in spite of the substantial effect construct saturation has on validity estimates (e.g., Arthur & Villado, 2008). This causes additional confounds that cloud comparative interpretations. Based on these, and other, concerns we outline an alternative, “considered estimation” strategy when comparing predictors of job performance. Basic tenets include using RR corrections in the same manner for all predictors, parsing validities of selection methods by constructs, applying the logic beyond validities (e.g., d s), thoughtful reconsideration of prior meta‐analyses, considering sensitivity analyses, and accounting for nonindependence across studies.

Overview of Research on “Learning Agility” and Proposals for Innovation of Education in Vietnam

Article

Full-text available

Jun 2024

By summarizing some of the world's typical research articles on “Learning Agility” by many authors from the past to present, this article focuses on analyzing and clarifying the concept of “Learning Agility” and the core factors that constitute “Learning Agility”. In addition, the article also analyzes and evaluates different models and approaches to researching and exploiting and using “Learning Agility” in education, thereby proposing a solution. And presenting conceptual framework to measure and evaluate learners’ potential when exploring new learning.

České panelové šetření středoškoláků. Výzkumná zpráva z první vlny sběru dat

Technical Report

Full-text available

Jun 2024

Na podzim školního roku 2023/24 byla realizována první vlna sběru dat mezi žáky a žákyněmi prvních ročníků středních škol a odpovídajících ročníků víceletých gymnázií (České panelové šetření středoškoláků). Celkem se do výzkumu zapojilo 249 škol a více než 24 000 žáků a žákyň z celé České republiky. Tato výzkumná zpráva představuje základní informace o tom, jak žáci a žákyně prvního ročníku vnímají sami sebe, jaké mají studijní aspirace, vztah ke škole a s jakou úrovní kritického myšlení přichází na jednotlivé typy středních škol.

Congruence work in stigmatized occupations: A managerial lens on employee fit with dirty work

Article

Full-text available

Jun 2024
J ORGAN BEHAV

Although research has established that it is often difficult for individuals engaged in dirty work to adjust to stigma and the attributes giving rise to stigma, little theory or empirical work addresses how managers may help workers adjust to dirty work. Interviews with managers across 18 dirty work occupations-physically tainted (e.g., animal control), socially tainted (e.g., corrections), and morally tainted (e.g., exotic entertainment)-indicate that managers engage in "congruence work": behaviors, sensemaking, and sensegiving that they perceive as helping individuals adjust and develop a stronger sense of person-environment fit. Specifically, congruence work focuses on 3 phases of managerial practices that correspond to individuals' growing experience in the occupation. First, recruitment/selection involves overcoming individuals' aversion to dirty work by selecting individuals with an affinity for the work and providing a realistic stigma preview. Second, socialization involves helping newcomers adjust to distasteful tasks and to stigma by using targeted divestiture, developing perspective taking, helping newcomers manage external relationships, and utilizing desensitization or immersion. Third, ongoing management roles involve cementing individuals' fit by fostering social validation, protecting workers from dirty work hazards, and negotiating the frontstage/backstage boundary. The practices identified as congruence work highlight the important role that managers can play in facilitating adjustment for both "dirty workers" and presumably their less stigmatized counterparts.

Sådan rekrutterer vi sygeplejersker

Article

Apr 2024

Denne artikel udforsker en innovativ rekrutteringsmetode inden for sygepleje, hvor ansøgere tilsendes en case, omfattende en faglig artikel om Personcentreret Praksis (PCP), som de skal læse, analysere og under ansættelsessamtalen, henholde til deres eget faglige interesseområde. Ved at anvende en case-baseret rekrutteringsmetode, har vi på Børne- og Ungeafdelingen, Sjællands Universitetshospital, Roskilde, skabt dialog om PCP-principper og udfordret konventionelle antagelser om rekruttering. Ansøgernes perspektiver, erfaringer og refleksioner indgår i denne artikel, og resultaterne viser, at metoden ikke kun evaluerer, men også forpligter både afdelingen og ansøger til afdelingens sygeplejefaglige strategi, nemlig PCP. Metoden fremmer innovation og ønsket er at andre afdelinger ser fordelene ved metoden og anvender den.

Extravert Surgical Resident Applicants Get Higher 360-Degree Evaluations From Coworkers

Article

Apr 2024
J SURG RES

Development of a Survey Tool to Assess Emotional and Social Behavioral Competencies of Science Technology Engineering and Medicine (STEM) Graduate Students

Preprint

Full-text available

Jun 2024

The underrepresentation of women and minority students in STEM graduate programs remains a significant challenge, compounded by biases in traditional admissions processes and barriers to effective mentoring and retention. This study develops and validates the Quinn Miller Competency Assessment (QMCA), a tool designed to assess emotional and social intelligence (ESI) competencies crucial for STEM graduate student success. The QMCA was created through an iterative process involving literature review, expert consultations, and empirical studies. It evaluates five key competencies: self-awareness, self-control, adaptability, achievement orientation, and teamwork. The tool's validity and reliability were tested using exploratory and confirmatory factor analyses on diverse samples of STEM graduate students and applicants. Results demonstrated strong construct validity and reliability, supporting the QMCA's use in both admissions and student development contexts, in conjunction with other measures. By providing a more holistic evaluation of applicants’ competencies, the QMCA aims to improve access and retention for underrepresented groups in STEM fields, fostering a more inclusive and diverse scientific community. Future research will focus on expanding the QMCA to include additional social competencies and testing its applicability across broader disciplines.

Examining the Impact of Parents’ Education on Students’ Academic Achievements

Article

Full-text available

Apr 2024

This article undertakes a primary objective to examine the influence exerted by the parents’ education upon the academic achievements of students. Academic success among children is a complex culmination of various factors, with the parents’ education emerging as a notably influential determinant. The synthesis of existing literature within this study aims to provide a comprehensive grasp of how the parents’ education intricately mold the educational courses of students. The foundation of this article lies in a cross-sectional survey design, wherein the parents’ education assumes the role of an independent variable, and the students’ academic attainment as the dependent one. The data collection process involved the administration of a survey questionnaire to 386 students enrolled at 14 high schools located in Makawanpur district, Nepal. Employing linear regression analysis, the study pursued to unravel the impact of the parents’ education on students’ academic achievements. The outcomes of the regression analysis, with a correlation coefficient (R=.711) indicates a positive link between parents’ education and students’ grade points, an R-Square value of .505, indicative of a predictive nature, revealed that 50.5% of the discrepancy in students’ academic achievement can be ascribed to the parents’ education, leaving 49.5% accounted for by other influencing factors. The observed f(1, 384) = 97.185 that exceeded the critical value (3.865) indicates the fitness of the model, and (Sig.) being smaller than .05 implies that the independent variable had a noteworthy impact on the dependent variable. This research investigates the intricate relationship between parents’ instructive backgrounds and their children’s educational feat. The implications of this research extend beyond individual academic accomplishments, influencing educational policies, parental guidance programs, and interventions aimed at narrowing educational disparities.

The Use of Triangulation in Vocational Evaluation

Article

Full-text available

May 2024

Steven Sligar

This practitioner-oriented article focuses on triangulation and how it is applied in the analysis and reporting of data collected during a vocational evaluation or career assessment (VE/CA). Triangulation is defined and its principles are discussed along with an examination of possible reasons practitioners do not use triangulation in their practice. Using Career Interest Factors from Super, we provide an example to demonstrate how triangulation is integrated into VE/CA. Next, we discuss the construct of cognition and how it is incorporated into reports. We use four report-writing approaches: computerized, systems, functional with two methods (i.e., Fit-to-Match, Response to Intervention), and counseling as ways to illustrate how triangulation of cognitive factors can be reported. Finally, we include suggestions for incorporating triangulation in practice.

Law and Psychopaths in the Workplace

Chapter

Full-text available

Apr 2024

Corporate psychopaths are a serious problem within the inadequate but evolving national and international legal frameworks, affecting both individuals and corporate bodies. This chapter canvases the main areas of law potentially available to people and organizations, namely tort, employment, and criminal law for dealing with corporate psychopaths. It also draws attention to potential responses by managers, victims, and bystanders as well as the general limitations of law as a remedy—particularly given that some industries tacitly reward behaviors such as excessive risk taking, non-conformity and even rule-breaking that are attributes of corporate psychopathy. Finally, it suggests that law in the workplace can serve to foster mechanisms for positive psychology rather than relying on traditional legal deterrence through punitive sanctions.

Can graphology predict occupational success? Two empirical studies and some methodological ruminations.

Article

Full-text available

Nov 1986

Gershon Ben-Shakhar

Tested the validity of graphological predictions in 2 empirical studies. In Study 1, graphologists rated 80 primarily 19–27 yr old bank employees on several job relevant traits, based on handwritten biographies. The scripts were also rated on the same traits by a clinical psychologist with no knowledge of graphology. The criterion was the ratings on the same traits by the employees' supervisors. The graphologists' and the clinician's correlations with the criterion were typically between 0.2 and 0.3. To test whether these validities might be attributable to the scripts' content, a 3rd method of prediction was developed. The information in the tests (e.g., education) was systematically extracted and combined in a linear model. This model outperformed the human judges. In Study 2, graphologists judged the profession, out of 8 possibilities, of 40 successful professionals. This was done on the basis of rich (e.g., containing numbers and Latin script as well as Hebrew text), uniform scripts. Results indicate that the graphologists did not perform significantly better than a chance model. The flaws of graphological research are discussed. (20 ref)

Selection utility in the occupation of U.S. park ranger for three modes of test use.

Article

Full-text available

Jan 1984

Joint Relation of Experience and Ability With Job Performance: Test of Three Hypotheses

Article

Full-text available

Feb 1988

Data from four different jobs (N = 1,474) were used to evaluate three hypotheses of the joint relation of job experience and general mental ability to job performance as measured by (a) work sample measures, (b) job knowledge measures, and (c) supervisory ratings of job performance. The divergence hypothesis predicts an increasing difference and the convergence hypothesis predicts a decreasing difference in the job performance of high- and low-mental-ability employees as employees gain increasing experience on the job. The noninteractive hypothesis, by contrast, predicts that the performance difference will be constant over time. For all three measures of job performance, results supported the noninteractive hypothesis. Also, consistent with the noninteractive hypothesis, correlational analyses showed essentially constant validities for general mental ability (measured earlier) out to 5 years of experience on the job. In addition to their theoretical implications, these findings have an important practical implication: They indicate that the concerns that employment test validities may decrease over time, complicating estimates of selection utility, are probably unwarranted.

The Validity of Employment Interviews: A Comprehensive Review and Meta-Analysis

Article

Full-text available

Aug 1994

This meta-analytic review presents the findings of a project investigating the validity of the employment interview. Analyses are based on 245 coefficients derived from 86,311 individuals. Results show that interview validity depends on the content of the interview (situational, job related, or psychological), how the interview is conducted (structured vs. unstructured; board vs. individual), and the nature of the criterion (job performance, training performance, and tenure; research or administrative ratings). Situational interviews had higher validity than did job-related interviews, which, in turn, had higher validity than did psychologically based interviews. Structured interviews were found to have higher validity than unstructured interviews. Interviews showed similar validity for job performance and training performance criteria, but validity for the tenure criteria was lower.

Meta-Analysis: Cumulating Research Findings Across Studies

Book

Full-text available

Jan 1982

Human cognitive abilities: A survey of factor-analytic studies

Book

Jan 1993

J B Carroll

When testing pays off

Article

H.E. Brogden

9. New Directions for Interest Testing

Article

Jan 1986

John L. Holland

This chapter is an attempt to outline where interest testing may be or should be in the near future: What changes will be seen in the development or revision of inventories, what new areas of application will occur, and what technical, social, and professional problems need resolution to get to a more desirable future. This sounds like a rational task. I have been asked to describe a desirable future by canoeing through the rapids of psychometric fashions , disgruntled test takers, passive publishers, worried professionals and their righteous associations, and future islands of unpredictable theory. To make this task easier, the sponsor cautioned me to rely on empirical data, not daydreams. Fortunately, I can recognize an impossible task without the aid of consultants. For several reasons, it appears helpful to redefine the task. Earlier opinions by distinguished pioneers in interest measurement have occasionally been off the mark. For example, Kuder (1954) suggested that occupational titles made poor items and that activity items would be the wave of the future . News item: Occupational items continue to be useful and popular in most inventories. And inventories that use only activity items usually include occupational titles disguised as "Be an accountant," or "Be a counselor." Developers apparently get tired of looking for good items by following a restrictive rule. At an earlier time, Strong (1943) and others dismissed a person's vocational aspiration as a weak index of the occupation a person would actually enter because this index did not have a substantial correlation with a person's measured interests. However, Dolliver (1969) started a cottage industry of research by demonstrating that aspirations and measured interests have about equal predictive validity. Later, we learned that the use of interest inventories and aspirations in tandem produced very substantial predictions. These events and the work of futurists imply that it is helpful to see future developments not only as the continuation of current trends but also as developments that will be shaped by economic, social, technological, and theoretical forces that we cannot always anticipate or control. Consequently, I attempt to relate current developments to future developments, but my forecasts will surely be deflected by unanticipated events. I also try to distinguish long-term trends that I believe are desirable and helpful and those that may be undesirable and not helpful. My reservations about this task have considerable empirical support. I have multiple conflicts of interest. I am the author of two interest inventories that have been the object of close scrutiny for 10 years. I am familiar with the evidence and issues about inventory biases, development, effects and usage, but my beliefs about these matters have received only mixed reviews (Gottfredson & Holland, 1978; Tittle & Zytowski, 1978) . The most constructive outcome of this experience for me has been to perceive interest inventories in the context of usefulness, validity and reliability- and about in that order.

Intelligence Is the Best Predictor of Job Performance

Article

Jun 1992

The Big Five Personality Dimensions: Implications for Research and Practice in Human Resources Management

Chapter

Jan 1995
Res Person Hum Resour Manag

The Validity and Utility of Selection Methods in Personnel Psychology: Practical and Theoretical Implications of 85 Years of Research Findings

Abstract

Recommended publications

Breaking barriers in brain research

Building on a century of hydroscience research

Iowa dives into the future of water research

Leading writing faculty make the difference at Iowa

Quantifying the effects of psychological interventions on employee job performance and work-force pr...

The Validity and Utility of Selection Methods in Personnel Psychology

General Mental Ability, Job Performance, and Red Herrings: Responses to Osterman, Hauser, and Schmit...

Validity Generalization

Implications of Methodological Advances for the Practice of Personnel Selection: How Practitioners B...