Article

Sequential decision bias – evidence from grading exams

Applied Economics

September 2021
54(32)

DOI:10.1080/00036846.2021.1976390

Authors:

Carina Goldbach

Hochschule Rhein-Waal

Jörn Sickmann

Hochschule Rhein-Waal

Thomas Pitz

Hochschule Rhein-Waal

Human perception is very comparative. A misperception of random sequential processes, however, can influence the outcome of important decisions in many areas of daily life, including the evaluation of exams in higher education. We investigate this effect by using an extensive dataset of more than 20,000 examination results that is analysed on whether a student’s performance in an exam has an impact on the evaluation of the one following, e.g. whether a poor performance lets the following shine in a better light, resulting in a better-than-expected grade. We conclude that there is evidence for sequential decision biases that do not necessarily occur generally, but rather after streaks of extreme events. An even greater effect size is detected when limiting the sample to exams with a low variance in grades and to exams that were evaluated later in the exam correction sequence. Therefore, there is evidence that under certain conditions, past evaluations may impact on current evaluations of student performances. This study should raise awareness of biases in grading against the background of their importance in the assessment of students’ educational performances, the admission to consecutive study programmes and as a key metric in the evaluation of job candidates.

Unveiling Bias in Sequential Decision Making: A Causal Inference Approach for Stochastic Service Systems

Preprint

Full-text available

Jul 2023

In many stochastic service systems, decision-makers find themselves making a sequence of decisions, with the number of decisions being unpredictable. To enhance these decisions, it is crucial to uncover the causal impact these decisions have through careful analysis of observational data from the system. However, these decisions are not made independently, as they are shaped by previous decisions and outcomes. This phenomenon is called sequential bias and violates a key assumption in causal inference that one person's decision does not interfere with the potential outcomes of another. To address this issue, we establish a connection between sequential bias and the subfield of causal inference known as dynamic treatment regimes. We expand these frameworks to account for the random number of decisions by modeling the decision-making process as a marked point process. Consequently, we can define and identify causal effects to quantify sequential bias. Moreover, we propose estimators and explore their properties, including double robustness and semiparametric efficiency. In a case study of 27,831 encounters with a large academic emergency department, we use our approach to demonstrate that the decision to route a patient to an area for low acuity patients has a significant impact on the care of future patients.

Sequential effects in Olympic synchronized diving scores

Article

Full-text available

Jan 2017

Robin Stewart Samuel Kramer

When judging performances in a sequence, the current score is often influenced by the preceding score. Where athletes are perceived to be similar, a judgement is assimilated towards the previous one. However, if judges focus on the differences between the two athletes, this will result in a contrasting influence on their scores. Here, I investigate sequential effects during synchronized diving events at the 2012 and 2016 Olympic Games. Although previous research found assimilation in scores of gymnasts, the current data showed contrast effects-current scores benefited from following a poor performance but were at a disadvantage if they followed a high-scoring performance. One explanation may be that the processes involved in judging synchronized pairs results in a focus on the differences between athletes, producing a contrast effect across dives. That the specific direction of this sequential bias may depend on the particular sport has implications for how judges might approach their roles in a context-dependent manner, as well as how such biases should be addressed.

Decision-Making Under the Gambler's Fallacy: Evidence from Asylum Judges, Loan Officers, and Baseball Umpires*

Article

Full-text available

Mar 2016

We find consistent evidence of negative autocorrelation in decision-making that is unrelated to the merits of the cases considered in three separate high-stakes field settings: refugee asylum court decisions, loan application reviews, and major league baseball umpire pitch calls. The evidence is most consistent with the law of small numbers and the gambler’s fallacy – people underestimating the likelihood of sequential streaks occurring by chance – leading to negatively autocorrelated decisions that result in errors. The negative autocorrelation is stronger among more moderate and less experienced decision-makers, following longer streaks of decisions in one direction, when the current and previous cases share similar characteristics or occur close in time, and when decision-makers face weaker incentives for accuracy. Other explanations for negatively autocorrelated decisions such as quotas, learning, or preferences to treat all parties fairly, are less consistent with the evidence, though we cannot completely rule out sequential contrast effects as an alternative explanation.

Contrast Effects in Sequential Decisions: Evidence from Speed Dating

Article

Full-text available

Jul 2014

We provide an empirical test of contrast effects-a bias where a decision maker perceives information in contrast to what preceded it-in the quasi-experimental context of speed dating decisions. We document that prior partner attractiveness reduces the subsequent likelihood of an affirmative dating decision. This relationship is confined to recent interactions, consistent with a perceptual error, but not learning or the presence of a quota in affirmative responses. The contrast effect is driven almost entirely by male evaluators. Additional evidence documents the effect's linearity with respect to prior partner attractiveness, its amplification for partners of moderate attractiveness, and its partial attenuation with accumulated experience.

Contrast effects in judgments of crime severity and the punishment of criminal violators

Article

Full-text available

Apr 1976

Both enhancing and depressing contrast effects were experimentally confirmed in judgments made by 182 male undergraduates of the seriousness of the 2nd of 2 sequential crimes. Thus, a homicide was judged to be a more severe criminal violation when Ss judged the seriousness of an assault case just preceding it than when the same homicide was preceded by another homicide. Symmetrically, an assault was judged to be less serious when it was preceded by a homicide than when it was preceded by another assault. Contrast effects occurred only when judgments of the 1st crime in the sequence were "anchored," that is, overtly recorded and thus publicly committed. Two theoretical implications of crime severity contrasts were confirmed: (a) Judgments of the moral character and personal adaptability were higher or lower depending upon the direction of the contrasted seriousness judgments. (b) The magnitude of the punishment recommended for the offender was increased or decreased as a function of the contrasted seriousness of the crimes. (18 ref) (PsycINFO Database Record (c) 2012 APA, all rights reserved)

Contrast Effects and Judgments of Physical Attractiveness: When Beauty Becomes a Social Problem

Article

Full-text available

Jan 1980

Conducted 3 studies to test the hypothesis that judgments of average females' attractiveness or dating desirability will be adversely affected by exposing judges to extremely attractive prior stimuli (i.e., judgments will show a "contrast effect"). Study 1 was a field study in which 81 male dormitory residents watching a popular TV show, whose main characters were 3 strikingly attractive females, were asked to rate a photo of an average female (described as a potential blind date for another dorm resident). These Ss rated the target female as significantly less attractive than did a comparable control group. Two other studies with 146 undergraduates demonstrated analogous effects in a more controlled laboratory setting. In addition, the 3rd study indicated a direct effect of informational social influence on physical attractiveness judgments. Implications are discussed with particular attention to mass media impact. (40 ref) (PsycINFO Database Record (c) 2012 APA, all rights reserved)

Perception is Relative: Sequential Contrasts in the Field

Article

Full-text available

Jan 2008

A large psychological literature suggests that individuals rely on comparative per-ception when making sequential decisions or assessments. Such a perceptual bias could inuence behavior in settings from employee hiring and medical diagnosis to investment appraisal and product evaluation. This study presents a theoretical framework which o¤ers predictions to di¤erentiate perceptual errors from rational behavior and provides empirical evidence for such "contrast e¤ects." The empirical focus is an analysis of se-quential exam evaluation in a large undergraduate course with supporting evidence for perceptual errors in sentencing decisions by judges in PA courts and in dating decisions by speed daters. There is modest evidence that graders are negatively biased when evaluating exams which follow high and low streaks of extreme exams. Relative to a typical score, this e¤ect is on the order of a 12% increase in leniency following a streak of three low scoring exams, and a 6% grade reduction following three high scoring ex-ams. Ostensibly, learning or quota constraints for high and low grades could rationally account for this …nding. However, the fact that these e¤ects: (i) decay fully after a single period, (ii) persist despite grader experience, and (iii) are non-existent for highly transparent (multiple-choice) questions suggests an alternative explanation. Stronger evidence exists for contrast e¤ects in judicial sentencing. Judges are 9% more likely to be lenient in sentencing of summary o¤enses after exposure to a criminal felony. The e¤ects disappear for days with exposure to multiple felonies. In dating, highly attractive or unattractive prior partners produce a 13 to 17% distortion relative to the baseline decision to date. An original survey of real estate agents suggests that these e¤ects may extend to non-random settings such as home purchases. This paper was heavily shaped by numerous discussions with Stefano DellaVigna, Botond Koszegi, and Matt Rabin, and I owe them a special thanks. participants of the Psychology & Economics Seminar at UC Berkeley also provided thoughtful comments and feedback. David Moyer, Deb Weber, Leah Woolsey, and anonymous others generously helped secure data for this project.

Extraneous Factors in Judicial Decisions

Article

Full-text available

Apr 2011
P NATL ACAD SCI USA

Are judicial rulings based solely on laws and facts? Legal formalism holds that judges apply legal reasons to the facts of a case in a rational, mechanical, and deliberative manner. In contrast, legal realists argue that the rational application of legal reasons does not sufficiently explain the decisions of judges and that psychological, political, and social factors influence judicial rulings. We test the common caricature of realism that justice is "what the judge ate for breakfast" in sequential parole decisions made by experienced judges. We record the judges' two daily food breaks, which result in segmenting the deliberations of the day into three distinct "decision sessions." We find that the percentage of favorable rulings drops gradually from ≈ 65% to nearly zero within each decision session and returns abruptly to ≈ 65% after a break. Our findings suggest that judicial rulings can be swayed by extraneous variables that should have no bearing on legal decisions.

The Hot Hand Fallacy and the Gambler’s Fallacy: Two faces of Subjective Randomness?

Article

Full-text available

Jan 2005

The representativeness heuristic has been invoked to explain two opposing expectations--that random sequences will exhibit positive recency (the hot hand fallacy) and that they will exhibit negative recency (the gambler's fallacy). We propose alternative accounts for these two expectations: (1) The hot hand fallacy arises from the experience of characteristic positive recency in serial fluctuations in human performance. (2) The gambler's fallacy results from the experience of characteristic negative recency in sequences of natural events, akin to sampling without replacement. Experiment 1 demonstrates negative recency in subjects' expectations for random binary outcomes from a roulette game, simultaneously with positive recency in expectations for another statistically identical sequence-the successes and failures of their predictions for the random outcomes. These findings fit our proposal but are problematic for the representativeness account. Experiment 2 demonstrates that sequence recency influences attributions that human performance or chance generated the sequence.

A Tough Act to Follow: Contrast Effects in Financial Markets

Article

Jul 2016

A contrast effect occurs when the value of a previously observed signal inversely biases perception of the next signal. We present the first evidence that contrast effects can distort prices in sophisticated and liquid markets. Investors mistakenly perceive earnings news today as more impressive if yesterday's earnings surprise was bad and less impressive if yesterday's surprise was good. A unique advantage of our financial setting is that we can identify contrast effects as an error in perceptions rather than expectations. Finally, we show that our results cannot be explained by an alternative explanation involving information transmission from previous earnings announcements. This article is protected by copyright. All rights reserved

Selecting successful students? Undergraduate grades as an admission criterion

Article

Dec 2017

In Europe’s reformed education system, universities may be forced by law to consider undergraduate grade point average (UGPA) as the primary admission criterion in the selection of graduate students. In this article, we investigate whether UGPA predicts graduate student performance in order to discuss its usefulness as an admission criterion. In our theoretical framework, we show that undergraduate students may choose slower study progress in favour of receiving higher grades and conclude that UGPA is a relatively good (weak) predictor for graduate grade point average (study progress). Having data from a cohort of students whose selection was in clear conflict with the legal requirement, we empirically confirm our theoretical predictions by exploiting a unique opportunity for assessing educational policies. Discussion of our findings leads to some important conclusions concerning the Bologna reforms and the lawmakers’ idea of giving some independence to universities, but not too much of it.

Honest grading, grade inflation, and reputation

Article

Sep 2016

When students receive better grades without any corresponding increase in ability, this is called grade inflation. Conventional wisdom says that such grade inflation is unavoidable since it is essentially costless to award good grades. In this article, we point out an effect driving into the opposite direction: Grade inflation is not actually costless, since it has an impact on future cohorts of graduates, or, put differently, by grading honestly, a school can build up reputation. Introducing a concern for reputation into an established signalling model of grading, we show that this mechanism reduces or even avoids grade inflation.

The Information Value of Central School Exams

Article

Nov 2016
ECON EDUC REV

The central vs. local nature of high-school exit exam systems can have important repercussions on the labor market. By increasing the informational content of grades, central exams may improve the sorting of students by productivity. To test this, we exploit the unique German setting where students from states with and without central exams work on the same labor market. Our difference-in-difference model estimates whether the earnings difference between individuals with high and low grades differs between central and local exams. We find that the earnings premium for a one standard-deviation increase in high-school grades is indeed 6 percent when obtained on central exams but less than 2 percent when obtained on local exams. Choices of higher-education programs and of occupations do not appear major channels of this result.

Nice guys finish first when presented second: Responsive daters are evaluated more positively following exposure to unresponsive daters

Article

Feb 2016

Decisions about who to date are increasingly being made while viewing a large pool of dating prospects simultaneously or sequentially (e.g., online dating). The present research explores how the order in which dating prospects are evaluated affects the role in dating decisions of a variable crucial to relationship success – partner responsiveness. In Study 1, participants viewed dating profiles varying in physical attractiveness and responsiveness. Some participants viewed responsive profiles first whereas others viewed unresponsive profiles first. Results revealed that responsive targets were rated more favorably following exposure to unresponsive targets, regardless of level of attractiveness. Study 2 specifically targeted how contrast effects affect romantic evaluations of a physically unattractive, yet responsive, target. Results again revealed that unattractive, responsive targets were viewed more favorably after exposure to unresponsive dating prospects, regardless of these unresponsive prospects' physical attractiveness. These results highlight the importance of the context in which dating decisions are made.

Grades as information

Article

Apr 2007
ECON EDUC REV

Darren Grant

We determine how much observed student performance in microeconomics principles can be attributed, inferentially, to three kinds of student academic “productivity,” the instructor, demographics, and unmeasurables. The empirical approach utilizes an ordered probit model that relates student performance in micro to grades in prior coursework, demographic information, instructor characteristics, and SAT scores. The micro grade is somewhat informative about general productivity but conveys little information about the most refined type of academic productivity or instructor grading standards, although there is great variation in the average grades given by different instructors. Because of a large unpredictable component of grade determination, however, differences in micro performance across individuals are mostly attributable to non-productivity related factors. As a result, it is very difficult to improve the information individual grades provide about student productivity. Averages of several grades, however, can provide useful information about the productivity of students and the effectiveness of instructors.

"Mirror, Mirror, on the Wall...?": Contrast Effects and Self-Evaluations of Physical Attractiveness

Article

Sep 1983
Pers Soc Psychol Bull

Several studies confirm the operation of contextual contrast effects on judgments of the physical attractiveness of others. The present experiment was conducted to determine whether contrast effects also occur on self-evaluations of physical attractiveness. Fifty-one female college students rated their own attractiveness and body-parts satisfaction following exposure to same-sexed stimulus persons who either were not physically attractive, were physically attractive, or were designated as attractive professional models. The predicted contrast effect was supported for self-perceived attractiveness but not for body satisfaction. Consistent with social comparison theory, subjects gave lower self-ratings in the attractive versus the not attractive and the professionally attractive stimulus context. Correlational analyses also indicated that self-rated attractiveness was related to several personality variables.

The Concept of Probability in Psychological Experiments

Article

Jul 1972

This paper explores a heuristic-representativeness-according to which the subjective probability of an event, or a sample, is determined by the degree to which it: (i) is similar in essential characteristics to its parent population; and (ii) reflects the salient features of the process by which it is generated. This heuristic is explicated in a series of empirical examples demonstrating predictable and systematic errors in the evaluation of un- certain events. In particular, since sample size does not represent any property of the population, it is expected to have little or no effect on judgment of likelihood. This prediction is confirmed in studies showing that subjective sampling distributions and posterior probability judgments are determined by the most salient characteristic of the sample (e.g., proportion, mean) without regard to the size of the sample. The present heuristic approach is contrasted with the normative (Bayesian) approach to the analysis of the judgment of uncertainty.

The Contrast Effect of Physical Attractiveness in Japan

Article

Jan 1993

We examined contextual effects on the judgment of others' attractiveness and self-evaluation among Japanese university students who rated their body satisfaction and self-esteem following exposure to various attractiveness stimuli. Our results showed the existence of a contrast effect of attractiveness stimuli on the judgment of target stimuli in men and women. A similar contrast effect on subjects' self-esteem and body satisfaction occurred in female students only. Western-based attractiveness comparison processes also prevailed in Japan. A gender difference was evident in the contextual effect of physical attractiveness stimuli.

Job Market Signaling

Article

Feb 1973

A. Michael Spence

This chapter discusses job market signaling. The term market signaling is not exactly a part of the well-defined, technical vocabulary of the economist. The chapter presents a model in which signaling is implicitly defined and explains its usefulness. In most job markets, the employer is not sure of the productive capabilities of an individual at the time he hires him. The fact that it takes time to learn an individual's productive capabilities means that hiring is an investment decision. On the basis of previous experience in the market, the employer has conditional probability assessments over productive capacity with various combinations of signals and indices. This chapter presents an introduction to Spence's more extensive analysis of market signaling.

Higher Education as a Filter

Article