Figure - uploaded by Kieran O’Connor
Content may be subject to copyright.
Study 1: Permanent Judges' Scores of Dancing With the Stars Contestants Across All Seasons

Study 1: Permanent Judges' Scores of Dancing With the Stars Contestants Across All Seasons

Source publication
Article
Full-text available
Sequential evaluation is the hallmark of fair review: The same raters assess the merits of applicants, athletes, art, and more using standard criteria. We investigated one important potential contaminant in such ubiquitous decisions: Evaluations become more positive when conducted later in a sequence. In four studies, (a) judges’ ratings of profess...

Contexts in source publication

Context 1
... p 2 = .001 (Table 1). To ensure that these results were not driven by outliers, we conducted a robustness check in which we split the data into two halves: Seasons 1 through 10 and Seasons 11 through 20. ...
Context 2
... led to a slightly smaller number of sections with an order of 1 than of sections with an order of 2. We used order as the primary independent variable in our analysis. Table S1 in the Supplemental Material shows the average class GPA for successive offerings of courses in our data set. ...
Context 3
... the long tail. Table S1 reveals that the number of sections dropped steeply as order increased, with the second half of the series (order > 10) including only 9% (90/991) of all the sections. To ensure that our results were not being skewed by this small subset of the data, we restricted our analysis to sections where order was less than 11, 10, 9, and so on (see Table S3 in the Supplemental Material). ...
Context 4
... we acknowledge this as a potential alternative explanation, we would expect this learning effect to be more pronounced when an instructor first offers a course (say, in the first three offerings) rather than in later sections. Instead, our data suggested that the effect persisted even in later sections of the course offerings (see Table S1), consistent with other findings that teaching effectiveness, if anything, tends to decline with age and years of experience without systematic intervention (for a review, see Marsh, 2007). Furthermore, the conditional analysis in Table S4 indicated that GPA increases are more pronounced for courses that are offered at least four times. ...
Context 5
... pattern of effects remained unchanged if we used unadjusted evaluations as the dependent measure and controlled for the main effect of story. Table S10 in the Supplemental Material shows the mean unadjusted evaluation at each level of order. ...
Context 6
... models revealed similar results with a positive and significant effect of order. See Table S12 in the Supplemental Material for effect-size estimates for all studies. ...

Citations

... For example, researchers using scrambled sentence tasks can present the words in more or less correct grammatical order, with more grammatical primes seeming more fluent even though the words and sentence solutions themselves remain constant (Greifeneder & Bless, 2010). Or, researchers can ask participants to complete a judgment task repeatedly over a period of time; as individuals gain practice performing the task, they may mistakenly use the experience of increasing ease over time as a signal that leads them to give more positive assessments of the stimuli they are judging (O'Connor & Cheema, 2018). Beyond these, there are several ways that one could superficially embed fluency into a task's stimuli in order to influence confidence, specifically. ...
... This work even goes so far as to suggest that intermixed trials of fluent and disfluent stimuli may be necessary for fluency effects to appear (Dechêne et al., 2009;Dechêne, Stahl, Hansen, & Wänke, 2010). Procedural fluency adds to a small but growing body of research that reiterates how stimuli can feel fluent not just in contrast to other stimuli or to one's expectations, but also in concert with each other-that is, the holistic experience of a task matters as much as the individual stimuli themselves (e.g., O'Connor & Cheema, 2018;Susser, Panitz, Buchin, & Mulligan, 2017). ...
Article
Full-text available
Incidental features of a stimulus can increase how easily it is processed, which can then increase confidence in task performance. Here, we examine the impact of fluency stemming from procedural features embedded in a task rather than in the features of a stimulus. We propose that manipulating the consistency of procedural features over a series of stimuli can produce procedural fluency, a metacognitive sense of ease in processing that can inflate confidence without boosting accuracy. That is, even superficial consistency within a task can lead people to inaccurately believe they are performing better. As with fluency derived from features of individual stimuli, drawing attention to procedural consistency leads people to discount it, attenuating its impact on confidence. Further, the influence of procedural fluency on confidence relies on individuals' naïve theories about what fluency signals about their performance. Accordingly, manipulating these naïve theories mitigates the effects of procedural fluency on confidence. (PsycInfo Database Record (c) 2020 APA, all rights reserved).
... Our results bear on people's daily lives, where repetition shapes important decisions (Unkelbach, Koch, Silva, & Garcia-Marques, 2019). As examples, professors assign higher grades in subsequent offerings of the same course, and judges rate competitors more favorably in later seasons of Dancing with the Stars (O'Connor & Cheema, 2018). Reliance on a fluency heuristic has dire consequences in a "post-truth world" (see Lewandowsky, Ecker, & Cook, 2017), where falsehoods tend to be repeated. ...
... Fluency is an appealing candidateeasy processing informs many judgments (Alter & Oppenheimer, 2009), including liking (Iyengar & Lepper, 2000), beauty (Reber, Schwarz, & Winkielman, 2004), and confidence (Schwartz & Metcalfe, 1992). University professors even assign higher grades in later offerings of a course, and judges give higher ratings to professional dancers in later seasons of Dancing with the Stars (O'Connor & Cheema, 2018). Fluency serves as a powerful cue, but other subjective feelings provide shortcuts to truth. ...
Article
Full-text available
Deceptive claims surround us, embedded in fake news, advertisements, political propaganda, and rumors. How do people know what to believe? Truth judgments reflect inferences drawn from three types of information: base rates, feelings, and consistency with information retrieved from memory. First, people exhibit a bias to accept incoming information, because most claims in our environments are true. Second, people interpret feelings, like ease of processing, as evidence of truth. And third, people can (but do not always) consider whether assertions match facts and source information stored in memory. This three-part framework predicts specific illusions (e.g., truthiness, illusory truth), offers ways to correct stubborn misconceptions, and suggests the importance of converging cues in a post-truth world in which falsehoods travel further and faster than the truth. Expected final online publication date for the Annual Review of Psychology, Volume 71 is January 4, 2020. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.
... This could have led us to give higher scores unintentionally for those skills. In a recent study, researchers found that the more experienced the assessors, the higher the scores they tended to give (O'Connor and Cheema 2018). This finding could show that rather than the supervisors behaving differently in relation to the different skills, the research team was simply more experienced at coding them. ...
Article
Full-text available
Understanding how different forms of supervision support good social work practice and improve outcomes for people who use services is nearly impossible without reliable and valid evaluative measures. Yet the question of how best to evaluate the quality of supervision in different contexts is a complicated and as-yet-unsolved challenge. In this study, we observed 12 social work supervisors in a simulated supervision session offering support and guidance to an actor playing the part of an inexperienced social worker facing a casework-related crisis. A team of researchers analyzed these sessions using a customized skills-based coding framework. In addition, 19 social workers completed a questionnaire about their supervision experiences as provided by the same 12 supervisors. According to the coding framework, the supervisors demonstrated relatively modest skill levels, and we found low correlations among different skills. In contrast, according to the questionnaire data, supervisors had relatively high skill levels, and we found high correlations among different skills. The findings imply that although self-report remains the simplest way to evaluate supervision quality, other approaches are possible and may provide a different perspective. However, developing a reliable independent measure of supervision quality remains a noteworthy challenge.
Article
Psychometricians have argued that measurement invariance (MI) testing is needed to know if the same psychological constructs are measured in different groups. Data from five experiments allowed that position to be tested. In the first, participants answered questionnaires on belief in free will and either the meaning of life or the meaning of a nonsense concept called “gavagai.” Since the meaning of life and the meaning of gavagai conceptually differ, MI should have been violated when groups were treated like their measurements were identical. MI was severely violated, indicating the questionnaires were interpreted differently. In the second and third experiments, participants were randomized to watch treatment videos explaining figural matrices rules or task-irrelevant control videos. Participants then took intelligence and figural matrices tests. The intervention worked and the experimental group had an additional influence on figural matrix performance in the form of knowing matrix rules, so their performance on the matrices tests violated MI and was anomalously high for their intelligence levels. In both experiments, MI was severely violated. In the fourth and fifth experiments, individuals were exposed to growth mindset interventions that a twin study revealed changed the amount of genetic variance in the target mindset measure without affecting other variables. When comparing treatment and control groups, MI was attainable before but not after treatment. Moreover, the control group showed longitudinal invariance, but the same was untrue for the treatment group. MI testing is likely able to show if the same things are measured in different groups.
Article
This study aims to examine whether customer ratings and online reviews affect hotel revenues, and if so, to quantify the effects. To achieve this objective, we articulate the mechanisms grounded on reputation theories whereby customer ratings exercise the influence on hotel performance through reputational and signaling effects. Using customer rating data from TripAdvisor and hotel revenue data from Texas, we estimate fixed effects regressions and adopt a regression discontinuity design to separate the signaling effect of customer ratings from reputational effect. We found that the signaling effect of a 1-star increase is an increase of 2.2–3.0% in hotel monthly revenues whereas the reputational effect of a 1-star increase is an increase of around 1.5–2.3% in hotel monthly revenues. Our findings are robust across alternative model specifications and provide insightful implications for hotels to manage their customer ratings.
Article
Reliable and valid message evaluation has a central role in effective health communication and message effects research. The authors have employed a message testing protocol to efficiently acquire valid and reliable message evaluation results: (a) use multiple messages, (b) recruit evaluators from the target population, (c) use valid and reliable effectiveness measures, (d) expose an evaluator to multiple messages, and (e) ensure enough evaluations per message. Two secondary analyses of anti-tobacco message evaluation studies provide evidence for reliability and validity regarding points (d) and (e). Seven studies where adult smokers evaluated the effectiveness of various anti-smoking campaign messages were examined. The first analysis shows that the position in which a message appears has little or no impact on its evaluation, supporting the validity of multiple-exposure design. The second analysis suggests having 25 evaluations per message can achieve a fair balance between accuracy and efficiency.
Article
When seeking out the truth about a certain aspect of the world, people frequently conduct several inquiries successively over a time span. Later inquiries usually improve upon earlier ones; thus, it is typically rational to expect the finding of a later inquiry to be closer to the truth than that of an earlier one. However, when no meaningful differences exist between earlier and later inquiries, later findings should not be considered epistemically superior. However, in these cases, people continue to regard findings from later inquiries as closer to the truth than earlier ones. In 10 experiments, when later inquiries conflicted with—but did not epistemically improve upon—earlier ones, participants’ global judgments about the truth aligned more with later findings than earlier ones, an effect referred to as progression bias. The liability to progression bias may have severe ramifications for the well-being of the society and its members.