Article

Life after P-Hacking

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

In this presentation, we discussed how researchers' commitment to avoid p-hacking will affect their research lives. One conclusion is that most experimental research cannot be successful without at least 50 observations per condition.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... 1 Here "significant" refers to statistical significance and "positive" refers to results that reject socalled "null hypotheses" and thereby (purportedly) push human knowledge forward. As pointed out by Simmons (2018), it is very easy for researchers to engage in p-hacking without being conscious that they are doing so. 2 I discuss in an appendix below (Section 6) how the remaining four elephants relate to p-hacking. ...
... 4 An excessive focus on academic misconduct-such as fraudulent manipulation of data-may in fact be a distraction. Simmons (2018) suggests that "fraud is out there … but it is not very common." Instead, p-hacking is "the main culprit" behind the failure of many studies to replicate (suggesting they are not correct). ...
Article
Ohlson (2023. Empirical accounting seminars: Elephants in the room. Accounting, Economics, and Law: A Convivium ) draws on his experience in empirical accounting seminars to identify five “elephants in the room”. I interpret each of these elephants as either a variant or a symptom of p-hacking. I provide evidence of the prevalence of p-hacking in accounting research that complements the observations made by Ohlson (2023. Empirical accounting seminars: Elephants in the room. Accounting, Economics, and Law: A Convivium ). In this paper, I identify a number of steps that could be taken to reduce p-hacking in accounting research. I conjecture that facilitating and encouraging replication alone could have profound effects on the quality and quantity of empirical accounting research.
... Editors and reviewers can also request authors to analyze the data with different methods (or adding different variables) for both robustness checking and for potentially detecting p-hacking practices. From the readers' and reviewers' perspectives, it is almost impossible to be certain whether a quantitative paper is p-hacked or not, unless journals start requiring disclosure of all sample size rules, measures, and manipulations upon submission of articles (Simmons et al., 2013). The MIS Quarterly guidelines on transparency material promote such disclosure to reduce phacking. ...
... To underscore the value of the quality of papers that are not p-hacked, Simmons et al. (2013) advocate hypothesis-testing researchers to note their papers as not "p-hacked" by including the following words in their methods sections: "We report how we determined our sample size, all data exclusions, all manipulations, and all measures in the study" (p. 775). ...
Article
While researchers are expected to look for significant results to confirm their hypotheses, some engage in intentional or unintentional HARKing (Hypothesizing After Results are Known) and p-hacking (repeated tinkering with data and retesting). If these practices are widespread, one possible result is field-wide exaggerated (inflated) results reported in Information Systems (IS) publications. In this paper, we summarize the literature on HARKing and p-hacking across different disciplines. We offer an illustrative example of how an IS study could involve HARKing and p-hacking in various stages of the project to generate a more “publishable” result. We also report on a survey targeted at IS researchers to explore their experiences and awareness of this issue. Finally, we provide recommendations and suggestions based on the review of practices in other fields and advocate for more transparency in reporting research projects, so that study results can be interpreted properly, and reproducibility and replicability can be increased.
... People were recruited from public spaces on a university campus in the Southeastern United States. We pre-registered a target sample size of 100 participants-50 participants per condition (Simmons et al. 2013). After months of recruitment, reaching the pre-registered sample size with the in-person protocol became ethically and practically untenable because the World Health Organization announced a global pandemic (Ghebreyesus 2020), the university campus closed, and the university IRB announced that all in-person data collection must cease until further notice (Office for Human Subjects Protection 2020). ...
... The current studies were limited by resources for listening to and coding think-aloud verbal reports. This resulted in minimal sample sizes for the research questions addressed in this paper (Simmons et al. 2013). Although the expected effects were detected-some in multiple populations, both in-person and online-there remains an opportunity for researchers with more resources to conduct larger-scale replications and extensions of the existing work. ...
Article
Full-text available
The standard interpretation of cognitive reflection tests assumes that correct responses are reflective and lured responses are unreflective. However, prior process-tracing of mathematical reflection tests has cast doubt on this interpretation. In two studies (N = 201), we deployed a validated think-aloud protocol in-person and online to test how this assumption is satisfied by the new, validated, less familiar, and non-mathematical verbal Cognitive Reflection Test (vCRT). Verbalized thoughts in both studies revealed that most (but not all) correct responses involved reflection and that most (but not all) lured responses lacked reflection. The think-aloud protocols seemed to reflect business-as-usual performance: thinking aloud did not disrupt test performance compared to a control group. These data suggest that the vCRT usually satisfies the standard interpretation of the reflection tests (albeit not without exceptions) and that the vCRT can be a good measure of the construct theorized by the two-factor explication of ‘reflection’ (as deliberate and conscious).
... Based on recommendations to recruit at least 50 participants per experimental condition [78,79], we recruited 233 heterosexual British men via Prolific (M age = 44.99 years, SD age = 15.18). ...
... Based on recommendations to recruit at least 50 participants per experimental condition [78,79], we recruited 182 heterosexual North American men via Amazon's Mechanical Turk. They were compensated with USD 0.75 for their time. ...
Article
Full-text available
Contemporary evidence suggests that masculinity is changing, adopting perceived feminine traits in the process. Implications of this new masculine norm on gender relations remain unclear. Our research aims to better understand the influence of changing masculine norms on men’s endorsement of gender-hierarchy-legitimizing ideologies. Based on Precarious Manhood Theory and Social Role Theory, we conducted two quasi-experimental studies ( N = 412) in which we first assessed heterosexual men's motivation to protect traditional masculinity. Then, we informed them that men’s gender norms are becoming more feminine (feminization norm condition) or are remaining masculine in a traditional sense (traditional norm condition). In the third (baseline-control) condition, participants received no information about men’s gender norms. Finally, we assessed the extent to which participants endorsed gender-hierarchy-legitimizing ideologies, namely sexism (Study 1) and masculinist beliefs (Study 2). Results showed that men who were less motivated to protect traditional masculinity were less likely to endorse gender-hierarchy-legitimizing ideologies when exposed to the feminization and control conditions compared to the traditional norm condition. The implications of these findings for gender equality and gender relations are discussed.
... En otras palabras, la prueba estadística sugiere que los dos grupos provienen de distribuciones diferentes, lo cual no es cierto. Dado que sabemos que ambos grupos vienen de la misma distribución, en este caso el valor de p igual a 0,01 es un falso positivo (Head et al., 2015;Simmons et al., 2013). ...
... Nota: Muestra A (café claro) y Muestra B (café oscuro) pertenecen a la misma distribución. h a c k e a n d o e l v a l o r d e " p " Adoptar un límite de importancia igual a 0,05 quiere decir que aproximadamente el 5 % de las pruebas estadísticas que podemos hacer sobre los datos con la misma distribución puede resultar en falsos positivos (Dean et al., 2017;Head et al., 2015;Simmons et al., 2013). Esta noción es importante al momento de interpretar el valor de p, el cual puede ser hackeado. ...
... As discussed in the Participant, Design, and Sensitivity Analysis section of Experiment 3, the test of difference of effect between approach-avoidance training and approach-avoidance instructions was severely underpowered (Simmons et al., 2013). To overcome this limitation, in Experiment 4 we decided to increase the sample size and to switch to a within-participant design for the approach-avoidance procedure condition (Perugini et al., 2018). ...
... underpowered(Simmons et al., 2013). With a sample size of 63 and experimental design, we have .80 ...
Thesis
In the field of implicit social cognition, indirect evaluative responses represent an opportunity to overcome some of the limitations of self-report. Theoretically capturing something progressively encoded over time and guiding our behaviors, these measures would allow us to determine the attitude that people have towards something, even when these people would not or could not reveal their preferences. For most theoretical models accounting for these behavioral responses, it is through repeated experience that we develop indirect evaluative responses. Recent experimental work, however, highlighted the impact that simple instructions can have on these evaluative responses. Throughout this dissertation, we argue that the effects of repeated experience and simple instructions differ. To investigate this question, we first developed an approach-avoidance training paradigm in which our participants were asked to repeatedly approach and avoid stimuli. After showing that new indirect evaluative responses emerged from this type of training (Exp. 1a–2), we compared this experimental paradigm to an instruction-based procedure (Exp. 3–7). Of these five studies comparing the two procedures on several types of indirect evaluative responses, across different populations, and in different situations, two revealed greater effectiveness of approach and avoidance training (the other three were inconclusive). Two additional experiments addressed the issue of naive theories that individuals might have about this issue (Exp. 8 & 9). Taken together, these results are consistent with recent theoretical advances in the field of implicit social cognition and lead us to recommend paradigms such as approach and avoidance training over paradigms based on simple instructions.
... On the basis of our previous research, it was decided that at least 50 participants per condition would allow detection of a large effect (Cohen's d > .90) with .80 power (see Simmons et al., 2013). We chose to detect a large effect to enhance practical relevance of the outcomes. ...
... Similar as in the previous experiment, the sample size of at least 50 participants was decided to provide a satisfactory statistical power (Simmons et al., 2013). A total of 558 participants entered the survey system, but only 351 of them completed the measures. ...
Article
Full-text available
Two 3(control versus LTI versus HTI) × 2(self-affirmation versus no self-affirmation)-experiments were conducted. The first study presented a news message on the treatability of bowel cancer (N = 717); the second study was about skin cancer (N = 342). The dependent variables were the intention to engage in preventive behaviors and message acceptance. The results showed that when participants were exposed to LTI, only when response efficacy was low, a self-affirmation procedure increased their intention to prevent cancer (experiment 1), and increased message acceptance (experiment 2). When participants were exposed to HTI, the self-affirmation procedure did not increase the intention, and even reduced message acceptance. The findings suggest that defensive processes were active in reaction to LTI, but not in reaction to HTI. Although publishing LTI and HTI information in the media serves legitimate goals, it may have positive but also negative unintended effects on preventive behaviors in the population.
... Qualtrics software randomized assignment of participants to experimental conditions. In this experiment, as well as in Experiment 2, we aimed for a sample size of 50 participants per cell in line with the general recommendations of statisticians (e.g., Simmons et al., 2013). ...
Article
Full-text available
Does working hard take the sting out of regret following failure or does working hard increase feelings of regret? The present research finds that neither of these views is correct. Rather, the results of both experiments found that regret was an interactive function of instrumental effort and goal value. In support of the consistency-fit model, large versus small amounts of effort produced more regret on a low-valued task, whereas small amounts of effort produced more regret on a high-valued task. Furthermore, supporting the consistency-fit model, receiving an undesirable outcome did not always produce more regret on the high- than low-valued task. We discussed several perspectives including attribution, achievement motivation, and cognitive dissonance.
... We did not use power analysis for sample size estimation when planning the study. Instead, we used a rule of thumb and aimed to recruit at least 50 participants [42]. Ultimately, we recruited 103 participants from a Polish university pooling sample (92 women; M age = 22.13 years, SD age = 7.05) who participated in the study in exchange for course credit. ...
Article
Full-text available
Could judgments about others’ moral character be changed under group pressure produced by human and virtual agents? In Study 1 ( N = 103), participants first judged targets’ moral character privately and two weeks later in the presence of real humans. Analysis of how many times participants changed their private moral judgments under group pressure showed that moral conformity occurred, on average, 43% of the time. In Study 2 ( N = 138), we extended this using Virtual Reality, where group pressure was produced either by avatars allegedly controlled by humans or AI. While replicating the effect of moral conformity (at 28% of the time), we find that the moral conformity for the human and AI-controlled avatars did not differ. Our results suggest that human and nonhuman groups shape moral character judgments in both the physical and virtual worlds, shedding new light on the potential social consequences of moral conformity in the modern digital world.
... All studies were preregistered (see Supplemen tary Materials [SM]). Sample sizes were also preregistered, aiming to recruit 100 (or more) participants per condition in every study (Simmons et al., 2013). To see all preregistered analyses not reported in the paper, please see SM. ...
Article
Full-text available
Liberals and conservatives in the United States exhibit intergroup bias toward those on the other side. In three preregistered experiments (N = 1,389), we examined the bias-reducing benefits of individuating members of the political outgroup by providing people with individuating information—information that provides knowledge about them beyond their group membership, such as their social roles, emotions, and personality. Studies 1 and 2 extended work on individuating information into this domain by testing its impact on a novel political outgroup member. Study 3 broke new ground by testing whether the benefits of learning individuating information can extend to additional members of the outgroup. Each methodology revealed that, compared to those who read non-individuating controls, participants who learned individuating information about a political outgroup member were less hostile and more empathic toward that outgroup member. The current studies thus identify a promising avenue for reducing interparty hostility.
... Due to the more complex design involving three CS, and the planned comparison of the two CS + , we expect a smaller effect of the instruction on extinction efficacy than reported by Sevenster et al. (2012), and therefore aimed for a higher sample size. In addition to considering previous experiments, we also followed the recommendations for investigating unknown effect sizes, i.e., to have a sample size of n 50 participants per group in a between-subjects design (Simmons et al., 2013). 7. covered in the introduction; as a result, the inclusion of a reinstatement phase in the experiment feels like it is lacking a clear rationale. ...
... He considers 104 + m (where m is the number of independent variables in the regression) as the minimum. Others propose different thresholds: 20 to 50 byCohen (1988) , 50 byBarrett and Kline (1981) , more than 25 by Jenkins and Quintana-Ascencio (2020) , and more than 50 bySimmons et al. (2013) . ...
Article
Full-text available
In this study, we aim to assess the relevance of credit composition to entrepreneurship empirically in light of the Schumpeterian perspective. The results of such an analysis can imply whether central banks should continue with the so-called neutrality principle or undertake an active credit policy. We employ a panel data model to quantify the effect of credit composition on entrepreneur-ship in 54 high-and middle-income economies from 2001 to 2016. To capture credit composition, we disaggregate total credit as credit to non-financial and financial businesses as well as credit to households and mortgages, and we hypothesize that the larger share of credit for non-financial businesses and households would be associated with greater entrepreneurship. The results indicate that credit composition is important for both high-and middle-income economies, but the effective composition of credit is different in the two sub-samples, which is why the effectiveness of credit allocation should not be taken for granted and active remedies are required. This paper corroborates the Schumpeterian view on the ties between credit allocation and entrepreneurship in both high-and middle-income economies.
... The experiment was programmed with JATOS (Lange et al., 2015). Sample size was determined by recruiting approximately 50 participants per cell (Simmons et al., 2013). ...
Article
Full-text available
Why are people willing to denounce or, contrarily, to keep silent on others’ misconduct? We hypothesized that people would be more likely to cheat, and consequently less likely to blow the whistle, when among an ingroup (vs. outgroup). In two experiments, participants witnessed a same nationality or a different nationality group member cheating during a group task. Participants either had the opportunity to cheat themselves before witnessing this cheating act (Experiments 1 and 2) or did not have this opportunity (Experiment 2). In the ingroup condition, participants cheated more and denounced others’ cheating less than in the outgroup condition (Experiments 1 and 2). However, when participants were not allowed to cheat themselves, they equally denounced ingroup and outgroup cheaters (Experiment 2). This provides evidence that cheating mediates the group effect on whistleblowing and is reminiscent of omertà, that is, the code of silence among criminals. We provide suggestions for future research.
... Though we did not have previous experiments that would enable us to conduct an a priori power analysis, our goal was to recruit at least 50 participants in each experimental condition. This number was based on the recommendation for 50 observations per group (Simmons et al., 2013). We recruited 228 participants in total, each of whom responded to an online survey and was compensated with $1.00 on completion of the study. ...
Article
Full-text available
Objectives: Prosecutors often use race as a basis for excluding Black jurors in cases with Black defendants. The current research tested whether this practice influences juror attitudes (Study 1). It also tested an intervention to prevent racially biased jury selection (Study 2). Hypotheses: We predicted that participants exposed to the exclusion of Black prospective jurors would have more negative feelings compared with those who were not exposed to such exclusions (Study 1). We also predicted that participants taking on the role of a prosecutor would be more likely to exclude a Black (vs. White) prospective juror in a case with a Black defendant and that warnings against race-based decisions would result in elaborate race-neutral rationales for the exclusions (Study 2). Method: In Study 1 (N = 228), participants witnessed a simulated jury selection process. For half of the participants, Black jurors were differentially excluded. In Study 2 (N = 298), participants selected between a Black and a White prospective juror for a case with a Black defendant. Results: Exposure to race-based exclusions negatively impacted perceptions of fairness and emotional responses, especially for Black participants (Study 1). Participants were more likely to exclude a Black juror (vs. White juror) but gave race-neutral rationales for their decisions. The effect of race on juror selection was eliminated when participants were warned against using race as a basis for excluding jurors (Study 2). Conclusions: Race-motivated exclusions affect not only Black defendants, by depriving them of their right to a jury of their peers, but also the jurors who remain to deliberate. A warning could be a viable intervention for curbing the influence of race on prosecutorial decisions during jury selection. (PsycInfo Database Record (c) 2023 APA, all rights reserved).
... The sample size for investigating unknown effect sizes is at least 50 participants per condition for experiments (Simmons et al., 2013). Following our prior studies (e.g., Wang et al., 2021;Wang et al., 2022b), we aimed at a sample of 100 participants per condition for Study 2 and 150 participants per condition for Study 4 (given the planned mediation analysis). ...
Article
Full-text available
Many important family decisions, such as when to have offspring, essentially manifest different life history strategies, ranging from slow to fast ones. The current research examined how one critical societal factor, social mobility (i.e., the shift of socioeconomic status in a society), may contribute to such slow (vs. fast) life history strategies. With four multi-method studies, including archival data at the national level, a large-sample survey (N = 6787), and experimental studies (N = 497), we found that a high level of social mobility predicted and resulted in delayed reproduction. Specifically, a high level of social mobility, indexed by both objective reality and subjective perception, predicted individuals’ positive future expectations. This further leads them to focus on long-term goals and foster a slow life history strategy, i.e., preferring delayed reproduction. Theoretical implications are discussed.
... p-hacking was first described by De Groot [10] as a problem of multiple testing and selective reporting. The term 'p-hacking' appeared shortly after the onset of the replication crisis [9,11], and the practice has since been discussed as one of the driving factors of false-positive results in the social sciences and beyond [12][13][14]. Essentially, p-hacking exploits the problem of multiplicity, that is, α-error accumulation due to multiple testing [15]. ...
Article
Full-text available
In many research fields, the widespread use of questionable research practices has jeopardized the credibility of scientific results. One of the most prominent questionable research practices is p -hacking. Typically, p -hacking is defined as a compound of strategies targeted at rendering non-significant hypothesis testing results significant. However, a comprehensive overview of these p -hacking strategies is missing, and current meta-scientific research often ignores the heterogeneity of strategies. Here, we compile a list of 12 p -hacking strategies based on an extensive literature review, identify factors that control their level of severity, and demonstrate their impact on false-positive rates using simulation studies. We also use our simulation results to evaluate several approaches that have been proposed to mitigate the influence of questionable research practices. Our results show that investigating p -hacking at the level of strategies can provide a better understanding of the process of p -hacking, as well as a broader basis for developing effective countermeasures. By making our analyses available through a Shiny app and R package, we facilitate future meta-scientific research aimed at investigating the ramifications of p -hacking across multiple strategies, and we hope to start a broader discussion about different manifestations of p -hacking in practice.
... We did not conduct a priori power analyses because we did not know what effect sizes would be. However, we aimed to maximize statistical power in our experimental studies in line with recommendations on sample size (e.g., Simmons et al., 2013) and thus recruited an average of 190 participants per cell, after exclusions. All studies were approved by the institutional review board, and all materials, data, analysis code, and codebook are available on the Open Science Framework: https://osf.io/mbnkr/?view_ only=fcc46c181b7f4c719cf4d8f7408e0c93. ...
Article
Full-text available
We test the hypothesis that the perception of stability in one's self-concept (i.e., future self-continuity) enables the experience of meaning in life because perceiving a stable sense of self confers a sense of certainty to the self-concept. Study 1 provided initial evidence of the influence of future self-continuity on feelings of meaning in life (MIL) in a nationally representative sample. In Studies 2a and 2b, we manipulated future self-continuity by varying the expectedness of one's future self, demonstrating the causal influence of future self-continuity on self-certainty and feelings of MIL. Study 3 again manipulated future self-continuity, finding an indirect effect on feelings of meaning in life via self-certainty. Our findings thus suggest the experience of meaning in life arises from the perception of a stable sense of self. We discuss the implications for the antecedents and conceptualization of MIL as well as the nature of the self-concept.
... All multi-item measures were mean. The present studies were conducted with the aim of achieving a sample size of at least N = 50 per condition, as recommended by Simmons, Nelson and Simonsohn (2013). After we reported the individual effects for each study, we proceeded to a mini metaanalysis of aggregated results (Goh, Hall, & Rosenthal, 2016) to try to give an estimate of the size of the effect of group membership on deviance punishment. ...
Article
Deviance Punishment is an important issue for social-psychological research. Group members tend to punish deviance through rejection, ostracism and-more commonly-negative judgments. Subjective Group Dynamics proposes to account for social judgement patterns of deviant and conformist individuals. Relying on a group identity management perspective, one of the model's core predictions is that the judgment of a deviant target depends on group membership. More specifically, the model predicts that deviant ingroup members should be judged more negatively than outgroup ones. Although this effect has been repeatedly observed over the past decades, there is a current lack of sufficiently powered studies in the literature. For the first time, we conducted tests of Subjective Group Dynamics in France and the US to investigate whether ingroup deviants were judged more harshly than outgroup ones. Across six experiments and an internal mini meta-analysis, we observed no substantial difference in judgment between ingroup and outgroup deviant targets, d =-0.01, 95% CI[-0.07, 0.06]. The findings' implications for deviance management research are discussed.
... Participants who responded in the affirmative were not permitted to complete the study, due to the potential for emotional distress, as in Study 1 (though they were still compensated for their participation). There were two between-subjects conditions in this study, so we aimed for approximately 100 total participants after exclusions, to have 50 participants per cell, as recommended by Simmons, Nelson, and Simonsohn (2013). The final sample consisted of N = 97 participants who completed the full study (64 male, M Age = 35.52, ...
Article
Full-text available
The act of suicide is commonly viewed as wrong in some sense, but it is not clear why this is. Based on past empirical research and philosophical theorizing, we test ten different explanations for why suicide is opposed on normative grounds. Using a within-subjects design, Study 1 showed that seven out of ten manipulations had significant effects on normative judgments of suicide: time left to live, lack of close social relationships, a history of prior immoral behavior, the manner in which the suicide is committed, painful, incurable medical issues, impulsive decision-making, and the actor’s own moral-religious background. However, in all cases, the act of suicide was still considered wrong, overall. Using a between-subjects design, Study 2 tested the combined effect of the seven significant manipulations from Study 1. In combination, the seven manipulations eliminated opposition to suicide, on average. Implications for moral psychology and suicide prevention are discussed.
... To maximize power, minimum sample requirement was predetermined using the conservative rule of thumb of 100 participants per condition -double that suggested bySimmons et al (2013), with another 10% added to account for exclusions. The only exception to this was Study 8, where we aimed for an even higher number -200 participants per condition, since it was an exploratory study with more dependent measures compared to the previous studies. ...
Preprint
Full-text available
People strive to be liked by others, and likability has profound effects on various life domains such as relationships and career success. Eight experiments (seven preregistered; total N = 2587) involving Western and Asian samples show that people providing ambiguous (i.e., vague, imprecise) responses to questions are seen as less likable compared to those who provide responses that are specific or precise. This phenomenon was consistently observed across multiple scenarios from family, stranger, and coworker conversations to politician interviews and first dates. This is because response ambiguity is interpreted as a way to conceal the truth, and sometimes as a sign of social disinterest. Consequently, people reported a lower inclination to befriend or date others who responded to their questions ambiguously. We also identified situations in which response ambiguity does not harm likability, such as when the questions are sensitive and the responder may need to “soften the blow”. A final exploratory study (n = 389) showed that beyond likability, response ambiguity also impacts personality trait perceptions such that responders providing ambiguous answers are judged as less warm and extraverted, but also less gullible and more cautious. We discuss theoretical implications for the language psychology and person perception literatures. Given that response ambiguity is a controllable and ever-present feature of conversations, and given the potential reputational and social consequences that come with insufficient response precision, practical implications of the present research are also discussed.
... SD age = 10.52 years). Sample size was determined before data analysis based on budgetary constraints, and we went well beyond recommendations for 50 participants per cell (Simmons, Nelson, & Simonsohn, 2013). A sensitivity power analysis based on this sample size at α = 0.05 and 80% power revealed that we had enough participants to detect an omnibus effect size of f = 0.14, a relatively small effect. ...
Article
Our research centers the underexamined perspective of Black Americans regarding allyship behaviors and investigates their perceptions and experience of a White ally who confronts a White perpetrator of prejudice. In two experimental studies (N = 1176), we found that Black participants reported higher levels of self-esteem after a White ally confronted a White perpetrator of racial prejudice compared to no confrontation. Additionally, we found that White ally confrontations that signaled intrinsic motivations (i.e., driven by personal values) were perceived as less suspicious in motive than those that signaled extrinsic motivations (i.e., driven by image concerns), which related to more target self-esteem. We discuss implications for research on allyship and confrontation as well as practical considerations. Our results strongly suggest that advantaged group members cannot allow overt prejudice to stand unchecked and should consider the motivations they convey in their actions.
... For this and all remaining studies, responder gender (where included) was only a peripheral and exploratory factor aimed at testing generalizability, rather than a main manipulation variable. Minimum sample size was predetermined using the conservative heuristic of 100 participants per conditiondouble that suggested by Simmons et al (2013), with an addition 10% to account for exclusions. In some studies we tried to recruit more than this number to maximize power, regardless, all studies' sample sizes (with the exception of this preliminary study) were preregistered a-priori, and data were only analysed after termination of data collection. ...
Article
Full-text available
Personality inferences are fundamental to human social interactions and have far-reaching effects on various social decisions. Fourteen experiments (13 preregistered; total N = 5160; using audio, video, and text stimuli) involving British, U.S. American, Singaporean, and Australian participants show that people responding to a question immediately (vs. after a slight pause) are seen as more extraverted. This is because response delays are believed to signal nervousness and passivity, and hence introversion. This effect was consistently observed across a range of scenarios from everyday small-talk to mock job interviews, and for various types of response formats, including face-to-face, phone, and online conversations. We found that the effect was not influenced by apparent relationship closeness between the responder and questioner, but that it was influenced by whether observers believed that the responder was mentally occupied during the interaction. Importantly, our results also suggest that the effect of response timing on extraversion perceptions influences hiring decisions – job applicants are more likely to be hired by mock employers for job types congruent with their level of extraversion as exuded from their response timing. Finally, we found that observers typically expect that introverted individuals would pause for longer before responding to questions, as compared to extraverted individuals. Theoretical implications for the understanding of personality impression formation and response timing and practical implications for hiring and other interpersonal situations are discussed. Keywords: response timing, perceived extraversion, impression formation, personality inferences, response delay
... Given, based on the scarce literature on the subject, that estimating the expected effect size (ES) was difficult, we decided to follow the heuristic of Simmons et al. (2013), using a minimum 50 participants by condition, i.e., 200 in total. Of these, we excluded participants whose age was below 18 (n = 4), those who refused consent (n = 5), those who refused the badge (n = 1) and those who did not fully complete the study (n = 6). ...
Article
Full-text available
Increase or decrease in subsequent action following a low-cost act of support for a cause can be predicted from both commitment theory and the slacktivism effect. In this paper, we report on three studies that tested type of motivation (prosocial vs. impression management) as a moderator of the effect of an initial act of support [wearing a badge (S1) and writing a slogan (S2 and 3)] has on support for blood donation. Small-scale meta-analysis performed on data from the three studies shows that activating prosocial motivation generally leads to greater support for the cause after an initial act of support compared to the control condition, while the effect from impression-management motivation can either be negative or null.
... Given the unknown effect size, we aimed for a sample size 2.5 times the suggestion (N = 150) for our correlational study. Because the sample size for investigating unknown effect sizes is at least 50 participants per cell for experiments (Simmons et al., 2013), we aimed for at least 50 participants per condition for all experiments. Because of the planned test for a mediated moderation model in the final two studies, we aimed for 100 participants per condition (Studies 4 and 5). ...
Article
Full-text available
Competitions are ubiquitous, and their psychological consequences for women have not received sufficient attention. For this research, we tested whether competition, in either work settings or a broader form of competition for resources, would interact with the sex is power belief to result in self-objectification among women. This prediction was confirmed by a series of studies (N = 1,416), including correlational studies, a quasi-experiment, and fully controlled experiments, with samples including company employees, MBA students with work experience, college students currently competing in a job market, and Mechanical Turkers. Competition (or a sense of competition) as a feature of the working environment (Study 1), a real state in life (Study 2), or a temporarily activated state (Studies 3–5) resulted in self-objectification among women who believe sex is power (Study 1) or who enter such a mindset (Studies 2–5). This effect further impaired the pursuit of personal growth (Studies 4 and 5). We discuss the implications of these findings.
... Simonsohn, Nelson, and Simmons (2011) recommended that "Authors must collect at least 20 observations per cell". A later recommendation by the same authors presented at a conference suggested to use n > 50, unless you study large effects (Simmons et al., 2013). Regrettably, this advice is now often mis-cited as a justification to collect no more than 50 observations per condition without considering the expected effect size. ...
Article
Full-text available
An important step when designing an empirical study is to justify the sample size that will be collected. The key aim of a sample size justification for such studies is to explain how the collected data is expected to provide valuable information given the inferential goals of the researcher. In this overview article six approaches are discussed to justify the sample size in a quantitative empirical study: 1) collecting data from (almost) the entire population, 2) choosing a sample size based on resource constraints, 3) performing an a-priori power analysis, 4) planning for a desired accuracy, 5) using heuristics, or 6) explicitly acknowledging the absence of a justification. An important question to consider when justifying sample sizes is which effect sizes are deemed interesting, and the extent to which the data that is collected informs inferences about these effect sizes. Depending on the sample size justification chosen, researchers could consider 1) what the smallest effect size of interest is, 2) which minimal effect size will be statistically significant, 3) which effect sizes they expect (and what they base these expectations on), 4) which effect sizes would be rejected based on a confidence interval around the effect size, 5) which ranges of effects a study has sufficient power to detect based on a sensitivity power analysis, and 6) which effect sizes are expected in a specific research area. Researchers can use the guidelines presented in this article, for example by using the interactive form in the accompanying online Shiny app, to improve their sample size justification, and hopefully, align the informational value of a study with their inferential goals.
... Across studies, we took several measures to ensure that our studies have sufficient statistical power. Simmons et al. (2013) recommended studies should include at least 50 participants per cell if an effect size is not known a priori. We decided to go beyond their recommendation and target a sample size of at least 100 participants per condition in our experimental studies (Studies 4a, 4b, and 6). ...
Article
Full-text available
There has been much discussion around when people use "I" versus "we" pronouns, and abstract versus concrete communications, as well as how each of these can shape communication effectiveness. In the current research we bring together these separate research streams. Drawing on research arguing that abstract and concrete language are linked with communicative scope, we argue for an association between linguistic abstractness and personal pronoun usage. Across three archival data sets and two experiments, we find support for this association: Speakers who use more concrete language also use more first person singular (vs. plural) pronouns. In two follow-up studies we further find that this association can impact message effectiveness, such that using more first person singular than plural pronouns is increasingly ineffective when using abstract rather than concrete language, and using more concrete language is increasingly effective when using first-person singular rather than plural pronouns. By illustrating the link between linguistic abstraction and pronoun use, we offer insights into previously documented phenomena and suggest a key way of enhancing communication effectiveness. (PsycInfo Database Record (c) 2022 APA, all rights reserved).
Article
Full-text available
In multiple studies, we found that people who are viewed as possessing a stronger desire for status are, ironically, afforded lower status by others. Coworkers who were viewed as having a higher (versus lower) desire for status (Study 1a and 1b), and individuals who were described as having a higher desire for status (versus a lower desire for status or no information), were afforded lower status (Studies 2, 3a, and 3b). Mediation analyses and an experimental manipulation of the mediator (Study 3a and 3b) suggested that the observed negative effect of desire for status on status was mediated primarily by perceptions of low prosociality. These findings have important implications for status organizing processes in groups.
Article
Full-text available
Infidelity has destructive effects on romantic relationships. Several idiographic characteristics or experiences in an intimate relationship have been linked to unfaithfulness. Yet, relatively little research has been paid to investigate how sexist beliefs might sabotage relationships by incurring infidelity. The present research examined the association between men’s ambivalent sexism – hostile sexism and benevolent sexism – and men’s infidelity as well as women’s perception of the likelihood of men’s infidelity. The results showed that men’s hostile sexism and benevolent sexism predicted their increased infidelity (Studies 1 and 2). In addition, the indirect association between ambivalent sexism (both hostile sexism and benevolent sexism) and infidelity was through the importance placed on power in one’s intimate relationship in general (Study 2). Importantly, women were unaware of benevolently sexist men’s increased infidelity, such that women rated benevolently sexist men as having a lower likelihood of engaging in infidelity than hostilely sexist men and believed benevolently sexist men’s infidelity level was similar to nonsexist men (Study 3). Therefore, these findings contribute to the psychology of infidelity by revealing that ambivalent sexism, both hostile sexism and benevolent sexism, are significant predictors. Implications of the findings are discussed.
Chapter
First published as a special issue of the Policy and Politics journal, this book situates reforms known as 'nudges' or 'behavioural interventions' which have emerged in public policy and administration within a broader tradition of methodological individualism.
Article
Full-text available
Practical wisdom, an essential component of leadership, has been approached mainly from a theoretical perspective. While there are barely any empirical studies on leaders’ practical wisdom, quantitative ones are even rarer, and no valid measure of a leader’s practical wisdom exists. Thus, our understanding of whether and how wise leaders influence their followers is limited. Inspired by Thomas Aquinas’ ideas on practical wisdom, we operationalize it as a tridimensional capacity of inquiring, judging, and acting in an emotionally regulated way, and develop and validate a corresponding measure of leader-expressed practical wisdom. To support our operationalization, we test how leader-expressed practical wisdom predicts employees’ speaking up behaviors via their psychological safety. Our rationale is that to make better decisions, wise leaders are receptive to employees’ views that address matters of concern and challenge the status quo with the intention of improving the situation – such a receptiveness being enabled by fostering employees’ psychological safety. Through a two-wave field study, a three-wave field study, and a vignette-based experiment carried out in three countries we obtain empirical support for that three-dimensional construct and show that leader-expressed practical wisdom predicts employees’ speaking up behaviors via their psychological safety.
Article
This work examines lay beliefs about the societal implications of different forms of ingroup identity. While secure ingroup identity reflects a genuine attachment to one’s ingroup members, defensive forms of identity are aimed at satisfying individual enhancement motives through highlighting belongingness to an exceptional group. The latter can be exemplified by collective narcissism, a belief in ingroup greatness and entitlement to privileged treatment, which has been linked to undesirable intra- and intergroup outcomes. In three experiments (total N = 473), conducted in the context of national identities, we investigated how people perceive the manifestations of collective narcissism, contrasted with secure ingroup identity and low identity. Across all studies, participants expected the highest outgroup hostility and poorest intragroup relations from those high in collective narcissism. However, perceivers who were themselves high in collective narcissism were less likely to expect these undesirable manifestations, thus revealing a biased perception of similar others.
Article
Can happiness be reliably increased? Thousands of studies speak to this question. However, many of them were conducted during a period in which researchers commonly “ p-hacked,” creating uncertainty about how many discoveries might be false positives. To prevent p-hacking, happiness researchers increasingly preregister their studies, committing to analysis plans before analyzing data. We conducted a systematic literature search to identify preregistered experiments testing strategies for increasing happiness. We found surprisingly little support for many widely recommended strategies (e.g., performing random acts of kindness). However, our review suggests that other strategies—such as being more sociable—may reliably promote happiness. We also found strong evidence that governments and organizations can improve happiness by providing underprivileged individuals with financial support. We conclude that happiness research stands on the brink of an exciting new era, in which modern best practices will be applied to develop theoretically grounded strategies that can produce lasting gains in life satisfaction. Expected final online publication date for the Annual Review of Psychology, Volume 75 is January 2024. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.
Preprint
Full-text available
Evidence for the prioritization of moral information in cognitive processes is mixed. We examined this question using a series of eleven experiments where participants first learned associations between moral characters and geometric shapes and then performed simple speed tasks. In the first six experiments, we tested and validated prioritized responses to good characters over bad and neutral characters. To pin down the processes that are critical to the prioritization effects, in the remaining five experiments, we examined two opposing hypotheses: the valence hypothesis suggests that a general positivity bias towards all underpins the effects, while the self-binding account posits that self-referencing, rather than other-referencing is the fundamental driver of the effects. The data support the latter. Together, these results show a robust prioritization effect of good character through self-referencing processes, indicating the innate connection between morality and oneself and how humans use self-reference to explore the world and learn morality.
Article
The human mind is a mosaic composed of multiple selves with conflicting desires. How can coherent actions emerge from such conflicts? Classical desire theory argues that rational action depends on maximizing the expected utilities evaluated by all desires. In contrast, intention theory suggests that humans regulate conflicting desires with an intentional commitment that constrains action planning towards a fixed goal. Here, we designed a series of 2D navigation games in which participants were instructed to navigate to two equally desirable destinations. We focused on the critical moments in navigation to test whether humans spontaneously commit to an intention and take actions that would be qualitatively different from those of a purely desire-driven agent. Across four experiments, we found three distinctive signatures of intentional commitment that only exist in human actions: "goal perseverance" as the persistent pursuit of an original intention despite unexpected drift making the intention suboptimal; "self-binding" as the proactive binding of oneself to a committed future by avoiding a path that could lead to many futures; and "temporal leap" as the commitment to a distant future even before reaching the proximal one. These results suggest that humans spontaneously form an intention with a committed plan to quarantine conflicting desires from actions, supporting intention as a distinctive mental state beyond desire. Additionally, our findings shed light on the possible functions of intention, such as reducing computational load and making one's actions more predictable in the eyes of a third-party observer.
Article
Full-text available
The present article aims to elucidate whether offender unforgiveness predicts organizationally targeted displaced revenge and whether this effect occurs because offender‐directed feelings spill over to shape feelings towards the group. Two studies (Study 1a/1b) showed that unforgiveness predicts organizationally directed displaced revenge in the form of counterproductive workplace behaviours against an organization, mediated by perceived group betrayal. Study 2 investigated whether the relationships between unforgiveness, perceived group betrayal, and displaced revenge are moderated by group embodiment: the extent to which the offender is closely connected to, identified with, and in alignment with the group. With an experimental design that manipulated group embodiment and transgressor status, we found that unforgiveness and perceived group betrayal predict higher levels of displaced revenge under conditions of high rather than low group embodiment. Study 2 further showed that displaced revenge intentions operate in addition to, not in place of, revenge intentions towards the offender. Implications are discussed.
Article
Full-text available
The identification of an empirically adequate theoretical construct requires determining whether a theoretically predicted effect is sufficiently similar to an observed effect. To this end, we propose a simple similarity measure, describe its application in different research designs, and use computer simulations to estimate the necessary sample size for a given observed effect. As our main example, we apply this measure to recent meta-analytical research on precognition. Results suggest that the evidential basis is too weak for a predicted precognition effect of d = 0.20 to be considered empirically adequate. As additional examples, we apply this measure to object-level experimental data from dissonance theory and a recent crowdsourcing hypothesis test, as well as to meta-analytical data on the correlation of personality traits and life outcomes.
Article
SYNOPSIS Amazon Mechanical Turk (MTurk) is an increasingly popular source of experimental participants due to its convenience and low cost (relative to traditional laboratories). However, MTurk presents challenges related to statistical power and reliability. These challenges are not unique to MTurk, but are more prevalent than in research conducted with other participant pools. In this paper I discuss several reasons why research conducted with MTurk may face additional power and reliability challenges. I then present suggestions for dealing with these challenges, taking advantage of the comparative strengths of MTurk. The discussion should be of interest to Ph.D. students and other researchers considering using MTurk or other online platforms as a source of experimental participants as well as to reviewers and editors who are considering quality control standards for research conducted with this participant pool. JEL Classifications: M40; M41; M42; C18; C90; C91.
Preprint
Full-text available
A considerable proportion of psychological research has not been replicable, and estimates range from 9% to 77% for nonreplicable results. The extent to which vast proportions of studies in the field are replicable is still unknown, as researchers lack incentives for publishing individual replication studies. When preregistering replication studies via the Open Science Foundation website (OSF, osf.io), researchers can publicly register their results without having to publish them and thus circumvent file-drawer effects. We analyzed data from 139 replication studies for which the results were publicly registered on the OSF and found that out of 62 reports that included the authors’ assessments, 23 were categorized as “informative failures to replicate” by the original authors. 24 studies allowed for comparisons between the original and replication effect sizes, and whereas 75% of the original effects were statistically significant, only 30% of the replication effects were. The replication effects were also significantly smaller than the original effects (approx. 38% the size). Replication closeness did not moderate the difference between the original and the replication effects. Our results provide a glimpse into estimating replicability for studies from a wide range of psychological fields chosen for replication by independent groups of researchers. We invite researchers to browse the Replication Database (ReD) ShinyApp, which we created to check for whether seminal studies from their respective fields have been replicated. Our data and code are available online: https://osf.io/9r62x/
Article
Building on perspectives highlighting the social nature of workplace creativity, we argue that being in a creative mindset will highlight the value that co-workers provide to the creative process. This heightened awareness of co-workers as being integral to the creative process increases social closeness with these co-workers, subsequently reducing instigated rudeness towards, as well as perceived rudeness from, those co-workers. In four studies (both in the field as well as in the lab), we find support for these theoretical predictions. Our work also identifies when and for whom these effects are likely to be strongest, indicating that the effect of being in a creative mindset on social closeness is stronger in contexts characterized by high (vs. low) psychological safety, and weaker for employees high (vs. low) in dispositional creativity. We discuss the theoretical and practical implications of our findings.
Article
The reliability of some published research from well-funded disciplines of medicine and psychology has been brought into question. This is because some researchers failed to achieve consistent results after replicating published studies using the same methodology. Researchers have referred to this as the ‘replicability in science crisis’ and have identified several practices contributing to unreliable science. Protected area and other conservation researchers are unlikely to be immune from these poor practices given they use the same scientific approaches as other disciplines. Fortunately, there are solutions to the poor practices contributing to unreliable science. In this paper I identify those poor practices and describe solutions as identified by researchers from a range of disciplines. These solutions are transferable to protected area science and related conservation disciplines. Most solutions are not costly or demanding to implement. Adopting these solutions can improve the reliability of both published and unpublished research.
Article
We present an empirical demonstration that people rely on linguistic valence as a direct cue to a speaker’s group membership. Members of the U.S. voting public judge positive words as more likely to be spoken by members of their political in-group, and negative words as more likely to be spoken by members of their political out-group (three studies with 655 participants). We further find that participants perceive pluralized forms of nouns as more extremely valenced than singular forms (one study with 280 participants). This allowed us to control for the semantic content of words while eliciting systematic differences in the source attributions made by partisans. Our work contributes to both theory and methodology used to understand the linguistic cues people use to make social relational judgments.
Article
Full-text available
Helping acts, however well intended and beneficial, sometimes involve immoral means or immoral helpers. Here, we explore whether help recipients consider moral evaluations in their appraisals of gratitude, a possibility that has been neglected by existing accounts of gratitude. Participants felt less grateful and more uneasy when offered immoral help (Study 1, N = 150), and when offered morally neutral help by an immoral helper (Study 2, N = 172). In response to immoral help or helpers, participants were less likely to accept the help and less willing to strengthen their relationship with the helper even when they accepted it. Study 3 ( N = 276) showed that recipients who felt grateful when offered immoral help were perceived as less likable, less moral, and less suitable as close relationship partners than those who felt uneasy by observers. Our results demonstrate that gratitude is morally sensitive and suggest this might be socially adaptive.
Article
Dominant actors are neither liked nor respected, yet they are reliably deferred to. Extant explanations of why dominant actors are deferred to focus on deferrers' first-order judgments (i.e., the deferrers' own private assessment of the dominant actor). The present research extends these accounts by considering the role of second-order judgments (i.e., an individual's perception of what others think about the dominant actor) in decisions to defer to dominant actors. While individuals themselves often have little respect for dominant actors, we hypothesized that (1) they think others respect dominant actors more than they do themselves, and (2) these second-order respect judgments are associated with their decision to defer dominant actors above and beyond their own first-order respect judgments. The results of four studies provide support for these hypotheses: across a variety of contexts, we found evidence that individuals think others respect dominant actors more than they themselves do (Studies 1–3), and perceptions of others' respect for dominant actors is associated with individuals' own decisions to defer to them, above and beyond first-order respect (Studies 3–4). Results highlight the importance of considering second-order judgments in order to fully understand why dominant actors achieve high social rank in groups and organizations.
Article
We propose that an organizational culture where playing politics is important for advancement, compared with an organizational culture where showing competencies is important, elicits stronger lack of fit experiences for women than for men. In a pre-study, playing politics was perceived as dominant, typically male work behaviors, whereas showing competencies was perceived as competent, typically female work behaviors. We then tested in two experiments (689 individuals, integrated in a small-scale meta-analysis) the joint effect of organizational culture and gender on four lack of fit indicators (self-concept conflict, fear of backlash, intention to seek power positions, concerns about one’s skills). As expected, women indicated more lack of fit experiences than men in politics cultures, but not in competencies cultures. Our findings suggest that perceived organizational culture may play an important role in understanding the dynamics of career advancement of women and men.
Article
Research suggests that White women often experience more gender backlash than women of color in response to expressions of agency. We consider whether this differential in backlash is driven by the match or mismatch of the race of both perceivers and targets. Much of the existing work in this space examines the perspective of White perceivers, which might underestimate racial minority women’s susceptibility to backlash if backlash occurs primarily in same-race interactions. We examine how the racial group memberships of targets and perceivers jointly affect backlash against gender-norm violating women. In analyses of Dr. Christine Blasey-Ford’s accusations of sexual assault against Brett Kavanaugh and Anita Hill’s accusations against Clarence Thomas during their respective U.S. Supreme Court confirmation hearings, an archival analysis of the 2016 U.S. presidential election, and two experiments, we find that perceivers of different races tend to express more backlash toward racial in-group than out-group women.
ResearchGate has not been able to resolve any references for this publication.