ArticlePDF Available

Performative Quantification: Design Choices Impact the Lessons of Empirical Surveys About the Ethics of Autonomous Vehicles

March 2023
Social Science Computer Review 42(1):1-17

March 2023
42(1):1-17

DOI:10.1177/08944393231164329

Authors:

Hubert Etienne

Meta

Florian Cova

University of Geneva

In recent years, researchers have emphasized the relevance of data about commonsense moral judgments for ethical decision-making, notably in the context of debates about autonomous vehicles (AVs). As such, the results of empirical studies such as the Machine Moral Experiment have been influential in debates about the ethics of AVs and some researchers have even put forward methods to automatize ethical decision-making on the basis of such data. In this paper, we argue that data collection is not a neutral process, and the difference in study design can change participants' answers and the ethical conclusions that can be drawn from them. After showing that participants' individual answers are stable in the sense that providing them with a second occasion to reflect on their answers does not change them (Study 1), we show that different conclusions regarding participants' moral preferences can be reached when participants are given a third option allowing AVs to behave randomly (Study 2), and that preference for this third option can be increased in the context of a collective discussion (Study 3). We conclude that design choices will influence the lessons that can be drawn from surveys about participants' moral judgments about AVs and that these choices are not morally neutral.

Example of scenario in Studies 1 and 2

…

Example of scenario in Study 3

…

Percentage of Participants Who Chose Option 2 ("Swerve") at the First and Second Presentation of Each Scenario (Study 1). For each scenario, we indicate the criteria that vary across the two options, and the consequence of each choice. Rightmost column indicates the results of a McNemar's Chi-Square test (df = 1) Comparing the distribution of answers between each set. Case 5 does not appear in the table, as it was an attention check. Cases 14 and 15 do not appear because, due to programming error, their consequences varied across presentations.

…

Percentage of Participants Who Rated Each Criterion as Morally Relevant, Along With Participants' Average Confidence in Their Answer, and Ratings of How Much They Understand That Someone can Think Differently (Study 2).

…

Percentage of Participants Selecting the "Random" Option in Each Set (1-3) and Vignette. * Indicates the result of a McNemar test comparing the percentage of participants selecting the "Random" option between Set 1 and Set 3.

…

Figures - uploaded by Hubert Etienne

Content may be subject to copyright.

Content uploaded by Hubert Etienne

Content may be subject to copyright.

Content uploaded by Hubert Etienne

Content may be subject to copyright.

Original Manuscript

Social Science Computer Review

2023, Vol. 0(0) 1–17

Article reuse guidelines:

sagepub.com/journals-permissions

DOI: 10.1177/08944393231164329

journals.sagepub.com/home/ssc

Performative Quantiﬁcation:

Design Choices Impact the

Lessons of Empirical Surveys

About the Ethics of

Autonomous Vehicles

Hubert Etienne

1,2,3

and Florian Cova

Abstract

In recent years, researchers have emphasized the relevance of data about commonsense moral

judgments for ethical decision-making, notably in the context of debates about autonomous

vehicles (AVs). As such, the results of empirical studies such as the Machine Moral Experiment have

been inﬂuential in debates about the ethics of AVs and some researchers have even put forward

methods to automatize ethical decision-making on the basis of such data. In this paper, we argue

that data collection is not a neutral process, and the difference in study design can change

participants’answers and the ethical conclusions that can be drawn from them. After showing that

participants’individual answers are stable in the sense that providing them with a second occasion

to reﬂect on their answers does not change them (Study 1), we show that different conclusions

regarding participants’moral preferences can be reached when participants are given a third

option allowing AVs to behave randomly (Study 2), and that preference for this third option can be

increased in the context of a collective discussion (Study 3). We conclude that design choices will

inﬂuence the lessons that can be drawn from surveys about participants’moral judgments about

AVs and that these choices are not morally neutral.

Keywords

AI ethics, Autonomous vehicles, Empirical ethics, Moral judgements, Measurement, Surveys

Department of Philosophy, Ecole Normale Sup´

erieure, Paris, France

Laboratory of Computer Sciences (LIP6), Sorbonne Universit´

e, Paris France

Facebook AI Research, Paris, France

Department of Philosophy, Universit´

e de Genève, Geneva, Switzerland

Corresponding Author:

Hubert Etienne, Department of Philosophy, Ecole Normale Sup´

erieure, 45 rue d’Ulm, Paris 75230, France.

Email: hae@meta.com

Introduction

Online platforms have become a common way to collect and quantify political and moral opinions

to inform decision-makers, conduct research, or even automate decision-making. However,

critical sociology has highlighted the non-neutrality of quantiﬁcation methods used for statistical

analysis, resulting in numerous controversies statisticians have debated (Desrosières, 2008). A

famous example of this critique is Bourdieu’s claim that “public opinion does not exist”because

opinion surveys are associated with political constructions strongly determined by their meth-

odology, the questions’framing and goals (Bourdieu, 1972). However, this call for prudence,

which characterized the social sciences’adoption of statistical methods, was unheeded by those

who developed computational approaches to social choice based on machine-learning models.

Indeed, despite relying on copious amount of data about people’s moral and social preferences,

these approaches have been particularly blind to the extent to which design choices and

methodological limitations in surveys’design may shape researchers’conclusions. As such

research typically aims to play a crucial role in shaping public debate and policy-making, our goal

in this paper is to draw attention to the way design choices can impact the conclusions of

computational approaches to social decision-making.

More precisely, we focus on recent attempts at automatizing social and moral decision-making

around autonomous vehicles (AVs). Through three studies, we provide experimental evidence that

subtle choices in survey design do impact participants’replies and the conclusions machine-

learning algorithms would draw while trying to deduce a general picture of social preferences from

their answers. Our results allow us to refute the conclusions of foundational experimental works

about people’s moral opinions on AVs dilemmas, demonstrating that more data does not necessary

lead to more accurate results, as attempts to aggregate moral opinions can fall in critical pitfalls. In

doing so, we also suggest better designs for online survey makers to collect accurate moral

opinions.

The recent development of AVs led academics to engage in numerous debates about the ethical

questions raised by this technology. However, most of these debates have focused on how AVs

should behave when they have to choose between courses of actions that would all lead to

inﬂicting harm on others. For example, should AVs decide to harm an animal rather than a human

being? One person rather than two? A young person rather than an old person?

One reason these questions have drawn so much attention is because such dilemmas have been

at the center of ambitious empirical studies. The Moral Machine Experiment (MME, Awad et al.,

2018) presented participants with a series of dilemmas involving AVs, in which they had to choose

whether an AV should go ahead and harm certain people, or swerve and injure other people. They

collected c. 40 million decisions from people in 233 countries and territories, what probably makes

it the largest available dataset on people’s intuitions about moral dilemmas. Based on this un-

precedented wealth of data, Awad and colleagues found several patterns in people’s preferences,

including preferences to save human beings rather than pets, several people rather than one person,

young people rather than old people and people who are healthy rather than people who are not.

As authors of the MME have stressed in several places (e.g., Bonnefon, 2019), these con-

clusions are supposed to be descriptive, not prescriptive: they describe how people prefer AVs to

be programmed, not how we should program them in the end. However, they still argue that these

data are relevant to normative debates about the way AVs should be programmed. A “modest”

argument in favor of this relevance relies on pragmatic considerations: if policymakers want

people to adopt AVs, then they should make sure that the behavior of AVs does not conﬂict with

people’s sense of morality. For example, Awad and colleagues (2018) write that “we need to have a

global conversation to express our preferences to the companies that will design moral algorithms,

and to the policymakers that will regulate them,”and that “we can embrace the challenges of

2Social Science Computer Review 0(0)

machine ethics as a unique opportunity to decide, as a community, what we believe to be right or

wrong; and to make sure that machines, unlike humans, unerringly follow these moral prefer-

ences.”Thus, results of the MME are morally relevant because they allow people to express their

preferences in the context of a collective ethical decision about the way AVs should be

programmed.

A more “ambitious”version of this approach proposes to automatize ethical decision-making

by collecting people’s judgments about such ethical issues and using aggregation methods to reach

“credible”ethical decisions. Thus, Noothigattu et al. (2018)’s Voting-based system (VBS) aims to

automatize moral decision-making in the context of dilemmas involving AVs: rather than pro-

viding AVs with general ethical principles people have agreed upon, we should simply provide

them with people’s opinion on ethical dilemmas and have the AV learn from these individual

choices to make their decisions. This more ambitious proposal has been resisted on several

grounds. For example, it has been criticized for being based on morally fallacious methodological

axioms—for example, endorsing Conitzer’s assumption (2017) that aggregating moral agents’

judgments “may result in a morally better system than that of any individual human, for example,

because idiosyncratic moral mistakes made by individual humans are washed out in the ag-

gregate”(Etienne, 2021;Greene et al., 2016).

Despite their differences, both approaches rest on the assumption that the data collected in the

context of the MME and similar studies accurately reﬂect participants’attitudes—and, more

importantly, the type of attitudes that might be relevant to the public reception of AVs. However,

the behavioral economists’literature on nudges, reminds us of the great sensitivity of respondents’

replies to survey designs (Thaler & Sunstein, 2008), emphasizing the critical importance of

collecting responses that accurately reﬂect people’s opinions. As such, there are at least ﬁve

dimensions on which the kind of data collected by the type of approach illustrated by the MME

might fall short of being relevant to ethical decision-making.

Aﬁrst dimension is perspective (PT). Indeed, past research has suggested that moral intuitions

can be shaped by the point of view from which we approach moral issues. For example, research

on moral dilemmas suggest that participants’intuitions might depend on whether they approach

this problem from a ﬁrst- or third-person perspective (Nadelhoffer & Feltz, 2008;Tobia et al.,

2013; but see Cova et al., 2021 for failure to replicate). Similarly, research on moral and political

reasoning suggests that people’s judgments can be modiﬁed by asking to take them certain speciﬁc

perspectives on moral and political issues—such as the perspective of an “impartial spectator”(see

Allard & Cova, forthcoming for a review of this research). In the context of AVs, Bonnefon and

colleagues (2016) observed that, though participants were supportive of AVs that might sacriﬁce

passengers to save others, they were reluctant to ride in vehicles designed to follow this moral

principle. Moreover, Frank and colleagues (2019) found that participants were less likely to

answer that AVs should sacriﬁce their passengers when asked to adopt the perspective of an AV

passenger (compared to the perspective of a pedestrian or observer). This suggests that what seems

acceptable might depend on the perspective participants adopt when reﬂecting about AVs.

However, we do not know which perspective participants naturally endorse when participating in

such studies, and whether this is one relevant to public discussion at large.

A second dimension is deliberation time (DT). Past research on moral intuitions have em-

phasized that our responses towards moral dilemmas sometimes pit against each other a quick,

intuitive answer and a slower, reﬂective one (Greene, 2014). In line with this “dual-process”

approach to moral judgment, it has been shown that people’s responses to moral dilemmas can

change when they are asked to take time to reﬂect before answering (Capraro et al., 2019;Suter &

Hertwig, 2011). Accordingly, recent research suggests that people’s response to moral dilemmas

involving AVs can change depending on whether they are asked to answer quickly or not: in a

dilemma asking participants whether the AV should sacriﬁce pedestrians or its passengers,

Etienne and Cova 3

participants who had to answer quickly (within 5s) were more likely to answer that it should

sacriﬁce its passengers (Frank et al., 2019). However, quick answers are not necessarily those

relevant to public debates, in which people are supposed to take the time to reﬂect and ponder

different factor. Additionally, most research on people’s judgments about AVs probe participants’

intuitions when ﬁrst exposed to a certain problem. But people engaged in public deliberation are

more likely to be repeatedly exposed to the questions they are asked to answer, giving their more

time to deliberate. Thus, it might be that answers collected by the MME only reﬂect quick,

intuitive answers based on a single exposure and not the slower, more reﬂective answers based on

repeated exposure that are more relevant to public debate.

A third dimension is whether people reﬂect about abstract principles or concrete cases (AvC).

While public debates about AVs are likely to be framed in terms of abstract principles (e.g.,

“should we take age into consideration?”), most studies have focused on participants’responses to

particular cases. However, past research in moral psychology and experimental philosophy have

emphasized the fact that people can give radically different answers depending on whether

questions are about abstract principles or concrete cases (Freiman & Nichols, 2011;Nichols &

Knobe, 2007;Sinnott-Armstrong, 2008;Struchiner et al., 2020).

A fourth dimension is the number of options, and more precisely the presence of a third option

(3O). Past research has shown that introducing a third option in a moral dilemma can dramatically

switch participants’moral preferences (Wiegmann et al., 2020). In the context of moral dilemmas

including AVs, Bigman and Gray (2020) have shown that introducing a third option allowing the

AV to make a decision at random drastically reduced the relevance of certain factors such as age,

gender or social status. Awad and colleagues (2020) argued that participants’answers were biased

by the formulation of the Bigman and Gray’s third option, which made it more ethically attractive

(“Treat the lives of X and Y equally”). However, Bigman and Gray also ran a study in which the

third option was formulated in a more neutral way (“To decide who to kill and who to save without

considering whether it is X or Y”) and participants still showed a wide preference for this option.

Awad and colleagues (2020) also argued that, when they offered their participants to indicate their

preference between two options using a slider, very few participants chose to put the slider at the

middle, which would indicate that they are indifferent between two options. However, this

counterargument rests on a confusion: preferring that the choice between A and B be random is not

the same as being indifferent between A and B if one is forced to choose between A and B. This

confusion is based on the assumption that preference for a choice at random must be grounded in

indifference, while it is more likely to be grounded in a moral preference for impartiality. Finally,

recent studies have shown that participants are much more likely to be outraged by AVs that makes

decisions based on criteria such as age, gender or moral status, compared to AVs that makes

decisions at random (De Freitas & Cikara, 2021). This suggests that allowing participants to have

AVs programmed to choose at random might lead to a very different picture of public consensus

compared to the two-options method used by studies such as the MME.

Aﬁfth and last dimension is the presence of a series of objections and a collective discussion

(DISC). Most people arguing for the relevance of empirical approaches to ethical debates on AVs

argue that such methods allow non-experts to take part in the collective discussion about the ethics

of AVs. However, most of the time, the data collected do not reﬂect the outcome of a public

discussion, but the aggregation of individual differences, formed in isolation. But psychological

studies show that the outcome of collective discussions is not similar to the mere aggregation of

individual answers, and that the results of public discussion are generally more efﬁcient (Balliet,

2010). This is why some have argued that discussion might lead people to converge on “better

moral judgments”(Mercier, 2011). Thus, if one’s goal is to identify the moral principles on which

people would converge, it might be more interesting to collect judgments that have been formed as

the result of a discussion, rather than in isolation, and challenged by the review of objections.

4Social Science Computer Review 0(0)

For all these reasons, it might be that public opinion as it has been measured through studies

such as the MME might not be the most relevant to ethical decision-making about AVs, and that

changing some of the aforementioned parameters might lead people to converge on very different

options. If this is the case, then programming AVs so that they learn to make decisions based on the

answers people gave to the MME might not lead AVs to make “credible”decisions, but rather to

go against the ethical principles that are likely to be endorsed as the result of public discussion. To

investigate this possibility, we conducted three exploratory studies. In the ﬁrst study, we explored

the impact of PT and DT, and had people answer dilemmas about AVs quickly and slowly. In the

second study, we explored the joint impact of AvC, 3O and DISC by comparing participants’

answers when collected according to the methods of the MME and when collected at the issue of a

collective discussion on abstract principles including the possibility of the third option. In the third

study, we explored how robust is participants’preference for their third option, and whether it is

favored by collective discussion.

Study 1: The Effect of Time Constraint on Participants’Judgments

First, we wanted to determine to which extent answers collected following the MME methods

were robust, or changed (i) depending on the perspective participants were asked to adopt, and (ii)

when participants were given more time to reﬂect on them.

Materials and Methods

We used an experimental paradigm similar to the one used by the MME. Participants were

presented with 16 (+1 control) moral dilemmas in which an AV experiences a brake failure,

preventing it from stopping safely on time. Respondents then had to choose whether the AV

should keep straight or swerve into the other lane, resulting in various consequences involving at

least one individual’s death (see Figure 1). Each scenario required respondents to arbitrate be-

tween different categories of victims according to nine criteria: gender (male vs. female), age

(young vs. adult vs. aged), body size (fat vs. ﬁt), social status (executive vs. homeless, doctor vs.

criminal), nature (humans vs. pets), number (1 vs. 2 vs. 3 vs. 5), role in the dilemma (pedestrian vs.

AV passengers) and lawfulness (jaywalkers vs. lawful pedestrians) (see Table 1 for the full list of

combinations). Participants had to indicate their answer by choosing between two options:

Continue (Option 1) or Swerve (Option 2).

Figure 1. Example of scenario in Studies 1 and 2

Etienne and Cova 5

Participants were presented with the full set of scenarios twice. For the ﬁrst presentation (Set 1),

participants were instructed to answer questions “as quickly as possible.”For the second presentation (Set

2), they were instructed to “take time to think about their answer.”To force them to take time to think,

there was an invisible time counter at the bottom of the vignette and participants could not submit their

answer before the counter reached zero. To investigate whether the effect of a second exposure on

participants’answers would depend on how much time they were asked to think about their answer, the

time they were asked to wait before answering varied across participants (10s, 20s, 30s, or 40s).

The way questions were framed varied across participants. One fourth of participants were

asked to answer as if they were designers of AVs, another fourth as if they were citizens answering

a national public consultation on AVs, and another fourth as if they were policymakers preparing

the regulation for the self-driving industry. The last fourth did not receive speciﬁc instructions.

Results

Participants were US and UK residents recruited through Proliﬁc Academic. After excluding 46

participants who failed the attention check, we were left with 608 participants (278 women, 324

men, 6 “others”;M

age

= 30.69, SD

age

= 11.22).

Perspective-Taking. We ﬁrst assessed the effect of perspective-taking on participants’answers. For

each vignette, we compared the proportion of participants’answers across all four conditions

using chi-square tests. Results are presented in Table S1 in Supplementary Materials. As can be

seen, we found no signiﬁcant effect of condition.

First vs. Second Exposure. Participants’average response time for the ﬁrst set of vignettes was 10.7

seconds (SD = 5.52). Participants’average response time for the second set of vignettes was 30.1

seconds (SD = 16.03). Whether participants had to answer 10, 20, 30 or 40 seconds before

Table 1. Percentage of Participants Who Chose “Option 2”at the First Presentation of Each Question for

Each Group and Each Case (Study 1). Rightmost column indicates the results of a Chi-Square test (Df = 3)

Comparing the distribution of answers between each group. Case 5 does not appear in the table, as it was an

attention check. Cases 14 and 15 do not appear because, due to programming error, their consequences

varied across presentations.

Scenario No instruction AV designer Citizen Policy-maker Chi-square

1 17.0% 18.1% 11.7% 10.8% p= .17

2 57.4% 57.2% 55.0 56.3% p= .97

3 19.9% 23.2% 23.4% 19.0% p= .70

4 7.8% 5.8% 7.0% 7.0% p= .93

6 26.2% 33.3% 23.4% 22.8% p= .15

7 8.5% 7.2% 4.1% 5.7% p= .41

8 92.2% 93.5% 86.5% 87.3% p= .12

9 62.4% 69.6% 58.5% 60.1% p= .21

10 95.0% 96.4% 90.1% 91.1% p= .10

11 27.7% 24.6% 28.1% 25.3% p= .88

12 68.1% 67.4% 59.1% 62.0% p= .29

13 66.0% 61.6% 57.9% 56.3% p= .33

16 26.2% 31.9% 26.9% 27.8% p= .72

17 39.7% 43.5% 43.9% 39.2% p= .77

N 141 138 171 158 —

6Social Science Computer Review 0(0)

answering the second set of vignettes did not signiﬁcantly impact the percentage of participants

who changed their mind between the ﬁrst and second set (see Table S1). We thus compared

participants’answers between the ﬁrst and second exposure without considering the time par-

ticipants were given to answer the second set. The result of these comparisons can be found in

Table 2. As one can see, participants’answers were quite stable: out of 14 scenarios, only two

showed statistically signiﬁcant differences in participants’answers between the ﬁrst and second

exposure. Moreover, changes in participants’answers were quite small (around 3%). Finally, in

both cases, reﬂection tended to simply increase the tendency that was already observable in Set 1

rather than leading participants into another direction.

Discussion

In this ﬁrst study, we investigated whether people’s answers to AVs dilemmas were robust against

two potential sources of variations: (i) perspective-taking, and (ii) repeated exposure providing

Table 2. Percentage of Participants Who Chose Option 2 (“Swerve”) at the First and Second Presentation

of Each Scenario (Study 1). For each scenario, we indicate the criteria that vary across the two options, and

the consequence of each choice. Rightmost column indicates the results of a McNemar’s Chi-Square test

(df = 1) Comparing the distribution of answers between each set. Case 5 does not appear in the table,

as it was an attention check. Cases 14 and 15 do not appear because, due to programming error, their

consequences varied across presentations.

Scenario Criterion Option 1 (Continue) Option 2 (Swerve) Set 1 (%) Set 2 (%) McNemar test

1 Gender 2 men 2 women 14.1 14.1 p= 1.00

2 Body size 1 athletic man

1 athletic woman

1 obese man

1 obese woman

56.4 59.2 p= .04*

3 Status 1 homeless man

1 homeless woman

1 executive man

1 executive woman

21.4 17.8 p< .001*

4 Status 1 criminal man

1 criminal woman

1 male doctor

1 female doctor

6.9 5.4 p= .08

6 Gender

Age

1 man

1 old woman

1 woman

1 old man

26.2 24.7 p= .38

7 Gender

Age

1 old woman

1 man

1 woman

1 young boy

6.3 7.1 p= .35

8 Number 2 women 1 woman 89.6 90.8 p= .27

9 Role 1 woman

1 man

1 female passenger

1 male passenger

62.3 65.1 p= .07

10 Number 3 men

2 women

1 man

1 woman

92.9 93.4 p= .30

11 Role 1 female passenger

1 male passenger

1 woman

1 man

26.5 26.6 p= .90

12 Role

Number

1 woman

1 man

1 male passenger 63.8 65.8 p= .15

13 Species

Role

Norm

1 jaywalking man 1 passenger pet 60.2 61.3 p= .47

16 Gender

Role

1 woman 2 women passengers 28.1 30.1 p= .23

17 Age

Role

1 old woman

1 old man

1 female passenger

1 male passenger

41.6 41.8 p= .92

Etienne and Cova 7

participants with more time to reﬂect on their answers. Overall, we did not observe a signiﬁcant

effect of perspective-taking. However, our results are not at odd with the previous literature, which

focused on the passenger vs. pedestrian perspective. Here, we were more interested in perspectives

that were directly relevant to the public debate. The fact that participants in the “no instruction”

condition did not signiﬁcantly differ from the “citizen”or “policymakers”conditions suggests that

the perspective participants naturally endorse does not impact their answers in a way that make

them irrelevant to public deliberation.

Presenting participants a second time with the AVs dilemmas and forcing them to take time to

reﬂect on their answer did not substantially alter their responses either. This absence of effect

might seem at odd with previous results (Capraro et al., 2019;Suter & Hertwig, 2011). However, it

should be noted that these studies were not concerned with AVs dilemmas (but dilemmas in-

volving human agents) and that they used a different methods: they compared participants that

were exposed a single time to dilemmas (and manipulated the time of this ﬁrst exposure). The only

study that focused on AVs dilemmas was the one by Frank and colleagues (2019), but this study

contrasted very short response times (<5s) with more “usual”ones (<30s). Here, if we were

interested in contrasting participants’“normal”answers to MME-style experiments with longer

public deliberation that typically involve several exposures, we rather asked participants to think

longer than usual, where Frank and colleagues asked them to think shorter than usual. Our results

are thus compatible. However, our results suggests that giving participants an occasion to reﬂect

more on their answer by presenting them a second time with a given AV dilemma did not make a

substantial difference, and thus that their answer to the ﬁrst presentation already is robust.

Study 2: Participants’Judgments after Collective

Discussions the Moral Relevance of Different Factors

In Study 2, our goal was to study participants’judgments about the ethics of AVs in a context that

would be closer to a public deliberation about the principles of AVs ethics. That is, participants

were asked to (i) have a collective discussion (DISC), about (ii) general abstract principles (AvC).

Because our means were limited, we did not systematically manipulate these factors but rather

introduced them all at once, to see whether the conclusions our study would yield would differ

from the one yielded by Study 1 and MME-style experiments at large.

Materials and Methods

At the beginning of the study, participants were presented with the same 16 + 1 scenarios as in

Study 1, and asked to answer them “as quickly as possible.”This was made to acquaint par-

ticipants with the kinds of dilemmas that are considered relevant for ethical debates about the

ethics of AVs.

After that, participants were immediately invited to join a video call to engage in a collective

online discussion with seven to ﬁfteen other participants (discussions lasted around 15 minutes).

Participants were asked to discuss the relevance of nine criteria: gender (male vs. female), age

(young vs. adult vs. aged), body size (fat vs. ﬁt), social status (executive vs. homeless, doctor vs.

criminal), nature (humans vs. pets), number (1 vs. 2 vs. 3 vs. 5), role in the dilemma (pedestrian vs.

AV passengers) and lawfulness (jaywalkers vs. lawful pedestrians) (see Box 1). In each question,

respondents were asked whether they thought the criterion was “morally relevant to make life

arbitrations in such situations”and, if so, which category should be sacriﬁced to spare the other

(e.g., sacriﬁce men to save women). After the collective discussion, each participant was asked to

indicate their answer by indicating for each criterion whether the criterion was morally relevant or

whether it was morally irrelevant and AVs should be programmed to choose at random, without

8Social Science Computer Review 0(0)

taking this criterion into account (see Box S1 in Supplementary Materials for the exact wording).

For half of participants, we asked them how conﬁdent they were about their reply (0 = Not

conﬁdent at all, 1 = Not much conﬁdent, 2 = Quite conﬁdent, 3 = Very conﬁdent), and how much

they understood that someone might feel differently about this criterion (0 = Not at all, 1 = Not

much, 2 = Quite, 3 = Very much).

Box 1. Instructions for Collective Discussion (Study 2)

Some of you answered that everything else being equal, [X] should be saved over [Y], and

others replied the opposite. Please use the next 90 seconds to discuss whether you think [Z]

is a morally relevant criterion to make such arbitrations, or not. And if so, should [X] or [Y]

be spared and why?

1. X = “men,”Y=“women,”Z=“gender”

2. X = “younger people,”Y=“older people,”Z=“age”

3. X = “ﬁt,”Y=“larger,”Z=“body size”

4. X = “people with higher social status,”Y=“people with lower social status,”Z=

“social status”

5. Some of you answered that humans should always be saved over pets, while others

replied it may depend on the situation. Do you think some circumstances may allow

for exceptions or not?

6. Some of you answered that everything else being equal, lawful drivers and

pedestrians should be saved over jaywalkers, and others replied it should not make a

difference. Do you think abidance by the law is a morally relevant criterion to make

such arbitrations? And if so, would you allow for exceptions to this?

7. X = “pedestrians,”Y=“AV passengers,”Z=“that this”

8. Some of you answered that everything else being equal, the AV’s action of

swerving versus keeping straight is morally relevant, while others replied that it

does not matter. Would you change any of your replies if reaching the same

outcome implied the AV swerving instead of keeping straight and why?

9. Some of you answered that the AV should always be operated in a way to save the

greater number of people, while others disagree, arguing that it depends on the

situation, which should be assessed based on the criteria previously discussed.

What do you think about it?”

Results

Participants were US and UK residents recruited through Proliﬁc Academic. After excluding 10

participants who failed the attention check, we were left with N = 190 participants (96 women, 94

men; M

age

= 31.4, SD

age

= 10.8).

Participants’judgments about the relevance of our nine criteria are presented in Table 3.

Discussion

Our results suggest that there was a strong consensus in our participants on the relevance of two

criteria: number of persons saved (saving the most people) and species (saving humans rather than

Etienne and Cova 9

pets). Two thirds of participants also considered age to be a relevant criterion. This is in line with

the results of the MME, which suggested that these two criteria had the most weight. However,

there was also a strong consensus on certain criteria being morally irrelevant: gender, body size,

and social status. For these criteria (which are the ones Etienne, 2021 identiﬁed as the least morally

relevant), most participants considered it best to leave the AV’s decision at chance.

However, training AVs to make decisions on the data collected by the MME would lead AVs to

take such criteria into account, leading them to go against the perceived consensus. This is because

the methodology of the MME only allows participants to signal indifference between two

outcomes, and not to express their commitment to impartiality and their preference for random

choices. Thus, an AV that would be trained on the kind of data we collected would behave in a

substantially different way from an AV trained on the data collected by the MME. This means that

design choice can substantially inﬂuence the outcome of data-driven, automated decision-making.

One question raised by our study is what drives people’s judgment that criteria such as

gender, body size and social status are irrelevant? Is it only the fact of offering participants a

third option that allows them to express their preference for random choice? Or did the fact

that we presented participants with abstract principles (rather than concrete cases) and that we

offered them the possibility to discuss with each other play a role? In Study 3, we investigated

the impact of introducing a third option (random choice) in concrete cases, rather than in

abstract ones. Moreover, we collected participants’answers before and after collective

discussions, to determine to which extent participating in a collective discussion led par-

ticipants to favor this third option.

Study 3: The Impact of Collective Discussion on Participants’

Preference for Random Choice

In Study 3, we still offered participants a third option (random choice) but presented this option in the

context of concrete scenarios rather than abstract principles. We then had participants read arguments

against the relevance of several criteria and engage in a collective discussion with other participants.

Materials and Methods

Participants were US and UK residents recruited through Proliﬁc Academic. We asked 331

participants, of which 324 passed the attention check, to address the same set of 11 randomized

Table 3. Percentage of Participants Who Rated Each Criterion as Morally Relevant, Along With

Participants’Average Conﬁdence in Their Answer, and Ratings of How Much They Understand That

Someone can Think Differently (Study 2).

Criterion Relevant (%) Conﬁdence Understanding

Gender 24.2 2.57 (.60) 2.03 (.85)

Age 65.3 2.28 (.58) 2.21 (.71)

Body size 19.5 2.48 (.72) 1.89 (.96)

Social status 26.3 2.39 (.67) 1.99 (.94)

Specie 86.8 2.66 (.58) 1.73 (1.05)

Conformity to law 59.5 2.27 (.75) 2.02 (.87)

Role (pedestrian vs. passenger) 65.2 2.20 (.73) 2.06 (.74)

Going straight vs. swerving 55.3 2.16 (.72) 1.91 (.75)

Number 92.6 2.54 (.54) 1.52 (.93)

10 Social Science Computer Review 0(0)

dilemmas (+1 control question) three different times (Set 1, 2 and 3). Between Set 1 and Set 2,

participants were presented with seven objections to the main arguments that were brought up in

group discussions in Study 2 and are based on Etienne (2022)’s counterarguments. For example,

the objection against the relevance of gender to AV’s decisions was:

You may think that gender is a morally relevant criterion here. If so, and to be consistent with your

answer, you should be ready to either state that white people should be spared versus black people or

the contrary, that Muslims should be spared versus Catholics or the contrary, that homosexuals should

be spared versus heterosexuals or the contrary, or to explain what makes gender different from skin

colour, religious belief and sexual orientation so that the former one is morally relevant here whereas

the others are not.

(All seven objections can be found in Supplementary Materials.) After each objection,

participants were asked to rate the objection’s strength on a 5-point scale. Between Set 2 and

Set 3, they participated in a group discussion to express and justify their replies (as in Study

2).

Contrary to the abstract principles we used in Study 2, the concrete cases we used in Studies 1

and 2 present one disadvantage: when participants choose the Option (“Keep straight”), we do not

know whether this choice reveals a preference for saving people on the other tracks, or a mere

preference for inaction. As we saw in Study 2, 44.7% of participants answered that they con-

sidered this a relevant criterion. To correct for this shortcoming, we used concrete cases in which

participants had to choose between “turning left”or “turning right”(or “choosing at random”). An

example is presented in Figure 2.

Figure 2. Example of scenario in Study 3

Etienne and Cova 11

Finally, in Study 2, we have seen that the third option (“choosing randomly”) was the most

often selected for certain criteria. However, one could object that participants might be drawn

towards this answer because they feel like it does not need to be justiﬁed (contrary to other

answers). To test for this, a third of participants were asked to provide a justiﬁcation for their

answer to all three sets (JUST). Another third received no particular instruction (CONTROL), and

the last third were asked to communicate a degree of conﬁdence (DOC) for their replies (“how

conﬁdent do you feel about your reply?”) as well as a score of perceived consensus (“how much do

you think that others would agree with you?”) for each of the three sets.

Results

Frequency of “Random”Choice. The percentage of participants choosing the “random”option

for each vignette and each presentation (Set 1, Set 2 or Set 3) can be found in Table 4 .Ascan

be seen, we found a pattern of answers similar to the one we observed in Study 2: participants

tended to see Species, Number, Role, and Conformity to law (Norms) as relevant factor, but

rated Gender, Body size, and Social status as irrelevant factors. The main difference was that

participants tended to rate Age as an irrelevant factor after discussion (Set 3), while they

tended to rate it as relevant in Study 2. Overall, this suggests that the pattern of answers we

observed in Study 2 (and that challenged the conclusions of the MME) cannot be explained

only by the fact that we presented choices in an abstract way, rather than in a concrete way—

though we cannot exclude the possibility that presenting choices in an abstract or concrete

way might affect participants’choices (for a direct comparison of results of Studies 2 and 3,

see TableS2inSupplementaryMaterials).

Effect of Condition. We used 11 chi-square tests to investigate the impact of condition

(CONTROL, DOC and JUST) on distribution of participants’answers to Set 1. Out of 11

Table 4. Percentage of Participants Selecting the “Random”Option in Each Set (1–3) and Vignette.

* Indicates the result of a McNemar test comparing the percentage of participants selecting the “Random”

option between Set 1 and Set 3.

N° Criterion Side 1 Side 2 Set 1 (%) Set 2 (%) Set 3

Question 1 Gender Man Woman 75.0 81.8 85.2%***

Question 2 Age Young girl Old woman 45.4 56.8 56.5%***

Question 3 Body size Obese woman Athletic woman 63.3 70.4 80.2%***

Question 4 Status 2 homeless men 2 executive men 74.7 79.6 85.5%***

Question 5 Role 1 pedestrian 1 passenger 30.6 33.6 38.3%**

Question 6 Number 2 young girls 1 young girl 21.3 31.2 25.0%

Question 7 Role

Species

1 passenger 1 pet 8.6 11.4 10.8%

Question 8 Role

Norm

1 jaywalking man 1 passenger 19.1 21.6 21.9%

Question 9 Role

Number

2 pedestrians 1 passenger 13.6 14.5 14.5%

Question 11 Norm

Age

1 jaywalking young girl 1 woman 37.3 36.4 43.8%*

Question 12 Norm

Species

Role

1 jaywalking man 1 passenger pet 8.6 10.2 8.3%

12 Social Science Computer Review 0(0)

tests, only the one for vignette 3 (obese woman vs. athletic woman) came out signiﬁcant.

However, this was not because the percentage of “random”answers signiﬁcantly varied

across conditions (p= .15), but because participants asked to justify their answer were more

likely to choose to kill the athletic woman (see Table S3 in Supplementary Materials). Thus,

participants’tendency to choose the “random”optionwasrobustandremainedevenwhen

participants were asked to justify their answer.

Effect of Objection and Discussion. We compared participants’answers across the three sets (Set 1: initial

answers, Set 2: after objections, Set 3: after discussion). We found that, for all 11 vignettes, variance in

participants’answers was lower in Set 3 compared to Set 1: t(10) = 5.31, p= .0003. This means that the

procedure increased consensus across participants (see Table S4 in Supplementary Materials).

As can be seen in Table 4, the procedure (objection + collective discussion) produced sig-

niﬁcant changes in participants’judgments. Overall, 30% of answers were modiﬁed at least once

across the three sets (see Section 3.4 in Supplementary Materials). For the four most controversial

criteria (age, gender, body size, and status), the procedure led more participants to endorse the

third option, and thus to treat these criteria as irrelevant. However, for role and norm-compliance,

it led more participants to consider these criteria as relevant, showing that the procedure did not

always favor the “random”option.

Conﬁdence and Consensus Perception. Interestingly, participants were more conﬁdent in their

answers at the end of the procedure (Set 3), compared to their answers at the beginning of the

procedure (Set 1): t(111) = 11.03, p< .001. Their perception of consensus also signiﬁcantly

increased between Set 1 and Set 3: t(111) = 7.84, p< .001, though this increase was mostly due

to the discussion, and not to their being faced with objections (see Table 5).

Discussion

In Study 2, we found a pattern of answers that suggested that criteria singled out as relevant in the

MME were deemed mostly irrelevant once a third option was introduced. However, it was not

possible to determine whether this was due only to the introduction of a third option, or whether this

was mostly due to the fact of presenting choices in an abstract and/or having participants engage in a

collective discussion. In Study 3, we found a similar pattern of answers in a concrete setting,

suggesting that this was not only due to the abstract presentation of Study 2 (though we cannot

exclude that presentation style might have effects, see Table S2 in Supplementary Materials).

Moreover, we found that engaging in a collective discussion tended to increase the choice of the

“random”option for the criteria judged more irrelevant. Still, the pattern of answers we spotted in

Study 2 was already visible before the collective discussion (in Set 1).

Participants’choice of the “random”option was not affected by their having to justify their

answer or indicate their degree of conﬁdence. Moreover, participating in the collective discussion

raised participants’conﬁdence in their answers. Overall, this suggests, against Awad and

colleagues (2020)’s suggestion, that participants’choice of the “random”order is not the re-

sult of mere bias that could be overcome by more reﬂection.

Table 5. Participants’Degree of Conﬁdence and Perceived Consensus for Each Set (Study 3).

Set 1 Set 2 Set 3

Conﬁdence 3.52 (.99) 3.78 (1.01) 4.22 (.69)

Consensus 3.29 (.75) 3.24 (.83) 3.84 (.77)

Etienne and Cova 13

Conclusion

In this paper, our goal was to show that attempts at automatizing ethical decision-making by

aggregating participants’answers to moral dilemmas face a serious methodological difﬁculty: data

collection is anything but a neutral process. Indeed, participants’replies can vary with experi-

mental designs, so that the way experiments are framed can lead machine-learning algorithms to

reach very different conclusions about what is the most “plausible”ethical answer to a dilemma.

More precisely, our goal was to explore whether collecting data using an experimental design

that more faithfully mirrored the context of a public discussion about AVs might change the

conclusions one could draw from such experiments. Indeed, most empirical studies on people’s

moral judgments about AVs are focusing on quick intuitions, generated in isolation by a single

exposure to each case, and with a limited range of options. However, these conditions differ

widely from the ones in which citizens engaging in a public debate would form their opinion about

the ethics of AVs. As the results of these experiments are increasingly used to bear on social

decisions, with the claim that they represent public opinion, this is problematic.

Thus, we sought whether changing the conditions in which participants’judgments are generated

to more closely mirror the conditions of public deliberation resulted in different conclusions re-

garding the “social consensus”about which factors should be relevant to AVs’behavior. For

example, in Study 1, we had participants take more time to think about their answers by exposing

them a second time to each vignette and forcing them to wait some time before answering, and by

asking to endorse them different perspectives beyond the mere perspective of driver or passenger

(such as policymaker or citizen). In this case, these changes did not make a difference.

However, in Studies 2 and 3, we showed that giving participants the possibility to mark certain

criteria as “morally irrelevant”andtoexpresstheirpreferenceforAVstomakerandomchoicesledtoa

pattern of answers (and to a picture of “social consensus”) that differed from the one observed by

previous studies offering only two options: we observed a strong consensus on the moral irrelevance of

certain criteria such as gender, body size, and social status. For gender, our results paint a picture in

which a minority of participants express a preference for saving women and a majority (around 75% in

Study 2 and 75–85% in Study 3) expresses a preference for random choice. This is very different from

a situation in which 60% prefer to save a woman compared to a man, and 40% prefer to save a man

compared to a woman—a situation in which there is no clear consensus. However, both situations will

be treated in a similar way by algorithms trained on data that do not provide participants with the

possibility to express their preference for random choices: in both cases, such algorithms will compute

a small preference, at the scale of the population, for saving women, while there seems to be a strong

consensus for not taking gender into account and allowing AVs to make a random decision. It will thus

go against the moral consensus by allowing AVs to favor women, even in a slight way.

The dismissal of a third, “random”option in previous studies about AVs dilemmas might be

due to a tendency to understand moral deliberation on the model of rational decision theory. From

the standpoint of rational decision theory, a choice at random between two options A and B can

only express one thing: indifference between A and B. However, in ethical decision-making,

choosing at random is not necessarily an expression of indifference—rather, it can express a strong

endorsement of moral values such as impartiality, or the commitment to treat all human beings as

having the same moral status, independently from their individual differences. Thus, rejecting the

need for a third, random option under the pretext that the same information can be obtained from

two-options survey (because it will manifest itself as indifference at the level of population)

already commits oneself to a particular view of ethical decision-making, according to which

ethical decision-making is similar to economic decisions.

Additionally, the assumption that preference for random choices (and thus impartiality) will

manifest itself as indifference in two-options surveys is unwarranted. When forced to choose

14 Social Science Computer Review 0(0)

between two options, participants who favor impartiality on moral grounds (and would select the

“random”choice in a three-options survey) might choose to rely on non-moral preferences. After

all, participants in the MME are simply asked what the AV should do—without specifying that this

“should”be a moral one (see Cova et al., 2019 on this particular methodological issue). Thus, the

two-options design might force participants to rely on personal preferences that they would not

themselves consider morally appropriate, thus increasing dissensus.

Moreover, decisions about AVs are not likely to be made in isolation: rather, as most people in

this literature emphasize, such decisions need to be the outcome of a public discussion. Thus, in

Study 3, we used a design that, in addition to providing participants with a random option, tried to

imitate the context of a public discussion: participants were provided with simpliﬁed versions of

arguments provided by ethicists against the relevance of certain criteria, and were asked to discuss

their answer with each other. We found that this procedure led participants to be more conﬁdent of

their individual answers, while reducing variance in participants’answers. Thus, the deliberative

process participants were invited to engage in allowed them to reach more stable and conﬁdent

replies they may better relate with and, thus, feel more responsible for.

Crucially, this procedure also led participants to signiﬁcantly change their minds about the

relevance of certain criteria. For example, this procedure led them to reach an even stronger

consensus on the moral irrelevance of gender, body size, and social status. It also led to signiﬁcant

difference in their assessment of the relevance of age: while more than half of participants

endorsed age as a morally relevant criterion at the beginning of the study, less than half did so at

the end of the study. Together, these results show that the consensus people are likely to reach

through a collective discussion cannot be reduced to the aggregation of individual answers made

in isolation, no matter how many individual answers have been collected.

Overall, the results of our studies suggest that we should be wary of using the empirical results

of studies such as the Moral Machine Experiment as a guide for ethical decision-making. On the

one hand, the design of these studies rests on unquestioned assumptions about the nature of ethical

decision-making, which might lead them to ignore clear popular consensus on “impartial”options.

On the other hand, the data are collected in a setting that cannot be considered equivalent to the

setting of an informed, collective discussion in which people might come to reject as unwarranted

and morally irrelevant the various biases identiﬁed by these studies. As we observed it, intro-

ducing a third option severely challenges Awad and colleagues’conclusions, showing that

surveys’design can either bring out dissensus which do not accurately capture people’s opinions,

or reveal consensus which better reﬂect them.

Finally, because surveys’design inﬂuence participants’replies, our studies also fall under this

limitation. Forcing people to take more time before submitting their responses does not necessarily

lead them to question these further, what could explain that we do not observe signiﬁcant effect of

the reﬂection time when other works do. The convergence effect of the collective discussion could

also partly result from a pressure to conform to the majority opinion, rather than from a genuine

revision of one’s opinion. Therefore, distributing participants across discussion groups based on

their previous replies may either reinforce their opinions, if groups are formed to be likeminded, or

encourage them to revise these latter, if groups are built to represent diversiﬁed opinions and

participants explicitly asked to defend their previous replies. Overall, these limitations only

support our claim that surveys’design inﬂuence the way participants grow an opinion about a

topic, therefore the replies they provide. Further studies should investigate the opportunity to

frame their design to support respondents’critical thinking and develop more robust and

meaningful opinions. Such an effort is crucial nowadays as surveys are being increasingly used by

a wide range of actors to represent a so-called public opinion, and that such biased representations

of people’s opinions inﬂuence their actual opinions (e.g., by pressure to conform), as well as it is

used by decision-makers to justify societal choices.

Etienne and Cova 15

Declaration of Conﬂicting Interests

The author(s) declared no potential conﬂicts of interest with respect to the research, authorship, and/or

publication of this article.

Funding

The author(s) disclosed receipt of the following ﬁnancial support for the research, authorship, and/or

publication of this article: This work was supported by Facebook Inc.

Supplemental Material

Supplemental material for this article is available online.

References

Allard, A., & Cova, F. (forthcoming). What experiments can teach us about justice and impartiality:

Vindicating experimental political philosophy. In H. Viciana, F. Aguiar, & A. Gait´

an (Eds.), Issues in

experimental moral philosophy. Routledge.

Awad, E., Dsouza, S., Kim, R., Schulz, J., Henrich, J., Shariff, A., Bonnefon, J. F., & Rahwan, I. (2018). The

moral machine experiment. Nature,563(7729), 59–64. https://doi.org/10.1038/s41586-018-0637-6

Awad, E., Dsouza, S., Kim, R., Schulz, J., Henrich, J., Shariff, A., Bonnefon, J. F., & Rahwan, I. (2020).

Reply to: Life and death decisions of autonomous vehicles. Nature,579(7797), E3–E5. https://doi.org/

10.1038/s41586-020-1988-3

Balliet, D. (2010). Communication and cooperation in social dilemmas: A meta-analytic review. Journal of

Conﬂict Resolution,54(1), 39–57. https://doi.org/10.1177/0022002709352443

Bigman, Y. E., & Gray, K. (2020). Life and death decisions of autonomous vehicles. Nature,579(7797),

E1–E2. https://doi.org/10.1038/s41586-020-1987-4

Bonnefon, J. F. (2019). La Voiture qui en savait trop. Humensciences.

Bonnefon, J. F., Shariff, A., & Rahwan, I. (2016). The social dilemma of autonomous vehicles. Science,

352(6293), 1573–1576. https://doi.org/10.1126/science.aaf2654

Bourdieu, P. (1972) L’opinion publique n’existe pas. Quelques remarques critiques sur les sondages

d’opinion, Temps Modernes, 318 (1973).

Capraro, V., Everett, J. A., & Earp, B. D. (2019). Priming intuition disfavors instrumental harm but not

impartial beneﬁcence. Journal of Experimental Social Psychology,83(1), 142–149. https://doi.org/10.

1016/j.jesp.2019.04.006

Conitzer, V., Sinnott-Armstrong, W., Borg Jana, S., Deng, Y., & Kramer, M. (2017). Moral decision making

frameworks for artiﬁcial intelligence. In Proceedings of the 31st AAAI conference on artiﬁcial intel-

ligence (pp. 4831–4835), AAAI.

Cova, F., Boudesseul, J., & Lantian, A. (2019). Sounds ﬁne, but No thanks!’: On distinguishing judgments

about action and acceptability in attitudes toward cognitive enhancement. AJOB Neuroscience,10(1),

57–59. https://doi.org/10.1080/21507740.2019.1595777

Cova, F., Strickland, B., Abatista, A. G. F, Allard, A., Andow, J., Attie, M., Beebe, J. R., Berni

unas, R.,

Boudesseul, J., Colombo, M., Cushman, F., D´

ıaz, R., van Dongen, N., Dranseika, V., Earp, B. D.,

Torres, A. G., Hannikainen, I. R., Hern´

andez-Conde, J. V., Hu, W., & Zhou, X. (2021). Estimating the

reproducibility of experimental philosophy. Review of Philosophy and Psychology,12(1), 9–44. https://

doi.org/10.1007/s13164-018-0400-9

De Freitas, J., & Cikara, M. (2021). Deliberately prejudiced self-driving vehicles elicit the most outrage.

Cognition,208(7729), 104555. https://doi.org/10.1016/j.cognition.2020.104555

Desrosières, A. (2008). Pour une sociologie historique de la quantiﬁcation. L’argument statistique I, Paris.

Presses de l’Ecole des Mines.

16 Social Science Computer Review 0(0)

Etienne, H. (2021). The dark side of the ‘Moral Machine’and the fallacy of computational ethical decision-

making for autonomous vehicles. Law, Innovation and Technology,13(1), 85–107. https://doi.org/10.

1080/17579961.2021.1898310

Etienne, H. (2022). A Practical role-based approach for autonomous vehicles moral dilemmas. Big Data &

Society,9(2), 1380. https://doi.org/10.1177/20539517221123305

Frank, D. A., Chrysochou, P., Mitkidis, P., & Ariely, D. (2019). Human decision-making biases in the moral

dilemmas of autonomous vehicles. Scientiﬁc Reports,9(1), 13080–13119. https://doi.org/10.1038/

s41598-019-49411-7

Freiman, C., & Nichols, S. (2011). Is desert in the details? Philosophy and Phenomenological Research,

82(1), 121–133. https://doi.org/10.1111/j.1933-1592.2010.00387.x

Greene, J., Rossi, F., Tasioulas, J., Venable Kristen, B., & Williams, B. (2016). Embedding ethical principles

in collective decision support systems. In Proceedings of the 30th AAAI conference on artiﬁcial in-

telligence (pp. 4147–4151), AAAI.

Greene, J. D. (2014). Beyond point-and-shoot morality: Why cognitive (neuro) science matters for ethics.

Ethics,124(4), 695–726. https://doi.org/10.1086/675875

Mercier, H. (2011). What good is moral reasoning? Mind & Society,10(2), 131–148. https://doi.org/10.1007/

s11299-011-0085-6

Nadelhoffer, T., & Feltz, A. (2008). The actor–observer bias and moral intuitions: Adding fuel to Sinnott-

Armstrong’sﬁre. Neuroethics,1(2), 133–144. https://doi.org/10.1007/s12152-008-9015-7

Nichols, S., & Knobe, J. (2007). Moral responsibility and determinism: The cognitive science of folk

intuitions. Noˆ

us,41(4), 663–685. https://doi.org/10.1111/j.1468-0068.2007.00666.x

Noothigattu, R., Gaikwad, S., Awad, E., Dsouza, S., Rahwan, I., Ravikumar, P., & Procaccia, A. (2018). A

voting-based system for ethical decision making. Proceedings of the AAAI Conference on Artiﬁcial

Intelligence,32(1), 1587–1594. https://doi.org/10.1609/aaai.v32i1.11512

Sinnott-Armstrong, W. (2008). Abstract + concrete = paradox. In J. Knobe & S. Nichols (Eds.), Experimental

philosophy (pp. 209–230). Oxford University Press.

Struchiner, N., Almeida, G. F. C. F., & Hannikainen, I. R. (2020). Legal decision-making and the abstract/

concrete paradox. Cognition,205(4), 104421. https://doi.org/10.1016/j.cognition.2020.104421

Suter, R. S., & Hertwig, R. (2011). Time and moral judgment. Cognition,119(3), 454–458. https://doi.org/10.

1016/j.cognition.2011.01.018

Thaler, R., & Sunstein, C. (2008). Nudge: Improving decisions about health, wealth and happiness. Yale

University Press.

Tobia, K., Buckwalter, W., & Stich, S. (2013). Moral intuitions: Are philosophers experts? Philosophical

Psychology,26(5), 629–638. https://doi.org/10.1080/09515089.2012.696327

Wiegmann, A., Horvath, J., & Meyer, K. (2020). Intuitive expertise and irrelevant options. In T. Lombrozo, J.

Knobe, & S. Nichols (Eds.), Oxford studies in experimental philosophy (3). Oxford University Press.

Author Biographies

Hubert Etienne is a researcher in AI ethics in Meta’s Responsible AI team in New York.

Florian Cova is a postdoctoral researcher at the Centre Interfacultaire en Sciences Affectives,

University of Geneva.

Etienne and Cova 17

Supplementary Materials

All experiments were conducted between October 2021 and March 2022 and hosted on Qualtrics.

Respondents were recruited on Prolific Academic and the collected data was analysed on R Studio.

1. Supplementary Materials for Study 1

1.1. Effect of second exposure's duration

We looked at the effect of time constraint (10s vs. 20s vs. 30s vs. 40s) on participants’ answers to the

second time they were asked each question. We computed the number of times each participant

changed their mind between Set 1 and Set 2 and compared them across conditions using an ANOVA.

We found no significant effect of time constraint on the number of times people changed their mind:

F(1.606) = 0.136, p = 0.712. This suggests that being forced to think longer did not make participants

more likely to change their minds. Results are presented in Table S2.

10s

20s

30s

40s

Chi-square

->1: 3.1%

->2: 4.7%

->1: 4.6%

->2: 6.7%

->1: 6.4%

->2: 8.3%

->1: 8.3%

->2: 3.2%

p = .20

->1: 2.6%

->2: 5.8%

->1: 6.0%

->2: 6.6%

->1: 6.4%

->2: 10.1%

->1: 3.8%

->2: 7.6%

p = .45

->1: 5.8%

->2: 1.6%

->1: 7.3%

->2: 2.6%

->1: 5.5%

->2: 2.8%

->1: 5.1%

->2: 2.5%

p = .96

->1: 2.6%

->2: 3.1%

->1: 3.3%

->2: 0.7%

->1: 3.7%

->2: 0.0%

->1: 2.5%

->2: 1.3%

p = .38

->1: 11.5%

->2: 9.4%

->1: 7.3%

->2: 6.6%

->1: 7.3%

->2: 7.3%

->1: 10.2%

->2: 7.6%

p = .71

->1: 1.6%

->2: 3.1%

->1: 2.0%

->2: 2.6%

->1: 0.9%

->2: 1.8%

->1: 3.2%

->2: 3.2%

p =.86

->1: 2.6%

->2: 4.2%

->1: 2.6%

->2: 2.6%

->1: 3.7%

->2: 4.6%

->1: 2.5%

->2: 4.5%

p = .97

->1: 7.3%

->1: 6.6%

->1: 3.7%

->1: 3.8%

p = .67

->2: 7.3%

->2: 9.9%

->2: 9.2%

->2: 7.6%

->1: 1.6%

->2: 2.1%

->1: 3.3%

->2: 2.0%

->1: 0.9%

->2: 2.8%

->1: 0.0%

->2: 2.5%

p = .39

->1: 5.8%

->2: 5.2%

->1: 6.6%

->2: 7.3%

->1: 4.6%

->2: 2.3%

->1: 7.6%

->2: 9.6%

p = .31

->1: 4.2%

->2: 5.8%

->1: 5.3%

->2: 5.3%

->1: 3.7%

->2: 9.2%

->1: 5.7%

->2: 7.7%

p = 83

->1: 7.3%

->2: 9.4%

->1: 6.6%

->2: 7.9%

->1: 7.3%

->2: 10.1%

->1: 7.0%

->2: 5.7%

p = .89

->1: 7.9%

->2: 10.5%

->1: 7.3%

->2: 6.6%

->1: 13.8%

->2: 11.9%

->1: 6.4%

->2: 12.1%

p = .19

->1: 6.8%

->2: 8.4%

->1: 11.3%

->2: 6.0%

->1: 9.2%

->2: 9.2%

->1: 8.3%

->2: 7.0%

p = .78

->1: 5.8%

->2: 9.4%

->1: 6.6%

->2: 10.6%

->1: 8.3%

->2: 7.3%

->1: 8.9%

->2: 8.9%

p = .89

->1: 8.4%

->2: 8.9%

->1: 8.6%

->2: 7.3%

->1: 8.3%

->2: 9.2%

->1: 7.6%

->2: 8.3%

p = .99

191

151

109

157

Table S1. Percentage of participants who changed their answers to Option 1 (->1) and to Option 2 (->2) between

the first and second set of questions for each condition (10s, 20s, 30s, 40s) and each case (Study 1). Rightmost

column indicates the results of a Chi-square test (df = 6) comparing the distribution of participants’ behaviors

(changed to 1, changed to 2, did not change) between each condition. Case 5 does not appear in the table, as it

was an attention check.

2. Supplementary Materials for Study 2

2.1. Wording for questions about general criteria

Box S1. Wording for question about general criteria (Study 2)

Considering the previous scenarios, which of the following claims do you agree most with?

I consider that [X] is a morally relevant criterion to make life arbitrations in such situations and would (1) rather spare

[Y] versus [Z] whenever it is possible (2) rather spare [Z] versus [Y] whenever it is possible. (3) I do not consider that

[X] is a morally relevant criterion to make life arbitrations in such situations and would allow the autonomous vehicle

to select a random answer whenever possible.

Set 2-1: X = “gender,” Y= “a man,” Z = “a woman”

Set 2-2: X = “age,” Y= “the younger,” Z = “the elderly”

Set 2-3: X = “body size,” Y= “an athletic person,” Z = “a fat person”

Set 2-4: X = “social status,” Y= “an executive person,” Z = “a homeless person”

Set 2-5: X = “human/pets”

(1) would always spare humans versus pets whenever possible

(2) would not always spare humans over pets, considering that pets could be spared versus humans in some

situations.

(3) would allow the autonomous vehicle to select a random answer whenever it is possible

Set 2-6: X = “abidance by the law,” Y = “lawful drivers and pedestrians,” Z = “jaywalkers versus”

Set 2-7: X = “pedestrians/passengers,” Y = “pedestrians,” Z = “AV passengers”

Set 2-8: X = “going straight/swerving”

(1) would change some of my answers if reaching the desired outcomes implied allowing the car to swerve

instead of going straight.

(2) would not change my answers if reaching the desired outcomes implied allowing the car to swerve instead

of going straight.

Set 2-9: I consider that the amount of harm is the most important criterion to make life arbitrations in such

situations…

(1) and I would always choose the option that minimises the total amount of harm, regardless of other criteria.

(2) but I would not necessarily consider it as the most important one. I may choose the option that minimizes

the total amount of harm whenever possible but not systematically at the expense of other criteria.

(3) I do not consider that the amount of harm is necessarily morally relevant to make life arbitrations in such

situations and would allow the autonomous vehicle to select a random answer whenever possible

2.2. Replies Set 1 vs Set 2

The following tables show how participants answered the second set of questions (abstract principles)

in function of their answer to the first set of questions (concrete cases).

Gender (Set 1.1 vs Set 2.1)

Save men

irrelevant

Save women

Continue (kill male)

122

Swerve (kill female)

Body size (Set 1.2 vs Set 2.3)

Save fit person

Irrelevant

Save fat person

Continue (kill athletic)

Swerve (kill fat)

Social status (Set 1.3 vs Set 2.4)

Save homeless

Irrelevant

Save executives

Continue (kill homeless)

110

Swerve (kill executive)

Pedestrians vs. passengers (Set 1.9 vs Set 2.7)

Save pedestrians

Irrelevant

Save passengers

Continue (kill pedestrians)

Swerve (kill passengers)

Pedestrians vs. passengers (Set 1.11 vs Set 2.7)

Save pedestrians

Irrelevant

Save passengers

Continue (kill passengers)

Swerve (kill pedestrians)

Animals vs. Humans (Set 1.13 vs Set 2.5)

May save pet

Irrelevant

Always save humans

Continue (kill human)

Swerve (kill pet)

Minimising the number of victims (Set 1.8 vs Set 2.9)

Most important

Quite important

Irrelevant

Continue (kill 2)

Swerve (kill 1)

Minimising the number of victims (Set 1.10 vs Set 2.9)

Most important

Quite important

Irrelevant

Continue (kill 5)

Swerve (kill 2)

126

3. Supplementary Materials for Study 3

3.1. Wording for objections

Box S2. Wording for objections (Study 3)

OBJ-1: Men vs women

“You may think that gender is a morally relevant criterion here. If so, and to be consistent with your answer, you

should be ready to either state that white people should be spared versus black people or the contrary, that Muslims

should be spared versus Catholics or the contrary, that homosexuals should be spared versus heterosexuals or the

contrary, or to explain what makes gender different from skin colour, religious belief and sexual orientation so that the

former one is morally relevant here whereas the others are not.”

OBJ-2: Athletic vs fat people

“You may think that body size is a morally relevant criterion here. If so, what else could a society that allows

arbitrations potentially involving people’s deaths based on beauty or body image also end up allowing? What if it is

decided that beauty is represented by blond-haired blue-eyed people?”

OBJ-3: Homeless people vs executives

“You may think that social status is a morally relevant criterion here. If so, who should be in charge of defining the

social status scale and deciding which activity is socially valuable and which one is not? What else could a society

that allows arbitrations potentially involving people’s death based on social status do next? What if we had a social

credit score that could rank citizens from the most useful to the least one?”

OBJ-4: Younger vs elder people

“You may think that age is a morally relevant criterion here. If so, you may think so because of the following

argument: ‘young people should be spared because they had less time to enjoy life and more to lose in terms of

expected lifetime’. However

- there is a great uncertainty surrounding expected lifetime, as a young boy can die tomorrow from a disease and

a 70-year-old grandmother can live another 20 years. Furthermore, on average, women tend to live longer than

men in many countries; would you then agree to systematically spare them over men for such a reason?

- it is not possible to measure and compare each individual’s value of life together with their capacity to enjoy it,

as it is far too subjective. Would you systematically sacrifice someone with an expected extra 20 years of pure

bliss to allow someone else to suffer another 30 years of a hard life full of pain and humiliation?

- to be consistent with your claim to prioritize people with the higher remaining expected lifetime, you would also

have to accept sacrificing people suffering from severe incurable diseases associated with a very low lifetime

such as Huntington’s diseases or progeria.

Finally, do you think that self-driving cars could recognise pedestrians’ gender, age, body size or social status in

practice? While these criteria might be morally relevant, they could also be impossible to implement in practice.”

OBJ-5: Passengers vs pedestrians

“You may think that passengers should be spared versus pedestrians. If so, why would they have a higher right not to

be endangered than pedestrians crossing legally, while the issue comes from the vehicle’s brakes which are not

working, resulting in the vehicle itself being the origin of the harm here?”

OBJ-6: More vs fewer people

“You may think that the vehicle should be operated in such a way to hit the lower number of people.

If so, is your objective to reduce the total number of deaths or the total amount of harm? In other words, would you

accept 10 people ending up in wheelchairs to save one person’s life?

If you focus on reducing the number of deaths rather than the amount of harm, you may actually end up sparing elder

people versus youngsters as they tend to have greater chances of surviving. Is this consistent with your previous

reply?

How would you calculate and compare the probabilities for different types of consequences? Or better said, should

the vehicle run over 3 people with a 50% chance of breaking the first one’s legs, 80% chance of killing the second

and 50% chance of plunging the third one into a coma or 3 people with a 90% probability of making the first one

quadriplegic, 40% probability of killing the second and 70% probability of making the fourth one blind?

Finally, would you agree to hit a person legally engaged in the pedestrian pathway to spare two jaywalkers aware

that they are acting unlawfully and that this may be dangerous?

OBJ-7: Humans vs pets

“You may think that the vehicle should be operated in such a way to always sacrifice pets in the car to spare humans,

even when they are jaywalking. Let us agree on the idea that a human life’s value is always greater than an animal’s

but look at the question from a different angle.

Legally in Europe, pets are considered “property,” so that if one murders my pet, they can be charged for damaging

my property. Let us now introduce Green Monkey, who was an American racehorse sold for 16 million dollars in

2006.

Do you think it would be fair for Green Monkey’s owner to sacrifice its 16-million-dollars-value asset conveyed in his

vehicle to save the life of a jaywalker who intentionally broke the law, thus putting everyone at risk?”

3.2. Comparison of participants' choices for Studies 2 and 3

Does presenting choices in an abstract rather than in a concrete way make any difference in

participants' choice? To find out, we used Chi-Square test to compare the percentage of participants

who chose the 'random' option for each factor between the two studies. For Study 3, we used

participants' answers to Questions 1 to 8, in the third set (after discussion). Results are presented in

Table S2. As can be seen, we found a significant difference for three factors out of eight (Gender,

Norm compliance, and Number). In two cases (Age and Numbers), the Concrete presentation raised

the proportion of irrelevant/random answers, but in one case, it lowered this proportion (Norm

compliance). Thus, there was no overall consistent pattern. Note that, due to methodological

differences between Studies 2 and 3, these differences are not necessarily due to the

abstract/concrete difference.

Factor

Study 2 (Abstract)

Study 3 (Concrete)

Chi-Square test

Gender

75.8%

85.2%

p = .43

Age

34.7%

56.5%

p = .005**

Body size

80.5%

80.2%

p = .99

Social status

73.7%

85.5%

p = .31

Species

13.2%

10.8%

p = .57

Norm

40.5%

21.9%

p = .001**

Role

34.7%

38.3%

p = .65

Number

7.4%

25.0%

p < .001***

Table S2. Chi-square test assessing the impact of presentation (Abstract vs. Concrete) on participants’ choice of

answers to Set 1. In each cell, we present the percentage of participants who chose Side 1 vs. the percentage of

participants who chose Side 2. Rightmost column indicates the result of a Chi-Square test comparing the

distribution of answers across all three conditions.

3.3. Effect of condition on participants’ judgments to Set 1

To test the impact of condition (asking for justifications vs. asking for degree of confidence vs. a

control condition), we performed a chi-square test comparing the distribution of all three answers

(Side 1 / Side 2 / Random) between conditions for Set 1. Results are presented in Figure S3.

N°

CONTROL

DOC

JUST

Chi-square

20.3% v. 5.8%

19.6% v. 4.5%

18.3% v. 6.4%

p = .97

6.8% v. 50.5%

8.0% v. 48.2%

10.1% v. 40.4%

p = .62

18.4% v. 13.6%

13.4% v. 20.5%

11.0% v. 33.0%

p = .01*

16.5% v. 7.8%

15.2% v. 8.0%

19.3% v. 9.2%

p = .92

13.6% v. 54.4%

14.3% v. 56.3%

14.7% v. 55.0%

p = .99

5.8% v. 77.7%

3.6% v. 70.5%

3.7% v. 75.2%

p = .49

81.6% v. 10.7%

86.6% v. 6.3%

80.7% v. 8.3%

p = .62

32.0% v. 48.5%

36.6% v. 42.9%

42.2% v. 40.4%

p = .62

12.6% v. 70.9%

8.9% v. 75.0%

12.8% v. 78.9%

p = .31

51.5% v. 3.9%

62.5% v. 4.5%

58.7% v. 6.4%

p = .39

12.6% v. 78.6%

18.8% v. 74.1%

20.2% v. 69.7%

p = .54

Table S3. Chi-square test assessing the impact of condition (CONTROL, DOC, JUST) on participants’ choice of

answers to Set 1. In each cell, we present the percentage of participants who chose Side 1 vs. the percentage of

participants who chose Side 2. Rightmost column indicates the result of a Chi-Square test comparing the

distribution of answers across all three conditions.

3.4. Effect of objections and discussion on participants’ answers

To analyze the effect of argumentation (Set 1 vs. Set 2) and of discussion (Set 2 vs. Set 3), we scored

participants’ answers in the following way: -1 for “Side 1,” 0 for “Random” and 1 for “Side 2.” The

dispersion among replies appears between parentheses in Table S4.

Set 1

Set 2

Set 3

Set 1 vs. Set 2

Set 2 vs. Set 3

Set 1 vs. Set 3

Question 1

-0.14 (0.48)

-0.13 (0.41)

-0.12 (0.37)

+.01

+.02

Question 2

0.38 (0.64)

0.30 (0.58)

0.34 (0.56)

-.08*

+.04

-.04

Question 3

0.09 (0.60)

0.07 (0.54)

-0.01 (0.44)

-.02

-.08**

-.09**

Question 4

-0.09 (0.50)

-0.02 (0.45)

-0.03 (0.38)

+.06*

-.00

+.06*

Question 5

0.41 (0.73)

0.44 (0.69)

0.47 (0.63)

+.03

+.06

Question 6

0.70 (0.54)

0.59 (0.58)

0.69 (0.53)

-.11***

+.10**

-.01

Question 7

-0.75 (0.60)

-0.72 (0.61)

-0.79 (0.51)

+.03

-.07*

-.05

Question 8

0.06 (0.90)

-0.03 (0.89)

-0.01 (0.89)

-.09

+.02

-.07

Question 9

0.64 (0.68)

0.67 (0.64)

0.70 (0.61)

+.03

+.02

+.06

Question 11

-0.53 (0.59)

-0.56 (0.57)

-0.48 (0.57)

-.03

+.08**

+.05

Question 12

0.57 (0.77)

0.46 (0.83)

0.60 (0.74)

-.11*

+.15***

+.03

Table S4. Mean and standard deviations for participants’ answers to vignettes (coded). Impact of

objections and group discussion on respondent’s answers are shown in the three rightmost columns.*

indicates the results of paired t-tests.

3.5. Participants’ change in answers

70% of replies did not change from Set 1 to Set 3. Among the 30% that did, 70% operated a one-way

shift (e.g., Side 2, Side 1, Side 1) and 23% reverted to their initial response (e.g., Side 2, Random,

Side 2). Of the replies that changed definitively, 53% did so after OBJ and 47% after DISC (see

Annex 2.5). Fig. 5 presents how OBJ and DISC impacted respondents’ confidence and perception of

consensus for each category. We aggregated replies according to their evolution through the

experiment: no change (e.g., Side 1, Side 1, Side 1), one-way shift DISC (e.g., Side 1, Side 1, Side 2),

one-way shift OBJ (e.g., Side 1, Side 2, Side 2), comeback (e.g., Side 1, Side 2, Side 1) and lost (e.g.,

Side 1, Side 2, Random).

Figure S5. Effect of objections & discussion on respondent’s confidence & perceived consensus per response

type.

Respondents who do not change their replies are also those who claim to have the highest

degree of confidence at every single step, while those who change their replies twice have the lowest

degree of confidence at every step. In addition, confidence and perception of consensus are impacted

in similar ways by OBJ and DISC for all types of replies except for one-way shift OBJ. This type is

associated with the respondents who changed their minds after having reviewed objections; they have

the highest increase in confidence (+1.0pt) and are not impacted by the decrease in the perception of

consensus. Finally, whereas it seems clear to interpret the one-way shift as coherent with the

continuous increase in confidence, the comeback circuit seems more paradoxical. As we understand

it, OBJ convinces several respondents to change their minds before DISC brings them back to their

initial reply once reassured of it by social confirmation. Thus, the comeback path illustrates the

deliberative process participants are engaging with as they are making increasingly informed and

robust moral judgements.

Such changes could be seen irrelevant at the macro level either because some people come

back to their initial reply or because the switches from one answer to another in a given scenario

compensate for the switches in other scenarios. However, the changes express something

meaningful for moral judgements, which is not captured at the aggregate level: how individuals relate

to their decisions and may feel responsible for them when having to justify themselves. More than

helping a group converge towards consensus, the deliberative process proposed here, combining

OBJ and DISC, is about building meaning. It supports participants to in making decisions they can

relate to, that is, more robust judgements they will stick to, feel confident in, responsible for, and that

they can justify to others.

Figure S6. Distribution of types of changes in responses per scenario.

3.6. Participants’ ratings of objections’ strength

Grade (/5)

Criteria

Arg1

2.44 (1.73)

Sex

Arg2

2.42 (1.67)

Body size

Arg3

2.62 (1.72)

Social status (homeless)

Arg4

2.99 (1.42)

Age

Arg5

2.97 (1.51)

Pedestrian / passengers

Arg6

2.71 (1.35)

Number

Arg7

2.22 (1.56)

Pets vs human

Table S7. Objection’s strength graded by participants from 0 to 5

Supplementary materials

As it appears below, all texts in black were displayed to participants. Texts in blue are comments we are adding

to explain the procedure.

Experiment 1

The first experiment was composed of three sets of questions: Introduction set, Main set and Conclusion set.

All participants addressed them in the same order:

- Step 1: Introduction set

- Step 2: Main Set

- Step 3: Main Set

- Step 4: Conclusion set

In the introduction set, [Instruction*] was replaced by:

“Imagine that you are the designer of the self-driving vehicle. The vehicle is ready to be commercialised and

the last thing you need to do before this is to set up its ethical settings by selecting options 1 or 2 for each

one of the following scenarios.” in condition A

“Imagine that the autonomous driving industry has made giant progress and that self-driving cars are ready

to be commercialised. The government leads a national public consultation to determine which ethical

settings should be selected in case of unavoidable fatalities. As a citizen, you take part of this public survey,

selecting options 1 or 2 for each one of the following scenarios.” in condition B

“Imagine that you are a policy-maker preparing the regulation for the self-driving industry. Autonomous

vehicles are ready to be commercialised and you need to set up their ethical settings by selecting options 1

or 2 for each one of the following scenarios.” in condition C

In the Main set, [Instruction**] was replaced by:

“Please answer the following questions as quickly as possible, giving your own opinion.” for all participants

in step 2

““We will now ask you to answer another round of similar questions, but this time we want you to take

more time to think about your answer. You will only be able to submit it after 10 seconds.”

““We will now ask you to answer another round of similar questions, but this time we want you to take

more time to think about your answer. You will only be able to submit it after 20 seconds.”

““We will now ask you to answer another round of similar questions, but this time we want you to take

more time to think about your answer. You will only be able to submit it after 30 seconds.”

“We will now ask you to answer another round of similar questions, but this time we want you to take

more time to think about your answer. You will only be able to submit it after 40 seconds.” based on

participants’ groups in step 3

In both step 2 and 3 and for each scenarios, the question “what should the self-driving car do?” was preceded

by “as the car designer” in condition A, “as a policy-maker” in condition B, and “as part of the national

consultation” in condition C.

Introduction set

Welcome and thank you for taking part of this survey!

We are two philosophers aiming to better understand how people make moral decisions. This research is

important and we count on your seriousness to advance the state of the art!

This survey is composed of 4 parts including a group discussion session so please do not take a break before it,

otherwise others will have to wait!

Please start by answering the following questions.

Consent for data collection

The data collected (demographic information such as age and gender, answers to questions) will be anonymized,

meaning that all personal data that would allow someone to identify you will be deleted within one week of

your participation. As a consequence, we will not be able to delete your data after this date, if ever you request

us to do so. Anonymized data will be stored on the computers of Hubert Etienne and Prof. Florian Cova,

protected by passwords. Their conservation is not limited in time. The use of these anonymized data might

include inclusion in future research, or sharing with other researchers.

Participants’ Prolific ID will be collected during this study and will appear in our datafile, as collecting them is

necessary to ensure that we do not pay people who did not in fact participate. However, this information will

be deleted as soon as participants’ are paid (within three days from participation). Data from participants who

leave the study before the end will be neither stored, nor used.

You are free to leave the study at any moment, but you will only be paid if you complete it until the end.

Information about research results: If you want to be informed of the results of our studies, please send a mail

to: hubert.etienne@sciencespo.fr, starting from August 31st 2021. Note that no information will be provided

about individual results, and that only general results will be communicated.

Research supervision: This research is supervised by Prof. Florian Cova, Swiss Center for Affective Sciences,

Geneva.

Contact person: For information about this research, please contact Hubert Etienne, Ecole Normale Supérieure,

Department of Philosophy, 45 rue d’Ulm, 75005, Paris (hubert.etienne@sciencespo.fr).

One the basis of the information you just received, and provided that your anonymity will be respected.

I1: Do you agree to voluntarily participate in the present study, and authorise us to use your answers for

teaching and scientific purposes, including the publication of our results in scientific journals and volumes?

• Yes

• No

I2: Please provide your Prolific ID:

• [insert]

I3: What best describes you?

• Male

• Female

• Other

I4: How old are you?

• [insert]

Today, many actors of the automotive industry are working to develop fully autonomous vehicles (self-driving

cars), as vehicles capable of driving themselves, without the intervention of a human driver.

I5: If fully autonomous vehicles were deployed and commercialised tomorrow for a reasonable price, how

likely would you be ready to buy one?

• Extremely likely

• Somewhat likely

• Neither likely or unlikely

• Somewhat unlikely

• Extremely unlikely

I6-1: [if selected choices 1-2 for I12] Why so?

• I would feel safer inside

• I could save time and do other things than driving

• I dislike driving or cannot drive

I63-2: [if selected choices 3-5 for I12] Why so?

• I would not feel safe inside

• I do not see the benefits

• I enjoy driving

• I dislike the idea of giving up my autonomy to a machine

Once commercialised, self-driving cars can reasonably be expected to be much safer than human drivers. They

may, however, still ends up facing complex situations of unavoidable fatalities, where a decision should be taken

to prioritise saving some people at the expense of others.

In this example, the self-driving car suddenly experiences a brake failure, which prevents it from stopping on

time. Two options are possible:

Decision 1 Decision 2

Continue ahead and drive Swerve and hit the pedestrians

through the pedestrians, lawfully crossing on the right lane

lawfully crossing ahead. of the crossing

Consequences: Consequences:

Death of Death of

2 homeless people 2 2 old women

1 Woman

In another scenario, the choice may not oppose different group of pedestrians, but pedestrians and the

passengers of the self-driving car. Here is an example:

Decision 1 Decision 2

Continue ahead and drive Swerve and crash into a concrete

through the pedestrians, barrier

lawfully crossing ahead.

Consequences: Consequences:

Death of Death of

2 homeless people 2 2 women

1 Woman 2 1 man

I7: Is it the first time you see this kind of scenarios with autonomous vehicles?

• Yes, I have never seen these before.

• No, I have already seen these before, but never taken the test.

• No, I have already seen these before and taken the test.

[Instruction*]

In all scenarios, the self-driving car is facing a brake failure preventing it from stopping, and can only do 2 things:

either keep straight, or swerve on the other lane.

It is assumed that all people hit by the car die (either on the left or the right side of the crossing). If the car

crashed into a concrete barrier, all its passengers also die.

Main set

[Instruction**]

MS1: What should the self-driving car do?

Decision 1 Decision 2

Continue ahead and drive Swerve and hit the pedestrians

through the pedestrians, lawfully crossing on the right lane

lawfully crossing ahead. of the crossing.

Consequences: Consequences:

Death of Death of

2 men 2 2 women

• Option 1 • Option 2

MS2: What should the self-driving car do?

Decision 1 Decision 2

Continue ahead and drive Swerve and hit the pedestrians

through the pedestrians, lawfully crossing on the right lane

lawfully crossing ahead. of the crossing.

Consequences: Consequences:

Death of Death of

1 athletic man 2 1 fat man

1 athletic woman 2 1 fat woman

• Option 1 • Option 2

MS3: What should the self-driving car do?

Decision 1 Decision 2

Continue ahead and drive Swerve and hit the pedestrians

through the pedestrians, lawfully crossing on the right lane

lawfully crossing ahead. of the crossing.

Consequences: Consequences:

Death of Death of

1 homeless man 2 1 executive man

1 homeless woman 2 1 executive woman

• Option 1 • Option 2

MS4: What should the self-driving car do?

Decision 1 Decision 2

Continue ahead and drive Swerve and hit the pedestrians

through the pedestrians, lawfully crossing on the right lane

lawfully crossing ahead. of the crossing.

Consequences: Consequences:

Death of Death of

1 criminal man 2 1 male doctor

1 criminal woman 2 1 female doctor

• Option 1 • Option 2

MS5: What should the self-driving car do?

Decision 1 Decision 2

Continue ahead and drive Swerve and crash into a concrete

through the pedestrians, barrier.

lawfully crossing ahead.

Consequences: Consequences:

Death of Death of

2 homeless people. 2 2 1 male doctor

1 woman 2 1 female doctor

• Click here • Do not click here

MS6: What should the self-driving car do?

Decision 1 Decision 2

Continue ahead and drive Swerve and hit the pedestrians

through the pedestrians, lawfully crossing on the right lane

lawfully crossing ahead. of the crossing.

Consequences: Consequences:

Death of Death of

1 man 2 2 1 woman

1 old woman 2 1 old man

• Option 1 • Option 2

MS7: What should the self-driving car do?

Decision 1 Decision 2

Continue ahead and drive Swerve and hit the pedestrians

through the pedestrians, lawfully crossing on the right lane

lawfully crossing ahead. of the crossing.

Consequences: Consequences:

Death of Death of

1 old woman 2 2 1 woman

1 man 2 1 young boy

• Option 1 • Option 2

MS8: What should the self-driving car do?

Decision 1 Decision 2

Continue ahead and drive Swerve and hit the pedestrians

through the pedestrians, lawfully crossing on the right lane

lawfully crossing ahead. of the crossing.

Consequences: Consequences:

Death of Death of

2 women 2 2 1 woman

• Option 1 • Option 2

MS9: What should the self-driving car do?

Decision 1 Decision 2

Continue ahead and drive Swerve and crash into a concrete

through the pedestrians, barrier.

lawfully crossing ahead.

Consequences: Consequences:

Death of Death of

1 woman (pedestrian) 1 woman (passenger)

1 man (pedestrian) 1 man (passenger)

• Option 1 • Option 2

MS10: What should the self-driving car do?

Decision 1 Decision 2

Continue ahead and drive Swerve and hit the pedestrians

through the pedestrians, lawfully crossing on the right lane

lawfully crossing ahead. of the crossing.

Consequences: Consequences:

Death of Death of

3 men 1 man

2 women 1 woman

• Option 1 • Option 2

MS11: What should the self-driving car do?

Decision 1 Decision 2

Continue ahead and crash Swerve and hit the pedestrians

into a concrete barrier. lawfully crossing on the right lane

of the crossing.

Consequences: Consequences:

Death of Death of

1 woman (passenger) 1 woman (pedestrian)

1 man (passenger) 1 man (pedestrian)

• Option 1 • Option 2

MS12: What should the self-driving car do?

Decision 1 Decision 2

Continue ahead and drive Swerve and crash into a concrete

through the pedestrians, barrier.

lawfully crossing ahead.

Consequences: Consequences:

Death of Death of

1 woman (pedestrian) 1 man (passenger)

1 man (pedestrian)

• Option 1 • Option 2

MS13: What should the self-driving car do?

Decision 1 Decision 2

Continue ahead and drive Swerve and crash into a concrete

through the pedestrians, barrier.

lawfully crossing ahead.

Consequences: Consequences:

Death of Death of

1 man (jaywalking) 1 pet (passenger)

• Option 1 • Option 2

MS14: What should the self-driving car do?

Decision 1 Decision 2

Continue ahead and drive Swerve and crash into a concrete

through the pedestrians, barrier.

lawfully crossing ahead.

Consequences: Consequences:

Death of Death of

3 men (jaywalking) 1 man (passenger)

• Option 1 • Option 2

MS15: What should the self-driving car do?

Decision 1 Decision 2

Continue ahead and drive Swerve and hit the pedestrians

through the pedestrians, lawfully crossing on the right lane

lawfully crossing ahead. of the crossing.

Consequences: Consequences:

Death of Death of

3 men (jaywalking) 1 woman

• Option • Option 2

MS16: What should the self-driving car do?

Decision 1 Decision 2

Continue ahead and drive Swerve and crash into a concrete

through the pedestrians, barrier.

lawfully crossing ahead.

Consequences: Consequences:

Death of Death of

1 woman 2 women (passenger)

• Option 1 • Option 2

MS17: What should the self-driving car do?

Decision 1 Decision 2

Continue ahead and drive Swerve and crash into a concrete

through the pedestrians, barrier.

lawfully crossing ahead.

Consequences: Consequences:

Death of Death of

1 old woman 2 woman (passenger)

1 old man 1 man (passenger)

• Option 1 • Option 2

Conclusion set

C1: If fully autonomous vehicles were deployed and commercialised tomorrow for a reasonable price, how

likely would you be ready to buy one?

• Extremely likely

• Somewhat likely

• Neither likely or unlikely

• Somewhat unlikely

• Extremely unlikely

C2: [if selected choices 1-2 for C1] Why so?

• I would feel safer inside

• I could save time and do other things than driving

• I dislike driving or cannot drive

• Other [please specify]

C3: [if selected choices 3-5 for C1] Why so?

• I would not feel safe inside

• I do not see the benefits

• I enjoy driving

• I dislike the idea of giving up my autonomy to a machine

• Other [please specify]

Thanks for taking part of this experiment!

Here is your completion code to provide to Prolific: 2B2A7427.

Experiment 2

The second experiment was composed of five sets of questions, addressed by all participants in the same order:

- Step 1: Introduction set

- Step 2: Set 1

- Step 3: Discussion set

- Step 3: Set 2

- Step 4: Conclusion set

In the introduction set, [Instruction*] was replaced by:

“Imagine that the autonomous driving industry has made giant progress and that self-driving cars are ready

to be commercialised. The government leads a national public consultation to determine which ethical

settings should be selected in case of unavoidable fatalities. As a citizen, you take part of this public survey,

selecting options 1 or 2 for each one of the following scenarios.” in condition B

“Imagine that you are a policy-maker preparing the regulation for the self-driving industry. Autonomous

vehicles are ready to be commercialised and you need to set up their ethical settings by selecting options 1

or 2 for each one of the following scenarios.” in condition C

In step 2, for each scenarios, the question “what should the self-driving car do?” was preceded by “as a policy-

maker” in condition B, and “as part of the national consultation” in condition C