ArticlePDF Available

A Naturalistic Investigation of Trust, AI, and Intelligence Work

May 2022
Journal of Cognitive Engineering and Decision Making 16(2):155534342211037

May 2022
16(2):155534342211037

Authors:

MITRE

Artificial Intelligence (AI) is often viewed as the means by which the intelligence community will cope with increasing amounts of data. There are challenges in adoption, however, as outputs of such systems may be difficult to trust, for a variety of factors. We conducted a naturalistic study using the Critical Incident Technique (CIT) to identify which factors were present in incidents where trust in an AI technology used in intelligence work (i.e., the collection, processing, analysis, and dissemination of intelligence) was gained or lost. We found that explainability and performance of the AI were the most prominent factors in responses; however, several other factors affected the development of trust. Further, most incidents involved two or more trust factors, demonstrating that trust is a multifaceted phenomenon. We also conducted a broader thematic analysis to identify other trends in the data. We found that trust in AI is often affected by the interaction of other people with the AI (i.e., people who develop it or use its outputs), and that involving end users in the development of the AI also affects trust. We provide an overview of key findings, practical implications for design, and possible future areas for research.

…

Factors Affecting Trust

…

Agreement (K) and Count of Trust Factors in CIT Responses (N = 25)

…

Figures - uploaded by Stephen Dorton

Content may be subject to copyright.

Content uploaded by Stephen Dorton

Content may be subject to copyright.

TRUST, AI, AND INTELLIGENCE WORK

Article Name: A Naturalistic Investigation of Trust, AI, and Intelligence Work

Authors: Stephen L. Dorton & Samantha B. Harper

Institutions: Human-Autonomy Interaction Laboratory – Sonalysts, Inc.

Abstract: Artificial Intelligence (AI) is often viewed as the means by which the intelligence community

will cope with increasing amounts of data. There are challenges in adoption, however, as outputs of such

systems may be difficult to trust for a variety of factors. We conducted a naturalistic study using the Critical

Incident Technique (CIT) to identify which factors were present in incidents where trust in an AI technology

used in intelligence work (i.e. the collection, processing, analysis, dissemination, of intelligence) was

gained or lost. We found that explainability and performance of the AI were the most prominent factors in

responses; however, several other factors affected the development of trust. Further, most incidents

involved two or more trust factors, demonstrating that trust is a multifaceted phenomenon. We also

conducted a broader thematic analysis to identify other trends in the data. We found that trust in AI is often

affected by the interaction of other people with the AI (i.e. people who develop it or use its outputs), and

that involving end users in the development of the AI also affects trust. We provide an overview of key

findings, practical implications for design, and possible future areas for research.

Citation:

Dorton, S.L., & Harper, S.B. (2022). A naturalistic investigation of trust, AI, and intelligence work.

Journal of Cognitive Engineering and Decision Making, 16(4), 222-236.

https://doi.org/10.1177/15553434221103718

TRUST, AI, AND INTELLIGENCE WORK

Intelligence is a complex and high-stakes work domain, concerned with the planning, collection,

processing, analysis, and dissemination of information to support decision making (Clark, 2014).

Intelligence professionals, especially analysts, are faced with numerous challenges to cognition, including

time pressure (Hoffman, et al., 2011), undefined starting and stopping points (Hoffman et al., 2011; Wong

& Kodagoda, 2015), a surplus of non-diagnostic and intentionally deceptive information (Trent, et al.,

2007), and numerous cognitive biases and pitfalls that adversely affect reasoning (Heuer, 2017).

In addition to these cognitive challenges, new sensors and open sources are providing a continuously

expanding amount of data for analysts to work with, exceeding what analysts can feasibly process and

exploit. Artificial Intelligence (AI) technologies (including machine learning) are commonly viewed as the

means to increase the speed and effectiveness of an analyst’s ability to glean insights from large and

complex datasets (McNeese, et al., 2016; Ackerman, 2021), with applications for intelligence analysis at

the tactical, operational, and strategic levels (Symon & Tarapore, 2015). This vision for AI-driven analysis

has been echoed in policy and other documentation across the Intelligence Community (IC) and the

Department of Defense (DoD) (ODNI, 2019; Lee, et al., 2018).

A concern, however, is that AI technologies are relatively prescriptive in nature (i.e. they are built upon a

relatively linear process of entering data, values, and weights), and are often designed, developed, and

fielded without considering the cognitive work of analysts (Moon & Hoffman, 2005). There is a growing

body of work suggesting that intelligence analysis is not a linear or prescriptive process of critical thinking,

but rather an iterative sensemaking process, where analysts use a variety of abductive, inductive, and

deductive reasoning strategies to make sense of disparate information (Hoffman, et al., 2012; Moon &

Hoffman, 2005; Moore, 2011; Wong, 2014; Wong & Kodagoda, 2015; Wong & Kodagoda, 2016; Gerber,

et al., 2016). This mismatch of rigid tools with more fluid cognitive processes results in tools being misused

or disused by analysts (Moon & Hoffman, 2005).

Recently, efforts have been made to leverage the intuition and robustness of human analysts with the

computational capacity of AI through novel workflows, visualizations, and interfaces (Kamaraj & Lee,

2020; Skarbez, et al., 2019). More specific to intelligence work, recent research has investigated novel

human-AI workflows, where the human analyst works with the AI to maximize the benefits of both agents

in analytic tasks such as authorship attribution (Dorton & Hall, 2021) and aerial collections planning

(Gutzwiller & Reeder, 2021). Vogel, et al. (2021) assessed the impact of AI on intelligence analysis,

although their focus was on the broader analytic culture, rather than analysis itself. Capiola et al. (2020)

examined the factors affecting how teams of intelligence professionals rapidly build trust in each other,

although their work was focused on human-human trust (not human-AI trust). There remains a gap in the

research of understanding how analysts gain and lose trust in AI in the context of challenging intelligence

work.

Explainable AI

While there are numerous factors affecting trust in AI, none have received as much recent attention as

explainability; therefore, we discuss it here as its own phenomenon, before exploring trust more broadly.

Explainable AI (XAI) refers to AI technology that can be easily understood, such that a human can interpret

why and how the technology arrived at a specific decision (Volz, et al., 2018; Michael, 2019). Stated more

casually, XAI aims to overcome the “black box” characterization typical of deep learning technologies

(Angelov, et al., 2021). Explanations may be characterized as global or local, where global explanations

are focused on how the system works in general, and local explanations provide insight as to why a

particular step or decision was made (Hoffman, et al., 2018). Further, there are various explanation methods,

such as contrastive explanations or counterfactual reasoning, which can be employed based on the context

of the application (Hoffman, et al., 2018; Pieters, 2020). Recent research has explored the required

TRUST, AI, AND INTELLIGENCE WORK

components of an explanation or explainability (Baber, et al., 2021; Yang, Wang, & Deleris, 2021), and

has even developed a self-explanation scorecard (Muller, et al., 2021).

Explainability is an important factor in trusting AI systems, as it enables users to not only justify system

outputs and maintain better control of the system, but also enables discovery, or the general gain of

knowledge (Adadi & Berrada, 2018). Aside from these various benefits, Angelov, et al. (2021) argue that

explainability is critical simply because it allows users to evaluate risk, which drives the adoption (or not)

of AI in different high-stakes applications. This is evidenced by the calls for XAI in numerous high-stakes

fields such as medicine, autonomous driving systems, and air traffic control (Cadario, et al., 2021; Lorente,

et al., 2021; Xie, et al., 2021). Given this argument that explainability is crucial for overcoming opacity and

assessing risk, we assumed that explainability would also be a critical factor in gaining or losing trust in AI

in intelligence work, a high-stakes domain.

The following are factors or components of XAI technologies that have been identified in the literature

(Roth-Berghofer & Cassens, 2005; Sørmo, et al., 2005; Hagras, 2018):

• Justification: The AI explains why the answer provided is a good answer.

• Transparency: The AI explains how the system reached the answer (where system decisions are

explained in terms, formats, and languages that we can understand).

• Conceptualization: The AI clarifies the meaning of concepts.

• Learning: The AI teaches you about the domain.

• Bias: The AI has verification that decisions made based on the AI system were made fairly and

were not based on a biased view of the world.

Trust in AI

While much of the recent trust work on AI has been through the lens of explainability, there is a considerable

body of work on the broader area of trust in automation (e.g. alarms, robotics, and unmanned systems) to

consider. Trust has been defined by Lee & See (2004, p. 54), as “the attitude that an agent will help achieve

an individual’s goals in a situation characterized by uncertainty and vulnerability.” Trust is not a binary

phenomenon, but rather a spectrum with a considerable gray area between trust and distrust (Roff & Danks,

2018), where trust is calibrated over time based on interactions with the system (Schaefer, et al., 2016;

Rebensky, et al. 2021; Yang, Schemanske, & Searle, 2021). Trust (or, to be more specific, calibrated trust)

is important for effective human-AI teaming. Miscalibrated trust, in which the user places either too much

or too little trust in a system, can result in the user relying on the system for more than is intended (i.e.

misuse), or not using the system to its full capabilities (i.e. disuse) (Hoff & Bashir, 2015; Parasuraman &

Riley, 1997).

Trust in a given system is influenced by dozens of factors, including human-related factors (e.g. individual

traits and emotive factors), automation- or system-related factors (e.g. capabilities of the system), and

environmental factors (e.g. task characteristics) (Schaefer, et al., 2016; Dorton & Harper, 2021). While

some researchers frame trust as primarily a function of reliability or predictability (Roff & Danks, 2018),

understandability is also a prominent factor, as are factors such as goal congruence. We have synthesized a

set of trust factors from the literature which we expected to have an impact on trust in AI systems (Table

1).

Throughout the literature there are many cases where different terms were used to describe the same concept

or trust factor, and in turn, there were also cases where a single term was used by various sources to describe

different trust factors. For example, Jian et al. (2000) use “reliability” to describe something similar to goal

congruence, while others refer to it in a more statistical manner (e.g. Madsen & Gregor, 2000). Further, the

distinction between some factors can become blurry depending on the type or application of AI. For

example, the factors of performance and reliability are similar in cases where the AI performs a binary

classification problem (i.e. provides a yes or no answer). However, we distinguish performance and

TRUST, AI, AND INTELLIGENCE WORK

reliability as being analogous to accuracy (the outputs are correct) and precision (the outputs are consistent

based on the inputs), respectively (Watson, 2019). To attempt to address these issues, we combined

synonyms that were deemed to be sufficiently conceptually proximal. For example, we combined

explainability, understandability, transparency, interpretability, and feedback, as these terms can all be used

somewhat interchangeably to describe the properties of an AI system that do not take a “black box”

approach (Angelov, et al., 2021).

Table 1

Factors Affecting Trust

Factor

Synonyms

Summary Definition

Reputation

Transitive Trust

The agent has received endorsement or reviews from others

(Siau & Wang, 2018; Roff & Danks, 2018).

Usability

Personal

Attachment

The agent is easy to interact with, and/or enjoyable to work

with (Siau & Wang, 2018; Madsen & Gregor, 2000;

Balfe, et al., 2018).

Security

Privacy Protection

The importance of operational safety and data security to

the agent (Siau & Wang, 2018).

Utility

The usefulness of the agent in completing a task (Siau &

Wang, 2018).*

Goal Congruence

Shared Mental

Model

The extent to which the agent’s goals align with your own

(Siau & Wang, 2018).

Reliability

Predictability

The agent is reliable and consistent in functioning over

time (Madsen & Gregor, 2000; Muir & Moray, 1996,

Sheridan, 1999; Balfe, et al., 2018).

Robustness

Error Tolerance

The agent is able to function under a variety of

circumstances (Sheridan, 1999; Woods, 1996; Balfe, et

al., 2018).

Explainability

Understandability,

Transparency,

Interpretability,

Feedback

The extent to which you are able to understand what the

agent is doing, why it is doing it, and how it is doing it

(Balfe, et al., 2018; Angelov, et al., 2021).

Performance

Competence,

Accuracy,

Errors, False

Alarms

The perceived ability of the agent to perform its tasks well

(Balfe, et al., 2018; Madsen & Gregor, 2000).

Directability

Subordination

The degree to which the agent’s actions are able to be

modified or changed (Klein et al., 2004; Schaefer, et al.,

2016)

* Yang, Schemanske, & Searle (2021) do not ascribe a specific term, but describe a similar concept where

a trust decrement after automation failure is larger when the outcome is undesirable, but smaller when

human-automation team is still successful (i.e. the automation failure does not impede the ultimate outcome

of work).

Goals and Research Questions

As previously discussed, there are large bodies of work on trust in automation, and explainability regarding

AI, with both supported primarily by heavy theoretical and laboratory work. The naturalistic work on AI in

intelligence analysis has so far not focused on trust in AI specifically. Therefore, our overarching goal is to

fill this gap in the research by developing a naturalistic understanding of how various factors affect the gain

and loss of trust in AI in the high-stakes domain of intelligence work (planning, collections, analysis, etc.).

More specifically, we desire to investigate whether specific factors and frameworks from the literature align

TRUST, AI, AND INTELLIGENCE WORK

with naturalistic findings regarding the complex sociotechnical system of humans, AI technologies, and

intelligence work. Although this research is fundamentally exploratory in nature, there are two high-level

research questions we aimed to answer:

• Which factors (e.g. explainability) from the literature are commonly cited in incidents where trust

is gained or lost in AI, in the context of intelligence work?

• What various sociotechnical phenomena exist when humans use AI for intelligence work?

Methods

We employed a naturalistic approach to achieve the aforementioned research goals, meaning that we

focused on how trust is actually gained or lost in the natural, context-rich environment of intelligence work.

The Naturalistic Decision Making (NDM) approach (and various naturalistic methods) was adopted after it

was found that statistical models and decision support systems did not improve decision making and/or

were not adopted for field use (Nemeth & Klein, 2010). Naturalistic approaches were developed to consider

behavior outside the laboratory (i.e. how individuals should make decisions), and how they are made in real

world settings (i.e. how individuals actually make decisions). Although this effort is focused on trust rather

than decision making, it is still considered a naturalistic approach to understanding the phenomenology.

Data Collection

Data were collected using the Critical Incident Technique (CIT), a research method for collecting

retrospective reports of human behavior in incidents that meet specific criteria of interest (Flanagan, 1954).

This method is not only effective for exploratory and investigative analysis of extreme or atypical events,

but is also tremendously flexible, allowing researchers to adopt it for a wide variety of uses (Butterfield, et

al., 2005).

We developed a CIT template with a standard set of questions that were split up into three different passes.

The first pass consisted of questions to understand the background and context of the event itself (e.g. did

they have a choice in using the AI, did they receive training, and was the training sufficient?). The second

pass included having participants tell the story about the incident, and any follow-up questions from the

research team (based on the context of their story). The third and final pass consisted of retrospective

questions (e.g. did the incident change the way you worked with the AI, how much did you trust it before

and after the incident, how much do you trust it now?).

Interviews were conducted with two researchers and a single participant, which generally took between 60

and 90 minutes to complete. Participants were asked to describe two incidents, with one incident involving

the gain or loss of trust with an AI technology, and the other incident involving the gain or loss of trust with

a human, for a total of two CIT responses. We documented responses in a CIT template form, whereby all

responses were transcribed within 24 hours of the interview. No audio recordings were collected because

of the need to protect the anonymity of participants, which is of utmost importance when working in the

intelligence domain (Vogel, et al., 2021). As we could not collect verbatim recordings, each incident was

transcribed by one researcher and then reviewed and edited by the other researcher, in accordance with best

practices to increase validity (Johnson, 1997).

Thematic Analysis

We used thematic analysis to analyze the data collected during CIT interviews, because the method is well

suited for identifying and describing themes in large amounts of qualitative data (Braun & Clarke, 2006;

Nowell, et al., 2017; King, 2004). We used an iterative thematic analysis approach similar to that of

Sherwood, et al. (2020), where we initially conducted a low-level thematic analysis to identify statements

of interest from the transcript, followed by a high-level thematic analysis to identify and define themes

based on these extracts. This was an iterative process where themes were cut, modified, or added as more

of the data were reviewed.

TRUST, AI, AND INTELLIGENCE WORK

The authors then individually coded the transcribed responses for the presence or absence of the identified

factors (from the literature) and the themes (from the thematic analysis process). We regularly met to

compare batches of coded cases, and used a simple two-step argumentation process to resolve any

discrepancies in codes (Dorton, et al., 2019). This resulted in consensus being reached on all cases. As

conducted by Klein and Jarosz (2011), we re-coded all cases for a certain theme if at any point the definition

for the theme was updated to resolve any ambiguities. We then completed a final iteration of low-level

thematic analysis of each theme based on the cases mapped to each theme.

Although we came to consensus on all codes for all cases through the argumentation process, we used

Cohen’s Kappa (K) to measure the inter-rater agreement of the two coders in identifying whether a factor,

theme, or other phenomenon was present (1) or absent (0) in a case. Kappa measures the proportion of

agreement (values range from -1.00 to 1.00) while correcting for random chance, and is an appropriate

measure of agreement given the two coders and the nominal level data (Cohen, 1960; Tinsley & Weiss,

1975; Hallgren, 2012). Hallgren (2012) argues that Kappa values can indicate slight agreement (.00-.20),

fair agreement (.21-.40), moderate agreement (.41-.60), substantial agreement (.61-.80), or near perfect

agreement (.81-1.00); however, others have argued simply that Kappa values less than .40 indicate poor

agreement (Banerjee, et al., 1999).

Participants and Dataset

We interviewed 29 current and former intelligence professionals, who had a total of 563 years (n = 28, M

= 20.11, SD = 11.26) of experience in intelligence, including not only analysis, but also planning,

collections, and decision making (military and/or policy) as a direct consumer of intelligence. This sample

was above and beyond the recommended sample size for phenomenological research (Guest, 2006). As

shown in Table 2, the participants had experience in a broad variety of intelligence disciplines (INTs), with

the majority of experience in All-source Intelligence, Signals Intelligence (SIGINT), and Open Source

Intelligence (OSINT). These INTs included what Clark (2014) calls non-literal and literal intelligence

disciplines, where the collected intelligence requires processing to have actionable value or not,

respectively.

Table 2

Participant Experience Across Different Intelligence Disciplines

Intelligence Discipline

Sum

SIGINT: Signals Intelligence

20.11

9.62

324

*ELINT: Electronic Intelligence

17.33

14.19

*COMINT: Communications Intelligence

25.00

All-Source

13.18

11.35

346

OSINT: Open Source Intelligence

13.38

10.08

174

MASINT: Measurement & Signature Intelligence

15.17

12.75

ACINT: Acoustics Intelligence

15.00

8.94

HUMINT: Human Intelligence

10.50

9.88

GEOINT: Geospatial Intelligence

13.33

10.41

IMINT: Imagery Intelligence

11.00

1.41

* ELINT and COMINT are subsets of SIGINT.

Note. SD cannot be calculated where n = 1.

Not only did participants have a diverse set of experiences across the different INTs, but also hailed from

a diverse set of organizations, each specializing in specific INTs, and/or in specific geopolitical or technical

domains. Participants had years of experience in different IC and DoD organizations, including, but not

limited to the: Central Intelligence Agency (CIA), Defense Intelligence Agency (DIA), National Security

Agency (NSA), Department of State (DOS), Defense Counterintelligence and Security Agency (DCSA),

Office of Naval Intelligence (ONI), National Air and Space Intelligence Center (NASIC), U.S. Navy, U.S.

Air Force, U.S. Marine Corps, and the U.S. Coast Guard.

TRUST, AI, AND INTELLIGENCE WORK

Each participant provided one story about gaining or losing trust in AI in the context of intelligence (one

participant provided two), generating a rough set of 30 stories. Three stories were excluded from the dataset.

One was not directly relevant to intelligence, another because the technology described in the incident failed

to meet a fairly broad characterization of AI (Diakopoulos, 2016), and the third because the participant

opted to tell a more general story about the evolution of analytic tradecraft. It should be noted that two of

these stories failed to discuss a specific instance where trust was gained or lost in AI, but still provided

valuable insights about AI use within the context of intelligence work, and were within scope (i.e. they

were concerning AI and intelligence work). Thus, we excluded those stories for the analysis of trust factors

(N = 25), but included them for the more general thematic analysis (N = 27).

Results

Factors Affecting Trust

We coded incidents for the presence or absence of each of the factors identified in Table 1, to determine

which factors were more prevalent in gaining or losing trust in AI. Table 3 provides an overview of these

results, including the count of how many times each factor was present in incidents (broken out by

directionality), and the reliability of coding (K) for each factor. These factors are hereafter reported

individually for sake of clarity in explanation; however, it should be noted that they do not exist in isolation,

and commonly co-occur with other factors. The mean number of factors present in an AI story was 2.76

(SD = 1.01), where 23 incidents (92%) had two or more factors present. To protect the anonymity of

participants we parenthetically cite the incident number for each participant quote (e.g. CIT 9).

Table 3

Agreement (K) and Count of Trust Factors in CIT Responses (N = 25)

Agreement

(K)

By Directionality

Factor

Gained

Trust (+)

Lost Trust

(-)

All Cases

(+/-)

Explainability

.82

Performance

.23

Utility

.43

Robustness

.48

Usability

.60

Reliability

.51

Reputation

.24

Safety & Security

Directability

1.00

Goal Congruence

* Kappa could not be calculated because inputs from at least one rater were constant.

Explainability was the most frequently present factor in gaining or losing trust in AI (n = 17, 68%). Of

those 17 cases, explainability was a factor in eight cases where trust was gained, and nine cases where trust

was lost. Generally, explainability of the AI increased trust because it allowed users to understand why

outputs were or were not correct, “You could look at the model and see the words and phrases it sorts for…

the model gave you some insights as to why it was flagged. This reinforced that it was capturing the right

things…” (CIT 15). Conversely, when AI lacked explainability, participants found that they could not make

sense of obviously correct or incorrect outcomes, or in some cases, apply correct outcomes to decision

making, “It will give you a GEO Plot with red, yellow, green… ‘and red means… uhhhh…’ We realized we

didn’t know what it meant… Don’t go there ever? Be alert if you do go there?” (CIT 8).

Performance was the second most common factor in AI incidents (n = 13, 52%), where it was present in

seven incidents where trust was gained, and six incidents where trust was lost. In addition to providing

TRUST, AI, AND INTELLIGENCE WORK

utility in decision making, correct outputs from high-performance AI increased trust by confirming

intelligence from other sources, “The [parameter] matched what we expected… the sensor told us…

confirmed it was the [threat that intelligence had warned about]” (CIT 28). Similarly, incorrect outputs

(i.e. low performance AI) decreased trust by not allowing participants to make informed decisions, knowing

that the outputs were incorrect.

Utility was the third most common factor in AI incidents (n = 12, 48%). Simply put, participants gained

trust when the AI helped them do their job (n = 10), and lost trust when it did not (n = 2). Participants

disentangled utility from performance (i.e. accuracy of outputs), where trust was gained in even a

marginally-performing AI if it helped in at least some aspect of the job at hand, “If it found something for

me at all it was a huge positive, because it was for such a huge quantity of data that there was probably no

way for me to get it in a practical sense - I had nothing to lose by using the tool” (CIT 57). Similarly, trust

was lost in high-performing AI if it did not provide practical utility, “We made the best DIME and PMESII

models ever

, they just won’t [ever] have enough data… [Organization] cancelled the program” (CIT 46).

Robustness was another common factor (n = 9, 36%), which was present in only two stories about gaining

trust, and in seven stories about losing trust in AI. Participants reported robustness in AI manifesting in

terms of the AI being successful outside of its originally-intended use case, and by being able to

continuously learn and adapt based on new intelligence, “The system is able to learn new data… getting

data from intelligence on targets that are always changing” (CIT 36). Conversely, trust was commonly lost

in brittle AI (i.e. agents that lacked robustness) where the technology could only perform under a specific

set of circumstances that did not account for an adequate proportion of real-world events, “The tool is

designed to work at a certain range... They trained what is a LDA model

on data, validated it, then gave

it data outside the boundaries of what it was trained on” (CIT 4).

Usability was a less common factor (n = 6, 24%), that was present in two incidents where trust was gained,

and four incidents where trust was lost. Usability did not appear to be the primary factor in gaining or losing

trust, but instead compounded the effects of other factors. For example, the ease of interacting with the

agent promoted use of the AI and competency development, which positively affected trust, “The UI

was

also very easy to use, which helped [learn the system]” (CIT 24). Conversely, if the AI was cumbersome

or otherwise unpleasant to interact with, the lack of usability promoted disuse of the system, preventing the

development or repair of trust, “I was spending hours per day combing through reports… It was hard to

find relevant information so I used it less frequently” (CIT 31).

Reliability was another relatively uncommon factor (n = 6, 24%), which was present in two incidents where

trust was gained, and four incidents where trust was lost. Interestingly, there was an observed duality in the

effects of reliability in misperforming AI, where reliably low performance contributed to both gaining and

losing trust. In some cases reliability helped build trust when model outputs were demonstrably wrong, “I

guess I’ve gained confidence because the algorithm consistently gives results that are imperfect” (CIT 55).

However, reliably poor performance also degraded trust in some cases, driving disuse of systems, “Is it

functioning properly or not? More often than not it wasn’t... It was kind of expected… so the alternative

was just [workaround]” (CIT 59).

Other Factors such as ‘Reputation’ (n = 3, 12%), ‘Safety and Security’ (n = 2, 8%) and ‘Directability’ (n

= 1, 4%) were also present, although infrequently. Reputation impacted trust when it was confirmed or

disconfirmed by firsthand experience with the AI, “Other people told me that all the reporting it spits out

is relevant… I didn’t have a lot of experience in this domain so I relied on it and trusted it more than an

experienced person would” (CIT 31). As one might assume, trust was lost when AI compromised the safety

DIME = Diplomacy, Information, Military, and Economic; PMESII = Political, Military, Economic, Social,

Information, and Infrastructure.

LDA = Linear Discriminant Analysis.

UI = User Interface

TRUST, AI, AND INTELLIGENCE WORK

of the participant and/or their colleagues, even in simulated training exercises, “If it was an actual [threat]

we would have been dead… you want the screening to be good enough to keep [threats] out of the kill

radius” (CIT 33). In one case, an inability to override or otherwise direct the AI based on the context of the

mission decreased trust, where the participant said the outputs of an AI-based GEOINT system were more

limited than features on their car that can be overridden when necessary, “It’s like [automated] brakes on

a car- it goes red for tall grass but I need to be able to bump into it to make the turn…” (CIT 8).

Themes Regarding Trust, AI, and Intelligence

Using the previously described thematic analysis process resulted in the identification of several themes.

The following is a list of these themes identified in the data, accompanied by a brief description.

• Trust by Proxy: Trust was affected by how other humans interacted with the AI (i.e. beyond

reputation or endorsements).

• Users Involved in Development: Trust was affected by the presence or absence of end users and/or

domain subject matter experts being involved in the development of the AI.

• Reputation: Reputation or endorsements from peers affected trust in the agent. This is not limited

to reputation’s role in the incident itself, but also in calibrating trust before or after the incident.

• Character: Trust was affected by the absence of character in the AI. This was not an issue with

anthropomorphism (i.e. the AI having humanlike features or not), but regarding AI not having

character flaws like human colleagues (e.g. lying or self-promoting), but instead being a function

of its inputs.

• Trust by Failure: Trust was gained in the agent despite it failing at some part of its task.

• Asymmetric Feedback: The agent typically provided only one type of feedback (i.e. positive or

negative), which affected trust.

As shown in Table 4, the two most common themes were trust by proxy (n = 11) and users in development

(n = 10). Other themes were less prevalent, but are still reported, accompanied by more detailed descriptions

and exemplar quotes from different collected incidents.

Table 4

Agreement (K) and Count of General Themes in CIT Responses (N = 27)

Agreement

(K)

By Directionality

Factor

Gained

Trust (+)

Lost Trust

(-)

All Cases

(+/-)

Trust by Proxy*

.49

Users Inv. in Dev.*

.32

Reputation

.92

Character*

.68

Trust by Failure

.65

Asymmetric Feedback

.78

* Included a general case, so gained and lost trust cases do not add up to all cases.

Trust by Proxy was the most common theme found in incidents with AI (n = 11, 41%), which included

incidents where trust in the AI was affected by human behavior with, or input to the AI. In most cases (n =

9) participants lost trust in the AI because its performance was largely subject to the inputs from other

fallible humans, “the reason I put less faith in this stuff is that humans can train it on bad data with bad

features and then say it’s gold when the outcomes are bad… you can see how much human assumptions

can affect the model’s performance” (CIT 55). Similarly, participants lost trust in the AI because of how

other people misused the outputs of the AI, typically by using it in conditions for which it was not validated,

TRUST, AI, AND INTELLIGENCE WORK

“I understand what it should be used for… I lost trust in people’s use of the tool” (CIT 11). Conversely,

trust was gained in AI in a few cases (n = 2) because of the role of humans affecting the inputs to the AI,

“A human verifying a nomination gives me much higher confidence than the algorithm feeding itself” (CIT

29).

Users Involved in Development was the second most common theme in incidents with AI (n = 10, 37%).

Participants reported gaining trust in the system because end users and/or domain experts were involved in

the development of the system (n = 6), “…we were lucky to have experts/former targeting officers, it was

helpful for the development of models” (CIT 55). Conversely, participants lost trust in the AI when they

knew end users were not involved in the development process (n = 3), “It was the mathematicians that

developed it, and they did not include the experts enough… the design requirement to develop the AI was

flawed” (CIT 10). In one case, the participant specifically highlighted the need for collaboration across the

technical and operational experts, “I think they should have cross-fertilized teams… [the analysts] have no

concept on the performance of the system or the guys developing the neural nets” (CIT 50).

Reputation was a recurring theme in several cases (n = 7, 19%) of gaining or losing trust in AI. While the

factor of reputation was limited to instances where a reputation affected trust during the incident, the theme

of reputation is more inclusive, and includes the role of reputation before, during, and after incidents where

trust was gained or lost. Reputation (as the more inclusive theme) was used to calibrate trust in two ways,

before and after the incident. Most commonly (n = 5), reputation was used to calibrate trust prior to the

incident in question, “I’ve heard anecdotal stories where people tell me it didn’t pick this up and it should

have” (CIT 44). Less commonly (n = 2), participants stated they would need corroboration or peer

endorsements from others in order to regain trust in an AI after a negative incident.

Character was present in some cases (n = 5, 19%), across incidents where trust was gained and lost, and in

a general story (i.e. one that did not fit the CIT template). Participants cited the fact that AI has no character

(unlike humans, who can have character flaws) as a factor in gaining trust in AI, “It’s a computer program

that does what we tell it do… so we force our own goals on it” (CIT 28). Conversely, some participants

viewed the lack of character with indifference, “I don’t believe technology is good or bad, it’s really the

way I use it” (CIT 50).

Trust by Failure was a relatively uncommon theme (n = 4, 15%), and was exclusive to incidents with AI.

In half of these cases participants reported gaining trust in the AI because said failure allowed them to learn

the boundary conditions or limitations of the AI, “In a way, [it] increased my trust because I have a better

understanding of what challenges can occur” (CIT 50); or, as another participant stated, “I didn’t

necessarily lose trust in the system- I learned its limitations” (CIT 17). The other two cases involved gaining

trust because the system behaved consistently or predictably, despite low performance, “Yeah, abject failure

improved my trust in the [AI]… it demonstrated that [it] is doing what I told it to” (CIT 55).

Asymmetric Feedback was only present in only two cases (7%), but merits discussion. In both cases

participants gained trust in an AI technology after it finally provided positive feedback (e.g. that it detected

and identified a certain threat), because it had previously not provided negative feedback (e.g. status updates

confirming that it is working properly, but has not detected anything), “We thought it was broken because

it never worked… When you were underway it was on all the time and didn’t spit anything out until it

received something” (CIT 53). Despite being a rarely occurring theme, this is important to note as

intelligence and national security work often involves using AI to detect rare but exceptionally grave

threats.

Discussion

We used the CIT to understand what factors affected trust in AI technologies in the context of intelligence

work. Additionally, we conducted thematic analysis of collected data to identify other sociotechnical

themes in human-AI interaction. This naturalistic approach allowed us to test theory and laboratory work,

TRUST, AI, AND INTELLIGENCE WORK

and to identify other factors to be considered when designing, employing, or otherwise integrating such

systems into intelligence work. As a result, we provided specific examples of how trust factors manifest in

incidents where intelligence professionals gain or lose trust in AI. Further, we have found evidence for

several high-level themes regarding trust, AI, and intelligence work, some of which are novel or not

reported elsewhere. These findings provide readers with terms and representations for various phenomena,

as well as real-world cases to point to when justifying research, requirements, and designs to AI developers,

project managers, and other engineers involved in the development of AI-based intelligence systems. After

conducting analysis of nearly 30 incidents, we were able to generate several key findings:

• Explainability is, in fact, a critical factor in developing trustable AI. It played a role in nearly 70%

of all reported incidents, spread evenly across incidents of gaining and losing trust. In all 17 cases

where explainability was a factor, participants referred specifically to transparency (how the AI

generated the output, n = 13) and/or justification (why the output is a good output, n = 11). In

contrast, the aforementioned XAI concepts of conceptualization, learning, and bias were never cited

by participants in incidents where they gained or lost trust in AI.

• Performance plays a role in gaining or losing trust in AI, but is separate from utility. We saw in

several instances that highly accurate AI lacked utility if it was not robust enough for use with real

world data, and conversely, intelligence professionals found poorly performing AI to be useful.

Knowing the strengths and weaknesses of AI, people will adapt their work accordingly, “As a

detection system it worked great-better than a human. The identification of the signal- not at all. It

was not accurate for the most part, it would require human inputs. It was a good tipper… it had

accurate [detection], but bad ID… so we passed the detections on for manual analysis…” (CIT

17).

• Reliability or predictability was not a prominent factor in gaining or losing trust in AI. Despite

assertions from the literature (e.g. Roff & Danks, 2018), reliability was present in only six incidents

(24%).

• It is difficult to separate trust in AI from trust in the greater human-AI sociotechnical system. The

trust by proxy theme was present in nearly half of reported incidents (n = 11, 41%). Trust in an AI

system can be affected by humans in two primary ways: The people who curate inputs to the system,

and the people who use the outputs of the system (sometimes erroneously).

• Having users involved in the development of the AI had an impact on gaining or losing trust (n =

10, 37%). When users were not involved, trust was lost by not only from a decreased shared mental

model between the intelligence professional and the AI, but we also saw specific instances where

the developers struggled to operationalize the outputs of their models.

• Trust factors and themes rarely occurred in isolation. As previously mentioned, all but two incidents

had two or more trust factors present (M = 2.76, SD = 1.01). Similarly, we conducted a two-tailed

point-biserial correlation on themes, which showed that there was a significant correlation between

the Trust by Proxy and Character themes (RS = .56, p < .01). Said differently, people who believe

AI to have no agency or character tended to also base their trust in the AI at least partially on other

people who interact with it. Both themes were present in five cases, or 19% of the sample. These

were incidents where participants described the AI lacking any character or agency, “It’s a

computer program that does what we tell it to do…” (CIT 28), and therefore their trust in the AI

was also affected by other humans who were developing the AI or putting data into it, “There’s a

lot of people who have accounts… you don’t know everyone who is putting info into it” (CIT 22).

• Some trust factors, depending on their manifestation, were disproportionately reported in incidents

with gained or lost trust. For example, participants rarely gained trust because an AI was robust to

operational context (n = 2); however, they commonly lost trust when the AI was brittle to such

context (n = 7). Conversely, participants frequently noted when AI provided utility to complete

work (n = 10); however, a lack of utility was rarely reported in incidents where trust was lost (n =

2).

TRUST, AI, AND INTELLIGENCE WORK

This research was hindered somewhat by the inability to collect verbatim recordings of interviews, violating

the best practice of low inference descriptors (Johnson, 1997). Although this is not ideal, it has been

acknowledged that studying intelligence work often requires artificialities in scenarios and extra care to be

taken in protecting the identities of individuals in order to render it publishable in public forum (Trent, et

al., 2007; Vogel, et al., 2021). We took several precautions to increase validity, including data triangulation

and investigator triangulation, and noted specifically when we were confident we captured specific quotes

verbatim (Johnson, 1997). Although it is far from systematic or exhaustive, we also briefed our results to

approximately 10 participants (one-third of the sample), who concurred with our interpretation of the data,

and registered no issues (Johnson, 1997; Butterfield et al., 2005). Further, we relied on what each participant

stated with minimal inference, rather than making broader inference on factors, likely resulting in

underreporting for some factors (e.g. safety and security was likely a factor in most cases; however,

participants only explicitly mentioned it as a concern in two cases). Finally, we focused on a relatively

narrow set of trust factors. While we believe our set of factors was relatively exhaustive for the domain of

study (i.e. it included enough factors such that some had few or no cases), it is less than half of the factors

identified from Schaefer et al. (2016). It is entirely possible that with longer interviews and/or survey

methods we could uncover the role of several factors that were not studied in this effort.

Another consideration is that we did not focus specifically on intelligence analysis, but rather intelligence

work more broadly (planning, collections, analysis, and decision making as a consumer of intelligence).

Further, we did not focus on AI use by a sample within a specific organization or INT. Intelligence

professionals in each organization and each INT likely use AI in different ways to achieve different goals

under different contexts. Thus, there are limits to the generalizability or predictive value of the different

counts or frequencies reported herein. For example, one cannot validly infer that utility (n = 12) is twice as

important as usability (n = 6) as a factor in developing trust in AI for intelligence work (i.e. because it

appeared twice as frequently). While some may argue that this highly inclusive dataset may dilute the

findings, we argue that it is a strength of this study, providing broader understanding of how trust in AI is

gained or lost across the broader intelligence cycle. As noted by Klein, et al. (2021), such a “wide net”

approach with minimal inclusion criteria is appropriate when data collection is opportunistic (in this case

participants with unclassified stories were difficult to come by) and when the objective of the study is more

exploratory in nature (e.g. not a more formal meta-analysis). This wide net and naturalistic approach has

enabled greater understanding of phenomena including decision making, insight, and explanations (Klein

et al., 2010; Klein & Jarosz, 2011; Klein, et al., 2021); so it stands to reason that it is also sufficient for

investigating trust.

This study also provided practical considerations for designers and integrators of AI systems for intelligence

work. The following are some recommendations based on the findings of this study:

• Involve end users in the development of the AI. More specifically, end users and domain experts

should be engaged to determine suitable inputs and outputs for the system, define expectations for

performance, and to identify contextual factors that are likely to change in operational use. Doing

so should allow developers to design and develop AI that has greater robustness to changing

contexts, greater utility in nominal and off-nominal situations, and better explainability when

performance expectations are not met.

• Involve developers in the training of end users. During collaborative development and/or through

training products delivered with the system, developers should clearly articulate capabilities and

limitations of the AI that may affect its performance, robustness, and reliability, as well as the

degree to which users were involved in the development of the AI. Understanding these limitations

will likely augment the “explainability” of the AI by allowing end users to interpret cues that

provide insights as to the behavior of the system. Knowing the extent to which users were involved

in the development of the AI (e.g. feature engineering and/or development of algorithms) will allow

users to understand the degree to which their mental model of the system matches its logic, and

also serves as a trust signaling function (Riasnow, et al., 2015).

TRUST, AI, AND INTELLIGENCE WORK

• Provide symmetric explainability. Explainability has been demonstrated to affect the development

of trust with AI. Additionally, we have seen that an ideal system should not only provide positive

feedback, but also negative feedback in use cases where the AI works persistently.

Looking forward, the degree to which these findings can be generalized to other high- (medical, power

plant, etc.) and low-stakes (media recommender systems) domains is unclear. Similarly, it would be

interesting to analyze cases based on the type of AI they refer to (e.g. prioritizing, classification, association,

or filtering; Diakapoulos, 2016); however, more data would be required for an analysis at that level. Further,

we envision the development of an empirically-driven checklist or scale that can be used to determine the

readiness of an AI system for adoption into the intelligence enterprise. These possibilities serve to illustrate

that this study should be viewed as a stepping stone to numerous other paths of research.

Acknowledgements

This work was supported in part by the US Army Combat Capabilities Development Command

(DEVCOM) under Contract No.W56KGU-18-C-0045. The views, opinions, and/or findings contained in

this report are those of the author and should not be construed as an official Department of the Army

position, policy, or decision unless so designated by other documentation. This document was approved for

public release on 23 September 2021, Item No. A364.

The authors wish to thank Kelly Neville and Mark Pfaff for their helpful insights regarding research and

analytic methods.

References

Ackerman, R. K. (2021). AI offers to change every aspect of intelligence. AFCEA SIGNAL.

Adadi, A. & Berrada, M. (2018). Peeking inside the black box: A survey on explainable artificial

intelligence (XAI). IEEE Access, 6, 52138-52160.

https://doi.org/10.1109/ACCESS.2018.2870052

Angelov, P. P., Soares, E. A., Jiang, R., Arnold, N. I., & Atkinson, P. M. (2021). Explainable artificial

intelligence: an analytical review. WIREs Data Mining and Knowledge Discovery, 1424, 1-13.

https://doi.org/10.1002/widm.1424

Baber, C., McCormick, E., & Apperly, I. (2021). A human-centered process model for explainable AI.

Naturalistic Decision Making and Resilience Engineering Symposium 2021. Toulouse, France.

Balfe, N., Sharples, S., & Wilson, J. R. (2018). Understanding is key: An analysis of factors pertaining to

trust in a real-world automation system. Human Factors, 60(4), 477–495.

https://doi.org/10.1177/0018720818761256

Banerjee, M., Capozzoli, M. McSweeny, L., & Sinha, D. (1999). Beyond kappa: A review of interrater

agreement measures. Canadian Journal of Statistics, 27, 3-23. https://doi.org/10.2307/3315487

Braun, V., Clarke, V. (2006). Using thematic analysis in psychology. Qualitative Research in Psychology,

3, 77–101. https://doi.org/10.1191/1478088706qp063oa

Butterfield, L. D., Borgen, W. A., Amundson, N. E., Maglio, A. T. (2005). Fifty years of the critical

incident technique: 1954-2004 and beyond. Qualitative Research, 5(4), 475-497.

https://doi.org/10.1177/1468794105056924

Cadario, R., Longoni, C., & Morewedge, C. K. (2021). Understanding, explaining, and utilizing medical

artificial intelligence. Nature Human Behavior. https://doi.org/10.1038/s41562-021-01146-0.

Capiola, A., Baxter, H. C., Pfahler, M. D., Calhoun, C. S., & Bobko, P. (2020). Swift trust in ad hoc

teams: A cognitive task analysis of intelligence operators in multi-domain command and control

contexts. Journal of Cognitive Engineering and Decision Making, 14(3), 218-241.

https://doi.org/10.1177/1555343420943460

Clark, R. M. (2014). Intelligence collection. SAGE.

Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological

Measurement, 20(1), 37-46. https://doi.org/10.1177/001316446002000104

TRUST, AI, AND INTELLIGENCE WORK

Diakopoulos, N. (2016). Accountability in algorithmic decision making. Communications of the ACM,

59(2), 56-62. https://doi.org/10.1145/2844110

Dorton, S. L., Frommer, I. D., & Garrison, T. M. (2019). A theoretical model for assessing information

validity from multiple observers. 2019 IEEE Conference on Cognitive and Computational

Aspects of Situation Management (CogSIMA), 62-68.

https://doi.org/10.1109/COGSIMA.2019.8724242

Dorton, S. L. & Hall, R. A. (2021). Collaborative human-AI sensemaking for intelligence analysis. In H.

Degen & S. Ntoa (Eds.), Artificial intelligence in HCI, (pp. 185-201). Springer Nature.

https://doi.org/10.1007/978-3-030-77772-2_12

Dorton, S. L. & Harper, S. (2021). Trustable AI: A critical challenge for naval intelligence. Center for

International and Maritime Security (CIMSEC).

Flanagan, J. C. (1954). The Critical Incident Technique. Psychological Bulletin, 5, 327-358.

http://dx.doi.org/10.1037/h0061470

Gerber, M., Wong, B. L. W., & Kodagoda, N. (2016). How analysts think: Intuition, leap of faith, and

insight. Proceedings of the Human Factors and Ergonomics Society 2016 Annual Meeting, 60(1)

173-177. https://doi.org/10.1177/1541931213601039

Guest, G., Bunce, A., & Johnson, L. (2006). How many interviews are enough? An experiment with data

saturation and variability. Field Methods, 18(1), 59-82.

https://doi.org/10.1177/1525822X05279903

Gutzwiller, R. S. & Reeder, J. (2021). Dancing with algorithms: Interaction creates greater preference and

trust in machine-learned behavior. Human Factors, 63(5), 854-867.

https://doi.org/10.1177/0018720820903893

Hallgren, K. A. (2012). Computing inter-rater reliability for observational data: An overview and tutorial.

Tutor Quant Methods Psychol., 8(1), 23-34.

Hagras, H. (2018). Toward human-understandable, explainable AI. In Computer, 51(9), 28-36.

https://doi.org/10.1109/MC.2018.3620965.

Heuer, R. (2017). Psychology of intelligence analysis. Echo Point Books.

Hoff, K. A., & Bashir, M. (2015). Trust in automation: Integrating empirical evidence on factors that

influence trust. Human Factors, 57(3), 407–434. https://doi.org/10.1177/0018720814547570

Hoffman, R., Klein, G., & Muller, S.T. (2018) Explaining Explanation for “explainable AI”. Proceedings

of the Human Factors and Ergonomics Society 2018 Annual Meeting, 62(1), 197-201.

https://doi.org/10.1177/1541931218621047

Hoffman, R. R, Henderson, S., Moon, B., Moore, D. T., & Litman, J. A. (2011). Reasoning difficulty in

analytical activity. Theoretical Issues in Ergonomics Science, 12(3), 225-240.

https://doi.org/10.1080/1464536X.2011.564484

Jian, J., Bisantz, A., Drury, C.G., Llinas, J. (2000). Foundations for an empirically determined scale of

trust in automated systems. International Journal of Cognitive Ergonomics, 4(1), 53-71.

https://doi.org/10.1207/S15327566IJCE0401_04

Johnson, R. B. (1997). Examining the validity structure of qualitative research. Education, 118(2), 282-

292.

Kamaraj, A. V. & Lee, J. D. (2021). Using machine learning to aid in data classification: Classifying

occupation compatibility with highly automated vehicles. Ergonomics in Design, 29(2), 4-12.

https://doi.org/10.1177/1064804620923193

King, N. (2004). Using templates in the thematic analysis of text. In Cassell, C. & Symon, G. (Eds.),

Essential guide to qualitative methods in organizational research (pp. 257–270). SAGE.

Klein, G., Hoffman, R., Mueller, S., & Newsome, E. (2021). Modeling the process by which people try to

explain complex things to others. Journal of Cognitive Engineering and Decision Making, 15(4),

213-232. https://doi.org/10.1177/15553434211045154

Klein, G. & Jarosz, A. (2011). A naturalistic study of insight. Journal of Cognitive Engineering and

Decision Making, 5(4), 335-351. https://doi.org/10.1177/1555343411427013

TRUST, AI, AND INTELLIGENCE WORK

Klein, G., Calderwood, R., Clinton-Cirocco, A. (2010). Rapid decision making on the fire ground: The

original study, plus a postscript. Journal of Cognitive Engineering and Decision Making, 4(3),

186-209. https://doi.org/10.1518/155534310X12844000801203

Klein, G., Woods, D. D. Bradshaw, J. M., & Hoffman, R. (2004). Ten challenges for making automation

a “team player” in joint human-agent activity. Intelligent Systems, 19(6), 91-95.

https://doi.org/10.1109/MIS.2004.74

Lee, J., & See, K. A. (2004). Trust in automation: Designing for appropriate reliance. Human Factors,

46(1), 50-80. https://doi.org/10.1518/hfes.46.1.50_30392

Lee, M., Valisetty, R., Breuer, A., Kirk, K., Panneton, B., & Brown, S. (2018). Current and future

applications of machine learning for the US Army (Report No. ARL-TR-8345). Aberdeen

Proving Ground, MD: US Army Research Laboratory.

Lorente, M. P., Lopez, E. M., Florez, L. A., Espino, A. L., Martinez, J. A. I, & de Miguel, A. S. (2021).

Explaining deep-learning-based driver models. Applied Science, 11(8).

https://doi.org/10.3390/app11083321

Madsen, M. & Gregor, S. (2000). Measuring human-computer trust. 11th Australasian Conference on

Information Systems, 53, 6-8.

McNeese, N.J., Hoffman, R.R., McNeese, M.D., Patterson, E.S., Cooke, N.J., & Klein, G. (2016). The

human factors of intelligence analysis. Proceedings of the 2016 International Annual Meeting of

the Human Factors and Ergonomics Society, 59(1), 130-134.

https://doi.org/10.1177/1541931215591027

Michael, N. (2019). Trustworthy AI – Why does it matter? National Defense.

Moon, B. M. & Hoffman, R. R (2005). How might “transformational” technologies and concepts be

barriers to sensemaking in intelligence analysis, Proceedings of the Seventh International

Naturalistic Decision Making Conference, J.M.C. Schraagen (Ed.), Amsterdam, The Netherlands,

June 2005.

Muir, B.M. & Moray, N. (1996). Trust in automation. Part II. Experimental studies of trust and human

intervention in a process control simulation. Ergonomics, 39(3), 429-460.

https://doi.org/10.1080/00140139608964474

Muller, S. T., Veinott, E. S., Hoffman, R. R., Klien, G., Alam, L., Mamun, T., & Clancey, W. J. (2021).

Principles of explanation in human-AI systems. AAAI 2021 – Explainable Agency in Artificial

Intelligence Workshop.

Nemeth, C. & Klein, G. (2010). The naturalistic decision making perspective. Encyclopedia of

Operations Research and Management Science.

https://doi.org/10.1002/9780470400531.eorms0410.

Nowell, L. S., Norris, J. M., White, D. E., & Moules, N. J. (2017). Thematic analysis: Striving to meet

trustworthiness criteria. International Journal of Qualitative Methods, 16(1), 1-13.

https://doi.org/10.1177/1609406917733847

Office of the Director of National Intelligence (ODNI) (2019). National Intelligence Strategy of the

United States of America 2019. Retrieved from:

https://www.dni.gov/files/ODNI/documents/National_Intelligence_Strategy_2019.pdf%3futm_so

urce%3dPress%20Release%26utm_medium%3dEmail%26utm_campaign%3dNIS_2019

Parasuraman, R. & Riley, V. (1997). Humans and automation: Use, misuse, disuse, and abuse. Human

Factors, 39(2), 240-253. https://doi.org/10.1518/001872097778543886

Rebensky, S., Carmody, K., Ficke, C., Nguyen, D., Carroll, M., Wildman, J., & Thayer, A. (2021).

Whoops! Something went wrong: Errors, trust, and trust repair strategies in human agent teaming.

In H. Degen & S. Ntoa (Eds). Artificial intelligence in HCI, (pp. 95-106). Springer Nature.

https://doi.org/10.1007/978-3-030-77772-2_7

Riasnow, T., Ye, H., & Goswami, S. (2015). Generating trust in online consumer reviews through

signaling: An experimental study. 48th Hawaii International Conference on Systems Sciences,

3307-3316. https://doi.org/10.1109/HICSS.2015.399

TRUST, AI, AND INTELLIGENCE WORK

Roff, H. M. & Danks, D. (2018). “Trust but verify”: The difficulty of trusting autonomous weapons

systems. Journal of Military Ethics, 17(1), 2-20, https://doi.org/10.1080/15027570.2018.1481907

Rosala, M. (2020). The Critical Incident Technique in UX. Nielsen Norman Group. Retrieved from:

https://www.nngroup.com/articles/critical-incident-technique/

Roth-Berghofer, T. R., & Cassens, J. (2005). Mapping goals and kinds of explanations to the knowledge

containers of case-based reasoning systems. ICCBR 2005, 3630, 451–464.

Schaefer, K. E., Chen, J. Y. C., Szalma, J. L., & Hancock, P. A. (2016). A meta-analysis of factors

influencing the development of trust in automation: Implications for understanding autonomy in

future systems. Human Factors, 58(3), 377-400. https://doi.org/10.1177/0018720816634228

Sherwood, S. M., Neville, K. J., McLean, A. L. M. T., Walwanis, M. M., & Bolton, A. E. (2020).

Integrating new technology into the complex system of air combat training. In H. A. H Handley

& A. Tolk (Eds). A framework of human systems engineering: Applications and case studies, (pp.

185-204). Wiley. https://doi.org/10.1002/9781119698821.ch10

Siau, K. & Wang, W. (2018). Building trust in artificial intelligence, machine learning, and robotics.

Cutter Business Technology Journal, 31(2), 47-53.

Skarbez, R., Polys, N. F., Ogle, J. T., North, C., Bowman, D. A. (2019). Immersive analytics: Theory and

research agenda. Frontiers in Robotics and AI, 6 (82), 1-15.

https://doi.org/10.3389/frobt.2019.00082

Sørmo, F., Cassens, J., & Aamodt, A. (2005). Explanation in case-based reasoning – perspectives and

goals. Artificial Intelligence Review, 24(2), 109–143. https://doi.org/10.1007/s10462-005-4607-7

Symon, P. B. & Tarapore, A. (2015). Defense intelligence analysis in the age of big data. Joint Forces

Quarterly, 79(4), 4-11.

Sheridan, T. B. (1999). Human supervisory control. In A. P. Sage & W. B. Rouse (Eds.), Handbook of

systems engineering and management (pp. 645–690). Wiley & Sons.

Tinsley, H. E. A., & Weiss, D. J. (1975). Interrater reliability and agreement of subjective judgments.

Journal of Counseling Psychology, 22(4), 358-376.

Trent, S. A., Patterson, E. S., & Woods, D. D. (2007). Challenges for cognition in intelligence analysis.

Journal of Cognitive Engineering and Decision Making, 1(1), 75-97.

https://doi.org/10.1177/155534340700100104

Vogel, K. M., Reid, G., Kampe, C., & Jones, P. (2021). The impact of AI on intelligence analysis:

Tackling issues of collaboration, algorithmic transparency, accountability, and management.

Intelligence and National Security, 1-11. https://doi.org/10.1080/02684527.2021.1946952

Volz, V., Marjchrzak, K., & Preuss, M. (2018). A social science-based approach to explanations for

(game) AI. 2018 IEEE Conference on Computational Intelligence and Games (CIG), 474-481.

https://doi.org/ 10.1109/CIG.2018.8490361

Watson, D. (2019). The rhetoric and reality of anthropomorphism in artificial intelligence. Minds and

Machines, 29, 417-440. https://doi.org/10.1007/S11023-019-09506-6

Wong, B. L. W. (2014). How analysts think (?): Early observations. 2014 IEEE Joint Intelligence and

Security Informatics Conference, 269-299. https://doi.org/10.1109/JISIC.2014.59

Wong, B. L. W. & Kodagoda, N. (2015). How analysts think: inference making strategies. Proceedings of

the Human Factors and Ergonomics Society Annual Meeting, 59(1), 269-273.

https://doi.org/10.1177/1541931215591055

Wong, B. L. W., Kodagoda, N. (2016). How analysts think: Anchoring, laddering, and associations.

Proceedings of the Human Factors and Ergonomics Society Annual Meeting, 60(1), 178-182.

https://doi.org/10.1177/1541931213601040

Woods, D. D. (1996). Decomposing automation: Apparent simplicity, real complexity. In R. Parasuraman

& M. Mouloua (Eds.), Automation technology and human performance (pp. 3–17). Lawrence

Erlbaum.

Xie, Y., Ponsakornsathien, N., Gardi, A., & Sabatini, R. (2021). Explanation of machine-learning

solutions in air traffic management. Aerospace, 8(8). https://doi.org/10.3390/aerospace8080224

TRUST, AI, AND INTELLIGENCE WORK

Yang, X.J., Schemanske, C., & Searle, C. (2021). Toward quantifying trust dynamics: how people adjust

their trust after moment-to-moment interaction with automation. Human Factors, 1-17.

https://doi.org/10.1177/0018720811034716

Yang, L., Wang, H., & Deleris, L. (2021). What does it mean to explain? A user-centered study on AI

explainability. In H. Degen & S. Ntoa (Eds.), Artificial intelligence in HCI, (pp. 107-121).

Springer Nature. https://doi.org/10.1007/978-3-030-77772-2_8

Exploring Trust With the AI Incident Database

Article

Oct 2023

Engineering trustworthy artificial intelligence (AI) is important to adoption and appropriate use, but there are challenges to implementing trustworthy AI systems. It is difficult to translate trust studies from the laboratory to the field. It is also difficult to operationalize “trustworthy AI” frameworks and principles to inform the actual development of AI. We address these challenges with an approach based in reported incidents of trust loss “in the wild.” We systematically identified 30 cases of trust loss in the AI Incident Database to gain insight into how and why humans lose trust in AI in various contexts. These factors could be codified into the development cycle in various forms such as checklists and design patterns to manage trust in AI systems and avoid similar incidents in the future. Because it is based in real incidents, this approach offers recommendations that are concrete and actionable for teams addressing real use cases with AI systems.

Exploring Trust with the AI Incident Database

Preprint

Full-text available

Jun 2023

Engineering trustworthy artificial intelligence (AI) is important to adoption and appropriate use, but there are challenges to implementing trustworthy AI systems. It is difficult to translate trust studies from the laboratory to the field. It is also difficult to operationalize "trustworthy AI" frameworks and principles to inform the actual development of AI. We address these challenges with an approach based in reported incidents of trust loss "in the wild." We systematically identified 30 cases of trust loss in the AI Incident Database to gain insight into how and why humans lose trust in AI in various contexts. These factors could be codified into the development cycle in various forms such as checklists and design patterns to manage trust in AI systems and avoid similar incidents in the future. Because it is based in real incidents, this approach offers recommendations that are concrete and actionable for teams addressing real use cases with AI systems.

CHAAIS: Climate-focused Human-machine teaming and Assurance in Artificial Intelligence Systems – Framework applied toward wildfire management case study

Conference Paper

Full-text available

Jan 2023

Climate change and the resulting cascade of impacts pose a real and urgent threat to human safety. Simultaneously, products from Artificial Intelligence (AI) research have grown exponentially and show high potential towards use in climate adaptation. However, an increasingly large barrier to responsive deployment and adoption of AI tools into climate change adaptation workflows is the actionable knowledge discrepancy between the fields of AI, Human Machine Teaming (HMT), AI Assurance, and the work of climate adaptation decision makers. To ensure alignment, applications of AI to climate change adaptation actions need a framework and knowledge base that map development considerations to the decision maker workflow. This paper introduces CHAAIS (Climate-focused Human-machine teaming and Assurance in Artificial Intelligence Systems), a design standard and accompanying knowledge base detailing the necessary human element of AI interaction in the high-risk domain of climate change. CHAAIS incorporates direct user interaction, decision maker adoption considerations, and downstream implications. Our process combines accepted HMT and AI Assurance principles for ethical design while testing specific issues in their intersection in the climate change domain. Specifically, we demonstrate this process with a case study in forestry and implications for wildfire management. The goal for the CHAAIS design framework and knowledge base is to be both a living information source and an adaptable method of tailoring future climate change AI solutions for responsive deployment directly informed by climate decision makers.

Clinical Decision Support Systems and Trust in Automation: Case of a Clinical Reminder for Titration of Beta Blockers

Article

Full-text available

Oct 2023

Trust in automation depends on more than just the automation itself, but the larger context in which the automation and the human operator are collaborating. This study takes a naturalistic approach to explore providers' trust in a Clinical Decision Support System. Primary Care Providers were shown simulated medical records and a prototype Clinical Reminder indicating that the patient should be titrated with recommended Beta Blockers to address the patient's Heart Failure with reduced ejection fraction. Analysis of responses showed three main themes: Concerns about the medical documentation used to generate the recommendation; Complexity of the patient condition and care delivery context (and how such factors limit possible courses of action); and Concerns about the Clinical Reminder and clinical guideline it is instantiating. These results align with the macrocognitive model of trust and reliance based on sensemaking and flexecution.

In the Rough: Evaluation of Convergence Across Trust Assessment Techniques Using an Autonomous Golf Cart

Article

Full-text available

Oct 2023

As automated and autonomous systems become more widely available, the ability to integrate them into environments seamlessly becomes more important. One cognitive construct that can predict the use, misuse, and disuse of automated and autonomous systems is trust that a user has in the system. The literature has explored not only the predictive nature of trust but also the ways in which it can be evaluated. As a result, various measures, such as physiological and behavioral measures, have been proposed as ways to evaluate trust in real-time. However, inherent differences in the measurement approaches (e.g., task dependencies and timescales) raise questions about whether the use of these approaches will converge upon each other. If they do, then the selection of any given proven approach to trust assessment may not matter. However, if they do not converge, it raises questions about the ability of these measures to assess trust equally and whether discrepancies are attributable to discriminant validity or other factors. The present study used various trust assessment techniques for passengers in a self-driving golf-cart. We find little to no convergence across measures, raising questions that need to be addressed in future research.

RAD-XP: Tabletop Exercises for Eliciting Resilience Requirements for Sociotechnical Systems

Article

Full-text available

Sep 2023

Despite noble intentions, new technologies may have adverse effects on the resilience of the sociotechnical systems into which they are integrated. Our objective was to develop a lightweight method to elicit requirements that, if implemented, would support sociotechnical system resilience. We developed and piloted the Resilience-Aware Development Exercise Protocol (RAD-XP), a method to generate tabletop exercises (TTXs) to elicit resilience requirements. In the pilot study, this approach generated 15 requirements from a one-hour TTX, where the majority of requirements were found to support resilience. Participants indicated via survey that RAD-XP was effective and efficient, and that they would want to use RAD-XP regularly throughout the agile development process. We discuss future research and development to refine this approach to eliciting resilience requirements.

Grand challenges in intelligent aerospace systems

Article

Full-text available

Sep 2023

Kelly Cohen

We are now entering the third decade of the 21st Century, and in recent years, the achievements made by scientists have been exceptional, leading to major advancements in the still-growing field of Aerospace Engineering. Artificial Intelligence (AI) is one of the technologies driving the aerospace industry, leading to increasingly autonomous aerospace systems which have resulted in new opportunities to radically alter the state-of-the-art. The aim of this Research Topic, led by Specialty Chief Editor Professor Kelly Cohen, is to highlight the latest advancements in research across the field of intelligent aerospace systems for civil and commercial aviation, with new insights, novel developments, current challenges, and future perspectives being key areas of interest. To acknowledge the efforts and commitment of authors to shaping this discussion, Professor Kelly Cohen will select and award the best article accepted to this collection. Topic themes of particular interest include, but are not limited to, the following: • Trustworthy AI • Certifiable AI • Advanced Air Mobility • Human-AI Teaming • Assured autonomy • Air traffic management enabled by AI • Systems Engineering of AI-Enabled Aerospace Systems • AI-assisted aerospace design • Cybersecurity-hardening of AI-driven systems A brief discussion of some of these topics can be found in this Specialty Grand Challenge article. All manuscript types are welcome and new articles will be added to this collection as they are published.

Data-driven agriculture and sustainable farming: friends or foes?

Article

Full-text available

Aug 2023
PRECIS AGRIC

Sustainability in our food and fiber agriculture systems is inherently knowledge intensive. It is more likely to be achieved by using all the knowledge, technology, and resources available , including data-driven agricultural technology and precision agriculture methods, than by relying entirely on human powers of observation, analysis, and memory following practical experience. Data collected by sensors and digested by artificial intelligence (AI) can help farmers learn about synergies between the domains of natural systems that are key to simultaneously achieve sustainability and food security. In the quest for agricultural sustainability, some high-payoff research areas are suggested to resolve critical legal and technical barriers as well as economic and social constraints. These include: the development of holistic decision-making systems, automated animal intake measurement, low-cost environmental sensors, robot obstacle avoidance, integrating remote sensing with crop and pasture models, extension methods for data-driven agriculture, methods for exploiting naturally occurring Genotype x Environment x Management experiments, innovation in business models for data sharing and data regulation reinforcing trust. Public funding for research is needed in several critical areas identified in this paper to enable sustainable agriculture and innovation.

It’s better than nothing: The influence of service failures on user reusage intention in AI chatbot

Article

Jun 2024
ELECTRON COMMER R A

Foresight for Ethical AI

Article

Full-text available

Jul 2023

There is a growing expectation that artificial intelligence (AI) developers foresee and mitigate harms that might result from their creations; however, this is exceptionally difficult given the prevalence of emergent behaviors that occur when integrating AI into complex sociotechnical systems. We argue that Naturalistic Decision Making (NDM) principles, models, and tools are well-suited to tackling this challenge. Already applied in high-consequence domains, NDM tools such as the premortem, and others, have been shown to uncover a reasonable set of risks of underlying factors that would lead to ethical harms. Such NDM tools have already been used to develop AI that is more trustworthy and resilience, and can help avoid unintended consequences of AI built with noble intentions. We present predictive policing algorithms as a use case, highlighting various factors that led to ethical harms and how NDM tools could help foresee and mitigate such harms.

Explanation of Machine-Learning Solutions in Air-Traffic Management

Article

Full-text available

Aug 2021

Advances in the trusted autonomy of air-traffic management (ATM) systems are currently being pursued to cope with the predicted growth in air-traffic densities in all classes of airspace. Highly automated ATM systems relying on artificial intelligence (AI) algorithms for anomaly detection, pattern identification, accurate inference, and optimal conflict resolution are technically feasible and demonstrably able to take on a wide variety of tasks currently accomplished by humans. However, the opaqueness and inexplicability of most intelligent algorithms restrict the usability of such technology. Consequently, AI-based ATM decision-support systems (DSS) are foreseen to integrate eXplainable AI (XAI) in order to increase interpretability and transparency of the system reasoning and, consequently, build the human operators’ trust in these systems. This research presents a viable solution to implement XAI in ATM DSS, providing explanations that can be appraised and analysed by the human air-traffic control operator (ATCO). The maturity of XAI approaches and their application in ATM operational risk prediction is investigated in this paper, which can support both existing ATM advisory services in uncontrolled airspace (Classes E and F) and also drive the inflation of avoidance volumes in emerging performance-driven autonomy concepts. In particular, aviation occurrences and meteorological databases are exploited to train a machine learning (ML)-based risk-prediction tool capable of real-time situation analysis and operational risk monitoring. The proposed approach is based on the XGBoost library, which is a gradient-boost decision tree algorithm for which post-hoc explanations are produced by SHapley Additive exPlanations (SHAP) and Local Interpretable Model-Agnostic Explanations (LIME). Results are presented and discussed, and considerations are made on the most promising strategies for evolving the human–machine interactions (HMI) to strengthen the mutual trust between ATCO and systems. The presented approach is not limited only to conventional applications but also suitable for UAS-traffic management (UTM) and other emerging applications.

Explainable artificial intelligence: an analytical review

Article

Full-text available

Jul 2021

This paper provides a brief analytical review of the current state-of-the-art in relation to the explainability of artificial intelligence in the context of recent advances in machine learning and deep learning. The paper starts with a brief historical introduction and a taxonomy, and formulates the main challenges in terms of explainability building on the recently formulated National Institute of Standards four principles of explainability. Recently published methods related to the topic are then critically reviewed and analyzed. Finally, future directions for research are suggested. This article is categorized under: • Technologies > Artificial Intelligence • Fundamental Concepts of Data and Knowledge > Explainable AI Abstract Accuracy versus interpretability for different machine learning models.

Understanding, explaining, and utilizing medical artificial intelligence

Article

Full-text available

Dec 2021
Nat. Hum. Behav.

Medical artificial intelligence is cost-effective and scalable and often outperforms human providers, yet people are reluctant to use it. We show that resistance to the utilization of medical artificial intelligence is driven by both the subjective difficulty of understanding algorithms (the perception that they are a ‘black box’) and by an illusory subjective understanding of human medical decision-making. In five pre-registered experiments (1–3B: N = 2,699), we find that people exhibit an illusory understanding of human medical decision-making (study 1). This leads people to believe they better understand decisions made by human than algorithmic healthcare providers (studies 2A,B), which makes them more reluctant to utilize algorithmic than human providers (studies 3A,B). Fortunately, brief interventions that increase subjective understanding of algorithmic decision processes increase willingness to utilize algorithmic healthcare providers (studies 3A,B). A sixth study on Google Ads for an algorithmic skin cancer detection app finds that the effectiveness of such interventions generalizes to field settings (study 4: N = 14,013).

Explaining Deep Learning-Based Driver Models

Article

Full-text available

Apr 2021

Different systems based on Artificial Intelligence (AI) techniques are currently used in relevant areas such as healthcare, cybersecurity, natural language processing, and self-driving cars. However, many of these systems are developed with “black box” AI, which makes it difficult to explain how they work. For this reason, explainability and interpretability are key factors that need to be taken into consideration in the development of AI systems in critical areas. In addition, different contexts produce different explainability needs which must be met. Against this background, Explainable Artificial Intelligence (XAI) appears to be able to address and solve this situation. In the field of automated driving, XAI is particularly needed because the level of automation is constantly increasing according to the development of AI techniques. For this reason, the field of XAI in the context of automated driving is of particular interest. In this paper, we propose the use of an explainable intelligence technique in the understanding of some of the tasks involved in the development of advanced driver-assistance systems (ADAS). Since ADAS assist drivers in driving functions, it is essential to know the reason for the decisions taken. In addition, trusted AI is the cornerstone of the confidence needed in this research area. Thus, due to the complexity and the different variables that are part of the decision-making process, this paper focuses on two specific tasks in this area: the detection of emotions and the distractions of drivers. The results obtained are promising and show the capacity of the explainable artificial techniques in the different tasks of the proposed environments.

Modeling the Process by Which People Try to Explain Complex Things to Others

Article

Sep 2021

The process of explaining something to another person is more than offering a statement. Explaining means taking the perspective and knowledge of the Learner into account and determining whether the Learner is satisfied. While the nature of explanation—conceived of as a set of statements—has been explored philosophically and empirically, the process of explaining, as an activity, has received less attention. We conducted an archival study, looking at 73 cases of explaining. We were particularly interested in cases in which the explanations focused on the workings of complex systems or technologies. The results generated two models: local explaining to address why a device (such an intelligent system) acted in a surprising way, and global explaining about how a device works. The examination of the processes of explaining as it occurs in natural settings revealed a number of mistaken beliefs about how explaining happens, and what constitutes an explanation that encourages learning.

The impact of AI on intelligence analysis: tackling issues of collaboration, algorithmic transparency, accountability, and management

Article

Jul 2021

Collaborative Human-AI Sensemaking for Intelligence Analysis

Chapter

Jul 2021

AI/ML is often considered the means by which intelligence analysts will overcome challenges of data overload under time pressure; however, AI/ML tools are often data- or algorithm-centric and opaque, and do not support the complexities of analyst sensemaking. An exploratory sensitivity analysis was conducted with a simple Authorship Attribution (AA) task to identify the degree to which an analyst can apply their sensemaking outputs as inputs to affect the performance of AI/ML tools, which can then provide higher quality information for continued sensemaking. These results show that analysts may support the performance of AI/ML primarily by refinement of potential outcomes, refinement of data and features, and refinement of algorithms themselves. A notional model of collaborative sensemaking with AI/ML was developed to show how AI/ML can support analyst sensemaking by processing large amounts of data to assist with different inference-making strategies to build and refine frames of information. Designing tools to fit this framework will increase the performance of the AI/ML, the user’s understanding of the technology and outputs, and the efficiency of the sensemaking process.

What Does It Mean to Explain? A User-Centered Study on AI Explainability

Chapter

Jul 2021

One frequent concern associated with the development of AI models is their perceived lack of transparency. Consequently, the AI academic community has been active in exploring mathematical approaches that can increase the explainability of models. However, ensuring explainability thoroughly in the real world remains an open question. Indeed, besides data scientists, a variety of users is involved in the model lifecycle with varying motivations and backgrounds. In this paper, we sought to better characterize these explanations needs. Specifically, we conducted a user research study within a large institution that routinely develops and deploys AI model. Our analysis led to the identification of five explanation focuses and three standard user profiles that together enable to better describe what explainability means in real life. We also propose a mapping between explanation focuses and a set of existing explainability approaches as a way to link the user view and AI-born techniques.

Whoops! Something Went Wrong: Errors, Trust, and Trust Repair Strategies in Human Agent Teaming

Chapter

Jul 2021

Human interactions with computerized systems are shifting from using computers as tools, into collaborating with them as teammates via autonomous capabilities. Modern technological advances will inevitably lead to the integration of autonomous systems and will consequently increase the need for effective human agent teaming (HAT). One of the most paramount ideals human operators must discern is their perception of autonomous agents as equal team members. In order to instill this trust within human operators, it is necessary for HAT missions to apply the proper trust repair strategies after a team member commits a trust violation. Identifying the correct trust repair strategy is critical to advancing HAT and preventing degrading team performance or potential misuse. Based on the current literature, this paper addresses key components necessary for effective trust repair and the numerous variables that can further improve upcoming HAT operations. The impacting factors of HAT trust, trust repair strategies, and needed areas of future research are presented.

Integrating New Technology into the Complex System of Air Combat Training

Chapter

Nov 2020

Air combat training can be dangerous. Yet, over time, procedures, training rules, practices, standards, and even the air combat community's culture have evolved to form a comprehensive, sociotechnical training system that minimizes risk of incident and optimizes training value. In response to recent advances in air combat capability and resultant demands on the training infrastructure, the Navy proposed a substantial change to live air combat training: the addition of virtual and constructive (i.e. non‐live) aircraft. In this chapter, we describe work contributing to the evaluation of the proposed change. We used ethnographic and risk evaluation methods to identify and assess risks to training system effectiveness and aircrew safety. We identified 26 proposed new hazards representing seven categories. Five hazard categories were assessed as existing and mitigated within the current baseline training system (without non‐live aircraft). One was assessed as similar to existing hazards, but with a higher likelihood and therefore greater hazard exposure. Two hazard categories were found to represent new types of hazard that require mitigation in the future training system. We conclude that with a discussion of ways, new technology can disrupt a complex system and the importance of evaluating and mitigating disruptive effects prior to and during the system update or change process.

A Naturalistic Investigation of Trust, AI, and Intelligence Work

Abstract and Figures

Recommended publications

Adaptations to Trust Incidents with Artificial Intelligence

Self-Repairing and/or Buoyant Trust in Artificial Intelligence

Collaborative Human-AI Sensemaking for Intelligence Analysis

Supradyadic Trust in Artificial Intelligence