The Hippocratic oath and the Belmont report articulate foundational principles for how physicians interact with patients and research subjects. The increasing use of big data and artificial intelligence techniques demands a re-examination of these principles in light of the potential issues surrounding privacy, confidentiality, data ownership, informed consent, epistemology, and inequities. Patients have strong opinions about these issues. Radiologists have a fiduciary responsibility to protect the interest of their patients. As such, the community of radiology leaders, ethicists, and informaticists must have a conversation about the appropriate way to deal with these issues and help lead the way in developing capabilities in the most just, ethical manner possible.
Protecting Your PatientsInterests in the
Era of Big Data, Articial Intelligence, and
Predictive Analytics
Patricia Balthazar, MD
, Peter Harri, MD
, Adam Prater, MD, MPH
, Nabile M. Safdar, MD, MPH
The Hippocratic oath and the Belmont report articulate foundational principles for how physicians interact with patients and research
subjects. The increasing use of big data and articial intelligence techniques demands a re-examination of these principles in light of the
potential issues surrounding privacy, condentiality, data ownership, informed consent, epistemology, and inequities. Patients have
strong opinions about these issues. Radiologists have a duciary responsibility to protect the interest of their patients. As such, the
community of radiology leaders, ethicists, and informaticists must have a conversation about the appropriate way to deal with these
issues and help lead the way in developing capabilities in the most just, ethical manner possible.
Key Words: Articial intelligence, machine learning, ethics, big data, informatics
J Am Coll Radiol 2018;15:580-586. Copyright 2017 American College of Radiology
When Google DeepMind needed to test an app to pro-
vide alerts for patients at risk for worsening renal disease,
it gathered the records of 1.6 million patients from the
Royal Free Hospital. The Information Commissioners
Ofce, an independent authority set up to uphold in-
formation rights in the public interest, promoting open-
ness by public bodies and data privacy for individualsin
the United Kingdom, disapproved, nding that the
arrangement between the two entities broke the law and
failed to uphold the data privacy rights of individuals
[1,2]. Although disclosures of patient information for
direct patient care are widely accepted, otherwise
identical disclosures for research and development
require informed consent. The distinction between
patient care and research is widely recognized, yet its
proper application in the setting of new techniques can
elude even the most capable organizations.
Imaging is a robust source of phenotypic information
suitable for the application of big data, articial intelli-
gence, and personalized medicine methods. Industry has
taken notice of this relatively unexplored frontier and
spent considerable resources surveying options to harness
the power of imaging data [3,4], eagerly seeking partners
in health care. Although some have forged ahead, others
have reconsidered their initial forays into this space with
industry partners [5,6]. Because the conversation often
begins with imaging and the radiology department, it
behooves any health care provider, department, or
system to consider important questions regarding their
big data and articial intelligence efforts, whether
internally or in partnership with external partners.
We have long subscribed to ethical and regulatory
frameworks to guide our use of patient and research
subject data. In many cases, we seem unsure of how to
apply these conventions in the era of big data and arti-
cial intelligence, with their seemingly insatiable appetite
for more information. These methods have real conse-
quences, with the potential to affect the lives of in-
dividuals and populations in ways that could benet some
while harming others. Here, we touch on the major
principles of existing applicable frameworks in this
setting, explore known issues when dealing with big data
and machine learning in health care, explore perspectives
from key stakeholders, and pose questions for discussion
for imaging health care professionals to consider as they
embark on their own big data and articial intelligence
The expressions big data, articial intelligence, personal-
ized medicine, population health, and predictive analytics
represent a family of concepts that are related but not
synonymous. For those being rst introduced to the eld,
a brief delineation of these terms follows.
Big Data
Still a vaguely dened term [7],big dataconsists of at
least three increasingly accepted characteristics of data,
the 3Vs: volume, variety, and velocity [8]. These are
especially suitable for radiology data, which include
large volumes of images and reports, in a variety of
imaging modalities, body parts, and formats
(unstructured text and structured DICOM), that are
rapidly generated and potentially analyzed in real time
or near real time.
Articial Intelligence
Articial intelligence is a branch of computer science that
encompasses the automation of intelligent behavior [9].
Machine learning, a subeld of articial intelligence, is
composed of data-driven techniques, such as deep
learning, used to uncover patterns and predict behavior
accomplished with minimum human intervention [10].
The machine learnsby analyzing training data and
then making predictions on a new data set [10]. This
technology holds promise in radiology for preliminary
lesion detection and differential diagnosis generation,
potentially augmenting the sensitivity and accuracy of
radiologists. Natural language processing, also a subeld
of articial intelligence, focuses on understanding the
full meaning of written or spoken text by integrating
concepts and methods pulled from various domains [11].
Precision or Personalized Medicine
Precision or personalized medicine involves prevention
and treatment strategies that take individual variability
into account [12], for example, scheduling earlier
mammographic and MRI breast cancer screening for
patients with BRCA gene mutations. Precision and
personalized medicine is increasingly dependent on big
data and articial intelligence techniques.
Population Health
Population health and public health are often inter-
changeably used terms and refer to the health of a group
of individuals, rather than the individuals themselves,
organized into many different units of analysis, depend-
ing on the research or policy purpose [13].
Predictive Analytics
Predictive analytics is a broad term used to describe a
variety of statistical techniques, such as modeling, ma-
chine learning, and data mining, that analyze current and
historical data to predict future events or behaviors [14].
When articial intelligence algorithms have access to big
data, they may facilitate the advancement of predictive
analytics in population health or personalized medicine.
I will respect the privacy of my patients, for their
problems are not disclosed to me that the world may
Hippocratic oath [15]
Numerous legal precedents, ethical frameworks, and
historical milestones have contributed to our current
understanding of how to appropriately interact with hu-
man subjects, patients, and clients. From these, two of
the most well known in health care are the Hippocratic
oath and the Belmont report, which articulate founda-
tional principles for how physicians ethically treat patients
and deal with research participants.
The Hippocratic oath is one of the earliest known
calls to respect the privacy of patients and respect the
condentiality of the information with which they
entrust their physicians. In this regard, the Hippocratic
oaths call to respect privacy and condentiality predates
the US constitution, the European Union Charter of
Fundamental Rights, and HIPAA, all of which allude
to privacy of individuals, if not the condentiality of
their data.
The Belmont report [16] deals with the protection of
human subjects in research. The principles of respect for
persons, benecence, and justicemanifest in our current
practices of informed consent, assessment of risks and
benets, and just selection of subjects.
These cornerstones of our shared ethical principles
have been designed to apply to a wide variety of settings,
present and future, and should not be deemed irrelevant,
even in the era of big data and articial intelligence. The
imaging community should revisit the meaning and
practical implications of these principles in light of new
questions. If needed, additional standards can be derived
from these principles to address specic issues that may
arise in the era of big data.
Other reports and codes, such as those of the Inter-
national Medical Informatics Association, the Association
of Computing Machinery, the American Health Infor-
mation Management Association, the Data Science As-
sociation, and the Institute of Electrical and Electronics
Engineers, also evaluate issues of condentiality, data
ownership, epistemology, informed consent, and justice
to varying degrees [17-19]. The imaging community and
individual health care entities should survey these and
other codes to identify the most relevant components
in the process of developing our own code of ethics and
conduct with respect to imaging-related health care data
and their potential use in big data and articial
As the use of big data, articial intelligence, and related
techniques in health care expand, conversations about
the appropriate, ethical use of these methods become
increasingly relevant. Here, we explore common,
fundamental issues relevant to those considering devel-
oping or using such techniques for imaging, including
privacy, ownership, informed consent, epistemology,
and potential inequities between various data constitu-
ents [20].
Privacy and Condentiality
nHow do we keep data-driven insights about sensitive
health issues condential?
nHow do institutions prevent the reidentication of
individuals from joining of data sets?
nWhat is your obligation to notify a patient or subject of
a health risk or propensity identied using big data or
machine learning techniques?
Privacy and condentiality are closely related issues.
Data privacy refers to the rights of individuals to maintain
control over their own health information. Condenti-
ality refers to the responsibility of those entrusted with
those data to maintain privacy [21].
Privacy and condentiality issues affect the scope,
proper storage of, access to, and dissemination of data,
especially data that are highly sensitive or personal. As the
breadth of the collected data and their analyses continue
to increase rapidly [20,22], these data are being housed
electronically in perpetuity [23,24], which increases the
risk for privacy violations. Additionally, anonymization
of the data does not ensure against individualsbeing
identied subsequently through the joining of data sets
and reidentication [25], manipulation of the data
causing discrimination [26,27], or other improper uses
[28]. Aggregated data about specic groups might also
be created, which can cause stigmatization [29].
Ownership of Data and Subsequently Developed
nCan patient data be reused for developing and vali-
dating advanced analytic methods? Can they be shared
or sold for this purpose?
nIf an app is developed and validated using patient data,
should the app be sold for prot?
In this context, ownershipdeals with who controls
(possesses or allows access to the data) and gains from
intellectual property that is subsequently developed [20].
Medical information, the key ingredient for any health-
related big data or articial intelligence exercise, is not
owned in the same sense that a physical object, or even
intellectual property, is in other settings. Although in
some cases property law may apply, in other cases an
intellectual property framework may be more appro-
priate. Patients, health care providers, and hospital sys-
tems are all stakeholders that may have intersecting rights
and responsibilities when it comes to individual medical
records [30,31]. There is no single legal construction that
regulates ownership of health care data but rather a
combination of federal, state, and international laws
and rules through which health information ownership
is governed [32].
Informed Consent
nDoes your institution have mechanisms for blanket or
tiered consent for the development, validation, and use
of big data analytics or articial intelligence?
nWhat mechanisms are in place to exclude the data of
individuals who do opt out?
Informed consent typically implies general consent to
treat, consent for a specic procedure, or participation in
a research study; however, big data often requires pooling
and analyzing data in the future for purposes not antic-
ipated at the time of consent [25]. Anticipating possible
big data or articial intelligence approaches may require
seeking consent at multiple levels and creating
mechanisms to deal with those patients who do not
wish to be included in such exercises. Blanket consents
preauthorize a wide range of secondary analytics [33],
addressing the impracticability of obtaining consent for
multiple future analyses but may also reduce autonomy
[34-38]. Tiered consent enables the exclusion of specic
uses of the data [39] but usually cannot anticipate all
possible purposes [38]. Your system, hospital, or
practice should review and accordingly revise its initial
intake and consent regime to consider possible big data
and articial intelligence uses of patient data. These
discussions should include mechanisms for patients to
opt out of such uses.
The Black Box
nHow do we know that the results of articial intelli-
gence algorithms are valid?
nWere the data sets with which they were developed
nHow would your institution defend the results of an
algorithm directly affecting a patients health care, if no
provider could completely comprehend how the
algorithm reached its conclusion?
Epistemology is the study or a theory of the nature
and grounds of knowledge especially with reference to its
limits and validity[40]. In many cases, the analysis of
big data may go beyond direct human intelligence,
calling into question how we can ascertain the validity
of the results [20]. With some techniques, the results of
any analysis could represent statistically signicant noise
found by an algorithm without any clinical signicance
[24,41]. Human input is needed to drive analysis
scientically and provide context to the results [42,43].
nWill predictive analytics lead to inadvertent harm of
an individual or group nancially or otherwise?
nCould a group be stigmatized by the results of an
articial intelligence classication?
Inequities in the era of articial intelligence may result
in a power gradient between the subjects providing data
and organizations with the ability to use the same data
[23,44], concentrating the ability to benet from the data
[24,45-48]. In this circumstance, patients may be
divorced from the analysis their own data supports [28]
and the ability to buy and sell that very data [48].
Additionally, aggregated data may be used to make
decisions about larger groups of people or create
groupings that previously did not exist, potentially
leading to discrimination, proling, or surveillance
[23,41,49]. If the data are collected, they may favor
groups that are more willing to provide the data [50].
Even in diverse data sets, the data could still be used to
preferentially benet one group over another [26,27,51].
Health care disparities have often been overlooked by
the emerging technology centered on big data [52].
Certainly, there is a risk that we perpetuate groups of
haves and have notson the basis of access to these
technologies. Active engagement with small population
data sets, addressing social determinants of health, and
actively promoting data access to underprivileged
populations are potential avenues for mitigating the
effects of such a digital divide.
We believe that researchers and institutions entering
the marketplace for articial intelligence discovery and
tool development should involve the local ethics review
committee, institutional review board, or other body and
craft a policy regarding these issues. The issues discussed
above, including privacy, condentiality measures,
ownership concerns, informed consent, epistemology,
and inequities, can serve as a checklist with which the
completeness of the developed policies can be evaluated.
Many patients may not have considered the differences
between privacy or condentiality, the epistemology of
predictions from machine learning methods, or the jus-
tice implications of population health analytics. However,
when asked, patients have strong opinions about the
appropriate use of their health information.
Studies have shown that in general, many patients
have positive attitudes toward precision medicine [53,54].
Although not directly related to imaging per se, Halverson
et al [53] interviewed patients who had undergone genomic
sequencing in oncology and rare-disease settings and
found that participants had predominantly positive
attitudes toward sequencing. Some of the positive feelings
were empowerment over their own health, altruistic
contribution to the progress of medicine, the legitimization
of their suffering, and a sense of closure in having done
everything they could [53]. However, in the era of
predictive analytics, decisions can be driven by nancial
and administrative incentives [55] not necessarily aligned
with the needs of patients. Patients have signicant
concerns about sharing their anonymized personal health
records when they might be divulged or sold to other
organizations to be used for prot[56]. Questions of
data security, privacy, and condentiality are also
common among patients, especially when it comes to
sensitive information (eg, drug abuse, mental health) that
could have an impact on their personal employment or
health insurance coverage [56].
When it comes to potential inequities, patients
opinions may vary with their backgrounds and sense of
vulnerability. One study assessing attitudes of patients
with breast cancer toward molecular testing for personal-
ized therapy and research found that nonwhites were
less willing to undergo testing even if the results would
guide their own therapy [57]. This is problematic on many
levels, including the fact that diverse representation is
needed for validity and reliability of articial intelligence
and personalized medicine techniques.
There is concern that with the use of predictive
analytics, a racial or religious population associated with an
illness, poverty, or another factor could become further
marginalized or stigmatized, either by being underrepre-
sented during the development of these methods, through
spurious associations, or through analyses focusing on an
outcome other than health. In a worst-case scenario, a data
set or articial intelligence algorithm can inheritthe
systematic biases from which they originate or the implicit
biases of those that curated them [26,58]. Indeed, the
potential for this was realized in the popular imagination
when Microsoft created an articial intelligence Twitter
chatbot that started posting racist and genocidal tweets
within 24 hours of learningfrom human-generated text
[59]. This issue is further complicated by the ability of
machines to identify minorities in ways that transcend
human perception, as exemplied by a recent project
demonstrating that a machine-learning algorithm using
Facebook proles was able to predict sexual orientation in
men with 91% accuracy [60]. When applying machine
learning algorithms to health care, we must be aware that
they were derived from existing data, usually entered by
humans, and therefore, might propagate human bias.
Given that in an era of increased information
accumulation and larger data breaches [61,62], and that
patient data can be used for nefarious purposes, we
advocate that the radiology community should craft a
patient bill of rightsthat addresses these issues
specically in collaboration with other physicians,
ethicists, and patients. If radiology is to maintain its
position as a leading health care discipline in big data
and articial intelligence, we must also lead in the
ethical use of these methods. Among other items, such
a bill of rights should consider the following:
nPatientsimaging data are valuable and deserve the
highest level of security reasonably available.
nPatients are entitled to know what their imaging data
can and cannot be used for, including secondary uses.
nPatientsimaging data will not be used to harm them
or a group to which they belong.
Radiologists must be deeply involved in the development,
validation, and implementation of big data analytics,
articial intelligence, and personalized medicine [63] in
imaging. Clinical expertise is essential to asking the
right questions, accurately interpreting results, and
communicating with patients for optimal decision
making. Furthermore, physicians have a duciary
responsibility for the well-being of their patients, as
afrmed in the Hippocratic oath, rendering them pro-
fessionally responsible for securing the interest of their
patients. Although corporations and health care systems
may have legal obligations and policies, physicians have
an individual, moral obligation to protect their patients
privacy and data.
As the appetite for data to develop articial intelli-
gence, precision medicine, and predictive analytics in
imaging grows, more radiologists will have to consider
how they wish to engage with their patients, their pa-
tientsdata, and the third parties looking for clinical
partners. The community of imaging scientists, ethicists,
radiology leaders, and informaticists must have their own
conversation about the appropriate way to deal with these
issues and help lead the way in developing capabilities in
the most just, ethical manner possible.
-Data privacy refers to the rights of individuals to
maintain control over their health data and infor-
mation. Condentiality refers to the responsibility
of those entrusted with those data to maintain
-Anticipating possible big data or articial intelli-
gence approaches may require seeking consent at
multiple levels and creating mechanisms to deal
with those patients who do not wish to be included
in such exercises.
-In many cases, the analysis of big data using com-
plex computer algorithms may go beyond direct
human intelligence.
-Physicians must be deeply involved in the devel-
opment, validation, and implementation of big data
analytics, articial intelligence, and personalized
medicine in healthcare.
