ArticlePDF Available

Implementing Partnership-driven Clinical Federated Electronic Health Record Data Sharing Networks

Authors:

Abstract and Figures

Objective: Building federated data sharing architectures requires supporting a range of data owners, effective and validated semantic alignment between data resources, and consistent focus on end-users. Establishing these resources requires development methodologies that support internal validation of data extraction and translation processes, sustaining meaningful partnerships, and delivering clear and measurable system utility. We describe findings from two federated data sharing case examples that detail critical factors, shared outcomes, and production environment results. Methods: Two federated data sharing pilot architectures developed to support network-based research associated with the University of Washington's Institute of Translational Health Sciences provided the basis for the findings. A spiral model for implementation and evaluation was used to structure iterations of development and support knowledge share between the two network development teams, which cross collaborated to support and manage common stages. Results: We found that using a spiral model of software development and multiple cycles of iteration was effective in achieving early network design goals. Both networks required time and resource intensive efforts to establish a trusted environment to create the data sharing architectures. Both networks were challenged by the need for adaptive use cases to define and test utility. Conclusion: An iterative cyclical model of development provided a process for developing trust with data partners and refining the design, and supported measureable success in the development of new federated data sharing architectures.
Content may be subject to copyright.
Implementing Partnership-driven Clinical Federated Electronic
Health Record Data Sharing Networks
Kari A. Stephens, PhD1,2,3, Nicholas Anderson, PhD4, Ching-Ping Lin, PhD5, Hossein Estiri,
PhD3
1Department of Psychiatry & Behavioral Sciences, University of Washington, Box 356560,
Seattle, WA, 98195
2Department of Biomedical Informatics & Medical Education, University of Washington, Box
358051, Seattle, WA, 98109
3Institute of Translational Health Sciences, University of Washington, Box 358051, Seattle, WA,
98109
4Department of Pathology and Laboratory Medicine, University of California Davis, Davis, CA,
95616
5Global REACH, Medical School, University of Michigan, 5113 Medical Science Building I, 1301
Catherine St, Ann Arbor, MI, 48109-5611
Abstract
Objective—Building federated data sharing architectures requires supporting a range of data
owners, effective and validated semantic alignment between data resources, and consistent focus
on end-users. Establishing these resources requires development methodologies that support
internal validation of data extraction and translation processes, sustaining meaningful partnerships,
and delivering clear and measurable system utility. We describe findings from two federated data
sharing case examples that detail critical factors, shared outcomes, and production environment
results.
Corresponding Author: Kari A. Stephens, Address: Psychiatry & Behavioral Sciences, University of Washington, Box 356560,
Seattle, WA 98195, kstephen@uw.eduPhone: (206) 221-0349, Fax: (206)543-9520.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our
customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of
the resulting proof before it is published in its final citable form. Please note that during the production process errors may be
discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
Conflict of Interest
We wish to confirm that there are no known conflicts of interest associated with this publication and there has been no significant
financial support for this work that could have influenced its outcome.
We confirm that the manuscript has been read and approved by all named authors and that there are no other persons who satisfied the
criteria for authorship but are not listed. We further confirm that the order of authors listed in the manuscript has been approved by all
of us.
We confirm that we have given due consideration to the protection of intellectual property associated with this work and that there are
no impediments to publication, including the timing of publication, with respect to intellectual property. In so doing we confirm that
we have followed the regulations of our institutions concerning intellectual property.
We understand that the Corresponding Author is the sole contact for the Editorial process (including Editorial Manager and direct
communications with the office). He/she is responsible for communicating with the other authors about progress, submissions of
revisions and final approval of proofs. We confirm that we have provided a current, correct email address which is accessible by the
Corresponding Author and which has been configured to accept email from kstephen@uw.edu.
HHS Public Access
Author manuscript
Int J Med Inform
. Author manuscript; available in PMC 2019 October 13.
Published in final edited form as:
Int J Med Inform
. 2016 September ; 93: 26–33. doi:10.1016/j.ijmedinf.2016.05.008.
Author Manuscript Author Manuscript Author Manuscript Author Manuscript
Methods—Two federated data sharing pilot architectures developed to support network-based
research associated with the University of Washington’s Institute of Translational Health Sciences
provided the basis for the findings. A spiral model for implementation and evaluation was used to
structure iterations of development and support knowledge share between the two network
development teams, which cross collaborated to support and manage common stages.
Results—We found that using a spiral model of software development and multiple cycles of
iteration was effective in achieving early network design goals. Both networks required time and
resource intensive efforts to establish a trusted environment to create the data sharing
architectures. Both networks were challenged by the need for adaptive use cases to define and test
utility.
Conclusion—An iterative cyclical model of development provided a process for developing trust
with data partners and refining the design, and supported measureable success in the development
of new federated data sharing architectures.
Keywords
Information systems; Data sharing; Federated Networks; Implementation; Electronic Health
Records
1 INTRODUCTION
The broad adoption of electronic health record systems (EHRs) and efforts to align data
across disparate EHRs have led to advancements in research to improve public health. But
barriers to establish effective data sharing systems range across technical, motivational,
economic, legal, political, and ethical issues.[1] Data sharing has an integral role in reducing
the lag between research and clinical knowledge, products, and procedures that can improve
human health.[2] Bi-directional data sharing between clinical care and research
environments is crucial to advance improvements in patient care and overall population
health and essential to a Learning Healthcare System.[3] But creating data sharing systems
is complex and difficult.
Technical and methodological frameworks and guidelines for providing and integrating data
sharing infrastructures across multiple distinct and disparate clinical environments can
advance the ability for translational and comparative effectiveness research, and lead to
meaningful use and sharing of medical data.[4] However, there are no systematic efforts to
develop processes for creating data sharing architectures in public health environments.[1]
Published accounts addressing builds of data sharing infrastructures lack any systematic
application of well-established software development models. At present, implementation of
data sharing systems are often supported by grant funding and require the development of
broad engagement strategies between disparate environments. Sustainability of these
systems often becomes a challenge after initial investments support creation.[5] Software
model applications to architecture builds may lead to better sustainability.
Stephens et al. Page 2
Int J Med Inform
. Author manuscript; available in PMC 2019 October 13.
Author Manuscript Author Manuscript Author Manuscript Author Manuscript
1.1 BACKGROUND AND SIGNIFICANCE
Previous efforts in developing methods and tools to support clinical data sharing for research
lack access to high quality data sources.[6–8] Centralized approaches to data sharing are
limited by the scope of the data that network partners typically authorize for sharing and the
difficulty with keeping these data up to date.[4] Historically, limitations have also included
uneven common terminology expertise, challenges of trust and feasibility, and concerns for
privacy and security.[4,9–17] Storing data locally at partner sites and using federated
approaches to support data sharing is attractive because they simplify privacy and security
issues and clarify trust relationships.[18] However, no standard use of terminologies and
other semantic alignment issues remain a challenge, regardless of a centralized versus
federated model.[19] To date, large scale federated data sharing networks remain relatively
scarce, though successes have been increasing in domain-specific networks such as Regional
Health Information Organizations and cohort discovery pilots.[19–22] Growing concerns of
enhanced HIPAA privacy laws may further limit data sharing efforts.[23]
The expanding use of heath information technology, driven through efforts such as the 2009
HITECH act and the meaningful use requirement of health information exchanges, has
created the need for effective data sharing methods across organizations to target evaluation
and implementation of evidence based, patient-centered clinical practices.[24–26]
Methodological approaches to developing federated data sharing networks need to be
testable and generalizable to multiple domains, users, and stakeholders. The NCATS
Clinical Translational Science Award (CTSA) consortium has provided a fertile environment
for building federated data sharing networks across a range of heterogeneous institutional
and community based clinical environments with a focus on translational science.
1.2 OBJECTIVE
We partnered across two network teams to implement and evaluate a software development
model for building federated electronic health record clinical data sharing architectures. We
describe the use of a common spiral model and the experience of developing two distinct
architectures. Implementation of the spiral model centrally incorporated partnership building
across different clinical data environments and addressed the crucial role of partnerships and
disparate electronic medical record platforms and workflows.
2 METHODS
2.1 Network Development Pilot Projects
The common goal of our network pilot projects was to implement architectures for federated
networks that could support research queries through a common set of terminologies and
business processes. The Data QUery, Extraction, Standardization, Translation (Data
QETEST) project focused on data sharing across primary care based electronic health record
(EHR) data domains (i.e., demographics, visits, problem lists, medications, labs, diagnoses,
tests, various medical metrics and findings, etc.) across six primary care organizations in
Washington and Idaho.[27] Data QUEST is aimed to provide tools for sharing both de-
identified and identified data in aggregate form and at the patient level. The Cross-
Institutional Clinical Translational Research (CICTR) project targeted sharing five broad
Stephens et al. Page 3
Int J Med Inform
. Author manuscript; available in PMC 2019 October 13.
Author Manuscript Author Manuscript Author Manuscript Author Manuscript
data domains (i.e., demographics, medications, labs, diagnoses, and disposition data), with a
common domain of diabetes across acute care settings at three academic institutions
(University of Washington, University of California, San Francisco, and University of
California, Davis) with a focus of sharing de-identified aggregated data.[28] Both projects
used HIPAA guidance to define privacy handling of data prior to allowing research querying.
Both projects supported approaches that describe and document the data provenance.
2.2 Procedure
Three primary categories of software models have been identified (free/open source software
(FOSS), plan-driven, and agile) with little progress made at creating comprehensive
reconciliation across these models.[29] However, recommendations for selecting an
appropriate model include achieving a balance between agility and discipline.[30] The
strength of FOSS lies in allowing stakeholders to address and refine a system based on
individual priorities and resources. This model did not provide a feasible approach, given
our partner sites must share resources and technical solutions to remain scalable in a diverse
health data sharing architecture environment. Plan-driven or waterfall models lack iterative
processes for achieving stakeholder engagement across cycles of development that provide
flexibility, buy in, and adaptability. Agile methods are iterative but rely on quick “sprints’
through the phases of development to produce working systems for evaluation, which
require intensive development resources and evaluation resources (clinical and technical)
from our partners that they did not have.
To balance agility and discipline, we chose Boehm’s spiral model, used across many
commercial and defense projects, which included a focus on using a cyclic approach to grow
a system’s degree of definition and implementation while laying out anchor point milestones
to ensure stakeholder commitment to the defined solutions.[31–32] The spiral model was
used to provide clear process to guide our architecture development and included cycles for
iteration, incremental development, and the right level of risk management and cultural
compatibility for our environment. We analyzed project activities, milestones, stakeholder
priorities, and project documents using themes from Boehm’s spiral model of development,
which included four main phases in the software development lifecycle, to define additional
emerging themes. Each team, in partnership with project stakeholders, then reviewed and
iterated on the emerging themes and charted the history of the project across the theme areas
to develop initial project specific content for a draft spiral model. The resulting model was
adopted within the Data QUEST and CICTR project teams, guiding biomedical informatics
work within the projects. The model provided a frame to report and assess both individual
project and cross project successes and challenges.
3 RESULTS
3.1 The Partnership-driven Clinical Federated (PCF) Model
3.1.1 PCF Model Description—A generic spiral model for partnership-driven clinical
federated (PCF) data sharing, based on Boehm’s spiral model for software development,[31–
32] emerged from our iterative and qualitative based methods (Figure 1). This model
identified four themes to anchor the iterative process of development: 1) developing
Stephens et al. Page 4
Int J Med Inform
. Author manuscript; available in PMC 2019 October 13.
Author Manuscript Author Manuscript Author Manuscript Author Manuscript
partnerships, 2) defining system requirements, 3) determining technical architecture, and 4)
conducting effective promotion and evaluation.
The starting point was anchored in developing functional partnerships (Partnerships) that
define boundaries and drive system definition. As the data sharing system was defined
(System Requirements), implementation could occur (Technical Architecture), and finally
impact was assessed (Promotion and Evaluation) to ensure the utility and impact of the data
sharing network.
3.1.2 Cross-Project Model Validity—As each theme progressed through a single
iteration of the model, they informed the subsequent iterations and matured. Maturity of the
model within each network occurred through iterations as well as across themes.
Partnerships began internally with core project teams and progressed to recruitment and
expansion to additional community partners, with final governance being addressed as
maturity was reached. System requirements were initially drafted and subsequently refined
as partnerships and technical architectures were developed. Use cases were initiated with the
development of initial partnerships and evolved over time into meaningful use cases that
addressed overall utility of the architecture. In tandem, pilot users were identified and
training, support, release, and marketing efforts were developed and implemented to ensure
system utility and sustainability. In general, the model provided a useful framework for
teams across projects to collaborate, report progress to each other, and share and iterate
lessons learned.
3.2 Data QUery, Extraction, Standardization, Translation (Data QUEST) Project
3.2.1 Individual PCF Model Cycle Iterations—The Data QUEST project conformed
to the PCF model cycles (see Figure 2 for a project specific model), iterating through four
cycles. Anchor point milestones were developed and adjusted as needed, based on technical
discoveries and partner requirements. The first cycle began with the team initiating
partnerships within our CTSA partners, then moved into drafts of technical requirements,
system testing of initially predefined technical architecture solutions, which subsequently
failed, and initial definitions of use cases. We formed internal partnerships between all
CTSA partners and convened regular meetings to discuss and evaluate adoption of the
HMORN Virtual Data Warehouse (VDW) model,[33] while the community engagement and
biomedical informatics subgroups began developing feasibility study methods to determine
selection of community based partners for the pilot. Development of feasibility methods
resulted in the rejection of the VDW as a technical solution, due to the technical requirement
for programming expertise among the community practice partners needed to work with the
SAS based architecture. Our community practice partners reported that they did not have
resources to support this level of on-site programming.
In the second cycle, we identified and approached initial community partnerships using the
feasibility study methods (i.e., a semi-structured interview involving research readiness and
technical capacity assessment), allowing for community partner input on technical
requirements. A search was conducted to identify a replacement solution for the VDW,
which led to e-PCRN.[34–35] The biomedical informatics team downloaded the e-PCRN
Stephens et al. Page 5
Int J Med Inform
. Author manuscript; available in PMC 2019 October 13.
Author Manuscript Author Manuscript Author Manuscript Author Manuscript
software for pilot test and the test was unsuccessful, with the software deemed non-
functional. Furthermore, the governance requirements dictated from the feasibility studies
required on-site validation of all queries, which e-PCRN did not support and co-
development of the software to add this feature was untenable. Use cases across the CTSA
team and the community partners were iterated, cataloging community priorities for research
and achieving buy in from partners for the utility and need for use case definition. This also
led to redefining technical requirements to scale back functionality by excluding a tool
requirement to conduct self service queries against the federated data system.
In the third cycle, we solidified community partnerships and finalized selection of pilot
partners for installation of the technical architecture and updated partners with the changes
to the technical requirements to maintain continued buy in. National partnerships, most
notably with the DARTNet Institute, were engaged and promoted selection of a for-profit
vendor solution for extraction/translation/loading (ETL) tasks. Several vendors were
evaluated and a final vendor was selected, based on their extensive expertise with multiple
primary care based EHR vendor systems, ability to offer clinical decision support tools,
success with working in Practice Based Research Network settings with small rural based
primary care practices, and their existing relationship and good performance history with the
DARTNet Institute.[27] Evaluation groups were formed to refine pilot use cases based on
research priorities from the community partners.
In the fourth cycle, we introduced partners to the vendor and iterated and established
contracts and governance (i.e., Memorandums of Understanding, Data Use Agreements,
Business Associate Agreements, purchasing contracts) to allow initial installation of the
technical architecture. System requirements were adjusted to include vendor and partner
requirements, and dictionary architectures were begun, with dictionary efforts designed to
support marketing and dissemination of data sharing across the new network. After
completion of the fourth cycle, the technical architecture was deemed operational and initial
use case based extractions were performed.
Costs and resources used for the initial pilot build of Data QUEST included: 1) biomedical
informatics personnel (i.e., faculty lead and dedicated research assistant), 2) community
outreach CTSA faculty and staff (i.e., faculty lead and program manager), 3) CTSA
subsidized staff resources (i.e., system architects and analysts for consultation), and 4)
infrastructure contracts paid out to our vendor to conduct contracting and programming to
establish servers and the nightly ETL process. Our partner sites also subsidized: 1) staff time
to participate in working meets and establish permissions with our vendor to establish a
server within their firewalls and 2) infrastructure support to house the server. The Data
QUEST pilot project was designed and implemented within the CTSA 5 year cycle.
In the current state and subsequent CTSA cycle, the Data QUEST architecture is fully
functional with daily refreshes of a federated set of clinical data repositories (CDR) across
pilot sites, housed physically behind firewalls within the partner practices in SQL Server
environments. In addition, data within the CDRs are stored in formats that semantically
align with national partner networks within the DARTNet Institute, allowing for cross
collaborations. The vendor and the DARTNet Institute have physical access to the CDRs and
Stephens et al. Page 6
Int J Med Inform
. Author manuscript; available in PMC 2019 October 13.
Author Manuscript Author Manuscript Author Manuscript Author Manuscript
provide manual data extractions as needed, governed by agreements designed and executed
within the development cycles. Several research projects to date, including local university
and national collaboration based projects have been successfully conducted using the Data
QUEST architecture. All data extractions are conducted using Business Associate
Agreements and governance that provides honest brokerage, including compliance with
Heath Insurance Portability and Accountability Act (HIPAA) regulations, Data Use
Agreements, and Data Transfer Agreements. Data remain owned by the local partners and
data owners approve each extraction as they are requested, with the ability to opt out at any
time. A tool (FindIT – Federated Information Dictionary Tool)[36] to catalogue data depth
and breadth is under development targeting data visualization of the Data QUEST network
to the research community, with an aim towards bridging researchers and community based
practices and increasing use of the EHR data to facilitate translational research among the
Data QUEST network partners. We have developed the architecture to support a centralized
de-identified warehouse adopting the Observation Medical Outcomes Partnership (OMOP)
ontology, harmonizing with other national data sharing architectures and plans to continue
partner expansion of Data QUEST.
3.2.2 Cross-Cycle Observations—The Data QUEST project initiatives and priorities
were accounted for across the four themes. Two technical architecture failures occurred, and
rather than having progress stifled, the iterative nature of the model strengthened
partnerships and supported refinement of feasible system requirements. The PCF model
provided natural feedback loops between themes to coordinate and problem solve issues
across themes. Initial partnership input was critical to the development of use cases and
development of the final technical architecture. The team engaged in micro iterations when
barriers occurred in testing the technical architecture solution, resulting in higher utilization
of time resources needed to engage and mature partnerships.
The team identified several barriers requiring mediation within themes that required further
micro iterations within cycles resulting in time delays, but ultimately did not compromise
project success. Limited partner expertise or knowledge in informatics, for both institutional
and community partners and limited experience in being part of a multi-disciplinary
institutional and community team were barriers to initially developing effective use cases.
Partners struggled with limited resources due to overly burdened clinical environments and
had limited experience with research practices. A primary outcome of the application of the
PCF model was the critical role of early engagement in partnerships to mediate identified
barriers through iteration. Despite real-world challenges to aligning partners and resources
in a collaborative effort, iterations continued across cycles, allowing individual themes to
mature.
3.3 Cross-Institutional Clinical Translational Research (CICTR) Project
3.3.1 Individual PCF Model Cycle Iterations—The CICTR project conformed to the
PCF model cycles (see Figure 3 for a project specific model), iterating through three cycles
with predefined anchor point milestones to move the project towards the first version release
within a 20 month timeline dictated by the pilot funding requirements. The first cycle
focused on defining stakeholder roles at the three partner sites, deploying common
Stephens et al. Page 7
Int J Med Inform
. Author manuscript; available in PMC 2019 October 13.
Author Manuscript Author Manuscript Author Manuscript Author Manuscript
Informatics for Integrating Biology & the Bedside (i2b2)[37] based technical architectures
and gaining appropriate regulatory approval. At the time, each of these sites had existing
familiarity with i2b2, but not in developing environments that could provide appropriate
hosting for large scale extracts from the local EHR systems. This first coordinated effort
resulted in each of the sites establishing the i2b2 environments as common virtualized
environments located within local server resources, and allowed for cross-site network and
server configuration. Each site then implemented their local i2b2 server stack against an
industry grade database system, either IBM DB2, Microsoft SQL Server, or Oracle 11. Each
of these database systems were unique to local sites, and due to licensing restrictions and
costs, requirements were defined that maintained these systems as local resources.
The second cycle built upon on the common deployed i2b2 architecture and partners iterated
the access, common definition, and testing development for the four target data sources of
demographics, diagnoses, medications and laboratory tests. In parallel, the SHRINE[38]
network interfaces were installed and tests were defined to measure pilot “connectivity”
across the network. Use cases were collectively developed to test both within system and
across system validity for the first two data sources. As the systems were being launched,
each site devoted considerable effort to recruiting new domain stakeholders in the testing
and demonstration of the functionality. An early challenge was in determining how to define
and refine the use cases as to allow for review and measurement across sites by the
stakeholders who were local to an individual site, in a way that would inform better
processes for designing and testing the remaining data sources. By the end of the second
phase, the network had completed basic validation of the first two data sources with
common tests at each site while maintaining and growing partner engagement.
The third and final planned cycle sought to complete mapping of medication and laboratory
data across the three sites, and through this requirement identified significant challenges for
defining scope of the much larger mapping efforts involved in these data sources. As there
were limited resources at each site, semi-automated tools were tested, which revealed
challenges in cross-site evaluation of resulting mappings. A key project contribution was a
tool developed at University of California, San Francisco for terminology mapping, which
provided programmatic access to a standard medication coding system (RxNorm API).
Through this, each site sought to map a common reference medication formulary list of
ordered medications to a navigable (RxNorm-based) terminology tree. This was successful
with a very restricted set of medications, but stakeholders had difficulty navigating the
results. Further refinement of the use cases resulting from this led to a very restricted set of
laboratory tests associated specifically with diabetes diagnosis related groups, which in turn
required manual data extraction and mapping at each site. Both these data sources tested the
design / build / test partnerships across all sites. Throughout the experience of using the PCF
model with CICTR, stakeholder engagement grew critically and was crucial to the design
process as the project moved towards launch.
Costs and resources used for the initial pilot build of CICTR included: 1) biomedical
informatics personnel at each university site (i.e., faculty lead and programming staff), 2)
CTSA subsidized staff resources (i.e., system architects, programmers, and analysts for
consultation), and 3) infrastructure costs related to establishing the computing environment
Stephens et al. Page 8
Int J Med Inform
. Author manuscript; available in PMC 2019 October 13.
Author Manuscript Author Manuscript Author Manuscript Author Manuscript
at each site and the data loads. The CICTR pilot project was designed and implemented
within a 20 month timeframe to comply with funder requirements.
Through partnership in the fourth cycle, the project was adopted as a University of
California (UC) wide network, with an additional three sites added and additional support
for five years to bridge the five UC health campuses under the UC-Research Exchange (UC-
ReX) project. As a direct extension of CICTR, the PCF model of organization development
and management now drives the UC-ReX project through a semantic harmonization
lifecycle, and is now capable of providing query access to a population in excess of 14
million patients. CICTR has also been adopted at the University of Washington as a self-
service tool (De-Identified Clinical Data Repository, DCDR) broadly offered to researchers
engaged with the CTSA who want access to cohort counts of patients.
3.3.2 Cross-Cycle Observations—By the beginning of the fourth “sustainability”
cycle of the project and after completion of project funding, the CICTR project had
developed a mature architecture representing four unique data sources on a collective total of
over 5 million patient lives seen in three institutions. Throughout the project, the team used
regular iteration and feedback with stakeholders and identified and met challenges across
themes. Early expectations were set in the project that helped keep partners focused on use
cases that would be needed within the promotion and evaluation phases of the project.
But as the resolution of the required data from each site increased, so did the difficulty in
developing broad measureable test cases. The cycle process revealed that the project lacked
engagement with actual users who sought data and across institutions. Towards the third
cycle, the project used the PCF model and themes to assess expansion to additional sites,
and as a result engaged with a new set of comparative effectiveness researchers who
explicitly sought multi-site data discovery capabilities. This required developing a strategy
to engage the new stakeholders, using reflection on phases and outcomes to date, as well as
build common expectations for rapid development and evaluation. This process was captured
in documents and expectations for each of the four themes in the spiral, and translated into a
two month iterative development sprint that culminated in successful stakeholder-driven use
cases across the network.
The application of this iterative approach to growing a network was considered a success
due to successful development of a functional data sharing architecture, positive feedback,
and continued user engagement. The PCF model also helped accommodate a tightly defined,
short timeline driven project scope. The iterative process served to support timely
information feedback for all parts of the process, and in turn maintained the project group on
common and clear goals.
3.4 Cross-Team Pilot Collaboration
Collaborative meetings were held between Data QUEST and CICTR biomedical informatics
team members, sharing progress, technical failures and solutions, ethical considerations,
governance documents, and trust building activities. The PCF spiral model was jointly
developed by the biomedical informatics teams and adopted to organize and help direct work
within the pilot projects, while providing a framework to cross share development activities.
Stephens et al. Page 9
Int J Med Inform
. Author manuscript; available in PMC 2019 October 13.
Author Manuscript Author Manuscript Author Manuscript Author Manuscript
In addition to the adoption of the spiral model itself, team members shared governance and
ethics work, including specifics for approaches to development of governance infrastructure
and supporting documents. Teams also shared technology findings including software,
ontology, and data quality experiences. In addition to the process the PCF model offered
each project, it also provided a structure for cross team collaboration.
Cross collaboration observations led to the discovery of common challenges and lessons
learned across both pilot projects. Both projects were centered in infrastructure based grant
proposals with no specific use case driven directive, which created ambiguity for the scope
of work. Use cases proved crucial for determining overall data requirements, as well as
offering initial test cases to evaluate the system during launch. In hindsight, use cases could
have been initially defined in early cycles given both pilot projects required multiple cycles
of iteration to define functional use case tests towards the end of the project lifespan. Early
definition of use cases may have increased efficiency overall. Both projects also suffered
from a lack of users standing in the ready to use the clinical data sharing architectures,
creating a need for expanding efforts to promote dissemination of use. Finally, both pilot
projects required resource and time intensive effort to create trust between partners, without
which the technical architectures could not have succeeded.
4 DISCUSSION
The spiral model offered a practical and flexible process for creating two new electronic
health record driven data sharing pilot architectures federated across multiple health care
organizations. Creating these complex clinical data sharing networks for research requires a
communications-driven process that prioritizes partner engagement throughout all aspects of
the project. The cycling supported by the PCF model across these projects ensured technical
requirements were iterated closely with partners and facilitated partner engagement in the
process. The two CTSA supported pilot project teams also benefited from cross
collaboration using the PCF model, particularly given the similarities in scope of work and
complexity of the socio-technical environments.
Both pilot projects were a success in creating functioning data sharing architectures across
federated systems, using multiple iterations within a spiral based PCF model of
development. The PCF model accounted for key project missions and provided structure,
context, and concrete direction for addressing barriers that were often associated with
limiting or preventing project success. Success of the PCF model is also evidenced by both
project teams opting to add additional cycle iterations to further mature their technical
architectures and both projects resulting in successful expansions of the initial projects.
4.1 Lessons Learned
Unexpectedly, both projects required additional cycles before sharing actual data across
partner data sites. The PCF model allowed for a heuristic evaluation that helped the teams
identify inefficient processes and a critical missing component, namely definition of key use
cases, which led to the need for additional cycles. Data sharing architectures require clearly
defined research or clinical questions and engaged end-users, which are vital for defining
system requirements. Both projects were funded specifically to pilot informatics
Stephens et al. Page 10
Int J Med Inform
. Author manuscript; available in PMC 2019 October 13.
Author Manuscript Author Manuscript Author Manuscript Author Manuscript
infrastructures with no anchoring specific clinical topic to guide specifications, use cases,
and ultimately a user base. As a lesson learned, we identified the crucial need to focus more
directly on developing research questions in earlier cycles to frame and provide context for
the project. For future architecture builds, creating use cases at the outset of project initiation
may limit the need for extra cycles.
Developing data sharing governance with partners impacted system design significantly and
should be considered as early as possible. Governance development should include
involvement from partners, experienced legal experts, and data extraction stakeholders, and
should be revisited during each incremental cycle. Governance stakeholders played a crucial
role in system requirement definition from the design, content, and the technical mechanics
of how data would be shared, and the methods used to access the resulting data.
Lessons learned highlight the importance of flexibility in implementation management, the
on-going complexity of aligning data across each data site, challenges in engaging users, the
impact of governance issues on design, and the need to focus on system utility in the early
stages of development to sustain development across multi-disciplinary teams. Developing
trusted environments is complex and critical for project success and in the cases of these
pilot projects, achievable with appropriate resources.
4.2 Limitations
Limitations include the inability to compare the number of cycles and the overall resources
and costs needed for these pilot architectures to other similar efforts, given clear baseline
data on these metrics with other existing architectures have not been published consistently.
However, we were able to use a reasonable number of cycles for each of these pilot projects
and stay within resource constraints. We report on only a single approach for developing
these architectures, which did not allow for comparisons or testing of different software
development models. Future studies could address testing multiple software development
models, particularly given convergence and definition of aspects of these models remains an
active area of discovery.
5 CONCLUSION
We found that our spiral based PCF model was crucial for creating collaboration between
our two pilot projects building functional federated data sharing architectures and provided
great utility for promoting success and evaluating challenges within each pilot project.
Multiple national efforts have and continue to invest in the development of novel network-
based data sharing infrastructure development for research and cross collaborations would
strengthen this work and likely increase success. The PCF model may help identify and
establish necessary relationships and early detection of barriers within and between teams.
Finding cohesive methods that focus on building appropriate early use cases, bringing in
users, and systematically building trust among partners are needed to increase
implementation success of data sharing architecture development projects.
Future work would benefit from cross collaborations between similar data architecture
building projects, definition of use cases early in the process, and proper resources to
Stephens et al. Page 11
Int J Med Inform
. Author manuscript; available in PMC 2019 October 13.
Author Manuscript Author Manuscript Author Manuscript Author Manuscript
support work in building trust between partners. Use of software development models can
support this future work and help create standard processes for building these complex
architectures.
ACKNOWLEDGEMENTS
We would like to thank our Institute for Translational Health Sciences colleagues with the Data QUEST and the
CICTR projects and the DARTNet Institute. This research was supported by the National Center for Advancing
Translational Sciences, Clinical and Translational Science Award for the Institute for Translational Health Sciences
UL1TR000423 and Contract #HHSN268200700031C.
REFERENCES
1. Van Panhuis WG, Paul P, Emerson C, et al. A systematic review of barriers to data sharing in public
health. BMC Public Health. 2014;14:1144. doi:10.1186/1471-2458-14-1144. [PubMed: 25377061]
2. National Institute of Health. NIH Data Sharing Policy. 2003 http://grants.nih.gov/grants/policy/
data_sharing/.
3. Westfall JM, Mold J, and Fagnan L, Practice-Based Research--”Blue Highways” on the NIH
Roadmap. JAMA, 2007 297(4): p. 403–406. [PubMed: 17244837]
4. Diamond CC, Mostashari F, and Shirky C, Collecting And Sharing Data For Population Health: A
New Paradigm. Health Affairs, 2009 28(2): p. 454–466. [PubMed: 19276005]
5. Wilcox A, Randhawa G, Embi P, Cao H, and Kuperman G Sustainability Considerations for Health
Research and Analytic Data Infrastructures. eGEMs, 2014 2(2): Article 8.
6. Lazarus R, et al., Electronic Support for Public Health: validated case finding and reporting for
notifiable diseases using electronic medical data. J Am Med Inform Assoc, 200916(1): p. 18–24.
[PubMed: 18952940]
7. Zerhouni EA, Translational research: moving discovery to practice. Clin Pharmacol Ther, 2007
81(1): p. 126–8. [PubMed: 17186011]
8. Ash JS, Anderson NR, and Tarczy-Hornoch P, People and organizational issues in research systems
implementation. J Am Med Inform Assoc, 200815(3): p. 283–9. [PubMed: 18308986]
9. Murphy SC, Henry A Security Architecture for Query Tools used to Access Large Biomedical
Databases, in American Medical Informatics Association. 2002: San Antonio, Texas.
10. Murphy S, et al., Instrumenting the health care enterprise for discovery research in the genomic
era. Genome Res, 2009.
11. Malin B, A computational model to protect patient data from location-based re-identification. Artif
Intell Med, 2007 40(3): p. 223–39. [PubMed: 17544262]
12. Ohm P, Broken Promises of Privacy: Responding to the Surprising Failure of Anonymization.
University of Colorado Law Legal Studies Research Paper, 2009 09–12.
13. Evans B, Congress’ New Infrastructural Model of Medical Privacy. Notre Dame L. Rev, 2009 585:
p. 619–20.
14. Kass NE, An Ethics Framework for Public Health. Am J Public Health, 2000 91(11): p. 1776–
1782.
15. Nissenbaum H,
Protecting privacy in an information age: The problem of privacy in
public. Law
Philosophy, 199817(559-596).
16. Karp DR, et al., Ethical and practical issues associated with aggregating databases. PLoS Med,
2008 5(9): p.el90.
17. MacKenzie S, Wyatt M, Schuff R, Tenenbaum J, Anderson N, Practices and Perspectives on
Building Integrated Data Repositories: Results from a 2010 CTSA Survey, J Am Med Inform
Assoc, 2012 19(el): p. ell9–el24.
18. Gupta A, et al., Federated access to heterogeneous information resources in the Neuroscience
Iformation Framework (NIF). Neuroinformatics, 2008 6(3): p. 205–217. [PubMed: 18958629]
19. Rosenbloom ST, et al., A model for evaluating interface terminologies. J Am Med Inform Assoc,
200815(1): p.65–76. [PubMed: 17947616]
Stephens et al. Page 12
Int J Med Inform
. Author manuscript; available in PMC 2019 October 13.
Author Manuscript Author Manuscript Author Manuscript Author Manuscript
20. Hurd A, The federated advantage. Data exchange between healthcare organizations in RHIOs is a
hot topic. Can federated models end the debate? Health Manag Technol, 2008 29(4): p. 14,16.
21. Weber GM, et al., The Shared Health Research Information Network (SHRINE): a prototype
federated query tool for clinical data repositories. J Am Med Inform Assoc, 200916(5): p. 624–30.
[PubMed: 19567788]
22. Bradshaw RL, et al., Architecture of a federated query engine for heterogeneous resources. AMIA
Annu Symp Proc, 2009 2009: p. 70–4. [PubMed: 20351825]
23. McKinney M, HIPAA and HITECH: tighter control of patient data. Hosp Health Netw, 2009 83(6):
p. 50,52.
24. DesRoches CM, et al., Electronic health records in ambulatory care-a national survey of
physicians. N Engl J Med, 2008 359(1): p. 50–60. [PubMed: 18565855]
25. Wise PB, The meaning of meaningful use. Several technology applications are needed to qualify.
Healthc Exec. 25(3): p. 20–1.
26. Bates DW and Bitton A, The future of health information technology in the patient-centered
medical home. Health Aff (Millwood). 29(4): p. 614–21. [PubMed: 20368590]
27. Stephens KA, et al., LC Data QUEST: A technical architecture for community federated clinical
data sharing In 2012 AMIA Summits TranslSci Proc. 2012, AMIA: San Francisco, CA..
28. Anderson N, et al., Implementation of a de-identified federated data network for population-based
cohort discovery. Journal of the American Medical Informatics Association, 2011 26.
29. Magdaleno AM, Werner CML, & Araujo RM, Reconciling software development models: A quasi-
systematic review. Journal of Systems and Software, 2012 85(2): p. 351–369.
30. Boehm B, & Turner R, Using risk to balance agile and plan-driven methods. Computer, 2003
36(6): p.57–66.
31. Boehm B, & Hansen WJ, The Spiral Model as at tools for evolutionary acquisition. The Journal of
Defense Software Engineering, 200114(5): p. 4–11.
32. Boehm BW, A spiral model of software development and enhancement. Computer, 1988 21(5): p.
61–72.
33. Hitz P, Johnson B, Meier J, Wasbotten B, & Haller I, PS3-23: VDWdata source: Essentia Health.
Clinical Medicine and Research, 201311(3): p. 178.
34. Nagykaldi S, et al., Improving collaboration between Primary Care Research Networks using
Access Grid technology. Informatics in Primary Care, 200816(1): p. 51–8.
35. Peterson KA, et al., A model for the electronic support of practice-based research networks. Annals
of Family Medicine, 201210(6): p. 560–7. [PubMed: 23149534]
36. Stephens KA, Lin C, Baldwin L, Echo-Hawk A, & Keppel G, A web-based tool for cataloging
primary care electronic medical record federated data: FInDiT. 2011, AMIA: Bethesda, MD.
37. Murphy SN et al., Serving the enterprise and beyond with informatics for integrating biology and
the bedside (i2b2’). J Am Med Inform Assoc, 201017(2): p. 124–31. [PubMed: 20190053]
38. Weber GM, et al., The Shared Health Research Information Network (SHRINE): a prototype
federated query tool for clinical data repositories. J Am Med Inform Assoc, 200916(5): p. 624–30.
[PubMed: 19567788]
Stephens et al. Page 13
Int J Med Inform
. Author manuscript; available in PMC 2019 October 13.
Author Manuscript Author Manuscript Author Manuscript Author Manuscript
Highlights
We describe two federated data-sharing case examples
Using a spiral model and four themes we iterated the data-sharing
architectures
Cross collaboration between networks resulted in critical knowledge sharing
Stephens et al. Page 14
Int J Med Inform
. Author manuscript; available in PMC 2019 October 13.
Author Manuscript Author Manuscript Author Manuscript Author Manuscript
Summary points
Building federated data-sharing architectures requires supporting a range of
data owners, effective and validated semantic alignment between data
resources, and consistent focus on end-users.
Establishing these resources requires recognition of development
methodologies that support internal validation of data extraction and
translation processes, sustaining meaningful partnerships, and delivering clear
and measurable system utility.
Stephens et al. Page 15
Int J Med Inform
. Author manuscript; available in PMC 2019 October 13.
Author Manuscript Author Manuscript Author Manuscript Author Manuscript
Figure 1.
The Partnership-Driven Clinical Federated (PCF) Data sharing Model illustrates four
quadrants of themes used to define each iteration cycle of development.
Stephens et al. Page 16
Int J Med Inform
. Author manuscript; available in PMC 2019 October 13.
Author Manuscript Author Manuscript Author Manuscript Author Manuscript
Figure 2.
PCF Model Applied to Data QUEST detailing four cycles of iteration to mature the initial
launch of the data sharing architecture. Technical architecture failures are highlighted in red.
Stephens et al. Page 17
Int J Med Inform
. Author manuscript; available in PMC 2019 October 13.
Author Manuscript Author Manuscript Author Manuscript Author Manuscript
Figure 3.
PCF Model Applied to CICTR detailing three initial cycles and a fourth rapid sprint cycle to
mature the initial launch of the data sharing architecture. Project milestones owed to the
funders are highlighted in red.
Stephens et al. Page 18
Int J Med Inform
. Author manuscript; available in PMC 2019 October 13.
Author Manuscript Author Manuscript Author Manuscript Author Manuscript
... 20,[22][23][24][25] The Partnership-Driven Clinically Federated (PCF) Data Sharing Model provides a 4-quadrant framework that can enable data sharing for populations health communities studying disaster response through an iterative spiral model of development across (1) building collaborative partnerships, (2) defining system requirements, (3) developing technical architecture, and (4) conducting promotion and evaluation through measurable system utility. 26 The PCF model highlights the need for clear use case definitions to drive the iterative cycles of development, and the need to identify key facilitators and barriers to ensure that the technical system architecture and evaluation iterations lead to success. ...
... We asked about 4 main constructs informed by prior disaster research and the PCF model. [9][10][11]26 These constructs include (1) researcher role and expertise, (2) readiness for future hurricanes and floods, (3) barriers and facilitators to collaborative research, and (4) barriers and facilitators to data and technology adoption. Specifically, construct 1 is drawn from known challenges to prioritizing research gaps amid a disaster, 11 and constructs 2-4 are drawn from PCF's promotion and evaluation quadrant, collaborative partnerships quadrant, and both the system requirements and technical architecture quadrants, respectively. ...
... Barriers and facilitators to hurricane and flood disaster preparedness We used template analysis 32 in Dedoose to analyze interviews according to predefined themes aligning with the interview guide's 4 constructs. Informed by PCF and disaster research challenges, 11,26 the first author (J.P.) created the following codebook: For researcher roles and expertise (construct 1), we coded for research training, subject matter expertise, analytical methods, current occupational setting, and context in research (ie, place of focus, temporal scale, geospatial scale). For readiness for future hurricanes and floods (construct 2), we coded for indications of planning and preparations, barriers and facilitators they envisioned for future disaster research, and prior disaster management experience. ...
Article
Objective Information gaps that accompany hurricanes and floods limit researchers’ ability to determine the impact of disasters on population health. Defining key use cases for sharing complex disaster data with research communities and facilitators, and barriers to doing so are key to promoting population health research for disaster recovery. Materials and Methods We conducted a mixed-methods needs assessment with 15 population health researchers using interviews and card sorting. Interviews examined researchers’ information needs by soliciting barriers and facilitators in the context of their expertise and research practices. Card sorting ranked priority use cases for disaster preparedness. Results Seven barriers and 6 facilitators emerged from interviews. Barriers to collaborative research included process limitations, collaboration dynamics, and perception of research importance. Barriers to data and technology adoption included data gaps, limitations in information quality, transparency issues, and difficulty to learn. Facilitators to collaborative research included collaborative engagement and human resource processes. Facilitators to data and technology adoption included situation awareness, data quality considerations, adopting community standards, and attractive to learn. Card sorting prioritized 15 use cases and identified 30 additional information needs for population health research in disaster preparedness. Conclusions Population health researchers experience barriers to collaboration and adoption of data and technology that contribute to information gaps and limit disaster preparedness. The priority use cases we identified can help address information gaps by informing the design of supportive research tools and practices for disaster preparedness. Supportive tools should include information on data collection practices, quality assurance, and education resources usable during failures in electric or telecommunications systems.
... Ref [85] To structure iterations of development and facilitate knowledge sharing between two health network development teams coordination to support and manage shared phases, a spiral model for implementation and assessment was utilized. ...
Preprint
Full-text available
With the advent of the Internet of Things (IoT), Artificial Intelligence (AI), and Machine Learning (ML)/DeepLearning (DL) algorithms, the landscape of data-driven medical applications has emerged as a promising avenue for designing robust and scalable diagnostic and prognostic models from medical data. This has gained a lot of attention from both academia and industry, leading to significant improvements in healthcare quality. However, the adoption of AI-driven medical applications still faces tough challenges, including meeting security, privacy, and quality of service (QoS) standards. Recent developments in Federated Learning (FL) have made it possible to train complex machine-learned models in a distributed manner and has become an active research domain, particularly processing the medical data at the edge of the network in a decentralized way to preserve privacy and address security concerns. To this end, in this paper, we explore the present and future of FL technology in medical applications where data sharing is a significant challenge. We delve into the current research trends and their outcomes, unravelling the complexities of designing reliable and scalable FL models. Our paper outlines the fundamental statistical issues in FL, tackles device-related problems, addresses security challenges, and navigates the complexity of privacy concerns, all while highlighting its transformative potential in the medical field. Our study primarily focuses on medical applications of FL, particularly in the context of global cancer diagnosis. We highlight the potential of FL to enable computer-aided diagnosis tools that address this challenge with greater effectiveness than traditional data-driven methods. Recent literature has shown that FL models are robust and generalize well to new data, which is essential for medical applications. We hope that this comprehensive review will serve as a checkpoint for the field, summarizing the current state-of-the-art and identifying open problems and future research directions.
... A significant body of healthcare literature addressed EHR barriers such as the security of the health records (Al-Sharhan et al., 2019), patients' perceptions (Asan et al., 2016), trust (Stephens et al., 2016), and channel richness (Mirzaei and Esmaeilzadeh, 2021), but there are limited studies mitigating EHR barriers using new technologies, e.g. blockchain. ...
Article
Full-text available
A primary objective of blockchain technology is to address information security and efficiency issues related to existing information sharing systems. For the sharing of health records, little is known about the application of blockchain in management information systems. There are strict regulations for sharing health information due to insecure systems and privacy concerns. To significantly and effectively improve medical diagnosis, it is beneficial to have efficient, reliable, and accurate accessibility to a patient's full medical history. Due to concerns with the security of health systems and privacy concerns, medical histories are not always accessible to healthcare providers. To help increase accessibility options, this research proposes a blockchain-based model that facilitates sharing medical records in a manner that is beneficial to both healthcare provider and patients. Social exchange theory provides the theoretical support for the conceptual model presented. Experimental findings based on 151 participants revealed that the blockchain technology can provide a secure information system and increase patient motivation to share medical records. To show the applicability of the proposed use of blockchain from a practical managerial perspective, we show feasibility by developing an Android software application.
... These systems will combine patient data, clinician-specific data and competencies, and healthcare system data including staffing, patient census, and supply availability to optimize healthcare delivery and improve equity. This can occur only when large-scale data across different institutions and healthcare systems can be securely combined into a single identity-protected mega-dataset for widespread use (16,17). Such large-scale federation of patient, clinician, and healthcare system data will refine practices in different environments and help detect and mitigate disparities in healthcare delivery. ...
Article
Full-text available
While technological innovations are the invariable crux of speculation about the future of critical care, they cannot replace the clinician at the bedside. This article summarizes the work of the Society of Critical Care Medicine–appointed multiprofessional task for the Future of Critical Care. The Task Force notes that critical care practice will be transformed by novel technologies, integration of artificial intelligence decision support algorithms, and advances in seamless data operationalization across diverse healthcare systems and geographic regions and within federated datasets. Yet, new technologies will be relevant and meaningful only if they improve the very human endeavor of caring for someone who is critically ill.
... The WPRN is partnered with the Pacific Northwest Node of the NIDA National Drug Abuse Treatment Clinical Trials Network (CTN). Four independent small and medium-sized community-based primary care organizations in the WPRN with 21 clinics compose Data QUEST, a technical infrastructure that supports harmonization of EHR data across disparate primary care practices for the purpose of translational research (Cole, Stephens, Keppel, Estiri, & Baldwin, 2016;Stephens et al., 2012;Stephens, Anderson, Lin, & Estiri, 2016). The 21 primary care clinics within these organizations include community health clinics that provide primary care to patients with limited income, insurance, or other resources, as well as rural critical access hospital-associated clinics that mostly serve patients in small towns and rural areas. ...
Article
Full-text available
Background Patients with a substance use disorder (SUD) often present with co-occurring chronic conditions in primary care. Despite the high co-occurrence of chronic medical conditions and SUD, little is known about whether chronic condition outcomes or related service utilization in primary care varies between patients with versus without documented SUDs. This study examined whether having a SUD influenced the use of primary care services and common chronic condition outcomes for patients with diabetes, hypertension, and obesity. Methods A longitudinal cohort observational study examined electronic health record data from 21 primary care clinics in Washington and Idaho to examine differences in service utilization and clinical outcomes for diabetes, hypertension, and obesity in patients with and without a documented SUD diagnosis. Differences between patients with and without documented SUD diagnoses were compared over a three-year window for clinical outcome measures, including hemoglobin A1c, systolic and diastolic blood pressure, and body mass index, as well as service outcome measures, including number of encounters with primary care and co-located behavioral health providers, and orders for prescription opioids. Adult patients (N = 10,175) diagnosed with diabetes, hypertension, or obesity before the end of 2014, and who had ≥2 visits across a three-year window including at least one visit in 2014 (baseline) and at least one visit occurring 12 months or longer after the 2014 visit (follow-up) were examined. Results Patients with SUD diagnoses and co-occurring chronic conditions were seen by providers more frequently than patients without SUD diagnoses (p's < 0.05), and patients with SUD diagnoses were more likely to be prescribed opioid medications. Chronic condition outcomes were no different for patients with versus without SUD diagnoses. Discussion Despite the higher visit rates to providers in primary care, a majority of patients with SUD diagnoses and chronic medical conditions in primary care did not get seen by co-located behavioral health providers, who can potentially provide and support evidence informed care for both SUD and chronic conditions. Patients with chronic medical conditions also were more likely to get prescribed opioids if they had an SUD diagnosis. Care pathway innovations for SUDs that include greater utilization of evidence-informed co-treatment of SUDs and chronic conditions within primary care settings may be necessary for improving care overall for patients with comorbid SUDs and chronic conditions.
... Conducting and advancing biomedical research has long been and remains an essential part of the mission of academic health centers (AHCs). Recent and ongoing initiatives focused on accelerating clinical, translational, and foundational biomedical research have led AHCs to invest in infrastructure and capabilities that enable and support research activities which are increasingly information and data management-intensive [1][2][3]. Research informatics and information technology (research IT) encompasses technological, human and organizational resources, systems and methods that manage and analyze data, information and knowledge to improve biomedical and health research [4]. The success and growth of institutions' research enterprise relies upon advanced research IT solutions and capabilities, with investments in such infrastructure steadily rising [5,6]. ...
Article
Full-text available
This paper proposes the creation and application of maturity models to guide institutional strategic investment in research informatics and information technology (research IT) and to provide the ability to measure readiness for clinical and research infrastructure as well as sustainability of expertise. Conducting effective and efficient research in health science increasingly relies upon robust research IT systems and capabilities. Academic health centers are increasing investments in health IT systems to address operational pressures, including rapidly growing data, technological advances, and increasing security and regulatory challenges associated with data access requirements. Current approaches for planning and investment in research IT infrastructure vary across institutions and lack comparable guidance for evaluating investments, resulting in inconsistent approaches to research IT implementation across peer academic health centers as well as uncertainty in linking research IT investments to institutional goals. Maturity models address these issues through coupling the assessment of current organizational state with readiness for deployment of potential research IT investment, which can inform leadership strategy. Pilot work in maturity model development has ranged from using them as a catalyst for engaging medical school IT leaders in planning at a single institution to developing initial maturity indices that have been applied and refined across peer medical schools.
... For instance, the Data Query, Extraction, Standardization, Translation (Data QUEST) and Cross-Institutional Clinical Translational Research (CICTR) projects utilize a federated model in which local partners store their own data and approve each data extraction. These health data platforms provide a socio-technical approach to providing data sharing infrastructure for demographic and medical visit data [55]. Accountability Act (HIPAA), institutional requirements and sectorbased ethical concerns. ...
Conference Paper
Data too sensitive to be "open" for analysis and re-purposing typically remains "closed" as proprietary information. This dichotomy undermines efforts to make algorithmic systems more fair, transparent, and accountable. Access to proprietary data in particular is needed by government agencies to enforce policy, researchers to evaluate methods, and the public to hold agencies accountable; all of these needs must be met while preserving individual privacy and firm competitiveness. In this paper, we describe an integrated legal-technical approach provided by a third-party public-private data trust designed to balance these competing interests. Basic membership allows firms and agencies to enable low-risk access to data for compliance reporting and core methods research, while modular data sharing agreements support a wide array of projects and use cases. Unless specifically stated otherwise in an agreement, all data access is initially provided to end users through customized synthetic datasets that offer a) strong privacy guarantees, b) removal of signals that could expose competitive advantage, and c) removal of biases that could reinforce discriminatory policies, all while maintaining fidelity to the original data. We find that using synthetic data in conjunction with strong legal protections over raw data strikes a balance between transparency, proprietorship, privacy, and research objectives. This legal-technical framework can form the basis for data trusts in a variety of contexts.
... Data QUEST includes patient-level data stored securely within local practices' firewalls in aligned data repositories and uses a federated data-sharing infrastructure to support regulation-compliant governance of data between primary care partners and researchers. 17,18 Data QUEST's infrastructure and governance maintain compliance with Health Insurance Portability and Accountability Act (HIPAA) and Institutional Review Board (IRB) regulations. Data QUEST targets extraction from main data domains in the EHRs, including demographics, vital signs, diagnosis codes, diagnostic test results, social and family history, prescriptions and physical examination findings. ...
Article
Full-text available
We use prescription of statin medications and prescription of warfarin to explore the capacity of electronic health record data to (1) describe cohorts of patients prescribed these medications and (2) identify cohorts of patients with evidence of adverse events related to prescription of these medications. This study was conducted in the WWAMI region Practice and Research Network (WPRN)., a network of primary care practices across Washington, Wyoming, Alaska, Montana and Idaho DataQUEST, an electronic data-sharing infrastructure. We used electronic health record data to describe cohorts of patients prescribed statin or warfarin medications and reported the proportions of patients with adverse events. Among the 35,445 active patients, 1745 received at least one statin prescription and 301 received at least one warfarin prescription. Only 3 percent of statin patients had evidence of myopathy; 51 patients (17% of those prescribed warfarin) had a bleeding complication. Primary-care electronic health record data can effectively be used to identify patients prescribed specific medications and patients potentially experiencing medication adverse events.
Article
Background: Most people with alcohol or opioid use disorders (AUD or OUD) are not diagnosed or treated for these conditions in primary care. This study takes a critical step toward quantifying service gaps and directing improvement efforts for AUD and OUD by using electronic health record (EHR) data from diverse primary care organizations to quantify the extent to which AUD and OUD are underdiagnosed and undertreated in primary care practices. Methods: We extracted and integrated diagnosis, medication, and behavioral health visit data from the EHRs of 21 primary care clinics within four independent healthcare organizations representing community health centers and rural hospital-associated clinics in the Pacific Northwest United States. Rates of documented AUD and OUD diagnoses, pharmacological treatments, and behavioral health visits were evaluated over a two-year period (2015-2016). Results: Out of 47,502 adult primary care patients, 1476 (3.1%) had documented AUD; of these, 115 (7.8%) had orders for AUD medications and 271 (18.4%) had at least one documented visit with a non-physician behavioral health specialist. Only 402 (0.8%) patients had documented OUD, and of these, 107 (26.6%) received OUD medications and 119 (29.6%) had at least one documented visit with a non-physician behavioral health specialist. Rates of AUD diagnosis and AUD and OUD medications were higher in clinics that had co-located non-physician behavioral health specialists. Conclusions: AUD and OUD are underdiagnosed and undertreated within a sample of independent primary care organizations serving mostly rural patients. Primary care organizations likely need service models, technologies, and workforces, including non-physician behavioral health specialists, to improve capacities to diagnose and treat AUD and OUD.
Article
Full-text available
Objective: To provide an open source, interoperable, and scalable data quality assessment tool for evaluation and visualization of completeness and conformance in electronic health record (EHR) data repositories. Materials and methods: This article describes the tool's design and architecture and gives an overview of its outputs using a sample dataset of 200 000 randomly selected patient records with an encounter since January 1, 2010, extracted from the Research Patient Data Registry (RPDR) at Partners HealthCare. All the code and instructions to run the tool and interpret its results are provided in the Supplementary Appendix. Results: DQe-c produces a web-based report that summarizes data completeness and conformance in a given EHR data repository through descriptive graphics and tables. Results from running the tool on the sample RPDR data are organized into 4 sections: load and test details, completeness test, data model conformance test, and test of missingness in key clinical indicators. Discussion: Open science, interoperability across major clinical informatics platforms, and scalability to large databases are key design considerations for DQe-c. Iterative implementation of the tool across different institutions directed us to improve the scalability and interoperability of the tool and find ways to facilitate local setup. Conclusion: EHR data quality assessment has been hampered by implementation of ad hoc processes. The architecture and implementation of DQe-c offer valuable insights for developing reproducible and scalable data science tools to assess, manage, and process data in clinical data repositories.
Article
Full-text available
The United States has made recent large investments in creating data infrastructures to support the important goals of patient-centered outcomes research (PCOR) and comparative effectiveness research (CER), with still more investment planned. These initial investments, while critical to the creation of the infrastructures, are not expected to sustain them much beyond the initial development. To provide the maximum benefit, the infrastructures need to be sustained through innovative financing models while providing value to PCOR and CER researchers. Based on our experience with creating flexible sustainability strategies (i.e., strategies that are adaptive to the different characteristics and opportunities of a resource or infrastructure), we define specific factors that are important considerations in developing a sustainability strategy. These factors include assets, expansion, complexity, and stakeholders. Each factor is described, with examples of how it is applied. These factors are dimensions of variation in different resources, to which a sustainability strategy should adapt. We also identify specific important considerations for maintaining an infrastructure, so that the long-term intended benefits can be realized. These observations are presented as lessons learned, to be applied to other sustainability efforts. We define the lessons learned, relating them to the defined sustainability factors as interactions between factors. Using perspectives and experiences from a diverse group of experts, we define broad characteristics of sustainability strategies and important observations, which can vary for different projects. Other descriptions of adaptive, flexible, and successful models of collaboration between stakeholders and data infrastructures can expand this framework by identifying other factors for sustainability, and give more concrete directions on how sustainability can be best achieved.
Article
Full-text available
Background: In the current information age, the use of data has become essential for decision making in public health at the local, national, and global level. Despite a global commitment to the use and sharing of public health data, this can be challenging in reality. No systematic framework or global operational guidelines have been created for data sharing in public health. Barriers at different levels have limited data sharing but have only been anecdotally discussed or in the context of specific case studies. Incomplete systematic evidence on the scope and variety of these barriers has limited opportunities to maximize the value and use of public health data for science and policy. Methods: We conducted a systematic literature review of potential barriers to public health data sharing. Documents that described barriers to sharing of routinely collected public health data were eligible for inclusion and reviewed independently by a team of experts. We grouped identified barriers in a taxonomy for a focused international dialogue on solutions. Results: Twenty potential barriers were identified and classified in six categories: technical, motivational, economic, political, legal and ethical. The first three categories are deeply rooted in well-known challenges of health information systems for which structural solutions have yet to be found; the last three have solutions that lie in an international dialogue aimed at generating consensus on policies and instruments for data sharing. Conclusions: The simultaneous effect of multiple interacting barriers ranging from technical to intangible issues has greatly complicated advances in public health data sharing. A systematic framework of barriers to data sharing in public health will be essential to accelerate the use of valuable information for the global good.
Article
Full-text available
Purpose: The principal goal of the electronic Primary Care Research Network (ePCRN) is to enable the development of an electronic infrastructure to support clinical research activities in primary care practice-based research networks (PBRNs). We describe the model that the ePCRN developed to enhance the growth and to expand the reach of PBRN research. Methods: Use cases and activity diagrams were developed from interviews with key informants from 11 PBRNs from the United States and United Kingdom. Discrete functions were identified and aggregated into logical components. Interaction diagrams were created, and an overall composite diagram was constructed describing the proposed software behavior. Software for each component was written and aggregated, and the resulting prototype application was pilot tested for feasibility. A practical model was then created by separating application activities into distinct software packages based on existing PBRN business rules, hardware requirements, network requirements, and security concerns. Results: We present an information architecture that provides for essential interactions, activities, data flows, and structural elements necessary for providing support for PBRN translational research activities. The model describes research information exchange between investigators and clusters of independent data sites supported by a contracted research director. The model was designed to support recruitment for clinical trials, collection of aggregated anonymous data, and retrieval of identifiable data from previously consented patients across hundreds of practices. Conclusions: The proposed model advances our understanding of the fundamental roles and activities of PBRNs and defines the information exchange commonly used by PBRNs to successfully engage community health care clinicians in translational research activities. By describing the network architecture in a language familiar to that used by software developers, the model provides an important foundation for the development of electronic support for essential PBRN research activities.
Article
Full-text available
The University of Washington Institute of Translational Health Sciences is engaged in a project, LC Data QUEST, building data sharing capacity in primary care practices serving rural and tribal populations in the Washington, Wyoming, Alaska, Montana, Idaho region to build research infrastructure. We report on the iterative process of developing the technical architecture for semantically aligning electronic health data in primary care settings across our pilot sites and tools that will facilitate linkages between the research and practice communities. Our architecture emphasizes sustainable technical solutions for addressing data extraction, alignment, quality, and metadata management. The architecture provides immediate benefits to participating partners via a clinical decision support tool and data querying functionality to support local quality improvement efforts. The FInDiT tool catalogues type, quantity, and quality of the data that are available across the LC Data QUEST data sharing architecture. These tools facilitate the bi-directional process of translational research.
Article
Full-text available
S ince its original publication [1], the spiral development model diagrammed in Figure 1 has been used successfully in many defense and commercial projects. To extend this base of success, the Department of Defense (DoD) has recent-ly rewritten the defense acquisition regula-tions to incorporate "evolutionary acquisi-tion," an acquisition strategy designed to mesh well with spiral development. In par-ticular, DoD Instruction 5000.2 subdi-vides acquisition [2]: "There are two ... approaches, evolu-tionary and single step to full capability. An evolutionary approach is preferred. … [In this] approach, the ultimate capa-bility delivered to the user is divided into two or more blocks, with increasing increments of capability." (p. 20) Here, a block corresponds to a single product release. The text goes on to speci-fy the use of spiral development within blocks: "For both the evolutionary and single-step approaches, software develop-ment shall follow an iterative spiral development process in which contin-ually expanding software versions are based on learning from earlier devel-opment." (p. 20) Given this reliance on the spiral develop-ment model, an in-depth definition is appropriate. Two recent workshops pro-vided one. Engineering Institute held two workshops last year to study spiral devel-opment and identify a set of critical suc-cess factors and recommended approaches. Their results appear in two reports, [3, 4] and are available on the workshop Web site www.sei.emu.edu/cbs/spiral2000 The first author's presentations at these workshops defined spiral develop-ment and are followed below. The defini-tion was first converted to a report [5], where details, suggestions, and further ref-erences can be found. Additionally, a fol-low-on article appearing in a later CR O S S TALK issue, will address the rela-tionships among spiral development, evo-lutionary acquisition, and the Integrated Capability Maturity Model.
Article
Full-text available
Both agile and plan-driven approaches have situation-dependent shortcomings that, if not addressed, can lead to project failure. The challenge is to balance the two approaches to take advantage of their strengths in a given situation while compensating for their weaknesses. The authors present a risk-based approach for structuring projects to incorporate both agile and plan-driven approaches in proportion to a project's needs.
Article
This article opens discussion of a starkly new approach for protecting the privacy of Americans' sensitive health information. Last year, Congress empowered the U.S. Food and Drug Administration (FDA) to oversee development of a major new national infrastructure: a large-scale data network, the Sentinel System, that aims to include health data for 100 million Americans by 2012. This marked the first time since the end of the New Deal that a wholly new infrastructure regulatory mandate had been issued at the federal level. This important development, buried in drug-safety provisions of the Food and Drug Administration Amendments Act of 2007 (FDAAA), went largely unnoticed, as did the fact that Congress cast medical privacy, a hot-button issue for many members of the American public, as an infrastructure regulatory problem. Individuals are not empowered to make autonomous decisions about permissible uses and disclosures of their health data. Instead, Congress authorized FDA to decide whether proposed disclosures meet a statutorily defined public-interest standard. If so, then the disclosures are lawful without individual privacy authorization or informed consent. Within limits that this article explores, FDA can approve the release of private health data, including data in identifiable form, to private operators of Sentinel System infrastructure and to outside data users, including academic and commercial entities. This article describes the new privacy model, which was implicit in the statute Congress passed but far from obvious on its face. The goal is not to oppose the new approach. Congress was responding to serious public concern about the safety of FDA-approved products. This article accepts that this new privacy model exists and explores directions for implementing it in a manner that will be least corrosive of public trust. The goal is to elicit ongoing dialogue about appropriate institutional protections for the 100 million Americans whose data soon will be in this vast data network. FDA is, in many respects, an accidental infrastructure regulator, thrust into a new role strikingly different from its longstanding product-safety mandate. Fortunately, the challenges FDA now faces are not new ones. U.S. infrastructure regulators, in a wide variety of industry contexts, have harnessed private capital to build new infrastructures to serve defined public interests while protecting vulnerable classes. Lessons from these other contexts can shed light on appropriate governance structures for the Sentinel System. For example, privacy protection may be enhanced by eschewing vertical integration in favor of segregating certain key infrastructure functions that require access to identifiable data. It may be better to establish core privacy protections via rulemaking rather than through contracts and to centralize certain key discretionary decisions rather than delegating them to private, commercial decision-makers. Public trust will require strong due-process protections, regulatory independence, and a well-funded system of regulatory oversight; approaches employed by other infrastructure regulators may help address these concerns. The single greatest threat to privacy will come as FDA faces pressure to approve wide ancillary sales of Sentinel System data to help defray costs of system development. To make this system financeable while enforcing strong privacy protections, FDA should deploy its limited available funds to support a well-thought-out infrastructure financing facility that backstops clear privacy policies with appropriate political risk guarantees for private infrastructure investors.
Article
Computer scientists have recently undermined our faith in the privacy-protecting power of anonymization, the name for techniques for protecting the privacy of individuals in large databases by deleting information like names and social security numbers. These scientists have demonstrated they can often 'reidentify' or 'deanonymize' individuals hidden in anonymized data with astonishing ease. By understanding this research, we will realize we have made a mistake, labored beneath a fundamental misunderstanding, which has assured us much less privacy than we have assumed. This mistake pervades nearly every information privacy law, regulation, and debate, yet regulators and legal scholars have paid it scant attention. We must respond to the surprising failure of anonymization, and this Article provides the tools to do so.
Article
Philosophical and legal theories of privacy have long recognized the relationship between privacy and information about persons. They have, however, focused on personal, intimate, and sensitive information, assuming that with public information, and information drawn from public spheres, either privacy norms do not apply, or applying privacy norms is so burdensome as to be morally and legally unjustifiable. Against this preponderant view, I argue that information and communications technology, by facilitating surveillance, by vastly enhancing the collection, storage, and analysis of information, by enabling profiling, data mining and aggregation, has significantly altered the meaning of public information. As a result, a satisfactory legal and philosophical understanding of a right to privacy, capable of protecting the important values at stake in protecting privacy, must incorporate, in addition to traditional aspects of privacy, a degree of protection for privacy in public.
Article
Clinical integrated data repositories (IDRs) are poised to become a foundational element of biomedical and translational research by providing the coordinated data sources necessary to conduct retrospective analytic research and to identify and recruit prospective research subjects. The Clinical and Translational Science Award (CTSA) consortium's Informatics IDR Group conducted a survey of 2010 consortium members to evaluate recent trends in IDR implementation and use to support research between 2008 and 2010. A web-based survey based in part on a prior 2008 survey was developed and deployed to 46 national CTSA centers. A total of 35 separate organizations completed the survey (74%), representing 28 CTSAs and the National Institutes of Health Clinical Center. Survey results suggest that individual organizations are progressing in their approaches to the development, management, and use of IDRs as a means to support a broad array of research. We describe the major trends and emerging practices below.