Content uploaded by Kari Stephens
Author content
All content in this area was uploaded by Kari Stephens on Oct 31, 2019
Content may be subject to copyright.
Implementing Partnership-driven Clinical Federated Electronic
Health Record Data Sharing Networks
Kari A. Stephens, PhD1,2,3, Nicholas Anderson, PhD4, Ching-Ping Lin, PhD5, Hossein Estiri,
PhD3
1Department of Psychiatry & Behavioral Sciences, University of Washington, Box 356560,
Seattle, WA, 98195
2Department of Biomedical Informatics & Medical Education, University of Washington, Box
358051, Seattle, WA, 98109
3Institute of Translational Health Sciences, University of Washington, Box 358051, Seattle, WA,
98109
4Department of Pathology and Laboratory Medicine, University of California Davis, Davis, CA,
95616
5Global REACH, Medical School, University of Michigan, 5113 Medical Science Building I, 1301
Catherine St, Ann Arbor, MI, 48109-5611
Abstract
Objective—Building federated data sharing architectures requires supporting a range of data
owners, effective and validated semantic alignment between data resources, and consistent focus
on end-users. Establishing these resources requires development methodologies that support
internal validation of data extraction and translation processes, sustaining meaningful partnerships,
and delivering clear and measurable system utility. We describe findings from two federated data
sharing case examples that detail critical factors, shared outcomes, and production environment
results.
Corresponding Author: Kari A. Stephens, Address: Psychiatry & Behavioral Sciences, University of Washington, Box 356560,
Seattle, WA 98195, kstephen@uw.eduPhone: (206) 221-0349, Fax: (206)543-9520.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our
customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of
the resulting proof before it is published in its final citable form. Please note that during the production process errors may be
discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
Conflict of Interest
We wish to confirm that there are no known conflicts of interest associated with this publication and there has been no significant
financial support for this work that could have influenced its outcome.
We confirm that the manuscript has been read and approved by all named authors and that there are no other persons who satisfied the
criteria for authorship but are not listed. We further confirm that the order of authors listed in the manuscript has been approved by all
of us.
We confirm that we have given due consideration to the protection of intellectual property associated with this work and that there are
no impediments to publication, including the timing of publication, with respect to intellectual property. In so doing we confirm that
we have followed the regulations of our institutions concerning intellectual property.
We understand that the Corresponding Author is the sole contact for the Editorial process (including Editorial Manager and direct
communications with the office). He/she is responsible for communicating with the other authors about progress, submissions of
revisions and final approval of proofs. We confirm that we have provided a current, correct email address which is accessible by the
Corresponding Author and which has been configured to accept email from kstephen@uw.edu.
HHS Public Access
Author manuscript
Int J Med Inform
. Author manuscript; available in PMC 2019 October 13.
Published in final edited form as:
Int J Med Inform
. 2016 September ; 93: 26–33. doi:10.1016/j.ijmedinf.2016.05.008.
Author Manuscript Author Manuscript Author Manuscript Author Manuscript
Methods—Two federated data sharing pilot architectures developed to support network-based
research associated with the University of Washington’s Institute of Translational Health Sciences
provided the basis for the findings. A spiral model for implementation and evaluation was used to
structure iterations of development and support knowledge share between the two network
development teams, which cross collaborated to support and manage common stages.
Results—We found that using a spiral model of software development and multiple cycles of
iteration was effective in achieving early network design goals. Both networks required time and
resource intensive efforts to establish a trusted environment to create the data sharing
architectures. Both networks were challenged by the need for adaptive use cases to define and test
utility.
Conclusion—An iterative cyclical model of development provided a process for developing trust
with data partners and refining the design, and supported measureable success in the development
of new federated data sharing architectures.
Keywords
Information systems; Data sharing; Federated Networks; Implementation; Electronic Health
Records
1 INTRODUCTION
The broad adoption of electronic health record systems (EHRs) and efforts to align data
across disparate EHRs have led to advancements in research to improve public health. But
barriers to establish effective data sharing systems range across technical, motivational,
economic, legal, political, and ethical issues.[1] Data sharing has an integral role in reducing
the lag between research and clinical knowledge, products, and procedures that can improve
human health.[2] Bi-directional data sharing between clinical care and research
environments is crucial to advance improvements in patient care and overall population
health and essential to a Learning Healthcare System.[3] But creating data sharing systems
is complex and difficult.
Technical and methodological frameworks and guidelines for providing and integrating data
sharing infrastructures across multiple distinct and disparate clinical environments can
advance the ability for translational and comparative effectiveness research, and lead to
meaningful use and sharing of medical data.[4] However, there are no systematic efforts to
develop processes for creating data sharing architectures in public health environments.[1]
Published accounts addressing builds of data sharing infrastructures lack any systematic
application of well-established software development models. At present, implementation of
data sharing systems are often supported by grant funding and require the development of
broad engagement strategies between disparate environments. Sustainability of these
systems often becomes a challenge after initial investments support creation.[5] Software
model applications to architecture builds may lead to better sustainability.
Stephens et al. Page 2
Int J Med Inform
. Author manuscript; available in PMC 2019 October 13.
Author Manuscript Author Manuscript Author Manuscript Author Manuscript
1.1 BACKGROUND AND SIGNIFICANCE
Previous efforts in developing methods and tools to support clinical data sharing for research
lack access to high quality data sources.[6–8] Centralized approaches to data sharing are
limited by the scope of the data that network partners typically authorize for sharing and the
difficulty with keeping these data up to date.[4] Historically, limitations have also included
uneven common terminology expertise, challenges of trust and feasibility, and concerns for
privacy and security.[4,9–17] Storing data locally at partner sites and using federated
approaches to support data sharing is attractive because they simplify privacy and security
issues and clarify trust relationships.[18] However, no standard use of terminologies and
other semantic alignment issues remain a challenge, regardless of a centralized versus
federated model.[19] To date, large scale federated data sharing networks remain relatively
scarce, though successes have been increasing in domain-specific networks such as Regional
Health Information Organizations and cohort discovery pilots.[19–22] Growing concerns of
enhanced HIPAA privacy laws may further limit data sharing efforts.[23]
The expanding use of heath information technology, driven through efforts such as the 2009
HITECH act and the meaningful use requirement of health information exchanges, has
created the need for effective data sharing methods across organizations to target evaluation
and implementation of evidence based, patient-centered clinical practices.[24–26]
Methodological approaches to developing federated data sharing networks need to be
testable and generalizable to multiple domains, users, and stakeholders. The NCATS
Clinical Translational Science Award (CTSA) consortium has provided a fertile environment
for building federated data sharing networks across a range of heterogeneous institutional
and community based clinical environments with a focus on translational science.
1.2 OBJECTIVE
We partnered across two network teams to implement and evaluate a software development
model for building federated electronic health record clinical data sharing architectures. We
describe the use of a common spiral model and the experience of developing two distinct
architectures. Implementation of the spiral model centrally incorporated partnership building
across different clinical data environments and addressed the crucial role of partnerships and
disparate electronic medical record platforms and workflows.
2 METHODS
2.1 Network Development Pilot Projects
The common goal of our network pilot projects was to implement architectures for federated
networks that could support research queries through a common set of terminologies and
business processes. The Data QUery, Extraction, Standardization, Translation (Data
QETEST) project focused on data sharing across primary care based electronic health record
(EHR) data domains (i.e., demographics, visits, problem lists, medications, labs, diagnoses,
tests, various medical metrics and findings, etc.) across six primary care organizations in
Washington and Idaho.[27] Data QUEST is aimed to provide tools for sharing both de-
identified and identified data in aggregate form and at the patient level. The Cross-
Institutional Clinical Translational Research (CICTR) project targeted sharing five broad
Stephens et al. Page 3
Int J Med Inform
. Author manuscript; available in PMC 2019 October 13.
Author Manuscript Author Manuscript Author Manuscript Author Manuscript
data domains (i.e., demographics, medications, labs, diagnoses, and disposition data), with a
common domain of diabetes across acute care settings at three academic institutions
(University of Washington, University of California, San Francisco, and University of
California, Davis) with a focus of sharing de-identified aggregated data.[28] Both projects
used HIPAA guidance to define privacy handling of data prior to allowing research querying.
Both projects supported approaches that describe and document the data provenance.
2.2 Procedure
Three primary categories of software models have been identified (free/open source software
(FOSS), plan-driven, and agile) with little progress made at creating comprehensive
reconciliation across these models.[29] However, recommendations for selecting an
appropriate model include achieving a balance between agility and discipline.[30] The
strength of FOSS lies in allowing stakeholders to address and refine a system based on
individual priorities and resources. This model did not provide a feasible approach, given
our partner sites must share resources and technical solutions to remain scalable in a diverse
health data sharing architecture environment. Plan-driven or waterfall models lack iterative
processes for achieving stakeholder engagement across cycles of development that provide
flexibility, buy in, and adaptability. Agile methods are iterative but rely on quick “sprints’
through the phases of development to produce working systems for evaluation, which
require intensive development resources and evaluation resources (clinical and technical)
from our partners that they did not have.
To balance agility and discipline, we chose Boehm’s spiral model, used across many
commercial and defense projects, which included a focus on using a cyclic approach to grow
a system’s degree of definition and implementation while laying out anchor point milestones
to ensure stakeholder commitment to the defined solutions.[31–32] The spiral model was
used to provide clear process to guide our architecture development and included cycles for
iteration, incremental development, and the right level of risk management and cultural
compatibility for our environment. We analyzed project activities, milestones, stakeholder
priorities, and project documents using themes from Boehm’s spiral model of development,
which included four main phases in the software development lifecycle, to define additional
emerging themes. Each team, in partnership with project stakeholders, then reviewed and
iterated on the emerging themes and charted the history of the project across the theme areas
to develop initial project specific content for a draft spiral model. The resulting model was
adopted within the Data QUEST and CICTR project teams, guiding biomedical informatics
work within the projects. The model provided a frame to report and assess both individual
project and cross project successes and challenges.
3 RESULTS
3.1 The Partnership-driven Clinical Federated (PCF) Model
3.1.1 PCF Model Description—A generic spiral model for partnership-driven clinical
federated (PCF) data sharing, based on Boehm’s spiral model for software development,[31–
32] emerged from our iterative and qualitative based methods (Figure 1). This model
identified four themes to anchor the iterative process of development: 1) developing
Stephens et al. Page 4
Int J Med Inform
. Author manuscript; available in PMC 2019 October 13.
Author Manuscript Author Manuscript Author Manuscript Author Manuscript
partnerships, 2) defining system requirements, 3) determining technical architecture, and 4)
conducting effective promotion and evaluation.
The starting point was anchored in developing functional partnerships (Partnerships) that
define boundaries and drive system definition. As the data sharing system was defined
(System Requirements), implementation could occur (Technical Architecture), and finally
impact was assessed (Promotion and Evaluation) to ensure the utility and impact of the data
sharing network.
3.1.2 Cross-Project Model Validity—As each theme progressed through a single
iteration of the model, they informed the subsequent iterations and matured. Maturity of the
model within each network occurred through iterations as well as across themes.
Partnerships began internally with core project teams and progressed to recruitment and
expansion to additional community partners, with final governance being addressed as
maturity was reached. System requirements were initially drafted and subsequently refined
as partnerships and technical architectures were developed. Use cases were initiated with the
development of initial partnerships and evolved over time into meaningful use cases that
addressed overall utility of the architecture. In tandem, pilot users were identified and
training, support, release, and marketing efforts were developed and implemented to ensure
system utility and sustainability. In general, the model provided a useful framework for
teams across projects to collaborate, report progress to each other, and share and iterate
lessons learned.
3.2 Data QUery, Extraction, Standardization, Translation (Data QUEST) Project
3.2.1 Individual PCF Model Cycle Iterations—The Data QUEST project conformed
to the PCF model cycles (see Figure 2 for a project specific model), iterating through four
cycles. Anchor point milestones were developed and adjusted as needed, based on technical
discoveries and partner requirements. The first cycle began with the team initiating
partnerships within our CTSA partners, then moved into drafts of technical requirements,
system testing of initially predefined technical architecture solutions, which subsequently
failed, and initial definitions of use cases. We formed internal partnerships between all
CTSA partners and convened regular meetings to discuss and evaluate adoption of the
HMORN Virtual Data Warehouse (VDW) model,[33] while the community engagement and
biomedical informatics subgroups began developing feasibility study methods to determine
selection of community based partners for the pilot. Development of feasibility methods
resulted in the rejection of the VDW as a technical solution, due to the technical requirement
for programming expertise among the community practice partners needed to work with the
SAS based architecture. Our community practice partners reported that they did not have
resources to support this level of on-site programming.
In the second cycle, we identified and approached initial community partnerships using the
feasibility study methods (i.e., a semi-structured interview involving research readiness and
technical capacity assessment), allowing for community partner input on technical
requirements. A search was conducted to identify a replacement solution for the VDW,
which led to e-PCRN.[34–35] The biomedical informatics team downloaded the e-PCRN
Stephens et al. Page 5
Int J Med Inform
. Author manuscript; available in PMC 2019 October 13.
Author Manuscript Author Manuscript Author Manuscript Author Manuscript
software for pilot test and the test was unsuccessful, with the software deemed non-
functional. Furthermore, the governance requirements dictated from the feasibility studies
required on-site validation of all queries, which e-PCRN did not support and co-
development of the software to add this feature was untenable. Use cases across the CTSA
team and the community partners were iterated, cataloging community priorities for research
and achieving buy in from partners for the utility and need for use case definition. This also
led to redefining technical requirements to scale back functionality by excluding a tool
requirement to conduct self service queries against the federated data system.
In the third cycle, we solidified community partnerships and finalized selection of pilot
partners for installation of the technical architecture and updated partners with the changes
to the technical requirements to maintain continued buy in. National partnerships, most
notably with the DARTNet Institute, were engaged and promoted selection of a for-profit
vendor solution for extraction/translation/loading (ETL) tasks. Several vendors were
evaluated and a final vendor was selected, based on their extensive expertise with multiple
primary care based EHR vendor systems, ability to offer clinical decision support tools,
success with working in Practice Based Research Network settings with small rural based
primary care practices, and their existing relationship and good performance history with the
DARTNet Institute.[27] Evaluation groups were formed to refine pilot use cases based on
research priorities from the community partners.
In the fourth cycle, we introduced partners to the vendor and iterated and established
contracts and governance (i.e., Memorandums of Understanding, Data Use Agreements,
Business Associate Agreements, purchasing contracts) to allow initial installation of the
technical architecture. System requirements were adjusted to include vendor and partner
requirements, and dictionary architectures were begun, with dictionary efforts designed to
support marketing and dissemination of data sharing across the new network. After
completion of the fourth cycle, the technical architecture was deemed operational and initial
use case based extractions were performed.
Costs and resources used for the initial pilot build of Data QUEST included: 1) biomedical
informatics personnel (i.e., faculty lead and dedicated research assistant), 2) community
outreach CTSA faculty and staff (i.e., faculty lead and program manager), 3) CTSA
subsidized staff resources (i.e., system architects and analysts for consultation), and 4)
infrastructure contracts paid out to our vendor to conduct contracting and programming to
establish servers and the nightly ETL process. Our partner sites also subsidized: 1) staff time
to participate in working meets and establish permissions with our vendor to establish a
server within their firewalls and 2) infrastructure support to house the server. The Data
QUEST pilot project was designed and implemented within the CTSA 5 year cycle.
In the current state and subsequent CTSA cycle, the Data QUEST architecture is fully
functional with daily refreshes of a federated set of clinical data repositories (CDR) across
pilot sites, housed physically behind firewalls within the partner practices in SQL Server
environments. In addition, data within the CDRs are stored in formats that semantically
align with national partner networks within the DARTNet Institute, allowing for cross
collaborations. The vendor and the DARTNet Institute have physical access to the CDRs and
Stephens et al. Page 6
Int J Med Inform
. Author manuscript; available in PMC 2019 October 13.
Author Manuscript Author Manuscript Author Manuscript Author Manuscript
provide manual data extractions as needed, governed by agreements designed and executed
within the development cycles. Several research projects to date, including local university
and national collaboration based projects have been successfully conducted using the Data
QUEST architecture. All data extractions are conducted using Business Associate
Agreements and governance that provides honest brokerage, including compliance with
Heath Insurance Portability and Accountability Act (HIPAA) regulations, Data Use
Agreements, and Data Transfer Agreements. Data remain owned by the local partners and
data owners approve each extraction as they are requested, with the ability to opt out at any
time. A tool (FindIT – Federated Information Dictionary Tool)[36] to catalogue data depth
and breadth is under development targeting data visualization of the Data QUEST network
to the research community, with an aim towards bridging researchers and community based
practices and increasing use of the EHR data to facilitate translational research among the
Data QUEST network partners. We have developed the architecture to support a centralized
de-identified warehouse adopting the Observation Medical Outcomes Partnership (OMOP)
ontology, harmonizing with other national data sharing architectures and plans to continue
partner expansion of Data QUEST.
3.2.2 Cross-Cycle Observations—The Data QUEST project initiatives and priorities
were accounted for across the four themes. Two technical architecture failures occurred, and
rather than having progress stifled, the iterative nature of the model strengthened
partnerships and supported refinement of feasible system requirements. The PCF model
provided natural feedback loops between themes to coordinate and problem solve issues
across themes. Initial partnership input was critical to the development of use cases and
development of the final technical architecture. The team engaged in micro iterations when
barriers occurred in testing the technical architecture solution, resulting in higher utilization
of time resources needed to engage and mature partnerships.
The team identified several barriers requiring mediation within themes that required further
micro iterations within cycles resulting in time delays, but ultimately did not compromise
project success. Limited partner expertise or knowledge in informatics, for both institutional
and community partners and limited experience in being part of a multi-disciplinary
institutional and community team were barriers to initially developing effective use cases.
Partners struggled with limited resources due to overly burdened clinical environments and
had limited experience with research practices. A primary outcome of the application of the
PCF model was the critical role of early engagement in partnerships to mediate identified
barriers through iteration. Despite real-world challenges to aligning partners and resources
in a collaborative effort, iterations continued across cycles, allowing individual themes to
mature.
3.3 Cross-Institutional Clinical Translational Research (CICTR) Project
3.3.1 Individual PCF Model Cycle Iterations—The CICTR project conformed to the
PCF model cycles (see Figure 3 for a project specific model), iterating through three cycles
with predefined anchor point milestones to move the project towards the first version release
within a 20 month timeline dictated by the pilot funding requirements. The first cycle
focused on defining stakeholder roles at the three partner sites, deploying common
Stephens et al. Page 7
Int J Med Inform
. Author manuscript; available in PMC 2019 October 13.
Author Manuscript Author Manuscript Author Manuscript Author Manuscript
Informatics for Integrating Biology & the Bedside (i2b2)[37] based technical architectures
and gaining appropriate regulatory approval. At the time, each of these sites had existing
familiarity with i2b2, but not in developing environments that could provide appropriate
hosting for large scale extracts from the local EHR systems. This first coordinated effort
resulted in each of the sites establishing the i2b2 environments as common virtualized
environments located within local server resources, and allowed for cross-site network and
server configuration. Each site then implemented their local i2b2 server stack against an
industry grade database system, either IBM DB2, Microsoft SQL Server, or Oracle 11. Each
of these database systems were unique to local sites, and due to licensing restrictions and
costs, requirements were defined that maintained these systems as local resources.
The second cycle built upon on the common deployed i2b2 architecture and partners iterated
the access, common definition, and testing development for the four target data sources of
demographics, diagnoses, medications and laboratory tests. In parallel, the SHRINE[38]
network interfaces were installed and tests were defined to measure pilot “connectivity”
across the network. Use cases were collectively developed to test both within system and
across system validity for the first two data sources. As the systems were being launched,
each site devoted considerable effort to recruiting new domain stakeholders in the testing
and demonstration of the functionality. An early challenge was in determining how to define
and refine the use cases as to allow for review and measurement across sites by the
stakeholders who were local to an individual site, in a way that would inform better
processes for designing and testing the remaining data sources. By the end of the second
phase, the network had completed basic validation of the first two data sources with
common tests at each site while maintaining and growing partner engagement.
The third and final planned cycle sought to complete mapping of medication and laboratory
data across the three sites, and through this requirement identified significant challenges for
defining scope of the much larger mapping efforts involved in these data sources. As there
were limited resources at each site, semi-automated tools were tested, which revealed
challenges in cross-site evaluation of resulting mappings. A key project contribution was a
tool developed at University of California, San Francisco for terminology mapping, which
provided programmatic access to a standard medication coding system (RxNorm API).
Through this, each site sought to map a common reference medication formulary list of
ordered medications to a navigable (RxNorm-based) terminology tree. This was successful
with a very restricted set of medications, but stakeholders had difficulty navigating the
results. Further refinement of the use cases resulting from this led to a very restricted set of
laboratory tests associated specifically with diabetes diagnosis related groups, which in turn
required manual data extraction and mapping at each site. Both these data sources tested the
design / build / test partnerships across all sites. Throughout the experience of using the PCF
model with CICTR, stakeholder engagement grew critically and was crucial to the design
process as the project moved towards launch.
Costs and resources used for the initial pilot build of CICTR included: 1) biomedical
informatics personnel at each university site (i.e., faculty lead and programming staff), 2)
CTSA subsidized staff resources (i.e., system architects, programmers, and analysts for
consultation), and 3) infrastructure costs related to establishing the computing environment
Stephens et al. Page 8
Int J Med Inform
. Author manuscript; available in PMC 2019 October 13.
Author Manuscript Author Manuscript Author Manuscript Author Manuscript
at each site and the data loads. The CICTR pilot project was designed and implemented
within a 20 month timeframe to comply with funder requirements.
Through partnership in the fourth cycle, the project was adopted as a University of
California (UC) wide network, with an additional three sites added and additional support
for five years to bridge the five UC health campuses under the UC-Research Exchange (UC-
ReX) project. As a direct extension of CICTR, the PCF model of organization development
and management now drives the UC-ReX project through a semantic harmonization
lifecycle, and is now capable of providing query access to a population in excess of 14
million patients. CICTR has also been adopted at the University of Washington as a self-
service tool (De-Identified Clinical Data Repository, DCDR) broadly offered to researchers
engaged with the CTSA who want access to cohort counts of patients.
3.3.2 Cross-Cycle Observations—By the beginning of the fourth “sustainability”
cycle of the project and after completion of project funding, the CICTR project had
developed a mature architecture representing four unique data sources on a collective total of
over 5 million patient lives seen in three institutions. Throughout the project, the team used
regular iteration and feedback with stakeholders and identified and met challenges across
themes. Early expectations were set in the project that helped keep partners focused on use
cases that would be needed within the promotion and evaluation phases of the project.
But as the resolution of the required data from each site increased, so did the difficulty in
developing broad measureable test cases. The cycle process revealed that the project lacked
engagement with actual users who sought data and across institutions. Towards the third
cycle, the project used the PCF model and themes to assess expansion to additional sites,
and as a result engaged with a new set of comparative effectiveness researchers who
explicitly sought multi-site data discovery capabilities. This required developing a strategy
to engage the new stakeholders, using reflection on phases and outcomes to date, as well as
build common expectations for rapid development and evaluation. This process was captured
in documents and expectations for each of the four themes in the spiral, and translated into a
two month iterative development sprint that culminated in successful stakeholder-driven use
cases across the network.
The application of this iterative approach to growing a network was considered a success
due to successful development of a functional data sharing architecture, positive feedback,
and continued user engagement. The PCF model also helped accommodate a tightly defined,
short timeline driven project scope. The iterative process served to support timely
information feedback for all parts of the process, and in turn maintained the project group on
common and clear goals.
3.4 Cross-Team Pilot Collaboration
Collaborative meetings were held between Data QUEST and CICTR biomedical informatics
team members, sharing progress, technical failures and solutions, ethical considerations,
governance documents, and trust building activities. The PCF spiral model was jointly
developed by the biomedical informatics teams and adopted to organize and help direct work
within the pilot projects, while providing a framework to cross share development activities.
Stephens et al. Page 9
Int J Med Inform
. Author manuscript; available in PMC 2019 October 13.
Author Manuscript Author Manuscript Author Manuscript Author Manuscript
In addition to the adoption of the spiral model itself, team members shared governance and
ethics work, including specifics for approaches to development of governance infrastructure
and supporting documents. Teams also shared technology findings including software,
ontology, and data quality experiences. In addition to the process the PCF model offered
each project, it also provided a structure for cross team collaboration.
Cross collaboration observations led to the discovery of common challenges and lessons
learned across both pilot projects. Both projects were centered in infrastructure based grant
proposals with no specific use case driven directive, which created ambiguity for the scope
of work. Use cases proved crucial for determining overall data requirements, as well as
offering initial test cases to evaluate the system during launch. In hindsight, use cases could
have been initially defined in early cycles given both pilot projects required multiple cycles
of iteration to define functional use case tests towards the end of the project lifespan. Early
definition of use cases may have increased efficiency overall. Both projects also suffered
from a lack of users standing in the ready to use the clinical data sharing architectures,
creating a need for expanding efforts to promote dissemination of use. Finally, both pilot
projects required resource and time intensive effort to create trust between partners, without
which the technical architectures could not have succeeded.
4 DISCUSSION
The spiral model offered a practical and flexible process for creating two new electronic
health record driven data sharing pilot architectures federated across multiple health care
organizations. Creating these complex clinical data sharing networks for research requires a
communications-driven process that prioritizes partner engagement throughout all aspects of
the project. The cycling supported by the PCF model across these projects ensured technical
requirements were iterated closely with partners and facilitated partner engagement in the
process. The two CTSA supported pilot project teams also benefited from cross
collaboration using the PCF model, particularly given the similarities in scope of work and
complexity of the socio-technical environments.
Both pilot projects were a success in creating functioning data sharing architectures across
federated systems, using multiple iterations within a spiral based PCF model of
development. The PCF model accounted for key project missions and provided structure,
context, and concrete direction for addressing barriers that were often associated with
limiting or preventing project success. Success of the PCF model is also evidenced by both
project teams opting to add additional cycle iterations to further mature their technical
architectures and both projects resulting in successful expansions of the initial projects.
4.1 Lessons Learned
Unexpectedly, both projects required additional cycles before sharing actual data across
partner data sites. The PCF model allowed for a heuristic evaluation that helped the teams
identify inefficient processes and a critical missing component, namely definition of key use
cases, which led to the need for additional cycles. Data sharing architectures require clearly
defined research or clinical questions and engaged end-users, which are vital for defining
system requirements. Both projects were funded specifically to pilot informatics
Stephens et al. Page 10
Int J Med Inform
. Author manuscript; available in PMC 2019 October 13.
Author Manuscript Author Manuscript Author Manuscript Author Manuscript
infrastructures with no anchoring specific clinical topic to guide specifications, use cases,
and ultimately a user base. As a lesson learned, we identified the crucial need to focus more
directly on developing research questions in earlier cycles to frame and provide context for
the project. For future architecture builds, creating use cases at the outset of project initiation
may limit the need for extra cycles.
Developing data sharing governance with partners impacted system design significantly and
should be considered as early as possible. Governance development should include
involvement from partners, experienced legal experts, and data extraction stakeholders, and
should be revisited during each incremental cycle. Governance stakeholders played a crucial
role in system requirement definition from the design, content, and the technical mechanics
of how data would be shared, and the methods used to access the resulting data.
Lessons learned highlight the importance of flexibility in implementation management, the
on-going complexity of aligning data across each data site, challenges in engaging users, the
impact of governance issues on design, and the need to focus on system utility in the early
stages of development to sustain development across multi-disciplinary teams. Developing
trusted environments is complex and critical for project success and in the cases of these
pilot projects, achievable with appropriate resources.
4.2 Limitations
Limitations include the inability to compare the number of cycles and the overall resources
and costs needed for these pilot architectures to other similar efforts, given clear baseline
data on these metrics with other existing architectures have not been published consistently.
However, we were able to use a reasonable number of cycles for each of these pilot projects
and stay within resource constraints. We report on only a single approach for developing
these architectures, which did not allow for comparisons or testing of different software
development models. Future studies could address testing multiple software development
models, particularly given convergence and definition of aspects of these models remains an
active area of discovery.
5 CONCLUSION
We found that our spiral based PCF model was crucial for creating collaboration between
our two pilot projects building functional federated data sharing architectures and provided
great utility for promoting success and evaluating challenges within each pilot project.
Multiple national efforts have and continue to invest in the development of novel network-
based data sharing infrastructure development for research and cross collaborations would
strengthen this work and likely increase success. The PCF model may help identify and
establish necessary relationships and early detection of barriers within and between teams.
Finding cohesive methods that focus on building appropriate early use cases, bringing in
users, and systematically building trust among partners are needed to increase
implementation success of data sharing architecture development projects.
Future work would benefit from cross collaborations between similar data architecture
building projects, definition of use cases early in the process, and proper resources to
Stephens et al. Page 11
Int J Med Inform
. Author manuscript; available in PMC 2019 October 13.
Author Manuscript Author Manuscript Author Manuscript Author Manuscript
support work in building trust between partners. Use of software development models can
support this future work and help create standard processes for building these complex
architectures.
ACKNOWLEDGEMENTS
We would like to thank our Institute for Translational Health Sciences colleagues with the Data QUEST and the
CICTR projects and the DARTNet Institute. This research was supported by the National Center for Advancing
Translational Sciences, Clinical and Translational Science Award for the Institute for Translational Health Sciences
UL1TR000423 and Contract #HHSN268200700031C.
REFERENCES
1. Van Panhuis WG, Paul P, Emerson C, et al. A systematic review of barriers to data sharing in public
health. BMC Public Health. 2014;14:1144. doi:10.1186/1471-2458-14-1144. [PubMed: 25377061]
2. National Institute of Health. NIH Data Sharing Policy. 2003 http://grants.nih.gov/grants/policy/
data_sharing/.
3. Westfall JM, Mold J, and Fagnan L, Practice-Based Research--”Blue Highways” on the NIH
Roadmap. JAMA, 2007 297(4): p. 403–406. [PubMed: 17244837]
4. Diamond CC, Mostashari F, and Shirky C, Collecting And Sharing Data For Population Health: A
New Paradigm. Health Affairs, 2009 28(2): p. 454–466. [PubMed: 19276005]
5. Wilcox A, Randhawa G, Embi P, Cao H, and Kuperman G Sustainability Considerations for Health
Research and Analytic Data Infrastructures. eGEMs, 2014 2(2): Article 8.
6. Lazarus R, et al., Electronic Support for Public Health: validated case finding and reporting for
notifiable diseases using electronic medical data. J Am Med Inform Assoc, 200916(1): p. 18–24.
[PubMed: 18952940]
7. Zerhouni EA, Translational research: moving discovery to practice. Clin Pharmacol Ther, 2007
81(1): p. 126–8. [PubMed: 17186011]
8. Ash JS, Anderson NR, and Tarczy-Hornoch P, People and organizational issues in research systems
implementation. J Am Med Inform Assoc, 200815(3): p. 283–9. [PubMed: 18308986]
9. Murphy SC, Henry A Security Architecture for Query Tools used to Access Large Biomedical
Databases, in American Medical Informatics Association. 2002: San Antonio, Texas.
10. Murphy S, et al., Instrumenting the health care enterprise for discovery research in the genomic
era. Genome Res, 2009.
11. Malin B, A computational model to protect patient data from location-based re-identification. Artif
Intell Med, 2007 40(3): p. 223–39. [PubMed: 17544262]
12. Ohm P, Broken Promises of Privacy: Responding to the Surprising Failure of Anonymization.
University of Colorado Law Legal Studies Research Paper, 2009 09–12.
13. Evans B, Congress’ New Infrastructural Model of Medical Privacy. Notre Dame L. Rev, 2009 585:
p. 619–20.
14. Kass NE, An Ethics Framework for Public Health. Am J Public Health, 2000 91(11): p. 1776–
1782.
15. Nissenbaum H,
Protecting privacy in an information age: The problem of privacy in
public. Law
Philosophy, 199817(559-596).
16. Karp DR, et al., Ethical and practical issues associated with aggregating databases. PLoS Med,
2008 5(9): p.el90.
17. MacKenzie S, Wyatt M, Schuff R, Tenenbaum J, Anderson N, Practices and Perspectives on
Building Integrated Data Repositories: Results from a 2010 CTSA Survey, J Am Med Inform
Assoc, 2012 19(el): p. ell9–el24.
18. Gupta A, et al., Federated access to heterogeneous information resources in the Neuroscience
Iformation Framework (NIF). Neuroinformatics, 2008 6(3): p. 205–217. [PubMed: 18958629]
19. Rosenbloom ST, et al., A model for evaluating interface terminologies. J Am Med Inform Assoc,
200815(1): p.65–76. [PubMed: 17947616]
Stephens et al. Page 12
Int J Med Inform
. Author manuscript; available in PMC 2019 October 13.
Author Manuscript Author Manuscript Author Manuscript Author Manuscript
20. Hurd A, The federated advantage. Data exchange between healthcare organizations in RHIOs is a
hot topic. Can federated models end the debate? Health Manag Technol, 2008 29(4): p. 14,16.
21. Weber GM, et al., The Shared Health Research Information Network (SHRINE): a prototype
federated query tool for clinical data repositories. J Am Med Inform Assoc, 200916(5): p. 624–30.
[PubMed: 19567788]
22. Bradshaw RL, et al., Architecture of a federated query engine for heterogeneous resources. AMIA
Annu Symp Proc, 2009 2009: p. 70–4. [PubMed: 20351825]
23. McKinney M, HIPAA and HITECH: tighter control of patient data. Hosp Health Netw, 2009 83(6):
p. 50,52.
24. DesRoches CM, et al., Electronic health records in ambulatory care-a national survey of
physicians. N Engl J Med, 2008 359(1): p. 50–60. [PubMed: 18565855]
25. Wise PB, The meaning of meaningful use. Several technology applications are needed to qualify.
Healthc Exec. 25(3): p. 20–1.
26. Bates DW and Bitton A, The future of health information technology in the patient-centered
medical home. Health Aff (Millwood). 29(4): p. 614–21. [PubMed: 20368590]
27. Stephens KA, et al., LC Data QUEST: A technical architecture for community federated clinical
data sharing In 2012 AMIA Summits TranslSci Proc. 2012, AMIA: San Francisco, CA..
28. Anderson N, et al., Implementation of a de-identified federated data network for population-based
cohort discovery. Journal of the American Medical Informatics Association, 2011 26.
29. Magdaleno AM, Werner CML, & Araujo RM, Reconciling software development models: A quasi-
systematic review. Journal of Systems and Software, 2012 85(2): p. 351–369.
30. Boehm B, & Turner R, Using risk to balance agile and plan-driven methods. Computer, 2003
36(6): p.57–66.
31. Boehm B, & Hansen WJ, The Spiral Model as at tools for evolutionary acquisition. The Journal of
Defense Software Engineering, 200114(5): p. 4–11.
32. Boehm BW, A spiral model of software development and enhancement. Computer, 1988 21(5): p.
61–72.
33. Hitz P, Johnson B, Meier J, Wasbotten B, & Haller I, PS3-23: VDWdata source: Essentia Health.
Clinical Medicine and Research, 201311(3): p. 178.
34. Nagykaldi S, et al., Improving collaboration between Primary Care Research Networks using
Access Grid technology. Informatics in Primary Care, 200816(1): p. 51–8.
35. Peterson KA, et al., A model for the electronic support of practice-based research networks. Annals
of Family Medicine, 201210(6): p. 560–7. [PubMed: 23149534]
36. Stephens KA, Lin C, Baldwin L, Echo-Hawk A, & Keppel G, A web-based tool for cataloging
primary care electronic medical record federated data: FInDiT. 2011, AMIA: Bethesda, MD.
37. Murphy SN et al., Serving the enterprise and beyond with informatics for integrating biology and
the bedside (i2b2’). J Am Med Inform Assoc, 201017(2): p. 124–31. [PubMed: 20190053]
38. Weber GM, et al., The Shared Health Research Information Network (SHRINE): a prototype
federated query tool for clinical data repositories. J Am Med Inform Assoc, 200916(5): p. 624–30.
[PubMed: 19567788]
Stephens et al. Page 13
Int J Med Inform
. Author manuscript; available in PMC 2019 October 13.
Author Manuscript Author Manuscript Author Manuscript Author Manuscript
Highlights
•We describe two federated data-sharing case examples
•Using a spiral model and four themes we iterated the data-sharing
architectures
•Cross collaboration between networks resulted in critical knowledge sharing
Stephens et al. Page 14
Int J Med Inform
. Author manuscript; available in PMC 2019 October 13.
Author Manuscript Author Manuscript Author Manuscript Author Manuscript
Summary points
•Building federated data-sharing architectures requires supporting a range of
data owners, effective and validated semantic alignment between data
resources, and consistent focus on end-users.
•Establishing these resources requires recognition of development
methodologies that support internal validation of data extraction and
translation processes, sustaining meaningful partnerships, and delivering clear
and measurable system utility.
Stephens et al. Page 15
Int J Med Inform
. Author manuscript; available in PMC 2019 October 13.
Author Manuscript Author Manuscript Author Manuscript Author Manuscript
Figure 1.
The Partnership-Driven Clinical Federated (PCF) Data sharing Model illustrates four
quadrants of themes used to define each iteration cycle of development.
Stephens et al. Page 16
Int J Med Inform
. Author manuscript; available in PMC 2019 October 13.
Author Manuscript Author Manuscript Author Manuscript Author Manuscript
Figure 2.
PCF Model Applied to Data QUEST detailing four cycles of iteration to mature the initial
launch of the data sharing architecture. Technical architecture failures are highlighted in red.
Stephens et al. Page 17
Int J Med Inform
. Author manuscript; available in PMC 2019 October 13.
Author Manuscript Author Manuscript Author Manuscript Author Manuscript
Figure 3.
PCF Model Applied to CICTR detailing three initial cycles and a fourth rapid sprint cycle to
mature the initial launch of the data sharing architecture. Project milestones owed to the
funders are highlighted in red.
Stephens et al. Page 18
Int J Med Inform
. Author manuscript; available in PMC 2019 October 13.
Author Manuscript Author Manuscript Author Manuscript Author Manuscript