ArticlePDF Available

Implementing Partnership-driven Clinical Federated Electronic Health Record Data Sharing Networks

June 2016
International Journal of Medical Informatics 93

June 2016
93

DOI:10.1016/j.ijmedinf.2016.05.008

Authors:

Kari Stephens

University of Washington Seattle

Ching-Ping Lin

National Taiwan Normal University

Hossein Estiri

Harvard Medical School

Objective: Building federated data sharing architectures requires supporting a range of data owners, effective and validated semantic alignment between data resources, and consistent focus on end-users. Establishing these resources requires development methodologies that support internal validation of data extraction and translation processes, sustaining meaningful partnerships, and delivering clear and measurable system utility. We describe findings from two federated data sharing case examples that detail critical factors, shared outcomes, and production environment results. Methods: Two federated data sharing pilot architectures developed to support network-based research associated with the University of Washington's Institute of Translational Health Sciences provided the basis for the findings. A spiral model for implementation and evaluation was used to structure iterations of development and support knowledge share between the two network development teams, which cross collaborated to support and manage common stages. Results: We found that using a spiral model of software development and multiple cycles of iteration was effective in achieving early network design goals. Both networks required time and resource intensive efforts to establish a trusted environment to create the data sharing architectures. Both networks were challenged by the need for adaptive use cases to define and test utility. Conclusion: An iterative cyclical model of development provided a process for developing trust with data partners and refining the design, and supported measureable success in the development of new federated data sharing architectures.

The Partnership-Driven Clinical Federated (PCF) Data sharing Model illustrates four quadrants of themes used to define each iteration cycle of development.

…

PCF Model Applied to Data QUEST detailing four cycles of iteration to mature the initial launch of the data sharing architecture. Technical architecture failures are highlighted in red.

…

Figures - uploaded by Kari Stephens

Content may be subject to copyright.

Content uploaded by Kari Stephens

Content may be subject to copyright.

Implementing Partnership-driven Clinical Federated Electronic

Health Record Data Sharing Networks

Kari A. Stephens, PhD1,2,3, Nicholas Anderson, PhD4, Ching-Ping Lin, PhD5, Hossein Estiri,

PhD3

1Department of Psychiatry & Behavioral Sciences, University of Washington, Box 356560,

Seattle, WA, 98195

2Department of Biomedical Informatics & Medical Education, University of Washington, Box

358051, Seattle, WA, 98109

3Institute of Translational Health Sciences, University of Washington, Box 358051, Seattle, WA,

98109

4Department of Pathology and Laboratory Medicine, University of California Davis, Davis, CA,

95616

5Global REACH, Medical School, University of Michigan, 5113 Medical Science Building I, 1301

Catherine St, Ann Arbor, MI, 48109-5611

Abstract

Objective—Building federated data sharing architectures requires supporting a range of data

owners, effective and validated semantic alignment between data resources, and consistent focus

on end-users. Establishing these resources requires development methodologies that support

internal validation of data extraction and translation processes, sustaining meaningful partnerships,

and delivering clear and measurable system utility. We describe findings from two federated data

sharing case examples that detail critical factors, shared outcomes, and production environment

results.

Corresponding Author: Kari A. Stephens, Address: Psychiatry & Behavioral Sciences, University of Washington, Box 356560,

Seattle, WA 98195, kstephen@uw.eduPhone: (206) 221-0349, Fax: (206)543-9520.

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our

customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of

the resulting proof before it is published in its final citable form. Please note that during the production process errors may be

discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

Conflict of Interest

We wish to confirm that there are no known conflicts of interest associated with this publication and there has been no significant

financial support for this work that could have influenced its outcome.

We confirm that the manuscript has been read and approved by all named authors and that there are no other persons who satisfied the

criteria for authorship but are not listed. We further confirm that the order of authors listed in the manuscript has been approved by all

of us.

We confirm that we have given due consideration to the protection of intellectual property associated with this work and that there are

no impediments to publication, including the timing of publication, with respect to intellectual property. In so doing we confirm that

we have followed the regulations of our institutions concerning intellectual property.

We understand that the Corresponding Author is the sole contact for the Editorial process (including Editorial Manager and direct

communications with the office). He/she is responsible for communicating with the other authors about progress, submissions of

revisions and final approval of proofs. We confirm that we have provided a current, correct email address which is accessible by the

Corresponding Author and which has been configured to accept email from kstephen@uw.edu.

HHS Public Access

Author manuscript

Int J Med Inform

. Author manuscript; available in PMC 2019 October 13.

Published in final edited form as:

Int J Med Inform

. 2016 September ; 93: 26–33. doi:10.1016/j.ijmedinf.2016.05.008.

Author Manuscript Author Manuscript Author Manuscript Author Manuscript

Methods—Two federated data sharing pilot architectures developed to support network-based

research associated with the University of Washington’s Institute of Translational Health Sciences

provided the basis for the findings. A spiral model for implementation and evaluation was used to

structure iterations of development and support knowledge share between the two network

development teams, which cross collaborated to support and manage common stages.

Results—We found that using a spiral model of software development and multiple cycles of

iteration was effective in achieving early network design goals. Both networks required time and

resource intensive efforts to establish a trusted environment to create the data sharing

architectures. Both networks were challenged by the need for adaptive use cases to define and test

utility.

Conclusion—An iterative cyclical model of development provided a process for developing trust

with data partners and refining the design, and supported measureable success in the development

of new federated data sharing architectures.

Keywords

Information systems; Data sharing; Federated Networks; Implementation; Electronic Health

Records

1 INTRODUCTION

The broad adoption of electronic health record systems (EHRs) and efforts to align data

across disparate EHRs have led to advancements in research to improve public health. But

barriers to establish effective data sharing systems range across technical, motivational,

economic, legal, political, and ethical issues.[1] Data sharing has an integral role in reducing

the lag between research and clinical knowledge, products, and procedures that can improve

human health.[2] Bi-directional data sharing between clinical care and research

environments is crucial to advance improvements in patient care and overall population

health and essential to a Learning Healthcare System.[3] But creating data sharing systems

is complex and difficult.

Technical and methodological frameworks and guidelines for providing and integrating data

sharing infrastructures across multiple distinct and disparate clinical environments can

advance the ability for translational and comparative effectiveness research, and lead to

meaningful use and sharing of medical data.[4] However, there are no systematic efforts to

develop processes for creating data sharing architectures in public health environments.[1]

Published accounts addressing builds of data sharing infrastructures lack any systematic

application of well-established software development models. At present, implementation of

data sharing systems are often supported by grant funding and require the development of

broad engagement strategies between disparate environments. Sustainability of these

systems often becomes a challenge after initial investments support creation.[5] Software

model applications to architecture builds may lead to better sustainability.

Stephens et al. Page 2

Int J Med Inform

. Author manuscript; available in PMC 2019 October 13.

Author Manuscript Author Manuscript Author Manuscript Author Manuscript

1.1 BACKGROUND AND SIGNIFICANCE

Previous efforts in developing methods and tools to support clinical data sharing for research

lack access to high quality data sources.[6–8] Centralized approaches to data sharing are

limited by the scope of the data that network partners typically authorize for sharing and the

difficulty with keeping these data up to date.[4] Historically, limitations have also included

uneven common terminology expertise, challenges of trust and feasibility, and concerns for

privacy and security.[4,9–17] Storing data locally at partner sites and using federated

approaches to support data sharing is attractive because they simplify privacy and security

issues and clarify trust relationships.[18] However, no standard use of terminologies and

other semantic alignment issues remain a challenge, regardless of a centralized versus

federated model.[19] To date, large scale federated data sharing networks remain relatively

scarce, though successes have been increasing in domain-specific networks such as Regional

Health Information Organizations and cohort discovery pilots.[19–22] Growing concerns of

enhanced HIPAA privacy laws may further limit data sharing efforts.[23]

The expanding use of heath information technology, driven through efforts such as the 2009

HITECH act and the meaningful use requirement of health information exchanges, has

created the need for effective data sharing methods across organizations to target evaluation

and implementation of evidence based, patient-centered clinical practices.[24–26]

Methodological approaches to developing federated data sharing networks need to be

testable and generalizable to multiple domains, users, and stakeholders. The NCATS

Clinical Translational Science Award (CTSA) consortium has provided a fertile environment

for building federated data sharing networks across a range of heterogeneous institutional

and community based clinical environments with a focus on translational science.

1.2 OBJECTIVE

We partnered across two network teams to implement and evaluate a software development

model for building federated electronic health record clinical data sharing architectures. We

describe the use of a common spiral model and the experience of developing two distinct

architectures. Implementation of the spiral model centrally incorporated partnership building

across different clinical data environments and addressed the crucial role of partnerships and

disparate electronic medical record platforms and workflows.

2 METHODS

2.1 Network Development Pilot Projects

The common goal of our network pilot projects was to implement architectures for federated

networks that could support research queries through a common set of terminologies and

business processes. The Data QUery, Extraction, Standardization, Translation (Data

QETEST) project focused on data sharing across primary care based electronic health record

(EHR) data domains (i.e., demographics, visits, problem lists, medications, labs, diagnoses,

tests, various medical metrics and findings, etc.) across six primary care organizations in

Washington and Idaho.[27] Data QUEST is aimed to provide tools for sharing both de-

identified and identified data in aggregate form and at the patient level. The Cross-

Institutional Clinical Translational Research (CICTR) project targeted sharing five broad

Stephens et al. Page 3

Int J Med Inform

. Author manuscript; available in PMC 2019 October 13.

Author Manuscript Author Manuscript Author Manuscript Author Manuscript

data domains (i.e., demographics, medications, labs, diagnoses, and disposition data), with a

common domain of diabetes across acute care settings at three academic institutions

(University of Washington, University of California, San Francisco, and University of

California, Davis) with a focus of sharing de-identified aggregated data.[28] Both projects

used HIPAA guidance to define privacy handling of data prior to allowing research querying.

Both projects supported approaches that describe and document the data provenance.

2.2 Procedure

Three primary categories of software models have been identified (free/open source software

(FOSS), plan-driven, and agile) with little progress made at creating comprehensive

reconciliation across these models.[29] However, recommendations for selecting an

appropriate model include achieving a balance between agility and discipline.[30] The

strength of FOSS lies in allowing stakeholders to address and refine a system based on

individual priorities and resources. This model did not provide a feasible approach, given

our partner sites must share resources and technical solutions to remain scalable in a diverse

health data sharing architecture environment. Plan-driven or waterfall models lack iterative

processes for achieving stakeholder engagement across cycles of development that provide

flexibility, buy in, and adaptability. Agile methods are iterative but rely on quick “sprints’

through the phases of development to produce working systems for evaluation, which

require intensive development resources and evaluation resources (clinical and technical)

from our partners that they did not have.

To balance agility and discipline, we chose Boehm’s spiral model, used across many

commercial and defense projects, which included a focus on using a cyclic approach to grow

a system’s degree of definition and implementation while laying out anchor point milestones

to ensure stakeholder commitment to the defined solutions.[31–32] The spiral model was

used to provide clear process to guide our architecture development and included cycles for

iteration, incremental development, and the right level of risk management and cultural

compatibility for our environment. We analyzed project activities, milestones, stakeholder

priorities, and project documents using themes from Boehm’s spiral model of development,

which included four main phases in the software development lifecycle, to define additional

emerging themes. Each team, in partnership with project stakeholders, then reviewed and

iterated on the emerging themes and charted the history of the project across the theme areas

to develop initial project specific content for a draft spiral model. The resulting model was

adopted within the Data QUEST and CICTR project teams, guiding biomedical informatics

work within the projects. The model provided a frame to report and assess both individual

project and cross project successes and challenges.

3 RESULTS

3.1 The Partnership-driven Clinical Federated (PCF) Model

3.1.1 PCF Model Description—A generic spiral model for partnership-driven clinical

federated (PCF) data sharing, based on Boehm’s spiral model for software development,[31–

32] emerged from our iterative and qualitative based methods (Figure 1). This model

identified four themes to anchor the iterative process of development: 1) developing

Stephens et al. Page 4

Int J Med Inform

. Author manuscript; available in PMC 2019 October 13.

Author Manuscript Author Manuscript Author Manuscript Author Manuscript

partnerships, 2) defining system requirements, 3) determining technical architecture, and 4)

conducting effective promotion and evaluation.

The starting point was anchored in developing functional partnerships (Partnerships) that

define boundaries and drive system definition. As the data sharing system was defined

(System Requirements), implementation could occur (Technical Architecture), and finally

impact was assessed (Promotion and Evaluation) to ensure the utility and impact of the data

sharing network.

3.1.2 Cross-Project Model Validity—As each theme progressed through a single

iteration of the model, they informed the subsequent iterations and matured. Maturity of the

model within each network occurred through iterations as well as across themes.

Partnerships began internally with core project teams and progressed to recruitment and

expansion to additional community partners, with final governance being addressed as

maturity was reached. System requirements were initially drafted and subsequently refined

as partnerships and technical architectures were developed. Use cases were initiated with the

development of initial partnerships and evolved over time into meaningful use cases that

addressed overall utility of the architecture. In tandem, pilot users were identified and

training, support, release, and marketing efforts were developed and implemented to ensure

system utility and sustainability. In general, the model provided a useful framework for

teams across projects to collaborate, report progress to each other, and share and iterate

lessons learned.

3.2 Data QUery, Extraction, Standardization, Translation (Data QUEST) Project

3.2.1 Individual PCF Model Cycle Iterations—The Data QUEST project conformed

to the PCF model cycles (see Figure 2 for a project specific model), iterating through four

cycles. Anchor point milestones were developed and adjusted as needed, based on technical

discoveries and partner requirements. The first cycle began with the team initiating

partnerships within our CTSA partners, then moved into drafts of technical requirements,

system testing of initially predefined technical architecture solutions, which subsequently

failed, and initial definitions of use cases. We formed internal partnerships between all

CTSA partners and convened regular meetings to discuss and evaluate adoption of the

HMORN Virtual Data Warehouse (VDW) model,[33] while the community engagement and

biomedical informatics subgroups began developing feasibility study methods to determine

selection of community based partners for the pilot. Development of feasibility methods

resulted in the rejection of the VDW as a technical solution, due to the technical requirement

for programming expertise among the community practice partners needed to work with the

SAS based architecture. Our community practice partners reported that they did not have

resources to support this level of on-site programming.

In the second cycle, we identified and approached initial community partnerships using the

feasibility study methods (i.e., a semi-structured interview involving research readiness and

technical capacity assessment), allowing for community partner input on technical

requirements. A search was conducted to identify a replacement solution for the VDW,

which led to e-PCRN.[34–35] The biomedical informatics team downloaded the e-PCRN

Stephens et al. Page 5

Int J Med Inform

. Author manuscript; available in PMC 2019 October 13.

Author Manuscript Author Manuscript Author Manuscript Author Manuscript

software for pilot test and the test was unsuccessful, with the software deemed non-

functional. Furthermore, the governance requirements dictated from the feasibility studies

required on-site validation of all queries, which e-PCRN did not support and co-

development of the software to add this feature was untenable. Use cases across the CTSA

team and the community partners were iterated, cataloging community priorities for research

and achieving buy in from partners for the utility and need for use case definition. This also

led to redefining technical requirements to scale back functionality by excluding a tool

requirement to conduct self service queries against the federated data system.

In the third cycle, we solidified community partnerships and finalized selection of pilot

partners for installation of the technical architecture and updated partners with the changes

to the technical requirements to maintain continued buy in. National partnerships, most

notably with the DARTNet Institute, were engaged and promoted selection of a for-profit

vendor solution for extraction/translation/loading (ETL) tasks. Several vendors were

evaluated and a final vendor was selected, based on their extensive expertise with multiple

primary care based EHR vendor systems, ability to offer clinical decision support tools,

success with working in Practice Based Research Network settings with small rural based

primary care practices, and their existing relationship and good performance history with the

DARTNet Institute.[27] Evaluation groups were formed to refine pilot use cases based on

research priorities from the community partners.

In the fourth cycle, we introduced partners to the vendor and iterated and established

contracts and governance (i.e., Memorandums of Understanding, Data Use Agreements,

Business Associate Agreements, purchasing contracts) to allow initial installation of the

technical architecture. System requirements were adjusted to include vendor and partner

requirements, and dictionary architectures were begun, with dictionary efforts designed to

support marketing and dissemination of data sharing across the new network. After

completion of the fourth cycle, the technical architecture was deemed operational and initial

use case based extractions were performed.

Costs and resources used for the initial pilot build of Data QUEST included: 1) biomedical

informatics personnel (i.e., faculty lead and dedicated research assistant), 2) community

outreach CTSA faculty and staff (i.e., faculty lead and program manager), 3) CTSA

subsidized staff resources (i.e., system architects and analysts for consultation), and 4)

infrastructure contracts paid out to our vendor to conduct contracting and programming to

establish servers and the nightly ETL process. Our partner sites also subsidized: 1) staff time

to participate in working meets and establish permissions with our vendor to establish a

server within their firewalls and 2) infrastructure support to house the server. The Data

QUEST pilot project was designed and implemented within the CTSA 5 year cycle.

In the current state and subsequent CTSA cycle, the Data QUEST architecture is fully

functional with daily refreshes of a federated set of clinical data repositories (CDR) across

pilot sites, housed physically behind firewalls within the partner practices in SQL Server

environments. In addition, data within the CDRs are stored in formats that semantically

align with national partner networks within the DARTNet Institute, allowing for cross

collaborations. The vendor and the DARTNet Institute have physical access to the CDRs and

Stephens et al. Page 6

Int J Med Inform

. Author manuscript; available in PMC 2019 October 13.

Author Manuscript Author Manuscript Author Manuscript Author Manuscript

provide manual data extractions as needed, governed by agreements designed and executed

within the development cycles. Several research projects to date, including local university

and national collaboration based projects have been successfully conducted using the Data

QUEST architecture. All data extractions are conducted using Business Associate

Agreements and governance that provides honest brokerage, including compliance with

Heath Insurance Portability and Accountability Act (HIPAA) regulations, Data Use

Agreements, and Data Transfer Agreements. Data remain owned by the local partners and

data owners approve each extraction as they are requested, with the ability to opt out at any

time. A tool (FindIT – Federated Information Dictionary Tool)[36] to catalogue data depth

and breadth is under development targeting data visualization of the Data QUEST network

to the research community, with an aim towards bridging researchers and community based

practices and increasing use of the EHR data to facilitate translational research among the

Data QUEST network partners. We have developed the architecture to support a centralized

de-identified warehouse adopting the Observation Medical Outcomes Partnership (OMOP)

ontology, harmonizing with other national data sharing architectures and plans to continue

partner expansion of Data QUEST.

3.2.2 Cross-Cycle Observations—The Data QUEST project initiatives and priorities

were accounted for across the four themes. Two technical architecture failures occurred, and

rather than having progress stifled, the iterative nature of the model strengthened

partnerships and supported refinement of feasible system requirements. The PCF model

provided natural feedback loops between themes to coordinate and problem solve issues

across themes. Initial partnership input was critical to the development of use cases and

development of the final technical architecture. The team engaged in micro iterations when

barriers occurred in testing the technical architecture solution, resulting in higher utilization

of time resources needed to engage and mature partnerships.

The team identified several barriers requiring mediation within themes that required further

micro iterations within cycles resulting in time delays, but ultimately did not compromise

project success. Limited partner expertise or knowledge in informatics, for both institutional

and community partners and limited experience in being part of a multi-disciplinary

institutional and community team were barriers to initially developing effective use cases.

Partners struggled with limited resources due to overly burdened clinical environments and

had limited experience with research practices. A primary outcome of the application of the

PCF model was the critical role of early engagement in partnerships to mediate identified

barriers through iteration. Despite real-world challenges to aligning partners and resources

in a collaborative effort, iterations continued across cycles, allowing individual themes to

mature.

3.3 Cross-Institutional Clinical Translational Research (CICTR) Project

3.3.1 Individual PCF Model Cycle Iterations—The CICTR project conformed to the

PCF model cycles (see Figure 3 for a project specific model), iterating through three cycles

with predefined anchor point milestones to move the project towards the first version release

within a 20 month timeline dictated by the pilot funding requirements. The first cycle

focused on defining stakeholder roles at the three partner sites, deploying common

Stephens et al. Page 7

Int J Med Inform

. Author manuscript; available in PMC 2019 October 13.

Author Manuscript Author Manuscript Author Manuscript Author Manuscript

Informatics for Integrating Biology & the Bedside (i2b2)[37] based technical architectures

and gaining appropriate regulatory approval. At the time, each of these sites had existing

familiarity with i2b2, but not in developing environments that could provide appropriate

hosting for large scale extracts from the local EHR systems. This first coordinated effort

resulted in each of the sites establishing the i2b2 environments as common virtualized

environments located within local server resources, and allowed for cross-site network and

server configuration. Each site then implemented their local i2b2 server stack against an

industry grade database system, either IBM DB2, Microsoft SQL Server, or Oracle 11. Each

of these database systems were unique to local sites, and due to licensing restrictions and

costs, requirements were defined that maintained these systems as local resources.

The second cycle built upon on the common deployed i2b2 architecture and partners iterated

the access, common definition, and testing development for the four target data sources of

demographics, diagnoses, medications and laboratory tests. In parallel, the SHRINE[38]

network interfaces were installed and tests were defined to measure pilot “connectivity”

across the network. Use cases were collectively developed to test both within system and

across system validity for the first two data sources. As the systems were being launched,

each site devoted considerable effort to recruiting new domain stakeholders in the testing

and demonstration of the functionality. An early challenge was in determining how to define

and refine the use cases as to allow for review and measurement across sites by the

stakeholders who were local to an individual site, in a way that would inform better

processes for designing and testing the remaining data sources. By the end of the second

phase, the network had completed basic validation of the first two data sources with

common tests at each site while maintaining and growing partner engagement.

The third and final planned cycle sought to complete mapping of medication and laboratory

data across the three sites, and through this requirement identified significant challenges for

defining scope of the much larger mapping efforts involved in these data sources. As there

were limited resources at each site, semi-automated tools were tested, which revealed

challenges in cross-site evaluation of resulting mappings. A key project contribution was a

tool developed at University of California, San Francisco for terminology mapping, which

provided programmatic access to a standard medication coding system (RxNorm API).

Through this, each site sought to map a common reference medication formulary list of

ordered medications to a navigable (RxNorm-based) terminology tree. This was successful

with a very restricted set of medications, but stakeholders had difficulty navigating the

results. Further refinement of the use cases resulting from this led to a very restricted set of

laboratory tests associated specifically with diabetes diagnosis related groups, which in turn

required manual data extraction and mapping at each site. Both these data sources tested the

design / build / test partnerships across all sites. Throughout the experience of using the PCF

model with CICTR, stakeholder engagement grew critically and was crucial to the design

process as the project moved towards launch.

Costs and resources used for the initial pilot build of CICTR included: 1) biomedical

informatics personnel at each university site (i.e., faculty lead and programming staff), 2)

CTSA subsidized staff resources (i.e., system architects, programmers, and analysts for

consultation), and 3) infrastructure costs related to establishing the computing environment

Stephens et al. Page 8

Int J Med Inform

. Author manuscript; available in PMC 2019 October 13.

Author Manuscript Author Manuscript Author Manuscript Author Manuscript

at each site and the data loads. The CICTR pilot project was designed and implemented

within a 20 month timeframe to comply with funder requirements.

Through partnership in the fourth cycle, the project was adopted as a University of

California (UC) wide network, with an additional three sites added and additional support

for five years to bridge the five UC health campuses under the UC-Research Exchange (UC-

ReX) project. As a direct extension of CICTR, the PCF model of organization development

and management now drives the UC-ReX project through a semantic harmonization

lifecycle, and is now capable of providing query access to a population in excess of 14

million patients. CICTR has also been adopted at the University of Washington as a self-

service tool (De-Identified Clinical Data Repository, DCDR) broadly offered to researchers

engaged with the CTSA who want access to cohort counts of patients.

3.3.2 Cross-Cycle Observations—By the beginning of the fourth “sustainability”

cycle of the project and after completion of project funding, the CICTR project had

developed a mature architecture representing four unique data sources on a collective total of

over 5 million patient lives seen in three institutions. Throughout the project, the team used

regular iteration and feedback with stakeholders and identified and met challenges across

themes. Early expectations were set in the project that helped keep partners focused on use

cases that would be needed within the promotion and evaluation phases of the project.

But as the resolution of the required data from each site increased, so did the difficulty in

developing broad measureable test cases. The cycle process revealed that the project lacked

engagement with actual users who sought data and across institutions. Towards the third

cycle, the project used the PCF model and themes to assess expansion to additional sites,

and as a result engaged with a new set of comparative effectiveness researchers who

explicitly sought multi-site data discovery capabilities. This required developing a strategy

to engage the new stakeholders, using reflection on phases and outcomes to date, as well as

build common expectations for rapid development and evaluation. This process was captured

in documents and expectations for each of the four themes in the spiral, and translated into a

two month iterative development sprint that culminated in successful stakeholder-driven use

cases across the network.

The application of this iterative approach to growing a network was considered a success

due to successful development of a functional data sharing architecture, positive feedback,

and continued user engagement. The PCF model also helped accommodate a tightly defined,

short timeline driven project scope. The iterative process served to support timely

information feedback for all parts of the process, and in turn maintained the project group on

common and clear goals.

3.4 Cross-Team Pilot Collaboration

Collaborative meetings were held between Data QUEST and CICTR biomedical informatics

team members, sharing progress, technical failures and solutions, ethical considerations,

governance documents, and trust building activities. The PCF spiral model was jointly

developed by the biomedical informatics teams and adopted to organize and help direct work

within the pilot projects, while providing a framework to cross share development activities.

Stephens et al. Page 9

Int J Med Inform

. Author manuscript; available in PMC 2019 October 13.

Author Manuscript Author Manuscript Author Manuscript Author Manuscript

In addition to the adoption of the spiral model itself, team members shared governance and

ethics work, including specifics for approaches to development of governance infrastructure

and supporting documents. Teams also shared technology findings including software,

ontology, and data quality experiences. In addition to the process the PCF model offered

each project, it also provided a structure for cross team collaboration.

Cross collaboration observations led to the discovery of common challenges and lessons

learned across both pilot projects. Both projects were centered in infrastructure based grant

proposals with no specific use case driven directive, which created ambiguity for the scope

of work. Use cases proved crucial for determining overall data requirements, as well as

offering initial test cases to evaluate the system during launch. In hindsight, use cases could

have been initially defined in early cycles given both pilot projects required multiple cycles

of iteration to define functional use case tests towards the end of the project lifespan. Early

definition of use cases may have increased efficiency overall. Both projects also suffered

from a lack of users standing in the ready to use the clinical data sharing architectures,

creating a need for expanding efforts to promote dissemination of use. Finally, both pilot

projects required resource and time intensive effort to create trust between partners, without

which the technical architectures could not have succeeded.

4 DISCUSSION

The spiral model offered a practical and flexible process for creating two new electronic

health record driven data sharing pilot architectures federated across multiple health care

organizations. Creating these complex clinical data sharing networks for research requires a

communications-driven process that prioritizes partner engagement throughout all aspects of

the project. The cycling supported by the PCF model across these projects ensured technical

requirements were iterated closely with partners and facilitated partner engagement in the

process. The two CTSA supported pilot project teams also benefited from cross

collaboration using the PCF model, particularly given the similarities in scope of work and

complexity of the socio-technical environments.

Both pilot projects were a success in creating functioning data sharing architectures across

federated systems, using multiple iterations within a spiral based PCF model of

development. The PCF model accounted for key project missions and provided structure,

context, and concrete direction for addressing barriers that were often associated with

limiting or preventing project success. Success of the PCF model is also evidenced by both

project teams opting to add additional cycle iterations to further mature their technical

architectures and both projects resulting in successful expansions of the initial projects.

4.1 Lessons Learned

Unexpectedly, both projects required additional cycles before sharing actual data across

partner data sites. The PCF model allowed for a heuristic evaluation that helped the teams

identify inefficient processes and a critical missing component, namely definition of key use

cases, which led to the need for additional cycles. Data sharing architectures require clearly

defined research or clinical questions and engaged end-users, which are vital for defining

system requirements. Both projects were funded specifically to pilot informatics

Stephens et al. Page 10

Int J Med Inform

. Author manuscript; available in PMC 2019 October 13.

Author Manuscript Author Manuscript Author Manuscript Author Manuscript

infrastructures with no anchoring specific clinical topic to guide specifications, use cases,

and ultimately a user base. As a lesson learned, we identified the crucial need to focus more

directly on developing research questions in earlier cycles to frame and provide context for

the project. For future architecture builds, creating use cases at the outset of project initiation

may limit the need for extra cycles.

Developing data sharing governance with partners impacted system design significantly and

should be considered as early as possible. Governance development should include

involvement from partners, experienced legal experts, and data extraction stakeholders, and

should be revisited during each incremental cycle. Governance stakeholders played a crucial

role in system requirement definition from the design, content, and the technical mechanics

of how data would be shared, and the methods used to access the resulting data.

Lessons learned highlight the importance of flexibility in implementation management, the

on-going complexity of aligning data across each data site, challenges in engaging users, the

impact of governance issues on design, and the need to focus on system utility in the early

stages of development to sustain development across multi-disciplinary teams. Developing

trusted environments is complex and critical for project success and in the cases of these

pilot projects, achievable with appropriate resources.

4.2 Limitations

Limitations include the inability to compare the number of cycles and the overall resources

and costs needed for these pilot architectures to other similar efforts, given clear baseline

data on these metrics with other existing architectures have not been published consistently.

However, we were able to use a reasonable number of cycles for each of these pilot projects

and stay within resource constraints. We report on only a single approach for developing

these architectures, which did not allow for comparisons or testing of different software

development models. Future studies could address testing multiple software development

models, particularly given convergence and definition of aspects of these models remains an

active area of discovery.

5 CONCLUSION

We found that our spiral based PCF model was crucial for creating collaboration between

our two pilot projects building functional federated data sharing architectures and provided

great utility for promoting success and evaluating challenges within each pilot project.

Multiple national efforts have and continue to invest in the development of novel network-

based data sharing infrastructure development for research and cross collaborations would

strengthen this work and likely increase success. The PCF model may help identify and

establish necessary relationships and early detection of barriers within and between teams.

Finding cohesive methods that focus on building appropriate early use cases, bringing in

users, and systematically building trust among partners are needed to increase

implementation success of data sharing architecture development projects.

Future work would benefit from cross collaborations between similar data architecture

building projects, definition of use cases early in the process, and proper resources to

Stephens et al. Page 11

Int J Med Inform

. Author manuscript; available in PMC 2019 October 13.

Author Manuscript Author Manuscript Author Manuscript Author Manuscript

support work in building trust between partners. Use of software development models can

support this future work and help create standard processes for building these complex

architectures.

ACKNOWLEDGEMENTS

We would like to thank our Institute for Translational Health Sciences colleagues with the Data QUEST and the

CICTR projects and the DARTNet Institute. This research was supported by the National Center for Advancing

Translational Sciences, Clinical and Translational Science Award for the Institute for Translational Health Sciences

UL1TR000423 and Contract #HHSN268200700031C.

REFERENCES

1. Van Panhuis WG, Paul P, Emerson C, et al. A systematic review of barriers to data sharing in public

health. BMC Public Health. 2014;14:1144. doi:10.1186/1471-2458-14-1144. [PubMed: 25377061]

2. National Institute of Health. NIH Data Sharing Policy. 2003 http://grants.nih.gov/grants/policy/

data_sharing/.

3. Westfall JM, Mold J, and Fagnan L, Practice-Based Research--”Blue Highways” on the NIH

Roadmap. JAMA, 2007 297(4): p. 403–406. [PubMed: 17244837]

4. Diamond CC, Mostashari F, and Shirky C, Collecting And Sharing Data For Population Health: A

New Paradigm. Health Affairs, 2009 28(2): p. 454–466. [PubMed: 19276005]

5. Wilcox A, Randhawa G, Embi P, Cao H, and Kuperman G Sustainability Considerations for Health

Research and Analytic Data Infrastructures. eGEMs, 2014 2(2): Article 8.

6. Lazarus R, et al., Electronic Support for Public Health: validated case finding and reporting for

notifiable diseases using electronic medical data. J Am Med Inform Assoc, 200916(1): p. 18–24.

[PubMed: 18952940]

7. Zerhouni EA, Translational research: moving discovery to practice. Clin Pharmacol Ther, 2007

81(1): p. 126–8. [PubMed: 17186011]

8. Ash JS, Anderson NR, and Tarczy-Hornoch P, People and organizational issues in research systems

implementation. J Am Med Inform Assoc, 200815(3): p. 283–9. [PubMed: 18308986]

9. Murphy SC, Henry A Security Architecture for Query Tools used to Access Large Biomedical

Databases, in American Medical Informatics Association. 2002: San Antonio, Texas.

10. Murphy S, et al., Instrumenting the health care enterprise for discovery research in the genomic

era. Genome Res, 2009.

11. Malin B, A computational model to protect patient data from location-based re-identification. Artif

Intell Med, 2007 40(3): p. 223–39. [PubMed: 17544262]

12. Ohm P, Broken Promises of Privacy: Responding to the Surprising Failure of Anonymization.

University of Colorado Law Legal Studies Research Paper, 2009 09–12.

13. Evans B, Congress’ New Infrastructural Model of Medical Privacy. Notre Dame L. Rev, 2009 585:

p. 619–20.

14. Kass NE, An Ethics Framework for Public Health. Am J Public Health, 2000 91(11): p. 1776–

1782.

15. Nissenbaum H,

Protecting privacy in an information age: The problem of privacy in

public. Law

Philosophy, 199817(559-596).

16. Karp DR, et al., Ethical and practical issues associated with aggregating databases. PLoS Med,

2008 5(9): p.el90.

17. MacKenzie S, Wyatt M, Schuff R, Tenenbaum J, Anderson N, Practices and Perspectives on

Building Integrated Data Repositories: Results from a 2010 CTSA Survey, J Am Med Inform

Assoc, 2012 19(el): p. ell9–el24.

18. Gupta A, et al., Federated access to heterogeneous information resources in the Neuroscience

Iformation Framework (NIF). Neuroinformatics, 2008 6(3): p. 205–217. [PubMed: 18958629]

19. Rosenbloom ST, et al., A model for evaluating interface terminologies. J Am Med Inform Assoc,

200815(1): p.65–76. [PubMed: 17947616]

Stephens et al. Page 12

Int J Med Inform

. Author manuscript; available in PMC 2019 October 13.

Author Manuscript Author Manuscript Author Manuscript Author Manuscript

20. Hurd A, The federated advantage. Data exchange between healthcare organizations in RHIOs is a

hot topic. Can federated models end the debate? Health Manag Technol, 2008 29(4): p. 14,16.

21. Weber GM, et al., The Shared Health Research Information Network (SHRINE): a prototype

federated query tool for clinical data repositories. J Am Med Inform Assoc, 200916(5): p. 624–30.

[PubMed: 19567788]

22. Bradshaw RL, et al., Architecture of a federated query engine for heterogeneous resources. AMIA

Annu Symp Proc, 2009 2009: p. 70–4. [PubMed: 20351825]

23. McKinney M, HIPAA and HITECH: tighter control of patient data. Hosp Health Netw, 2009 83(6):

p. 50,52.

24. DesRoches CM, et al., Electronic health records in ambulatory care-a national survey of

physicians. N Engl J Med, 2008 359(1): p. 50–60. [PubMed: 18565855]

25. Wise PB, The meaning of meaningful use. Several technology applications are needed to qualify.

Healthc Exec. 25(3): p. 20–1.

26. Bates DW and Bitton A, The future of health information technology in the patient-centered

medical home. Health Aff (Millwood). 29(4): p. 614–21. [PubMed: 20368590]

27. Stephens KA, et al., LC Data QUEST: A technical architecture for community federated clinical

data sharing In 2012 AMIA Summits TranslSci Proc. 2012, AMIA: San Francisco, CA..

28. Anderson N, et al., Implementation of a de-identified federated data network for population-based

cohort discovery. Journal of the American Medical Informatics Association, 2011 26.

29. Magdaleno AM, Werner CML, & Araujo RM, Reconciling software development models: A quasi-

systematic review. Journal of Systems and Software, 2012 85(2): p. 351–369.

30. Boehm B, & Turner R, Using risk to balance agile and plan-driven methods. Computer, 2003

36(6): p.57–66.

31. Boehm B, & Hansen WJ, The Spiral Model as at tools for evolutionary acquisition. The Journal of

Defense Software Engineering, 200114(5): p. 4–11.

32. Boehm BW, A spiral model of software development and enhancement. Computer, 1988 21(5): p.

61–72.

33. Hitz P, Johnson B, Meier J, Wasbotten B, & Haller I, PS3-23: VDWdata source: Essentia Health.

Clinical Medicine and Research, 201311(3): p. 178.

34. Nagykaldi S, et al., Improving collaboration between Primary Care Research Networks using

Access Grid technology. Informatics in Primary Care, 200816(1): p. 51–8.

35. Peterson KA, et al., A model for the electronic support of practice-based research networks. Annals

of Family Medicine, 201210(6): p. 560–7. [PubMed: 23149534]

36. Stephens KA, Lin C, Baldwin L, Echo-Hawk A, & Keppel G, A web-based tool for cataloging

primary care electronic medical record federated data: FInDiT. 2011, AMIA: Bethesda, MD.

37. Murphy SN et al., Serving the enterprise and beyond with informatics for integrating biology and

the bedside (i2b2’). J Am Med Inform Assoc, 201017(2): p. 124–31. [PubMed: 20190053]

38. Weber GM, et al., The Shared Health Research Information Network (SHRINE): a prototype

federated query tool for clinical data repositories. J Am Med Inform Assoc, 200916(5): p. 624–30.

[PubMed: 19567788]

Stephens et al. Page 13

Int J Med Inform

. Author manuscript; available in PMC 2019 October 13.

Author Manuscript Author Manuscript Author Manuscript Author Manuscript

Highlights

•We describe two federated data-sharing case examples

•Using a spiral model and four themes we iterated the data-sharing

architectures

•Cross collaboration between networks resulted in critical knowledge sharing

Stephens et al. Page 14

Int J Med Inform

. Author manuscript; available in PMC 2019 October 13.

Author Manuscript Author Manuscript Author Manuscript Author Manuscript

Summary points

•Building federated data-sharing architectures requires supporting a range of

data owners, effective and validated semantic alignment between data

resources, and consistent focus on end-users.

•Establishing these resources requires recognition of development

methodologies that support internal validation of data extraction and

translation processes, sustaining meaningful partnerships, and delivering clear

and measurable system utility.

Stephens et al. Page 15

Int J Med Inform

. Author manuscript; available in PMC 2019 October 13.

Author Manuscript Author Manuscript Author Manuscript Author Manuscript

Figure 1.

The Partnership-Driven Clinical Federated (PCF) Data sharing Model illustrates four

quadrants of themes used to define each iteration cycle of development.

Stephens et al. Page 16

Int J Med Inform

. Author manuscript; available in PMC 2019 October 13.

Author Manuscript Author Manuscript Author Manuscript Author Manuscript

Figure 2.

PCF Model Applied to Data QUEST detailing four cycles of iteration to mature the initial

launch of the data sharing architecture. Technical architecture failures are highlighted in red.

Stephens et al. Page 17

Int J Med Inform

. Author manuscript; available in PMC 2019 October 13.

Author Manuscript Author Manuscript Author Manuscript Author Manuscript

Figure 3.

PCF Model Applied to CICTR detailing three initial cycles and a fourth rapid sprint cycle to

mature the initial launch of the data sharing architecture. Project milestones owed to the

funders are highlighted in red.

Stephens et al. Page 18

Int J Med Inform

. Author manuscript; available in PMC 2019 October 13.

Author Manuscript Author Manuscript Author Manuscript Author Manuscript

Information needs and priority use cases of population health researchers to improve preparedness for future hurricanes and floods

Article

Nov 2020

Objective Information gaps that accompany hurricanes and floods limit researchers’ ability to determine the impact of disasters on population health. Defining key use cases for sharing complex disaster data with research communities and facilitators, and barriers to doing so are key to promoting population health research for disaster recovery. Materials and Methods We conducted a mixed-methods needs assessment with 15 population health researchers using interviews and card sorting. Interviews examined researchers’ information needs by soliciting barriers and facilitators in the context of their expertise and research practices. Card sorting ranked priority use cases for disaster preparedness. Results Seven barriers and 6 facilitators emerged from interviews. Barriers to collaborative research included process limitations, collaboration dynamics, and perception of research importance. Barriers to data and technology adoption included data gaps, limitations in information quality, transparency issues, and difficulty to learn. Facilitators to collaborative research included collaborative engagement and human resource processes. Facilitators to data and technology adoption included situation awareness, data quality considerations, adopting community standards, and attractive to learn. Card sorting prioritized 15 use cases and identified 30 additional information needs for population health research in disaster preparedness. Conclusions Population health researchers experience barriers to collaboration and adoption of data and technology that contribute to information gaps and limit disaster preparedness. The priority use cases we identified can help address information gaps by informing the design of supportive research tools and practices for disaster preparedness. Supportive tools should include information on data collection practices, quality assurance, and education resources usable during failures in electric or telecommunications systems.

Federated Learning for Medical Applications: A Taxonomy, Current Trends, Challenges, and Future Research Directions

Preprint

Full-text available

Aug 2022

With the advent of the Internet of Things (IoT), Artificial Intelligence (AI), and Machine Learning (ML)/DeepLearning (DL) algorithms, the landscape of data-driven medical applications has emerged as a promising avenue for designing robust and scalable diagnostic and prognostic models from medical data. This has gained a lot of attention from both academia and industry, leading to significant improvements in healthcare quality. However, the adoption of AI-driven medical applications still faces tough challenges, including meeting security, privacy, and quality of service (QoS) standards. Recent developments in Federated Learning (FL) have made it possible to train complex machine-learned models in a distributed manner and has become an active research domain, particularly processing the medical data at the edge of the network in a decentralized way to preserve privacy and address security concerns. To this end, in this paper, we explore the present and future of FL technology in medical applications where data sharing is a significant challenge. We delve into the current research trends and their outcomes, unravelling the complexities of designing reliable and scalable FL models. Our paper outlines the fundamental statistical issues in FL, tackles device-related problems, addresses security challenges, and navigates the complexity of privacy concerns, all while highlighting its transformative potential in the medical field. Our study primarily focuses on medical applications of FL, particularly in the context of global cancer diagnosis. We highlight the potential of FL to enable computer-aided diagnosis tools that address this challenge with greater effectiveness than traditional data-driven methods. Recent literature has shown that FL models are robust and generalize well to new data, which is essential for medical applications. We hope that this comprehensive review will serve as a checkpoint for the field, summarizing the current state-of-the-art and identifying open problems and future research directions.

Theoretical and practical applications of blockchain in healthcare information management

Article

Full-text available

Apr 2022
INFORM MANAGE-AMSTER

A primary objective of blockchain technology is to address information security and efficiency issues related to existing information sharing systems. For the sharing of health records, little is known about the application of blockchain in management information systems. There are strict regulations for sharing health information due to insecure systems and privacy concerns. To significantly and effectively improve medical diagnosis, it is beneficial to have efficient, reliable, and accurate accessibility to a patient's full medical history. Due to concerns with the security of health systems and privacy concerns, medical histories are not always accessible to healthcare providers. To help increase accessibility options, this research proposes a blockchain-based model that facilitates sharing medical records in a manner that is beneficial to both healthcare provider and patients. Social exchange theory provides the theoretical support for the conceptual model presented. Experimental findings based on 151 participants revealed that the blockchain technology can provide a secure information system and increase patient motivation to share medical records. To show the applicability of the proposed use of blockchain from a practical managerial perspective, we show feasibility by developing an Android software application.

The Future of Critical Care: Optimizing Technologies and a Learning Healthcare System to Potentiate a More Humanistic Approach to Critical Care

Article

Full-text available

Mar 2022

While technological innovations are the invariable crux of speculation about the future of critical care, they cannot replace the clinician at the bedside. This article summarizes the work of the Society of Critical Care Medicine–appointed multiprofessional task for the Future of Critical Care. The Task Force notes that critical care practice will be transformed by novel technologies, integration of artificial intelligence decision support algorithms, and advances in seamless data operationalization across diverse healthcare systems and geographic regions and within federated datasets. Yet, new technologies will be relevant and meaningful only if they improve the very human endeavor of caring for someone who is critically ill.

Service utilization and chronic condition outcomes among primary care patients with substance use disorders and co-occurring chronic conditions

Article

Full-text available

Mar 2020
J SUBST ABUSE TREAT

Background Patients with a substance use disorder (SUD) often present with co-occurring chronic conditions in primary care. Despite the high co-occurrence of chronic medical conditions and SUD, little is known about whether chronic condition outcomes or related service utilization in primary care varies between patients with versus without documented SUDs. This study examined whether having a SUD influenced the use of primary care services and common chronic condition outcomes for patients with diabetes, hypertension, and obesity. Methods A longitudinal cohort observational study examined electronic health record data from 21 primary care clinics in Washington and Idaho to examine differences in service utilization and clinical outcomes for diabetes, hypertension, and obesity in patients with and without a documented SUD diagnosis. Differences between patients with and without documented SUD diagnoses were compared over a three-year window for clinical outcome measures, including hemoglobin A1c, systolic and diastolic blood pressure, and body mass index, as well as service outcome measures, including number of encounters with primary care and co-located behavioral health providers, and orders for prescription opioids. Adult patients (N = 10,175) diagnosed with diabetes, hypertension, or obesity before the end of 2014, and who had ≥2 visits across a three-year window including at least one visit in 2014 (baseline) and at least one visit occurring 12 months or longer after the 2014 visit (follow-up) were examined. Results Patients with SUD diagnoses and co-occurring chronic conditions were seen by providers more frequently than patients without SUD diagnoses (p's < 0.05), and patients with SUD diagnoses were more likely to be prescribed opioid medications. Chronic condition outcomes were no different for patients with versus without SUD diagnoses. Discussion Despite the higher visit rates to providers in primary care, a majority of patients with SUD diagnoses and chronic medical conditions in primary care did not get seen by co-located behavioral health providers, who can potentially provide and support evidence informed care for both SUD and chronic conditions. Patients with chronic medical conditions also were more likely to get prescribed opioids if they had an SUD diagnosis. Care pathway innovations for SUDs that include greater utilization of evidence-informed co-treatment of SUDs and chronic conditions within primary care settings may be necessary for improving care overall for patients with comorbid SUDs and chronic conditions.

Research IT maturity models for academic health centers: Early development and initial evaluation

Article

Full-text available

Oct 2018

This paper proposes the creation and application of maturity models to guide institutional strategic investment in research informatics and information technology (research IT) and to provide the ability to measure readiness for clinical and research infrastructure as well as sustainability of expertise. Conducting effective and efficient research in health science increasingly relies upon robust research IT systems and capabilities. Academic health centers are increasing investments in health IT systems to address operational pressures, including rapidly growing data, technological advances, and increasing security and regulatory challenges associated with data access requirements. Current approaches for planning and investment in research IT infrastructure vary across institutions and lack comparable guidance for evaluating investments, resulting in inconsistent approaches to research IT implementation across peer academic health centers as well as uncertainty in linking research IT investments to institutional goals. Maturity models address these issues through coupling the assessment of current organizational state with readiness for deployment of potential research IT investment, which can inform leadership strategy. Pilot work in maturity model development has ranged from using them as a catalyst for engaging medical school IT leaders in planning at a single institution to developing initial maturity indices that have been applied and refined across peer medical schools.

Beyond Open vs. Closed: Balancing Individual Privacy and Public Accountability in Data Sharing

Conference Paper

Jan 2019

Data too sensitive to be "open" for analysis and re-purposing typically remains "closed" as proprietary information. This dichotomy undermines efforts to make algorithmic systems more fair, transparent, and accountable. Access to proprietary data in particular is needed by government agencies to enforce policy, researchers to evaluate methods, and the public to hold agencies accountable; all of these needs must be met while preserving individual privacy and firm competitiveness. In this paper, we describe an integrated legal-technical approach provided by a third-party public-private data trust designed to balance these competing interests. Basic membership allows firms and agencies to enable low-risk access to data for compliance reporting and core methods research, while modular data sharing agreements support a wide array of projects and use cases. Unless specifically stated otherwise in an agreement, all data access is initially provided to end users through customized synthetic datasets that offer a) strong privacy guarantees, b) removal of signals that could expose competitive advantage, and c) removal of biases that could reinforce discriminatory policies, all while maintaining fidelity to the original data. We find that using synthetic data in conjunction with strong legal protections over raw data strikes a balance between transparency, proprietorship, privacy, and research objectives. This legal-technical framework can form the basis for data trusts in a variety of contexts.

Use of electronic health record data from diverse primary care practices to identify and characterize patients’ prescribed common medications

Article

Full-text available

Mar 2020
Health Informat J

We use prescription of statin medications and prescription of warfarin to explore the capacity of electronic health record data to (1) describe cohorts of patients prescribed these medications and (2) identify cohorts of patients with evidence of adverse events related to prescription of these medications. This study was conducted in the WWAMI region Practice and Research Network (WPRN)., a network of primary care practices across Washington, Wyoming, Alaska, Montana and Idaho DataQUEST, an electronic data-sharing infrastructure. We used electronic health record data to describe cohorts of patients prescribed statin or warfarin medications and reported the proportions of patients with adverse events. Among the 35,445 active patients, 1745 received at least one statin prescription and 301 received at least one warfarin prescription. Only 3 percent of statin patients had evidence of myopathy; 51 patients (17% of those prescribed warfarin) had a bleeding complication. Primary-care electronic health record data can effectively be used to identify patients prescribed specific medications and patients potentially experiencing medication adverse events.

Prevalence of documented alcohol and opioid use disorder diagnoses and treatments in a regional primary care practice-based research network

Article

Nov 2019
J SUBST ABUSE TREAT

Background: Most people with alcohol or opioid use disorders (AUD or OUD) are not diagnosed or treated for these conditions in primary care. This study takes a critical step toward quantifying service gaps and directing improvement efforts for AUD and OUD by using electronic health record (EHR) data from diverse primary care organizations to quantify the extent to which AUD and OUD are underdiagnosed and undertreated in primary care practices. Methods: We extracted and integrated diagnosis, medication, and behavioral health visit data from the EHRs of 21 primary care clinics within four independent healthcare organizations representing community health centers and rural hospital-associated clinics in the Pacific Northwest United States. Rates of documented AUD and OUD diagnoses, pharmacological treatments, and behavioral health visits were evaluated over a two-year period (2015-2016). Results: Out of 47,502 adult primary care patients, 1476 (3.1%) had documented AUD; of these, 115 (7.8%) had orders for AUD medications and 271 (18.4%) had at least one documented visit with a non-physician behavioral health specialist. Only 402 (0.8%) patients had documented OUD, and of these, 107 (26.6%) received OUD medications and 119 (29.6%) had at least one documented visit with a non-physician behavioral health specialist. Rates of AUD diagnosis and AUD and OUD medications were higher in clinics that had co-located non-physician behavioral health specialists. Conclusions: AUD and OUD are underdiagnosed and undertreated within a sample of independent primary care organizations serving mostly rural patients. Primary care organizations likely need service models, technologies, and workforces, including non-physician behavioral health specialists, to improve capacities to diagnose and treat AUD and OUD.

Exploring completeness in clinical data research networks with DQ-c

Article

Full-text available

Oct 2017
J AM MED INFORM ASSN

Objective: To provide an open source, interoperable, and scalable data quality assessment tool for evaluation and visualization of completeness and conformance in electronic health record (EHR) data repositories. Materials and methods: This article describes the tool's design and architecture and gives an overview of its outputs using a sample dataset of 200 000 randomly selected patient records with an encounter since January 1, 2010, extracted from the Research Patient Data Registry (RPDR) at Partners HealthCare. All the code and instructions to run the tool and interpret its results are provided in the Supplementary Appendix. Results: DQe-c produces a web-based report that summarizes data completeness and conformance in a given EHR data repository through descriptive graphics and tables. Results from running the tool on the sample RPDR data are organized into 4 sections: load and test details, completeness test, data model conformance test, and test of missingness in key clinical indicators. Discussion: Open science, interoperability across major clinical informatics platforms, and scalability to large databases are key design considerations for DQe-c. Iterative implementation of the tool across different institutions directed us to improve the scalability and interoperability of the tool and find ways to facilitate local setup. Conclusion: EHR data quality assessment has been hampered by implementation of ad hoc processes. The architecture and implementation of DQe-c offer valuable insights for developing reproducible and scalable data science tools to assess, manage, and process data in clinical data repositories.

Sustainability Considerations for Health Research and Analytic Data Infrastructures

Article

Full-text available

Jun 2014

The United States has made recent large investments in creating data infrastructures to support the important goals of patient-centered outcomes research (PCOR) and comparative effectiveness research (CER), with still more investment planned. These initial investments, while critical to the creation of the infrastructures, are not expected to sustain them much beyond the initial development. To provide the maximum benefit, the infrastructures need to be sustained through innovative financing models while providing value to PCOR and CER researchers. Based on our experience with creating flexible sustainability strategies (i.e., strategies that are adaptive to the different characteristics and opportunities of a resource or infrastructure), we define specific factors that are important considerations in developing a sustainability strategy. These factors include assets, expansion, complexity, and stakeholders. Each factor is described, with examples of how it is applied. These factors are dimensions of variation in different resources, to which a sustainability strategy should adapt. We also identify specific important considerations for maintaining an infrastructure, so that the long-term intended benefits can be realized. These observations are presented as lessons learned, to be applied to other sustainability efforts. We define the lessons learned, relating them to the defined sustainability factors as interactions between factors. Using perspectives and experiences from a diverse group of experts, we define broad characteristics of sustainability strategies and important observations, which can vary for different projects. Other descriptions of adaptive, flexible, and successful models of collaboration between stakeholders and data infrastructures can expand this framework by identifying other factors for sustainability, and give more concrete directions on how sustainability can be best achieved.

A systematic review of barriers to data sharing in public health

Article

Full-text available

Nov 2014
BMC PUBLIC HEALTH

Background: In the current information age, the use of data has become essential for decision making in public health at the local, national, and global level. Despite a global commitment to the use and sharing of public health data, this can be challenging in reality. No systematic framework or global operational guidelines have been created for data sharing in public health. Barriers at different levels have limited data sharing but have only been anecdotally discussed or in the context of specific case studies. Incomplete systematic evidence on the scope and variety of these barriers has limited opportunities to maximize the value and use of public health data for science and policy. Methods: We conducted a systematic literature review of potential barriers to public health data sharing. Documents that described barriers to sharing of routinely collected public health data were eligible for inclusion and reviewed independently by a team of experts. We grouped identified barriers in a taxonomy for a focused international dialogue on solutions. Results: Twenty potential barriers were identified and classified in six categories: technical, motivational, economic, political, legal and ethical. The first three categories are deeply rooted in well-known challenges of health information systems for which structural solutions have yet to be found; the last three have solutions that lie in an international dialogue aimed at generating consensus on policies and instruments for data sharing. Conclusions: The simultaneous effect of multiple interacting barriers ranging from technical to intangible issues has greatly complicated advances in public health data sharing. A systematic framework of barriers to data sharing in public health will be essential to accelerate the use of valuable information for the global good.

A Model for the Electronic Support of Practice-Based Research Networks

Article

Full-text available

Nov 2012
ANN FAM MED

Purpose: The principal goal of the electronic Primary Care Research Network (ePCRN) is to enable the development of an electronic infrastructure to support clinical research activities in primary care practice-based research networks (PBRNs). We describe the model that the ePCRN developed to enhance the growth and to expand the reach of PBRN research. Methods: Use cases and activity diagrams were developed from interviews with key informants from 11 PBRNs from the United States and United Kingdom. Discrete functions were identified and aggregated into logical components. Interaction diagrams were created, and an overall composite diagram was constructed describing the proposed software behavior. Software for each component was written and aggregated, and the resulting prototype application was pilot tested for feasibility. A practical model was then created by separating application activities into distinct software packages based on existing PBRN business rules, hardware requirements, network requirements, and security concerns. Results: We present an information architecture that provides for essential interactions, activities, data flows, and structural elements necessary for providing support for PBRN translational research activities. The model describes research information exchange between investigators and clusters of independent data sites supported by a contracted research director. The model was designed to support recruitment for clinical trials, collection of aggregated anonymous data, and retrieval of identifiable data from previously consented patients across hundreds of practices. Conclusions: The proposed model advances our understanding of the fundamental roles and activities of PBRNs and defines the information exchange commonly used by PBRNs to successfully engage community health care clinicians in translational research activities. By describing the network architecture in a language familiar to that used by software developers, the model provides an important foundation for the development of electronic support for essential PBRN research activities.

LC Data QUEST: A Technical Architecture for Community Federated Clinical Data Sharing

Article

Full-text available

Mar 2012

The University of Washington Institute of Translational Health Sciences is engaged in a project, LC Data QUEST, building data sharing capacity in primary care practices serving rural and tribal populations in the Washington, Wyoming, Alaska, Montana, Idaho region to build research infrastructure. We report on the iterative process of developing the technical architecture for semantically aligning electronic health data in primary care settings across our pilot sites and tools that will facilitate linkages between the research and practice communities. Our architecture emphasizes sustainable technical solutions for addressing data extraction, alignment, quality, and metadata management. The architecture provides immediate benefits to participating partners via a clinical decision support tool and data querying functionality to support local quality improvement efforts. The FInDiT tool catalogues type, quantity, and quality of the data that are available across the LC Data QUEST data sharing architecture. These tools facilitate the bi-directional process of translational research.

The Spiral Model as a Tool for Evolutionary Acquisition

Article

Full-text available

Jan 2001

S ince its original publication [1], the spiral development model diagrammed in Figure 1 has been used successfully in many defense and commercial projects. To extend this base of success, the Department of Defense (DoD) has recent-ly rewritten the defense acquisition regula-tions to incorporate "evolutionary acquisi-tion," an acquisition strategy designed to mesh well with spiral development. In par-ticular, DoD Instruction 5000.2 subdi-vides acquisition [2]: "There are two ... approaches, evolu-tionary and single step to full capability. An evolutionary approach is preferred. … [In this] approach, the ultimate capa-bility delivered to the user is divided into two or more blocks, with increasing increments of capability." (p. 20) Here, a block corresponds to a single product release. The text goes on to speci-fy the use of spiral development within blocks: "For both the evolutionary and single-step approaches, software develop-ment shall follow an iterative spiral development process in which contin-ually expanding software versions are based on learning from earlier devel-opment." (p. 20) Given this reliance on the spiral develop-ment model, an in-depth definition is appropriate. Two recent workshops pro-vided one. Engineering Institute held two workshops last year to study spiral devel-opment and identify a set of critical suc-cess factors and recommended approaches. Their results appear in two reports, [3, 4] and are available on the workshop Web site www.sei.emu.edu/cbs/spiral2000 The first author's presentations at these workshops defined spiral develop-ment and are followed below. The defini-tion was first converted to a report [5], where details, suggestions, and further ref-erences can be found. Additionally, a fol-low-on article appearing in a later CR O S S TALK issue, will address the rela-tionships among spiral development, evo-lutionary acquisition, and the Integrated Capability Maturity Model.

Using Risk to Balance Agile and Plan-Driven Methods.

Article

Full-text available

Jun 2003

Both agile and plan-driven approaches have situation-dependent shortcomings that, if not addressed, can lead to project failure. The challenge is to balance the two approaches to take advantage of their strengths in a given situation while compensating for their weaknesses. The authors present a risk-based approach for structuring projects to incorporate both agile and plan-driven approaches in proportion to a project's needs.

Congress' New Infrastructural Model of Medical Privacy

Article

Jul 2008
NOTRE DAME LAW REV

Barbara J. Evans

This article opens discussion of a starkly new approach for protecting the privacy of Americans' sensitive health information. Last year, Congress empowered the U.S. Food and Drug Administration (FDA) to oversee development of a major new national infrastructure: a large-scale data network, the Sentinel System, that aims to include health data for 100 million Americans by 2012. This marked the first time since the end of the New Deal that a wholly new infrastructure regulatory mandate had been issued at the federal level. This important development, buried in drug-safety provisions of the Food and Drug Administration Amendments Act of 2007 (FDAAA), went largely unnoticed, as did the fact that Congress cast medical privacy, a hot-button issue for many members of the American public, as an infrastructure regulatory problem. Individuals are not empowered to make autonomous decisions about permissible uses and disclosures of their health data. Instead, Congress authorized FDA to decide whether proposed disclosures meet a statutorily defined public-interest standard. If so, then the disclosures are lawful without individual privacy authorization or informed consent. Within limits that this article explores, FDA can approve the release of private health data, including data in identifiable form, to private operators of Sentinel System infrastructure and to outside data users, including academic and commercial entities. This article describes the new privacy model, which was implicit in the statute Congress passed but far from obvious on its face. The goal is not to oppose the new approach. Congress was responding to serious public concern about the safety of FDA-approved products. This article accepts that this new privacy model exists and explores directions for implementing it in a manner that will be least corrosive of public trust. The goal is to elicit ongoing dialogue about appropriate institutional protections for the 100 million Americans whose data soon will be in this vast data network. FDA is, in many respects, an accidental infrastructure regulator, thrust into a new role strikingly different from its longstanding product-safety mandate. Fortunately, the challenges FDA now faces are not new ones. U.S. infrastructure regulators, in a wide variety of industry contexts, have harnessed private capital to build new infrastructures to serve defined public interests while protecting vulnerable classes. Lessons from these other contexts can shed light on appropriate governance structures for the Sentinel System. For example, privacy protection may be enhanced by eschewing vertical integration in favor of segregating certain key infrastructure functions that require access to identifiable data. It may be better to establish core privacy protections via rulemaking rather than through contracts and to centralize certain key discretionary decisions rather than delegating them to private, commercial decision-makers. Public trust will require strong due-process protections, regulatory independence, and a well-funded system of regulatory oversight; approaches employed by other infrastructure regulators may help address these concerns. The single greatest threat to privacy will come as FDA faces pressure to approve wide ancillary sales of Sentinel System data to help defray costs of system development. To make this system financeable while enforcing strong privacy protections, FDA should deploy its limited available funds to support a well-thought-out infrastructure financing facility that backstops clear privacy policies with appropriate political risk guarantees for private infrastructure investors.

Broken Promises of Privacy: Responding to the Surprising Failure of Anonymization

Article

Aug 2009

Paul Ohm

Computer scientists have recently undermined our faith in the privacy-protecting power of anonymization, the name for techniques for protecting the privacy of individuals in large databases by deleting information like names and social security numbers. These scientists have demonstrated they can often 'reidentify' or 'deanonymize' individuals hidden in anonymized data with astonishing ease. By understanding this research, we will realize we have made a mistake, labored beneath a fundamental misunderstanding, which has assured us much less privacy than we have assumed. This mistake pervades nearly every information privacy law, regulation, and debate, yet regulators and legal scholars have paid it scant attention. We must respond to the surprising failure of anonymization, and this Article provides the tools to do so.

Protecting Privacy in an Information Age: The Problem of Privacy in Public

Article

Nov 1998

Helen Nissenbaum

Philosophical and legal theories of privacy have long recognized the relationship between privacy and information about persons. They have, however, focused on personal, intimate, and sensitive information, assuming that with public information, and information drawn from public spheres, either privacy norms do not apply, or applying privacy norms is so burdensome as to be morally and legally unjustifiable. Against this preponderant view, I argue that information and communications technology, by facilitating surveillance, by vastly enhancing the collection, storage, and analysis of information, by enabling profiling, data mining and aggregation, has significantly altered the meaning of public information. As a result, a satisfactory legal and philosophical understanding of a right to privacy, capable of protecting the important values at stake in protecting privacy, must incorporate, in addition to traditional aspects of privacy, a degree of protection for privacy in public.

Practices and perspectives on building integrated data repositories: Results from a 2010 CTSA survey

Article

Mar 2012
J AM MED INFORM ASSN

Clinical integrated data repositories (IDRs) are poised to become a foundational element of biomedical and translational research by providing the coordinated data sources necessary to conduct retrospective analytic research and to identify and recruit prospective research subjects. The Clinical and Translational Science Award (CTSA) consortium's Informatics IDR Group conducted a survey of 2010 consortium members to evaluate recent trends in IDR implementation and use to support research between 2008 and 2010. A web-based survey based in part on a prior 2008 survey was developed and deployed to 46 national CTSA centers. A total of 35 separate organizations completed the survey (74%), representing 28 CTSAs and the National Institutes of Health Clinical Center. Survey results suggest that individual organizations are progressing in their approaches to the development, management, and use of IDRs as a means to support a broad array of research. We describe the major trends and emerging practices below.

Implementing Partnership-driven Clinical Federated Electronic Health Record Data Sharing Networks

Abstract and Figures

Recommended publications

Government Revenue Maximization Using ICT and GIS by integrating Intelligent Monitoring System

Diagnostic modeling for real-time emergency response

Phenotyping Intensive Care Unit Patients Using Temporal Abstractions and Temporal Pattern Matching

A Current Landscape of Provincial Perinatal Data Collection in Canada