Article

Introducing the BlendedICU dataset, the first harmonized, international intensive care dataset

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

This study introduces the BlendedICU dataset, a massive dataset of international intensive care data. This dataset aims to facilitate generalizability studies of machine learning models, as well as statistical studies of clinical practices in the intensive care units. Methods: Four publicly available and patient-level intensive care databases were used as source databases. A unique and customizable preprocessing pipeline extracted clinically relevant patient-related variables from each source database. The variables were then harmonized and standardized to the Observational Medical Outcomes Partnership (OMOP) Common Data Format. Finally, a brief comparison was carried out to explore differences in the source databases. Results: The BlendedICU dataset features 41 timeseries variables as well as the exposure times to 113 active ingredients extracted from the AmsterdamUMCdb, eICU, HiRID, and MIMIC-IV databases. This resulted in a database of more than 309000 intensive care admissions, spanning over 13 years and three countries. We found that data collection, drug exposure, and patient outcomes varied strongly between source databases. Conclusion: The variability in data collection, drug exposure, and patient outcomes between the source databases indicated some dissimilarity in patient phenotypes and clinical practices between different intensive care units. This demonstrated the need for generalizability studies of machine learning models. This study provides the clinical data research community with essential data to build efficient and generalizable machine learning models, as well as to explore clinical practices in intensive care units around the world.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
Digital data collection during routine clinical practice is now ubiquitous within hospitals. The data contains valuable information on the care of patients and their response to treatments, offering exciting opportunities for research. Typically, data are stored within archival systems that are not intended to support research. These systems are often inaccessible to researchers and structured for optimal storage, rather than interpretability and analysis. Here we present MIMIC-IV, a publicly available database sourced from the electronic health record of the Beth Israel Deaconess Medical Center. Information available includes patient measurements, orders, diagnoses, procedures, treatments, and deidentified free-text clinical notes. MIMIC-IV is intended to support a wide array of research studies and educational material, helping to reduce barriers to conducting clinical research.
Article
Full-text available
Hospital length of stay of patients is a crucial factor for the effective planning and management of hospital resources. There is considerable interest in predicting the LoS of patients in order to improve patient care, control hospital costs and increase service efficiency. This paper presents an extensive review of the literature, examining the approaches employed for the prediction of LoS in terms of their merits and shortcomings. In order to address some of these problems, a unified framework is proposed to better generalise the approaches that are being used to predict length of stay. This includes the investigation of the types of routinely collected data used in the problem as well as recommendations to ensure robust and meaningful knowledge modelling. This unified common framework enables the direct comparison of results between length of stay prediction approaches and will ensure that such approaches can be used across several hospital environments. A literature search was conducted in PubMed, Google Scholar and Web of Science from 1970 until 2019 to identify LoS surveys which review the literature. 32 Surveys were identified, from these 32 surveys, 220 papers were manually identified to be relevant to LoS prediction. After removing duplicates, and exploring the reference list of studies included for review, 93 studies remained. Despite the continuing efforts to predict and reduce the LoS of patients, current research in this domain remains ad-hoc; as such, the model tuning and data preprocessing steps are too specific and result in a large proportion of the current prediction mechanisms being restricted to the hospital that they were employed in. Adopting a unified framework for the prediction of LoS could yield a more reliable estimate of the LoS as a unified framework enables the direct comparison of length of stay methods. Additional research is also required to explore novel methods such as fuzzy systems which could build upon the success of current models as well as further exploration of black-box approaches and model interpretability.
Article
Full-text available
The evolution of intracranial pressure (ICP) of critically ill patients admitted to a neurointensive care unit (ICU) is difficult to predict. Besides the underlying disease and compromised intracranial space, ICP is affected by a multitude of factors, many of which are monitored on the ICU, but the complexity of the resulting patterns limits their clinical use. This paves the way for new machine learning techniques to assist clinical management of patients undergoing invasive ICP monitoring independent of the underlying disease. An institutional cohort (ICP-ICU) of patients with invasive ICP monitoring (n = 1346) was used to train recurrent machine learning models to predict the occurrence of ICP increases of ≥22 mmHg over a long (>2 h) time period in the upcoming hours. External validation was performed on patients undergoing invasive ICP measurement in two publicly available datasets [Medical Information Mart for Intensive Care (MIMIC, n = 998) and eICU Collaborative Research Database (n = 1634)]. Different distances (1–24 h) between prediction time point and upcoming critical phase were evaluated, demonstrating a decrease in performance but still robust AUC-ROC with larger distances (24 h AUC-ROC: ICP-ICU 0.826 ± 0.0071, MIMIC 0.836 ± 0.0063, eICU 0.779 ± 0.0046, 1 h AUC-ROC: ICP-ICU 0.982 ± 0.0008, MIMIC 0.965 ± 0.0010, eICU 0.941 ± 0.0025). The model operates on sparse hourly data and is stable in handling variable input lengths and missingness through its nature of recurrence and internal memory. Calculation of gradient-based feature importance revealed individual underlying decisions for our long short time memory-based model and thereby provided improved clinical interpretability. Recurrent machine learning models have the potential to be an effective tool for the prediction of ICP increases with high translational potential.
Article
Full-text available
Background: In the era of big data, the intensive care unit (ICU) is likely to benefit from real-time computer analysis and modeling based on close patient monitoring and electronic health record data. The Medical Information Mart for Intensive Care (MIMIC) is the first open access database in the ICU domain. Many studies have shown that common data models (CDMs) improve database searching by allowing code, tools, and experience to be shared. The Observational Medical Outcomes Partnership (OMOP) CDM is spreading all over the world. Objective: The objective was to transform MIMIC into an OMOP database and to evaluate the benefits of this transformation for analysts. Methods: We transformed MIMIC (version 1.4.21) into OMOP format (version 5.3.3.1) through semantic and structural mapping. The structural mapping aimed at moving the MIMIC data into the right place in OMOP, with some data transformations. The mapping was divided into 3 phases: conception, implementation, and evaluation. The conceptual mapping aimed at aligning the MIMIC local terminologies to OMOP's standard ones. It consisted of 3 phases: integration, alignment, and evaluation. A documented, tested, versioned, exemplified, and open repository was set up to support the transformation and improvement of the MIMIC community's source code. The resulting data set was evaluated over a 48-hour datathon. Results: With an investment of 2 people for 500 hours, 64% of the data items of the 26 MIMIC tables were standardized into the OMOP CDM and 78% of the source concepts mapped to reference terminologies. The model proved its ability to support community contributions and was well received during the datathon, with 160 participants and 15,000 requests executed with a maximum duration of 1 minute. Conclusions: The resulting MIMIC-OMOP data set is the first MIMIC-OMOP data set available free of charge with real disidentified data ready for replicable intensive care research. This approach can be generalized to any medical field.
Conference Paper
Full-text available
The pressure of ever-increasing patient demand and budget restrictions make hospital bed management a daily challenge for clinical staff. Most critical is the efficient allocation of resource-heavy Intensive Care Unit (ICU) beds to the patients who need life support. Central to solving this problem is knowing for how long the current set of ICU patients are likely to stay in the unit.In this work, we propose a new deep learning model based on the combination of temporal convolution and pointwise (1x1) convolution, to solve the length of stay prediction task on the eICU andMIMIC-IV critical care datasets. The model – which we refer to asTemporal Pointwise Convolution (TPC) – is specifically designed to mitigate common challenges with Electronic Health Records, such as skewness, irregular sampling and missing data. In doing so, we have achieved significant performance benefits of 18-68% (metric and dataset dependent) over the commonly used Long-Short TermMemory (LSTM) network, and the multi-head self-attention net-work known as the Transformer. By adding mortality prediction as a side-task, we can improve performance further still, resulting in a mean absolute deviation of 1.55 days (eICU) and 2.28 days(MIMIC-IV) on predicting remaining length of stay. Temporal pointwise convolutional networks for length of stay prediction in the intensive care unit.
Article
Full-text available
Objectives: Critical care medicine is a natural environment for machine learning approaches to improve outcomes for critically ill patients as admissions to ICUs generate vast amounts of data. However, technical, legal, ethical, and privacy concerns have so far limited the critical care medicine community from making these data readily available. The Society of Critical Care Medicine and the European Society of Intensive Care Medicine have identified ICU patient data sharing as one of the priorities under their Joint Data Science Collaboration. To encourage ICUs worldwide to share their patient data responsibly, we now describe the development and release of Amsterdam University Medical Centers Database (AmsterdamUMCdb), the first freely available critical care database in full compliance with privacy laws from both the United States and Europe, as an example of the feasibility of sharing complex critical care data. Setting: University hospital ICU. Subjects: Data from ICU patients admitted between 2003 and 2016. Interventions: We used a risk-based deidentification strategy to maintain data utility while preserving privacy. In addition, we implemented contractual and governance processes, and a communication strategy. Patient organizations, supporting hospitals, and experts on ethics and privacy audited these processes and the database. Measurements and main results: AmsterdamUMCdb contains approximately 1 billion clinical data points from 23,106 admissions of 20,109 patients. The privacy audit concluded that reidentification is not reasonably likely, and AmsterdamUMCdb can therefore be considered as anonymous information, both in the context of the U.S. Health Insurance Portability and Accountability Act and the European General Data Protection Regulation. The ethics audit concluded that responsible data sharing imposes minimal burden, whereas the potential benefit is tremendous. Conclusions: Technical, legal, ethical, and privacy challenges related to responsible data sharing can be addressed using a multidisciplinary approach. A risk-based deidentification strategy, that complies with both U.S. and European privacy regulations, should be the preferred approach to releasing ICU patient data. This supports the shared Society of Critical Care Medicine and European Society of Intensive Care Medicine vision to improve critical care outcomes through scientific inquiry of vast and combined ICU datasets.
Article
Full-text available
Prognostic models that aim to improve the prediction of clinical events, individualized treatment and decision-making are increasingly being developed and published. However, relatively few models are externally validated and validation by independent researchers is rare. External validation is necessary to determine a prediction model’s reproducibility and generalizability to new and different patients. Various methodological considerations are important when assessing or designing an external validation study. In this article, an overview is provided of these considerations, starting with what external validation is, what types of external validation can be distinguished and why such studies are a crucial step towards the clinical implementation of accurate prediction models. Statistical analyses and interpretation of external validation results are reviewed in an intuitive manner and considerations for selecting an appropriate existing prediction model and external validation population are discussed. This study enables clinicians and researchers to gain a deeper understanding of how to interpret model validation results and how to translate these results to their own patient population.
Article
Full-text available
Objective: In applying machine learning (ML) to electronic health record (EHR) data, many decisions must be made before any ML is applied; such preprocessing requires substantial effort and can be labor-intensive. As the role of ML in health care grows, there is an increasing need for systematic and reproducible preprocessing techniques for EHR data. Thus, we developed FIDDLE (Flexible Data-Driven Pipeline), an open-source framework that streamlines the preprocessing of data extracted from the EHR. Materials and methods: Largely data-driven, FIDDLE systematically transforms structured EHR data into feature vectors, limiting the number of decisions a user must make while incorporating good practices from the literature. To demonstrate its utility and flexibility, we conducted a proof-of-concept experiment in which we applied FIDDLE to 2 publicly available EHR data sets collected from intensive care units: MIMIC-III and the eICU Collaborative Research Database. We trained different ML models to predict 3 clinically important outcomes: in-hospital mortality, acute respiratory failure, and shock. We evaluated models using the area under the receiver operating characteristics curve (AUROC), and compared it to several baselines. Results: Across tasks, FIDDLE extracted 2,528 to 7,403 features from MIMIC-III and eICU, respectively. On all tasks, FIDDLE-based models achieved good discriminative performance, with AUROCs of 0.757-0.886, comparable to the performance of MIMIC-Extract, a preprocessing pipeline designed specifically for MIMIC-III. Furthermore, our results showed that FIDDLE is generalizable across different prediction times, ML algorithms, and data sets, while being relatively robust to different settings of user-defined arguments. Conclusions: FIDDLE, an open-source preprocessing pipeline, facilitates applying ML to structured EHR data. By accelerating and standardizing labor-intensive preprocessing, FIDDLE can help stimulate progress in building clinically useful ML tools for EHR data.
Article
Full-text available
Progress of machine learning in critical care has been difficult to track, in part due to absence of public benchmarks. Other fields of research (such as computer vision and natural language processing) have established various competitions and public benchmarks. Recent availability of large clinical datasets has enabled the possibility of establishing public benchmarks. Taking advantage of this opportunity, we propose a public benchmark suite to address four areas of critical care, namely mortality prediction, estimation of length of stay, patient phenotyping and risk of decompensation. We define each task and compare the performance of both clinical models as well as baseline and deep learning models using eICU critical care dataset of around 73,000 patients. This is the first public benchmark on a multi-centre critical care dataset, comparing the performance of clinical gold standard with our predictive model. We also investigate the impact of numerical variables as well as handling of categorical variables on each of the defined tasks. The source code, detailing our methods and experiments is publicly available such that anyone can replicate our results and build upon our work.
Article
Full-text available
Intensive-care clinicians are presented with large quantities of measurements from multiple monitoring systems. The limited ability of humans to process complex information hinders early recognition of patient deterioration, and high numbers of monitoring alarms lead to alarm fatigue. We used machine learning to develop an early-warning system that integrates measurements from multiple organ systems using a high-resolution database with 240 patient-years of data. It predicts 90% of circulatory-failure events in the test set, with 82% identified more than 2 h in advance, resulting in an area under the receiver operating characteristic curve of 0.94 and an area under the precision-recall curve of 0.63. On average, the system raises 0.05 alarms per patient and hour. The model was externally validated in an independent patient cohort. Our model provides early identification of patients at risk for circulatory failure with a much lower false-alarm rate than conventional threshold-based systems. A machine-learning algorithm based on an array of demographic, physiological and clinical information is able to predict, hours in advance, circulatory failure of patients in the intensive-care unit.
Article
Full-text available
Critical care patients are monitored closely through the course of their illness. As a result of this monitoring, large amounts of data are routinely collected for these patients. Philips Healthcare has developed a telehealth system, the eICU Program, which leverages these data to support management of critically ill patients. Here we describe the eICU Collaborative Research Database, a multi-center intensive care unit (ICU)database with high granularity data for over 200,000 admissions to ICUs monitored by eICU Programs across the United States. The database is deidentified, and includes vital sign measurements, care plan documentation, severity of illness measures, diagnosis information, treatment information, and more. Data are publicly available after registration, including completion of a training course in research with human subjects and signing of a data use agreement mandating responsible handling of the data and adhering to the principle of collaborative research. The freely available nature of the data will support a number of applications including the development of machine learning algorithms, decision support tools, and clinical research.
Article
Full-text available
The vision of creating accessible, reliable clinical evidence by accessing the clincial experience of hundreds of millions of patients across the globe is a reality. Observational Health Data Sciences and Informatics (OHDSI) has built on learnings from the Observational Medical Outcomes Partnership to turn methods research and insights into a suite of applications and exploration tools that move the field closer to the ultimate goal of generating evidence about all aspects of healthcare to serve the needs of patients, clinicians and all other decision-makers around the world.
Article
An increasing amount of research is being devoted to applying machine learning methods to electronic health record (EHR) data for various clinical purposes. This growing area of research has exposed the challenges of the accessibility of EHRs. MIMIC is a popular, public, and free EHR dataset in a raw format that has been used in numerous studies. The absence of standardized preprocessing steps can be, however, a significant barrier to the wider adoption of this rare resource. Additionally, this absence can reduce the reproducibility of the developed tools and limit the ability to compare the results among similar studies. In this work, we provide a greatly customizable pipeline to extract, clean, and preprocess the data available in the fourth version of the MIMIC dataset (MIMIC-IV). The pipeline also presents an end-to-end wizard-like package supporting predictive model creations and evaluations. The pipeline covers a range of clinical prediction tasks which can be broadly classified into four categories - readmission, length of stay, mortality, and phenotype prediction. The tool is publicly available at https://github.com/healthylaife/MIMIC-IV-Data-Pipeline.
Article
Background Acute respiratory distress syndrome (ARDS) is common in intensive care units with high mortality rate and mechanical ventilation (MV) is the most important related treatment. Early prediction of MV duration has benefit for patients risk stratification and care strategies support. Objective To develop an explainable model for predicting mechanical ventilation (MV) duration in patients with ARDS using the machine learning (ML) approach. Method The number of 1,148, 1,697, and 29 ARDS patients admitted to intensive care units (ICU) in the MIMIC-IV, eICU-CRD, and AmsterdamUMCdb databases were included in the study. Features at MV initiation from the MIMIC-IV dataset were used to train prediction models based on seven supervised machine learning algorithms. After 5-fold cross-validation for hyperparameters tuning, the hyperparameters- optimized model of different algorithms was tested by external datasets extracted from eICU-CRD and Amsterdamumcdb. Finally, three descriptive machine learning explanation methods were conducted for the model explanation. Result The XGBoosting model showed the most stable and accurate performance among two testing datasets (RMSE= 5.57 and 5.46 days in eICU-CRD and AmsterdamUMCdb) and was selected as the optimal model. The model explanation based on SHAP, LIME, and DALEX results showed a consistent result, vasopressor, PH, and SOFA score had the highest effect on MV duration prediction. Conclusion ML models with features at MV initiation can accurate predict MV duration in patients with ARDS in ICUs. Among seven algorithms, XGB models showed the best performance (RMSE= 5.57 and 5.46 in two external datasets). LIME, SHAP, and Breakdown methods showed good performance as AXI methods.
Article
Objective: As data science and artificial intelligence continue to rapidly gain traction, the publication of freely available ICU datasets has become invaluable to propel data-driven clinical research. In this guide for clinicians and researchers, we aim to: 1) systematically search and identify all publicly available adult clinical ICU datasets, 2) compare their characteristics, data quality, and richness and critically appraise their strengths and weaknesses, and 3) provide researchers with suggestions, which datasets are appropriate for answering their clinical question. Data sources: A systematic search was performed in Pubmed, ArXiv, MedRxiv, and BioRxiv. Study selection: We selected all studies that reported on publicly available adult patient-level intensive care datasets. Data extraction: A total of four publicly available, adult, critical care, patient-level databases were included (Amsterdam University Medical Center data base [AmsterdamUMCdb], eICU Collaborative Research Database eICU CRD], High time-resolution intensive care unit dataset [HiRID], and Medical Information Mart for Intensive Care-IV). Databases were compared using a priori defined categories, including demographics, patient characteristics, and data richness. The study protocol and search strategy were prospectively registered. Data synthesis: Four ICU databases fulfilled all criteria for inclusion and were queried using SQL (PostgreSQL version 12; PostgreSQL Global Development Group) and analyzed using R (R Foundation for Statistical Computing, Vienna, Austria). The number of unique patient admissions varied between 23,106 (AmsterdamUMCdb) and 200,859 (eICU-CRD). Frequency of laboratory values and vital signs was highest in HiRID, for example, 5.2 (±3.4) lactate values per day and 29.7 (±10.2) systolic blood pressure values per hour. Treatment intensity varied with vasopressor and ventilatory support in 69.0% and 83.0% of patients in AmsterdamUMCdb versus 12.0% and 21.0% in eICU-CRD, respectively. ICU mortality ranged from 5.5% in eICU-CRD to 9.9% in AmsterdamUMCdb. Conclusions: We identified four publicly available adult clinical ICU datasets. Sample size, severity of illness, treatment intensity, and frequency of reported parameters differ markedly between the databases. This should guide clinicians and researchers which databases to best answer their clinical questions.
Article
Deep learning models are increasingly studied in the field of critical care. However, due to the lack of external validation and interpretability, it is difficult to generalize deep learning models in critical care senarios. Few works have validated the performance of the deep learning models with external datasets. To address this, we propose a clinically practical and interpretable deep model for intensive care unit (ICU) mortality prediction with external validation. We use the newly published dataset Philips eICU to train a recurrent neural network model with two-level attention mechanism, and use the MIMIC III dataset as the external validation set to verify the model performance. This model achieves a high accuracy (AUC = 0.855 on the external validation set) and have good interpretability. Based on this model, we develop a system to support clinical decision-making in ICUs.
Article
Introduction Rhabdomyolysis (RM) is a complex set of clinical syndromes involving the rapid dissolution of skeletal muscles. The early detection of patients who need renal replacement therapy (RRT) is very important and may aid in delivering proper care and optimizing the use of limited resources. Methods Retrospective analyses of the following three databases were performed: the eICU Collaborative Research Database (eICU-CRD), the Medical Information Mart for Intensive Care III (MIMIC-III) database and electronic medical records from the First Medical Centre of the Chinese People's Liberation Army General Hospital (PLAGH). The data from the eICU-CRD and MIMIC-III datasets were merged to form the derivation cohort. The data collected from the Chinese PLAGH were used for external validation. The factors predictive of the need for RRT were selected using a LASSO regression analysis. A logistic regression was selected as the algorithm. The model was built in Python using the ML library scikit-learn. The accuracy of the model was measured by the area under the receiver operating characteristic curve (AUC). R software was used for the LASSO regression analysis, nomogram, concordance index, calibration, and decision and clinical impact curves. Results In total, 1259 patients with RM (614 patients from eICU-CRD, 324 patients from the MIMIC-III database and 321 patients from the Chinese PLAGH) were eligible for this analysis. The rate of RRT was 15.0% (92/614) in the eICU-CRD database, 17.6% (57/324) in the MIMIC-III database and 5.6% in the Chinese PLAGH (18/321). After the LASSO regression selection, eight variables were included in the RRT prediction model. The AUC of the model in the training dataset was 0.818 (95% CI 0.78–0.87), the AUC in the test dataset was 0.794 (95% CI 0.72–0.86), and the AUC in the Chinese PLAGH dataset (external validation dataset) was 0.820 (95% CI 0.70–0.86). Conclusions We developed and validated a model for the early prediction of the RRT requirement among patients with RM based on 8 variables commonly measured during the first 24 h after admission. Predicting the need for RRT could help ensure appropriate treatment and facilitate the optimization of the use of medical resources.
Article
The newly inaugurated Research Resource for Complex Physiologic Signals, which was created under the auspices of the National Center for Research Resources of the National Institutes of Health, is intended to stimulate current research and new investigations in the study of cardiovascular and other complex biomedical signals. The resource has 3 interdependent components. PhysioBank is a large and growing archive of well-characterized digital recordings of physiological signals and related data for use by the biomedical research community. It currently includes databases of multiparameter cardiopulmonary, neural, and other biomedical signals from healthy subjects and from patients with a variety of conditions with major public health implications, including life-threatening arrhythmias, congestive heart failure, sleep apnea, neurological disorders, and aging. PhysioToolkit is a library of open-source software for physiological signal processing and analysis, the detection of physiologically significant events using both classic techniques and novel methods based on statistical physics and nonlinear dynamics, the interactive display and characterization of signals, the creation of new databases, the simulation of physiological and other signals, the quantitative evaluation and comparison of analysis methods, and the analysis of nonstationary processes. PhysioNet is an on-line forum for the dissemination and exchange of recorded biomedical signals and open-source software for analyzing them. It provides facilities for the cooperative analysis of data and the evaluation of proposed new algorithms. In addition to providing free electronic access to PhysioBank data and PhysioToolkit software via the World Wide Web (http://www.physionet. org), PhysioNet offers services and training via on-line tutorials to assist users with varying levels of expertise.
HiRID-ICU-benchmark — A comprehensive machine learning benchmark on high-resolution ICU data
  • Yèche