Fig 3 - uploaded by Hugo Leroux
Content may be subject to copyright.
The RDF Data Cube data model (container and descriptor classes) 

The RDF Data Cube data model (container and descriptor classes) 

Source publication
Conference Paper
Full-text available
The development of ontology, linked data standards and tools for semantic enrichment, opens new opportunities to analyse and reuse the clinical data collected as part of clinical trials and longitudinal studies. This paper pre-sents our approach on the semantic enrichment of the data collected as part of the Australian, Imaging, Biomarker and Lifes...

Context in source publication

Context 1
... the definitions of these data structures are also included in the CDISC ODM export format. In OpenClinica, this data dictionary can be edited by super users with the help of an Excel template that plays the role of a configuration file. The users of the tools are encouraged to share their CRFs and have access to a library of peer reviewed ones derived from authoritative standards sources such as the CDISC Clinical Data Acquisition Standards Harmonization (CDASH) initiative. The RDF Data Cube vocabulary [2], published by the W3C Government Linked Data working group, is a vocabulary for the publication of statistical data in RDF [11]. This specification is available as a working draft, but it has been evaluated by a number of government agencies (Eurostat, European and UK Environment agencies) who have published large scale datasets. It has also triggered new work on the Online Analytical Processing (OLAP) of Linked Data sources ([12-13]). The basic principles behind the design of the RDF Data Cube vocabulary are illustrated in Figure 2. A cube is a dataset that is divided into slices according to several dimensions. Each slice contains a number of observations . The arrows in Fig. 2 represent the links between the cube and the slices and between the slices and the observations. These extra links at multiple levels of data aggregation allow the data consumers to navigate and query linked data. The RDF Data Cube vocabulary defines three types of data items: dimensions for the identification keys, measures and attributes for the recorded data and metadata. The slices group subsets of observations within a dataset where all the dimensions except one (or a small number) are fixed. The RDF Data Cube vocabulary specifies container classes and descriptor classes and the set of properties to link them ( Fig. 3 ). The main descriptor class, has a link to properties classes ( qb:componentProperty ) to specify the data items which are used. The other classes ( qb:DataStructureDefinition , qb:SliceKey , qb:ComponentSpecification ) and properties ( qb:structure , qb:sliceStructure , qb:componentAttachment ) are used to indicate ...

Citations

... Although a network of ontologies seems to be the most appropriate approach to comply the LOD vision, this specific work does not use the QB ontology despite being a standard recommendation suitable for modeling and integrating heterogeneous spatio-temporal data. QB has already been used to organize data of various natures in the LOD Cloud, such as medical data (Leroux and Lefort, 2012;Rodriguez and Hogan, 2021;Casey et al., 2022), historical data (Bayerl and Granitzer, 2015) and especially socioeconomic data 14 due to its compatibility with the SDMX model. However, despite the benefits of the QB vocabulary, it is rarely used in the field of EO data. ...
Article
Full-text available
The Earth’s ecosystem is facing serious threats due to the depletion of natural resources and environmental pollution. To promote sustainable practices and formulate effective policies that address these issues, both experts and non-expert stakeholders require access to meaningful Open Data. Current Earth monitoring programs provide a large volume of open Earth Observation (EO) data typically organized and managed in EO Data Cubes (EODCs). From these datasets, satellite-derived indices can be calculated for assessing various environmental aspects in areas of interest over time. However, current EOs lack semantics and are isolated from significant Web resources, greatly hindering their comprehension and limiting their use to specialized users. To enhance EO data with semantic richness and ensure their understanding by a wider audience, it is pertinent to adopt a Linked Open Data (LOD) approach. In this paper, we present the Linked Earth Observation Data Series (LEODS) framework designed to publish aggregated EO data in the LOD Cloud. LEODS provides a processing chain that converts EO data into EO-RDF data cubes based on a spatio-temporal modeling approach that ensures integration and future semantic enrichment of EO data while preserving the advantages of traditional EODCs and following the FAIR principles (i.e., findable, accessible, interoperable, and reusable). To highlight the advantages of our proposal, we explore through SPARQL queries and visualizations, the results of implementing LEODS with study areas located in Switzerland and France.
... Continuing, during our research, we identified several vocabularies/ontologies designed to facilitate the publication of EO data on the LOD Cloud, the most relevant ones are: 22 is an OGC standard designed to describe observations and measurements in the geospatial and environmental domains, e.g.,the relationships between the target spatial objects, the measured properties, the measurement procedure, and the captured data resulting from those observational events. Although O&M is versatile within its intended scope, there are certain areas where it may not provide complete coverage, e.g., Social Sciences and Economics. ...
... Although a network of ontologies seems to be the most appropriate approach to comply the LOD vision, this specific work does not use the QB ontology despite being a standard recommendation suitable for modeling and integrating heterogeneous spatio-temporal data. QB has already been used to organize data of various natures in the LOD Cloud, such as medical data [8,22,30], historical data [5] and especially socio-economic data 27 due to its compatibility with the SDMX model. However, despite the benefits of the QB vocabulary, it is rarely used in the field of EO data. ...
... Each informal term is assigned a single medical concept. AskaPatient 9 [36] maps informal terms from AskaPatient web forum to medical concepts in SNOMED-CT and Australian Medical Terminology [35]. Since this lexicon was created from a web forum, it is more informative compared to TwADR-L. ...
Conference Paper
Full-text available
Mental health illness such as depression is a significant risk factor for suicide ideation, behaviors, and attempts. A report by Substance Abuse and Mental Health Services Administration (SAMHSA) shows that 80% of the patients suffering from Borderline Personality Disorder (BPD) have suicidal behavior, 5-10% of whom commit suicide. While multiple initiatives have been developed and implemented for suicide prevention, a key challenge has been the social stigma associated with mental disorders, which deters patients from seeking help or sharing their experiences directly with others including clinicians. This is particularly true for teenagers and younger adults where suicide is the second highest cause of death in the US. Prior research involving surveys and questionnaires (e.g. PHQ-9) for suicide risk prediction failed to provide a quantitative assessment of risk that informed timely clinical decision-making for intervention. Our interdisciplinary study concerns the use of Reddit as an unobtrusive data source for gleaning information about suicidal tendencies and other related mental health conditions afflicting depressed users. We provide details of our learning framework that incorporates domain-specific knowledge to predict the severity of suicide risk for an individual. Our approach involves developing a suicide risk severity lexicon using medical knowledge bases and suicide ontology to detect cues relevant to suicidal thoughts and actions. We also use language modeling, medical entity recognition and normalization and negation detection to create a dataset of 2181 redditors that have discussed or implied suicidal ideation, behavior, or attempt. Given the importance of clinical knowledge, our gold standard dataset of 500 redditors (out of 2181) was developed by four practicing psychiatrists following the guidelines outlined in Columbia Suicide Severity Rating Scale (C-SSRS), with the pairwise annotator agreement of 0.79 and group-wise agreement of 0.73. Compared to the existing four-label classification scheme (no risk, low risk, moderate risk, and high risk), our proposed C-SSRS-based 5-label classification scheme distinguishes people who are supportive, from those who show different severity of suicidal tendency. Our 5-label classification scheme outperforms the state-of-the-art schemes by improving the graded recall by 4.2% and reducing the perceived risk measure by 12.5%. Convolutional neural network (CNN) provided the best performance in our scheme due to the discriminative features and use of domain-specific knowledge resources, in comparison to SVM-L that has been used in the state-of-the-art tools over similar dataset.
... RDF Data Cube has been used to transform statistical data from other domains such as census data [5], Open Government Data [6], clinical trial data [7], and meteorological sensor data [8]. ...
Chapter
Full-text available
Tourism is a crucial component of Sri Lanka’s economy. Intelligent business decisions by means of thorough analysis of relevant data can help the Sri Lankan tourism industry to be competitive. To this end, Sri Lanka Tourism Development Authority makes tourism statistics publicly available. However, they are published as PDF files limiting their reuse. In this paper, we present how to transform such data into 5-star Linked Open Data by extracting the statistics as structured data; modelling them using the W3C RDF Data Cube vocabulary and transforming them to RDF using W3C R2RML mappings. Furthermore, we demonstrate the benefits of such transformation using two real-world use cases.
... However, the monolithic nature of the ODM data model favours a one-dimensional traversal of the clinical data along its hierarchy of Study-Subject-StudyEvent-Form-ItemGroup-Item. More effective exploration and querying of the clinical data, especially when dealing with longitudinal studies, requires more direct access to the data, particularly at the Study Event, Subject and Item levels [6,7,19]. ...
... The Linked Clinical Data Cube (LCDC) [6,7,19] describes a semantic web approach to investigate the association of the semantic statistics vocabularies with clinical data exchange standards and demonstrate their fit in achieving the semantic enrichment of clinical study data with a view to fulfilling semantic interoperability. The LCDC defines a set of modularised data cubes that helps manage the multi-dimensional and multidisciplinary nature of clinical data. ...
Article
Full-text available
Background:Observational clinical studies play a pivotal role in advancing medical knowledge and patient healthcare. To lessen the prohibitive costs of conducting these studies and support evidence-based medicine, results emanating from these studies need to be shared and compared to one another. Current approaches for clinical study management have limitations that prohibit the effective sharing of clinical research data. Methods:The objective of this paper is to present a proposal for a clinical study architecture to not only facilitate the communication of clinical study data but also its context so that the data that is being communicated can be unambiguously understood at the receiving end. Our approach is two-fold. First we outline our methodology to map clinical data from Clinical Data Interchange Standards Consortium Operational Data Model (ODM) to the Fast Healthcare Interoperable Resource (FHIR) and outline the strengths and weaknesses of this approach. Next, we propose two FHIR-based models, to capture the metadata and data from the clinical study, that not only facilitate the syntactic but also semantic interoperability of clinical study data. Conclusions:This work shows that our proposed FHIR resources provide a good fit to semantically enrich the ODM data. By exploiting the rich information model in FHIR, we can organise clinical data in a manner that preserves its organisation but captures its context. Our implementations demonstrate that FHIR can natively manage clinical data.Furthermore, by providing links at several levels, it improves the traversal and querying of the data. The intended benefits of this approach is more efficient and effective data exchange that ultimately will allow clinicians to switch their focus back to decision-making and evidence-based medicines.
... By providing independent, unique codes, the AMT is intended to allow long-term, reliable communication of medication information between different systems. Uptake of the AMT has been slow, particularly in commercial software, but it is being increasingly used in research and analysis [31][32][33]. ...
Article
Full-text available
ABSTRACT Background: Inappropriate use of sedating medication has been reported in nursing homes for several decades. The Reducing Use of Sedatives (RedUSe) project was designed to address this issue through a combination of audit, feedback, staff education, and medication review. The project significantly reduced sedative use in a controlled trial of 25 Tasmanian nursing homes. To expand the project to 150 nursing homes across Australia, an improved and scalable method of data collection was required. This paper describes and evaluates a method for remotely extracting, transforming, and validating electronic resident and medication data from community pharmacies supplying medications to nursing homes. Objective: The aim of this study was to develop and evaluate an electronic method for extracting and enriching data on psychotropic medication use in nursing homes, on a national scale. Methods: An application uploaded resident details and medication data from computerized medication packing systems in the pharmacies supplying participating nursing homes. The server converted medication codes used by the packing systems to Australian Medicines Terminology coding and subsequently to Anatomical Therapeutic Chemical (ATC) codes for grouping. Medications of interest, in this case antipsychotics and benzodiazepines, were automatically identified and quantified during the upload. This data was then validated on the Web by project staff and a “champion nurse” at the participating home. Results: Of participating nursing homes, 94.6% (142/150) had resident and medication records uploaded. Facilitating an upload for one pharmacy took an average of 15 min. A total of 17,722 resident profiles were extracted, representing 95.6% (17,722/18,537) of the homes’ residents. For these, 546,535 medication records were extracted, of which, 28,053 were identified as antipsychotics or benzodiazepines. Of these, 8.17% (2291/28,053) were modified during validation and verification stages, and 4.75% (1398/29,451) were added. The champion nurse required a mean of 33 min website interaction to verify data, compared with 60 min for manual data entry. Conclusions: The results show that the electronic data collection process is accurate: 95.25% (28,053/29,451) of sedative medications being taken by residents were identified and, of those, 91.83% (25,762/28,053) were correct without any manual intervention. The process worked effectively for nearly all homes. Although the pharmacy packing systems contain some invalid patient records, and data is sometimes incorrectly recorded, validation steps can overcome these problems and provide sufficiently accurate data for the purposes of reporting medication use in individual nursing homes. J Med Internet Res 2017;19(8):e283 doi:10.2196/jmir.6938 KEYWORDS electronic health records; information storage and retrieval; inappropriate prescribing; antipsychotic agents; benzodiazepines; nursing homes; systematized nomenclature of medicine; health information systems
... However, the monolithic nature of the ODM data model favours a one-dimensional traversal of the clinical data along its hierarchy of Study-Subject-StudyEvent-Form-ItemGroup-Item. More effective exploration and querying of the clinical data, especially when dealing with longitudinal studies, requires more direct access to the data, particularly at the Study Event, Subject and Item levels [6,7,16]. ...
... The Linked Clinical Data Cube [6,16,7] investigates the association of the semantic statistics vocabularies with clinical data exchange standards and demonstrates their fit in achieving the semantic enrichment of clinical study data with a view to fulfilling semantic interoperability. ...
... Automating ontology construction [13,16,23], population [8,30], and reuse [7] have long been topics of research. There have been many different applications but none seem to have the overall aim of the creation of a system to quickly understand the scope of a longitudinal study and identify relevant areas of interest to a clinical specialist. ...
... In work most similar to that presented here, researchers [13] use existing ontologies (AMT -Australian Medical Terminologies, SNOMED CT) and link them to the Australian Imaging, Biomarker and Lifestyle (AIBL) study of ageing. Our goal was to enrich the ontology from an existing dementia study rather than enriching the clinical trial data itself. ...
Article
Full-text available
A common activity carried out by healthcare professionals is to test various hypotheses on longitudinal study data in an effort to develop new and more reliable algorithms that might determine the possibility of developing certain illnesses. The INnovative, Midlife INtervention for Dementia Deterrence project provides input from a number of European dementia experts to identify the most accurate model of inter-related risk factors which can yield a personalized dementia-risk quotient and profile. This model is then validated against the large population-based prospective Maastricht Aging Study dataset. As part of this overall goal, the research presented in this article demonstrates how we can automate the process of mapping modifiable risk factors against large sections of the aging study and thus use information technology to provide more powerful query interfaces. © The Author(s) 2015.
... Data interchange, a key process in the data collection phase, is an original ODM use case, was the focus of ODM v1.0, and has been covered broadly in the literature [2,13,26,35,43,45,46,52,53,57,58,60,[62][63][64][73][74][75][76][77][78][79][80][81][82][83][84][85]. ODM's basic hierarchical structure is particularly well suited for data capture [84]. ...
... ODM has been used to integrate clinical research data into the i2b2 (Informatics for Integrating Biology and the Bedside) data model including both ontology and fact data [26,46]. Leroux and Lefort [82] used ODM to drive the creation of RDF data cubes from longitudinal study data. ...
... ODM's relative simplicity has at times also been a limiting factor impacting all aspects of interoperability including data mapping, representing semantics, data types, and terminology support. The ODM hierarchical structure, based on the elements shown in Fig. 3, most clearly expresses CRF-oriented data [82,84], and in the cases of the Define-XML and Dataset-XML, extensions have been used to represent tabular datasets [14,15]. ...
Article
Introduction: In order to further advance research and development on the Clinical Data Interchange Standards Consortium (CDISC) Operational Data Model (ODM) standard, the existing research must be well understood. This paper presents a methodological review of the ODM literature. Specifically, it develops a classification schema to categorize the ODM literature according to how the standard has been applied within the clinical research data lifecycle. This paper suggests areas for future research and development that address ODM's limitations and capitalize on its strengths to support new trends in clinical research informatics. Methods: A systematic scan of the following databases was performed: (1) ABI/Inform, (2) ACM Digital, (3) AIS eLibrary, (4) Europe Central PubMed, (5) Google Scholar, (5) IEEE Xplore, (7) PubMed, and (8) ScienceDirect. A Web of Science citation analysis was also performed. The search term used on all databases was "CDISC ODM." The two primary inclusion criteria were: (1) the research must examine the use of ODM as an information system solution component, or (2) the research must critically evaluate ODM against a stated solution usage scenario. Out of 2686 articles identified, 266 were included in a title level review, resulting in 183 articles. An abstract review followed, resulting in 121 remaining articles; and after a full text scan 69 articles met the inclusion criteria. Results: As the demand for interoperability has increased, ODM has shown remarkable flexibility and has been extended to cover a broad range of data and metadata requirements that reach well beyond ODM's original use cases. This flexibility has yielded research literature that covers a diverse array of topic areas. A classification schema reflecting the use of ODM within the clinical research data lifecycle was created to provide a categorized and consolidated view of the ODM literature. The elements of the framework include: (1) EDC (Electronic Data Capture) and EHR (Electronic Health Record) infrastructure; (2) planning; (3) data collection; (4) data tabulations and analysis; and (5) study archival. The analysis reviews the strengths and limitations of ODM as a solution component within each section of the classification schema. This paper also identifies opportunities for future ODM research and development, including improved mechanisms for semantic alignment with external terminologies, better representation of the CDISC standards used end-to-end across the clinical research data lifecycle, improved support for real-time data exchange, the use of EHRs for research, and the inclusion of a complete study design. Conclusions: ODM is being used in ways not originally anticipated, and covers a diverse array of use cases across the clinical research data lifecycle. ODM has been used as much as a study metadata standard as it has for data exchange. A significant portion of the literature addresses integrating EHR and clinical research data. The simplicity and readability of ODM has likely contributed to its success and broad implementation as a data and metadata standard. Keeping the core ODM model focused on the most fundamental use cases, while using extensions to handle edge cases, has kept the standard easy for developers to learn and use.
... However, as outlined in section 3.4, forms are often ill-conceived in ODM and as discussed in sections 3.5 and 3.6, the tendency is not to organise questions in a contextual manner but in one that befits the data capture process. The implication is that tremendous effort, which grows exponentially with the size of the study, must be expended to semantically enrich the clinical data by regrouping it contextually and integrating it with the relevant domain ontologies [10,9]. ...
... Several researchers have initiated approaches to address the semantic enrichment of clinical data with a view to achieving interoperability. One such approach, the Linked Clinical Data Cube (LCDC) [10,9,11] is a set of modularised data cubes that helps manage the multi-dimensional and multi-disciplinary nature of clinical data. A comprehensive comparison between the LCDC and this work is outside the scope of this paper. ...
Conference Paper
Full-text available
Observational clinical studies play a pivotal role in advancing medical knowledge and patient healthcare. However, to lessen the prohibitive costs of conducting these studies and support evidence-based medicine, results emanating from these studies need to be shared and compared with one another. This paper explores how semantic interoperability of clinical data can be achieved by integrating two prominent standards for clinical data: ODM and FHIR. ODM lacks a rich-enough information model to adequately capture the contextual information of clinical study data. This is overcome by using FHIR's information model to achieve semantic interoperability of clinical data. This work outlines our ongoing e�ffort to integrate the ODM standard to the FHIR standard. In particular, it demonstrates how the hierarchical ODM model lends itself to be mapped to the ubiquitous FHIR resources. We describe the approach and provide insights into the assumptions made to fi�t the clinical data extracted from the ODM standard into the FHIR resources. Our focus is not only on mapping the data from ODM to the FHIR models but on capturing the contextual information, present in other sources, such as the study protocol, and which should have been made available with the extracted data. Finally, we discuss the exceptions under which the extracted ODM data does not adequately �fit the targeted FHIR resources and o�ffer some insight into a suitable solution