scholarly journals One Step Away from Technology but One Step Towards Domain Experts—MDRBridge: A Template-Based ISO 11179-Compliant Metadata Processing Pipeline

2019 ◽  
Vol 58 (S 02) ◽  
pp. e72-e79 ◽  
Author(s):  
Ann-Kristin Kock-Schoppenhauer ◽  
B. Kroll ◽  
M. Lambarki ◽  
H. Ulrich ◽  
S. Stahl-Toyota ◽  
...  

Summary Background Secondary use of routine medical data relies on a shared understanding of given information. This understanding is achieved through metadata and their interconnections, which can be stored in metadata repositories (MDRs). The necessity of an MDR is well understood, but the local work on metadata is a time-consuming and challenging process for domain experts. Objective To support the identification, collection, and provision of metadata in a predefined structured manner to foster consolidation. A particular focus is placed on user acceptance. Methods We propose a software pipeline MDRBridge as a practical intermediary for metadata capture and processing, based on MDRSheet, an ISO 11179–3 compliant template using popular spreadsheet software. It serves as a practical mediator for metadata acquisition and processing in a broader pipeline. Due to the different origins of the metadata, both manual entry and automatic extractions from application systems are supported. To enable the export of collected metadata into external MDRs, a mapping of ISO 11179 to Clinical Data Interchange Standards Consortium (CDISC) Operational Data Model (ODM) was developed. Results MDRSheet is embedded in the processing pipeline MDRBridge and delivers metadata in the CDISC ODM format for further use in MDRs. This approach is used to interactively unify core datasets, import existing standard datasets, and automatically extract all defined data elements from source systems. The involvement of clinical domain experts improved significantly due to minimal changes within their usual work routine. Conclusion A high degree of acceptance was achieved by adapting the working methods of clinical domain experts. The designed process is capable of transforming all relevant data elements according to the ISO 11179-3 format. MDRSheet is used as an intermediate format to present the information at a glance and to allow editing or supplementing by domain experts.

Author(s):  
Thomas J. Reese ◽  
Noa Segall ◽  
Guilherme Del Fiol ◽  
Joseph E. Tonna ◽  
Kensaku Kawamoto ◽  
...  

2021 ◽  
Author(s):  
Sylvia Cho ◽  
Chunhua Weng ◽  
Michael G Kahn ◽  
Karthik Natarajan

BACKGROUND There is a growing interest in using person-generated wearable device data for biomedical research, but concerns in the quality of data such as missing or incorrect data exists. This emphasizes the importance of assessing data quality prior to conducting research. In order to perform data quality assessments, it is essential to define what data quality means for person-generated wearable device data by identifying data quality dimensions. OBJECTIVE The goal of this study was to identify data quality dimensions for person-generated wearable device data for research purposes. METHODS Study was conducted in three phases: (1) literature review, (2) survey, and (3) focus group discussion. Literature review was conducted following the PRISMA guideline to identify factors affecting data quality and its associated data quality challenges. In addition, a survey was conducted to confirm and complement results from the literature review, and to understand researchers’ perception on data quality dimensions that were previously identified as dimensions for the secondary use of electronic health record (EHR) data. The survey was sent out to researchers with experience in analyzing wearable device data. Focus group discussion sessions were conducted with domain experts to derive data quality dimensions for person-generated wearable device data. Based on the results from the literature review and survey, a facilitator proposed potential data quality dimensions relevant to person-generated wearable device data, and the domain experts accepted or rejected the suggested dimensions. RESULTS Nineteen studies were included in the literature review. Three major themes emerged: device- and technical-related, user-related, and data governance-related factors. Associated data quality problems were incomplete data, incorrect data, and heterogeneous data. Twenty respondents answered the survey. Major data quality challenges faced by researchers were completeness, accuracy, and plausibility. The importance ratings on data quality dimensions in an existing framework showed that dimensions for secondary use of EHR data is applicable to person-generated wearable device data. There were three focus group sessions with domain experts in data quality and wearable device research. The experts concluded that intrinsic data quality features such as conformance, completeness, and plausibility, and contextual/fitness-for-use data quality features such as completeness (breadth and density) and temporal data granularity are important data quality dimensions for assessing person-generated wearable device data for research purposes. CONCLUSIONS In this study, intrinsic and contextual/fitness-for-use data quality dimensions for person-generated wearable device data were identified. The dimensions were adapted from data quality terminologies and frameworks for the secondary use of EHR data with a few modifications. Further research on how data quality can be assessed in regards to each dimension is needed.


2015 ◽  
Vol 54 (01) ◽  
pp. 65-74 ◽  
Author(s):  
J. Evans ◽  
T. A. Oniki ◽  
J. F. Coyle ◽  
L. Bain ◽  
S. M. Huff ◽  
...  

SummaryIntroduction: This article is part of the Focus Theme of Methods of Information in Medicine on “Managing Interoperability and Complexity in Health Systems”.Background: Data sharing and integration between the clinical research data management system and the electronic health record system remains a challenging issue. To approach the issue, there is emerging interest in utilizing the Detailed Clinical Model (DCM) approach across a variety of contexts. The Intermountain Healthcare Clinical Element Models (CEMs) have been adopted by the Office of the National Coordinator awarded Strategic Health IT Advanced Research Projects for normalization (SHARPn) project for normalizing patient data from the electronic health records (EHR).Objective: The objective of the present study is to describe our preliminary efforts toward harmonization of the SHARPn CEMs with CDISC (Clinical Data Interchange Standards Consortium) clinical study data standards.Methods: We were focused on three generic domains: demographics, lab tests, and medications. We performed a panel review on each data element extracted from the CDISC templates and SHARPn CEMs.Results: We have identified a set of data elements that are common to the context of both clinical study and broad secondary use of EHR data and discussed outstanding harmonization issues.Conclusions: We consider that the outcomes would be useful for defining new requirements for the DCM modeling community and ultimately facilitating the semantic interoper-ability between systems for both clinical study and broad secondary use domains.


2020 ◽  
Vol 27 (9) ◽  
pp. 1437-1442 ◽  
Author(s):  
Xiao Dong ◽  
Jianfu Li ◽  
Ekin Soysal ◽  
Jiang Bian ◽  
Scott L DuVall ◽  
...  

Abstract Large observational data networks that leverage routine clinical practice data in electronic health records (EHRs) are critical resources for research on coronavirus disease 2019 (COVID-19). Data normalization is a key challenge for the secondary use of EHRs for COVID-19 research across institutions. In this study, we addressed the challenge of automating the normalization of COVID-19 diagnostic tests, which are critical data elements, but for which controlled terminology terms were published after clinical implementation. We developed a simple but effective rule-based tool called COVID-19 TestNorm to automatically normalize local COVID-19 testing names to standard LOINC (Logical Observation Identifiers Names and Codes) codes. COVID-19 TestNorm was developed and evaluated using 568 test names collected from 8 healthcare systems. Our results show that it could achieve an accuracy of 97.4% on an independent test set. COVID-19 TestNorm is available as an open-source package for developers and as an online Web application for end users (https://clamp.uth.edu/covid/loinc.php). We believe that it will be a useful tool to support secondary use of EHRs for research on COVID-19.


2021 ◽  
Vol 653 ◽  
pp. A68 ◽  
Author(s):  
Mats G. Löfdahl ◽  
Tomas Hillberg ◽  
Jaime de la Cruz Rodríguez ◽  
Gregal Vissers ◽  
Oleksii Andriienko ◽  
...  

Context. Data from ground-based, high-resolution solar telescopes can only be used for science with calibrations and processing, which requires detailed knowledge about the instrumentation. Space-based solar telescopes provide science-ready data, which are easier to work with for researchers whose expertise is in the interpretation of data. Recently, data-processing pipelines for ground-based instruments have been constructed. Aims. We aim to provide observers with a user-friendly data pipeline for data from the Swedish 1-meter Solar Telescope (SST) that delivers science-ready data together with the metadata needed for proper interpretation and archiving. Methods. We briefly describe the CHROMospheric Imaging Spectrometer (CHROMIS) instrument, including its (pre)filters, as well as recent upgrades to the CRisp Imaging SpectroPolarimeter (CRISP) prefilters and polarization optics. We summarize the processing steps from raw data to science-ready data cubes in FITS files. We report calibrations and compensations for data imperfections in detail. Misalignment of Ca II data due to wavelength-dependent dispersion is identified, characterized, and compensated for. We describe intensity calibrations that remove or reduce the effects of filter transmission profiles as well as solar elevation changes. We present REDUX, a new version of the MOMFBD image restoration code, with multiple enhancements and new features. It uses projective transforms for the registration of multiple detectors. We describe how image restoration is used with CRISP and CHROMIS data. The science-ready output is delivered in FITS files, with metadata compliant with the SOLARNET recommendations. Data cube coordinates are specified within the World Coordinate System (WCS). Cavity errors are specified as distortions of the WCS wavelength coordinate with an extension of existing WCS notation. We establish notation for specifying the reference system for Stokes vectors with reference to WCS coordinate directions. The CRIsp SPectral EXplorer (CRISPEX) data-cube browser has been extended to accept SSTRED output and to take advantage of the SOLARNET metadata. Results. SSTRED is a mature data-processing pipeline for imaging instruments, developed and used for the SST/CHROMIS imaging spectrometer and the SST/CRISP spectropolarimeter. SSTRED delivers well-characterized, science-ready, archival-quality FITS files with well-defined metadata. The SSTRED code, as well as REDUX and CRISPEX, is freely available through git repositories.


2018 ◽  
Author(s):  
Zhou Yuan ◽  
Sean Finan ◽  
Jeremy Warner ◽  
Guergana Savova ◽  
Harry Hochheiser

AbstractRetrospective cancer research requires identification of patients matching both categorical and temporal inclusion criteria, often based on factors exclusively available in clinical notes. Although natural language processing approaches for inferring higher-level concepts have shown promise for bringing structure to clinical texts, interpreting results is often challenging, involving the need to move between abstracted representations and constituent text elements. We discuss qualitative inquiry into user tasks and goals, data elements and models resulting in an innovative natural language processing pipeline and a visual analytics tool designed to facilitate interpretation of patient summaries and identification of cohorts for retrospective research.


2021 ◽  
Author(s):  
Rajaram Kaliyaperumal ◽  
Mark D Wilkinson ◽  
Pablo Alarcon Moreno ◽  
Nirupama Benis ◽  
Ronald Cornet ◽  
...  

Background: The European Platform on Rare Disease Registration (EU RD Platform) aims to address the fragmentation of European rare disease (RD) patient data, scattered among hundreds of independent and non-coordinating registries, by establishing standards for integration and interoperability. The first practical output of this effort was a set of 16 Common Data Elements (CDEs) that should be implemented by all RD registries. Interoperability, however, requires decisions beyond data elements - including data models, formats, and semantics. Within the European Joint Programme on Rare Disease (EJP RD), we aim to further the goals of the EU RD Platform by generating reusable RD semantic model templates that follow the FAIR Data Principles. Results: Through a team-based iterative approach, we created semantically grounded models to represent each of the CDEs, using the SemanticScience Integrated Ontology as the core framework for representing the entities and their relationships. Within that framework, we mapped the concepts represented in the CDEs, and their possible values, into domain ontologies such as the Orphanet Rare Disease Ontology, Human Phenotype Ontology and National Cancer Institute Thesaurus. Finally, we created an exemplar, reusable ETL pipeline that we will be deploying over these non-coordinating data repositories to assist them in creating model-compliant FAIR data without requiring site-specific coding nor expertise in Linked Data or FAIR. Conclusions: Within the EJP RD project, we determined that creating reusable, expert-designed templates reduced or eliminated the requirement for our participating biomedical domain experts and rare disease data hosts to understand OWL semantics. This enabled them to publish highly expressive FAIR data using tools and approaches that were already familiar to them.


2016 ◽  
Vol 39 (1) ◽  
pp. 42-62 ◽  
Author(s):  
Chih-Lin Chi ◽  
Jin Wang ◽  
Thomas R. Clancy ◽  
Jennifer G. Robinson ◽  
Peter J. Tonellato ◽  
...  

Health care Big Data studies hold substantial promise for improving clinical practice. Among analytic tools, machine learning (ML) is an important approach that has been widely used by many industries for data-driven decision support. In Big Data, thousands of variables and millions of patient records are commonly encountered, but most data elements cannot be directly used to support decision making. Although many feature-selection tools can help identify relevant data, these tools are typically insufficient to determine a patient data cohort to support learning. Therefore, domain experts with nursing or clinic knowledge play critical roles in determining value criteria or the type of variables that should be included in the patient cohort to maximize project success. We demonstrate this process by extracting a patient cohort (37,506 individuals) to support our ML work (i.e., the production of a proactive strategy to prevent statin adverse events) from 130 million de-identified lives in the OptumLabs™ Data Warehouse.


2001 ◽  
Vol 10 (04) ◽  
pp. 483-507 ◽  
Author(s):  
GOCE TRAJCEVSKI ◽  
CHITTA BARAL ◽  
JORGE LOBO

This work addresses the problem of workflow requirements specifications considering the realistic assumptions that, it involves experts from different domains (i.e. representatives of different business policies); not all the possible execution scenarios are known beforehand, during the early stage of specification. In particular, since the main purpose of a workflow is to achieve a certain (bussiness) goal, we propose a formalism which enables the users to specify their requirements (and expectations) and test if the information that they have provided is, in a sense, sufficient for the workflow to behave "as desired", in terms of the goal. Our methodology allows domain experts to express not only their knowledge, but also the "ignorance" (the semantics allows for unknown values to reflect a realistic situation of agents dealing with incomplete information) and the possibility of occurrence of exceptional situations. As a basis for formalizing the process of equirements specifications, we are using the recent results on reasoning about actions. We propose a high level language AW which enables specifying the effects that activites have on the environment and how they should be coordinated. We also describe our prototype tool for process specification. Strictly speaking, in this work we go "one step" before actual analysis and design, and offer a formalism which enables the involved partners to see if the extent to which they have expressed their domain knowledge (which may sometimes be subject to a proprietary restricions) can satisfy the intended needs and behaviour of their product_to_be. We define an entailment relation which enables reasoning about the correctness of the specification, in terms of achieving a desired goal and, also testing about consequences of modifications in the workflow descriptions.


Sign in / Sign up

Export Citation Format

Share Document