data lifecycle
Recently Published Documents


TOTAL DOCUMENTS

114
(FIVE YEARS 62)

H-INDEX

7
(FIVE YEARS 3)

2021 ◽  
Vol 13 (24) ◽  
pp. 13827
Author(s):  
Seungjin Baek ◽  
Young-Gab Kim

Although the defense field is also one of the key areas that use big data for security reasons, there is a lack of study that designs system frameworks and presents security requirements to implement big data in defense. However, we overcome the security matters by examining the battlefield environment and the system through the flow of data in the battlefield. As such, this research was conducted to apply big data in the defense domain, which is a unique field. In particular, a three-layered system framework was designed to apply big data in the C4I system, which collects, manages, and analyzes data generated from the battlefield, and the security measures required for each layer were developed. First, to enhance the general understanding of big data and the military environment, an overview of the C4I system, the characteristics of the 6V’s, and the five-phase big data lifecycle were described. While presenting a framework that divides the C4I system into three layers, the roles and components of each layer are described in detail, considering the big data lifecycle and system framework. A security architecture is finally proposed by specifying security requirements for each field in the three-layered C4I system. The proposed system framework and security architecture more accurately explain the unique nature of the military domain than those studied in healthcare, smart grids, and smart cities; development directions requiring further research are described.


Author(s):  
Stephan Graf ◽  
Olaf Mextorf

JUST is a versatile storage infrastructure operated by the Jülich Supercomputing Centre at Forschungszentrum Jülich. The system provides high-performance and high-capacity storage resources for the supercomputer facility. Recently, additional storage and management services, addressing demands beyond the high-performance computing area, have been added. In support of its mission, JUST consists of multiple storage tiers with different performance and functional characteristics to cover the entire data lifecycle.


2021 ◽  
Vol 109 (3) ◽  
Author(s):  
Soojung Kim ◽  
Sue Yeon Syn

Objective: This study investigates research data management (RDM) services using a crosstab framework with the National Institutes of Health (NIH) Library as a case study to provide practical considerations for libraries seeking to improve their RDM services.Methods: We conducted semistructured interviews with four librarians who provide data services at the NIH Library regarding library user characteristics, RDM services provided, RDM infrastructure, and collaboration experiences. Through the analysis of interview transcripts, we identified and analyzed the NIH Library’s RDM services according to Online Computer Library Center (OCLC)'s three categories of RDM services and the six stages of the data lifecycle.Results: The findings show that the two models’ crosstab framework can provide an overview of an institution’s current RDM services and identify service gaps. The NIH Library tends to take more responsibility in providing education and expertise services while relying more on information technology departments for curation services. The library provides significant support for data creation, analysis, and sharing stages to meet biomedical researchers’ needs, suggesting areas for potential expansion of RDM services in the less supported stages of data description, storage, and preservation. Based on these findings, we recommend three key considerations for libraries: identify gaps in current services, identify services that can be supported via partnerships, and get regular feedback from users.Conclusion: These findings provide a deeper understanding of RDM support on the basis of RDM service categories and the data lifecycle and promote discussion of issues to be considered for future improvements in RDM services.


Author(s):  
Ryan McGranaghan ◽  
Enrico Camporeale ◽  
Manolis Georgoulis ◽  
Anastasios Anastasiadis

The onset and rapid advance of the Digital Age have brought challenges and opportunities for scientific research characterized by a continuously evolving data landscape reflected in the four V’s of big data: volume, variety, veracity, and velocity. The big data landscape supersedes traditional means of storage, processing, management, and exploration, and requires adaptation and innovation across the full data lifecycle (i.e., collection, storage and processing, analytics, and representation). The Topical Issue ``Space Weather research in the Digital Age and across the full data lifecycle'' collects research from across the full data lifecycle (collection, management, analysis, and communication; collectively `Data Science') and offers a tractable compendium that illustrates the latest computational and data science trends, tools, and advances for Space Weather research. We introduce the paradigm shift in Space Weather and the articles in the Topical Issue. We create a network view of the research that highlights the contribution to the change of paradigm and reveals the trends that will guide it hereafter.


2021 ◽  
Vol 10 (3) ◽  
Author(s):  
Inna Kouper ◽  
Karen L. Tucker ◽  
Kevin Tharp ◽  
Mary Ellen van Booven ◽  
Ashley Clark

In this paper we take an in-depth look at the curation of a large longitudinal survey and activities and procedures involved in moving the data from its generation to the state that is needed to conduct scientific analysis. Using a case study approach, we describe how large surveys generate a range of data assets that require many decisions well before the data is considered for analysis and publication. We use the notion of active curation to describe activities and decisions about the data objects that are “live,” i.e., when they are still being collected and processed for the later stages of the data lifecycle. Our efforts illustrate a gap in the existing discussions on curation. On one hand, there is an acknowledged need for active or upstream curation as an engagement of curators close to the point of data creation. On the other hand, the recommendations on how to do that are scattered across multiple domain-oriented data efforts. In describing the complexities of active curation of survey data and providing general recommendations we aim to draw attention to the practices of active curation, stimulate the development of interoperable tools, standards, and techniques needed at the initial stages of research projects, and encourage collaborations between libraries and other academic units.


2021 ◽  
Vol 16 (1) ◽  
pp. 36
Author(s):  
Jukka Rantasaari

Sound research data management (RDM) competencies are elementary tools used by researchers to ensure integrated, reliable, and re-usable data, and to produce high quality research results. In this study, 35 doctoral students and faculty members were asked to self-rate or rate doctoral students’ current RDM competencies and rate the importance of these competencies. Structured interviews were conducted, using close-ended and open-ended questions, covering research data lifecycle phases such as collection, storing, organization, documentation, processing, analysis, preservation, and data sharing. The quantitative analysis of the respondents’ answers indicated a wide gap between doctoral students’ rated/self-rated current competencies and the rated importance of these competencies. In conclusion, two major educational needs were identified in the qualitative analysis of the interviews: to improve and standardize data management planning, including awareness of the intellectual property and agreements issues affecting data processing and sharing; and to improve and standardize data documenting and describing, not only for the researcher themself but especially for data preservation, sharing, and re-using. Hence the study informs the development of RDM education for doctoral students.


2021 ◽  
Author(s):  
Ashmita Kumar

<p>The Neuroimaging Data Model (NIDM) was started by an international team of cognitive scientists, computer scientists and statisticians to develop a data format capable of describing all aspects of the data lifecycle, from raw data through analyses and provenance. NIDM was built on top of the PROV standard and consists of three main interconnected specifications: Experiment, Results, and Workflow. These specifications were envisioned to capture information on all aspects of the neuroimaging data lifecycle, using semantic web techniques. They provide a critical capability to aid in reproducibility and replication of studies, as well as data discovery in shared resources. The NIDM-Experiment component has been used to describe publicly-available human neuroimaging datasets (e.g. ABIDE, ADHD200, CoRR, and OpenNeuro datasets) along with providing unambiguous descriptions of the clinical, neuropsychological, and imaging data collected as part of those studies resulting in approximately 4.5 million statements about aspects of these datasets.</p><p>PyNIDM, a toolbox written in Python, supports the creation, manipulation, and query of NIDM documents. It is an open-source project hosted on GitHub and distributed under the Apache License, Version 2.0. PyNIDM is under active development and testing. Tools have been created to support RESTful SPARQL queries of the NIDM documents in support of users wanting to identify interesting cohorts across datasets in support of evaluating scientific hypotheses and/or replicating results found in the literature. This query functionality, together with the NIDM document semantics, provides a path for investigators to interrogate datasets, understand what data was collected in those studies, and provide sufficiently-annotated data dictionaries of the variables collected to facilitate transformation and combining of data across studies.</p><p>Beyond querying across NIDM documents, some high-level statistical analysis tools are needed to provide investigators with an opportunity to gain more insight into data they may be interested in combining for a complete scientific investigation. Here we report on one such tool providing linear modeling support for NIDM documents: nidm_linreg.</p>


2021 ◽  
Author(s):  
Ashmita Kumar

<p>The Neuroimaging Data Model (NIDM) was started by an international team of cognitive scientists, computer scientists and statisticians to develop a data format capable of describing all aspects of the data lifecycle, from raw data through analyses and provenance. NIDM was built on top of the PROV standard and consists of three main interconnected specifications: Experiment, Results, and Workflow. These specifications were envisioned to capture information on all aspects of the neuroimaging data lifecycle, using semantic web techniques. They provide a critical capability to aid in reproducibility and replication of studies, as well as data discovery in shared resources. The NIDM-Experiment component has been used to describe publicly-available human neuroimaging datasets (e.g. ABIDE, ADHD200, CoRR, and OpenNeuro datasets) along with providing unambiguous descriptions of the clinical, neuropsychological, and imaging data collected as part of those studies resulting in approximately 4.5 million statements about aspects of these datasets.</p><p>PyNIDM, a toolbox written in Python, supports the creation, manipulation, and query of NIDM documents. It is an open-source project hosted on GitHub and distributed under the Apache License, Version 2.0. PyNIDM is under active development and testing. Tools have been created to support RESTful SPARQL queries of the NIDM documents in support of users wanting to identify interesting cohorts across datasets in support of evaluating scientific hypotheses and/or replicating results found in the literature. This query functionality, together with the NIDM document semantics, provides a path for investigators to interrogate datasets, understand what data was collected in those studies, and provide sufficiently-annotated data dictionaries of the variables collected to facilitate transformation and combining of data across studies.</p><p>Beyond querying across NIDM documents, some high-level statistical analysis tools are needed to provide investigators with an opportunity to gain more insight into data they may be interested in combining for a complete scientific investigation. Here we report on one such tool providing linear modeling support for NIDM documents: nidm_linreg.</p>


2021 ◽  
Vol 9 ◽  
Author(s):  
Yin Jin ◽  
Wang Junren ◽  
Jiang Jingwen ◽  
Sun Yajing ◽  
Chen Xi ◽  
...  

Relying on the Biomedical Big Data Center of West China Hospital, this paper makes an in-depth research on the construction method and application of breast cancer-specific database system based on full data lifecycle, including the establishment of data standards, data fusion and governance, multi-modal knowledge graph, data security sharing and value application of breast cancer-specific database. The research was developed by establishing the breast cancer master data and metadata standards, then collecting, mapping and governing the structured and unstructured clinical data, and parsing and processing the electronic medical records with NLP natural language processing method or other applicable methods, as well as constructing the breast cancer-specific database system to support the application of data in clinical practices, scientific research, and teaching in hospitals, giving full play to the value of medical big data of the Biomedical Big Data Center of West China Hospital.


Sign in / Sign up

Export Citation Format

Share Document