Applying a Metadata Framework to Improve Data Quality

Author(s):  
Victoria Youngohc Yoon ◽  
Peter Aiken ◽  
Tor Guimaraes

The importance of a company-wide framework for managing data resources has been recognized (Gunter, 2001; Lee, 2003, 2004; Madnick, Wang & Xian, 2003, 2004; Sawhney, 2001; Shankaranarayan, Ziad & Wang, 2003). It is considered a major component of information resources management (Guimaraes, 1988). Many organizations are discovering that imperfect data in information systems negatively affect their business operations and can be extremely costly (Brown, 2001; Keizer, 2004). The expanded data life cycle model proposed here enables us to identify links between cycle phases and data quality engineering dimensions. Expanding the data life cycle model and the dimensions of data quality will enable organizations to more effectively implement the inter- as well as intra-system use of their data resources, as well as better coordinate the development and application of their data quality engineering methods.

2018 ◽  
Vol 10 (1) ◽  
Author(s):  
Alexslis Maindze

Data forms the foundation on which knowledge is created, captured, used and shared. The lack of an approach consistent with technological changes and needs can facilitate loss of knowledge and increased costs. Integrated Vehicle Health Management (IVHM) is characterized by prognostics and diagnostics which depend heavily on high quality data to perform data-driven, model-based and hybrid computational analysis of asset health. As a result, managing data and knowledge for Integrated Vehicle Health Management (IVHM) requires a data life cycle model that adopts the OSA-CBM data model and integrate with other approaches. This project will propose such a model and use it to support the development of an IVHM knowledge management system.


2019 ◽  
Vol 11 (1) ◽  
Author(s):  
Alexslis Maindze ◽  
Zakwan Skaf ◽  
Ian Jennions

The creation, capturing, using and sharing of knowledge is based on data. The rate of data creation, collection, and elicitation through wide range experiments, simulations and measurements is rapidly increasing within Integrated Vehicle Health Management (IVHM). In addition, Knowledge Management (KM), data abstraction, analyses, storage and accessibility challenges persist, resulting in loss of knowledge and increased costs. This growth in the creation of research data, algorithms, technical papers, reports and logs, requires both a strategy and tool to address these challenges. A Data Life Cycle Model (DLCM) ensures the efficient and effective abstraction and management of both data and knowledge outputs. IVHM which depend heavily on high-quality data to perform data-driven, model-based and hybrid computational analysis of asset health. IVHM Centre does not yet have a systematic and coherent approach to its data management. The absence of a DLCM means that valuable knowledge might be lost or is difficult to find. Data visualization is fragmented and done on a project by project basis leading to increased costs. There is insufficient algorithm documentation and communication for easy transition between subsequent researchers and personnel. A systematic review of DLCMs, frameworks, standards and process models pertaining to data- and KM in the context of IVHM, found that there is no DLCM that is consistent with IVHM data and knowledge management requirements. Specifically, there is a need to develop a DLCM based on Open System Architecture for Condition-Based Maintenance framework.


2019 ◽  
Vol 15 (2) ◽  
Author(s):  
Débora Gomes de Araújo ◽  
Marco Antonio Almeida Llarena ◽  
Sandra De Alburqueque Siebra ◽  
Guilherme Ataíde Dias

RESUMO O objetivo foi analisar as intersecções entre os elementos dos modelos de ciclos de vida dos dados das iniciativas do DCC, DataONE e o CVD-CI. Trata-se de uma pesquisa descritiva, qualitativa e bibliográfica. Verificou-se que há correspondências entre etapas (nem sempre de um para um) dos ciclos de vida dos dados analisados. Foi possível constatar que o CVD-CI condensa várias atividades em uma única etapa, o que pode dificultar a sua aplicabilidade. De uma maneira geral, os modelos propostos ainda carecem de maior detalhamento para poderem ser aplicados diretamente por pesquisadores/curadores.Palavras-chave: Ciclo de Vida dos Dados; Curadoria Digital; Dados científicos; Tecnologia da Informação.ABSTRACT The objective was to analyze the intersections among the data life cycle model elements of the DCC, DataONE and CVD-CI initiatives. It is a descriptive, qualitative and bibliographical research. It was verified that there are correspondences between stages (not always one-to-one) of the analyzed data life cycles. It was possible to verify that the CVD-CI condenses several activities in a single step, which can hinder its applicability. In general, the proposed models still need to be further detailed so that they can be directly applied by researchers/curators.Keywords: Data Life Cycle; Digital Curation; Scientific Data; Information Technology.


2016 ◽  
Vol 10 (2) ◽  
pp. 176-192 ◽  
Author(s):  
Line Pouchard

As science becomes more data-intensive and collaborative, researchers increasingly use larger and more complex data to answer research questions. The capacity of storage infrastructure, the increased sophistication and deployment of sensors, the ubiquitous availability of computer clusters, the development of new analysis techniques, and larger collaborations allow researchers to address grand societal challenges in a way that is unprecedented. In parallel, research data repositories have been built to host research data in response to the requirements of sponsors that research data be publicly available. Libraries are re-inventing themselves to respond to a growing demand to manage, store, curate and preserve the data produced in the course of publicly funded research. As librarians and data managers are developing the tools and knowledge they need to meet these new expectations, they inevitably encounter conversations around Big Data. This paper explores definitions of Big Data that have coalesced in the last decade around four commonly mentioned characteristics: volume, variety, velocity, and veracity. We highlight the issues associated with each characteristic, particularly their impact on data management and curation. We use the methodological framework of the data life cycle model, assessing two models developed in the context of Big Data projects and find them lacking. We propose a Big Data life cycle model that includes activities focused on Big Data and more closely integrates curation with the research life cycle. These activities include planning, acquiring, preparing, analyzing, preserving, and discovering, with describing the data and assuring quality being an integral part of each activity. We discuss the relationship between institutional data curation repositories and new long-term data resources associated with high performance computing centers, and reproducibility in computational science. We apply this model by mapping the four characteristics of Big Data outlined above to each of the activities in the model. This mapping produces a set of questions that practitioners should be asking in a Big Data project


2020 ◽  
Author(s):  
Oleg Malafeyev ◽  
Irina Zaitseva ◽  
Sergey Sychev ◽  
Gennady Badin ◽  
Ilya Pavlov ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document