Some Factors Impact on Deghosting by Over/Under Towed-streamer Acquisition - Synthetic Data Analysis

Collection of high-throughput data has become prevalent in biology. Large datasets allow the use of statistical constructs such as binning and linear regression to quantify relationships between variables and hypothesize underlying biological mechanisms based on it. We discuss several such examples in relation to single-cell data and cellular growth. In particular, we show instances where what appears to be ordinary use of these statistical methods leads to incorrect conclusions such as growth being non-exponential as opposed to exponential and vice versa. We propose that the data analysis and its interpretation should be done in the context of a generative model, if possible. In this way, the statistical methods can be validated either analytically or against synthetic data generated via the use of the model, leading to a consistent method for inferring biological mechanisms from data. On applying the validated methods of data analysis to infer cellular growth on our experimental data, we find the growth of length in E. coli to be non-exponential. Our analysis shows that in the later stages of the cell cycle the growth rate is faster than exponential.

Download Full-text

Multidimensional Scaling and Individual Differences

Journal of Marketing Research ◽

10.1177/002224377100800110 ◽

1971 ◽

Vol 8 (1) ◽

pp. 71-77 ◽

Cited By ~ 2

Author(s):

Paul E. Green ◽

Vithala R. Rao

Keyword(s):

Data Analysis ◽

Individual Differences ◽

Multidimensional Scaling ◽

Synthetic Data

This article compares, via synthetic data analysis, the performance of five different methods for scaling averaged dissimilarities data under conditions involving individual differences in “perception.” All methods perform well when no “degradation” of the (simulated) ratings is entailed. When the data are transformed to zero-one values—a procedure sometimes followed in applied studies—all procedures perform poorly compared to the no-degradation case. Implications of these results for scaling applications involving group solutions are discussed.

Download Full-text

A Granular Approach to Analyze Spatiotemporal Data

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.380-384.2876 ◽

2013 ◽

Vol 380-384 ◽

pp. 2876-2879

Author(s):

Ming Li Song ◽

Shu Juan Wang

Keyword(s):

Data Analysis ◽

Everyday Life ◽

Synthetic Data ◽

Spatiotemporal Data ◽

Information Granules

Spatiotemporal data are widely visible in everyday life. This paper proposes an algorithm to represent them in a granular wayinformation granules. Information granules can be regarded as a collection of conceptual landmarks using which people can view the data and describe them in a semantic way. The key objective of this paper is to introduce a new granular way of data analysis through their granulation. Several experiments are done with synthetic data and the results show a clear way how our algorithm performs.

Download Full-text

Explanation and Prediction of Clinical Data with Imbalanced Class Distribution based on Pattern Discovery and Disentanglement

10.21203/rs.3.rs-28409/v1 ◽

2020 ◽

Author(s):

Peiyuan Zhou ◽

Andrew K.C. Wong

Keyword(s):

Data Analysis ◽

Clinical Data ◽

Pattern Discovery ◽

Synthetic Data ◽

General Setting ◽

Clinical Practices ◽

Class Distribution ◽

Clinical Data Analysis ◽

Imbalanced Class ◽

To Come

Abstract Background Statistical data analysis, especially the advanced machine learning (ML) methods, have attracted considerable interest and application in clinical practices. First, the interpretability of the diagnostic/prognostic results will bring confidence to doctors, patients and their relatives in therapeutics and clinical practice. Furthermore, from the clinical aspect, when the datasets are imbalanced in diagnostic categories, the ordinary ML methods might produce results overwhelmed by the majority classes diminishing prediction accuracy. Hence, it is desirable to have a method that could produce explicit transparent and interpretable results in decision-making, even for data with imbalanced groups.Methods In order to interpret the clinical patterns and conduct diagnostic prediction of patients, we present our new method, Pattern Discovery and Disentanglement for Clinical Data Analysis (cPDD), which is able to discover patterns (correlated traits/indicants) and use them to classify clinical data even if the class distribution is imbalanced. In the most general setting, a relational dataset is a large table such that each column represents an attribute (trait/indicant), each row contains a set of attribute values (AVs) of an entity (patient). Compared to the existing pattern discovery approaches, cPDD can discover a small and succinct set of statistically significant high-order patterns from clinical data for interpreting and predicting the disease class of the patients even for small and rare groups.Results Experiments on synthetic and thoracic clinical dataset showed that cPDD can 1) discover fewer patterns compared to other existing pattern discovery methods; 2) allow the users to interpret succinct sets of patterns coming from uncorrelated sources, even the groups are rare/small; and 3) obtain better performance in prediction compared to other interpretable classification approaches.Conclusions In conclusion, cPDD discovers fewer patterns with greater comprehensive coverage to improve the interpretability of patterns discovered. Experimental results on synthetic data validated that cPDD discover all patterns implanted in the data, display them precisely and succinctly with statistical support for interpretation and prediction, a capability which the traditional ML methods lack. The success of cPDD as a novel explainable method in solving the imbalanced class problem shows its great potential to clinical data analysis for years to come.

Download Full-text

Investment-Cash Flow Sensitivity: A Study of Iranian Listed Companies

Organizacija ◽

10.2478/orga-2013-0009 ◽

2013 ◽

Vol 46 (3) ◽

pp. 87-97

Author(s):

Mahdi Salehi ◽

Ali Mohammadi ◽

Parisa Taherzadeh Esfahani

Keyword(s):

Regression Analysis ◽

Data Analysis ◽

Cash Flow ◽

Stock Exchange ◽

Synthetic Data ◽

Listed Companies ◽

Audit Report ◽

Flow Sensitivity ◽

Cash Flow Sensitivity ◽

Negative Effect

The main objective of the current study is to examine the effect of audit report on cash-flow investment sensitivity of 123 listed companies in Tehran Stock Exchange (TSE) during 2006-2010. Regression analysis and synthetic data were used for data analysis. The results showed that receiving modified report has a significant negative effect on cash flow-investment sensitivity. The findings also suggest the significant effect of receiving qualified report and unqualified report with explanatory paragraphs on cash flow-investment sensitivity.

Download Full-text

Bayesian data analysis reveals no preference for cardinal Tafel slopes in CO2 reduction electrocatalysis

Nature Communications ◽

10.1038/s41467-021-20924-y ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

Aditya M. Limaye ◽

Joy S. Zeng ◽

Adam P. Willard ◽

Karthish Manthiram

Keyword(s):

Data Analysis ◽

Synthetic Data ◽

Co2 Reduction ◽

Tafel Slope ◽

Bayesian Data Analysis ◽

Electrochemical Catalyst ◽

Current Voltage ◽

Uncertainty Estimates ◽

Tafel Slopes ◽

Kinetic Investigations

AbstractThe Tafel slope is a key parameter often quoted to characterize the efficacy of an electrochemical catalyst. In this paper, we develop a Bayesian data analysis approach to estimate the Tafel slope from experimentally-measured current-voltage data. Our approach obviates the human intervention required by current literature practice for Tafel estimation, and provides robust, distributional uncertainty estimates. Using synthetic data, we illustrate how data insufficiency can unknowingly influence current fitting approaches, and how our approach allays these concerns. We apply our approach to conduct a comprehensive re-analysis of data from the CO2 reduction literature. This analysis reveals no systematic preference for Tafel slopes to cluster around certain "cardinal values” (e.g. 60 or 120 mV/decade). We hypothesize several plausible physical explanations for this observation, and discuss the implications of our finding for mechanistic analysis in electrochemical kinetic investigations.

Download Full-text

Expert knowledge and data analysis for detecting advanced persistent threats

Open Mathematics ◽

10.1515/math-2017-0094 ◽

2017 ◽

Vol 15 (1) ◽

pp. 1108-1122 ◽

Cited By ~ 5

Author(s):

Juan Ramón Moya ◽

Noemí DeCastro-García ◽

Ramón-Ángel Fernández-Díaz ◽

Jorge Lorenzana Tamargo

Keyword(s):

Data Analysis ◽

Public Administration ◽

Expert Knowledge ◽

Synthetic Data ◽

Critical Infrastructures ◽

Data Traffic ◽

It Infrastructure ◽

Effective Learning ◽

Advanced Persistent Threats ◽

Log Files

Abstract Critical Infrastructures in public administration would be compromised by Advanced Persistent Threats (APT) which today constitute one of the most sophisticated ways of stealing information. This paper presents an effective, learning based tool that uses inductive techniques to analyze the information provided by firewall log files in an IT infrastructure, and detect suspicious activity in order to mark it as a potential APT. The experiments have been accomplished mixing real and synthetic data traffic to represent different proportions of normal and anomalous activity.

Download Full-text

Modelling and simulation for metabolomics data analysis

Biochemical Society Transactions ◽

10.1042/bst0331427 ◽

2005 ◽

Vol 33 (6) ◽

pp. 1427-1429 ◽

Cited By ~ 10

Author(s):

P. Mendes ◽

D. Camacho ◽

A. de la Fuente

Keyword(s):

Data Analysis ◽

Synthetic Data ◽

Network Models ◽

Large Data ◽

Biochemical Network ◽

Data Sets ◽

Metabolomics Data ◽

Analysis Methods ◽

True Knowledge ◽

Data Analysis Methods

The advent of large data sets, such as those produced in metabolomics, presents a considerable challenge in terms of their interpretation. Several mathematical and statistical methods have been proposed to analyse these data, and new ones continue to appear. However, these methods often disagree in their analyses, and their results are hard to interpret. A major contributing factor for the difficulties in interpreting these data lies in the data analysis methods themselves, which have not been thoroughly studied under controlled conditions. We have been producing synthetic data sets by simulation of realistic biochemical network models with the purpose of comparing data analysis methods. Because we have full knowledge of the underlying ‘biochemistry’ of these models, we are better able to judge how well the analyses reflect true knowledge about the system. Another advantage is that the level of noise in these data is under our control and this allows for studying how the inferences are degraded by noise. Using such a framework, we have studied the extent to which correlation analysis of metabolomics data sets is capable of recovering features of the biochemical system. We were able to identify four major metabolic regulatory configurations that result in strong metabolite correlations. This example demonstrates the utility of biochemical simulation in the analysis of metabolomics data.

Download Full-text

A Study of Representation of Spatiotemporal Data with Information Granules

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.380-384.1382 ◽

2013 ◽

Vol 380-384 ◽

pp. 1382-1385

Author(s):

Ming Li Song ◽

Yong Bin Wang

Keyword(s):

Data Analysis ◽

Everyday Life ◽

Synthetic Data ◽

Spatiotemporal Data ◽

Information Granules

Spatiotemporal data are widely visible in everyday life. This paper proposes an algorithm to represent them in a granular wayinformation granules. Information granules can be regarded as a collection of conceptual landmarks using which people can view the data and describe them in a semantic way. The key objective of this paper is to introduce a new granular way of data analysis through their granulation. Several experiments are done with synthetic data and the results show a clear way how our algorithm performs.

Download Full-text

An approach to benchmark fraud detection algorithms in the COVID-19 era

Revista Latinoamericana de Economía y Sociedad Digital ◽

10.53857/rpgd2470 ◽

2021 ◽

Author(s):

Miroslawa Alunowska Figueroa ◽

Daniel Turner-Szymkiewicz ◽

Edgar Alonso Lopez-Rojas ◽

Juan Sebastián Cárdenas-Rodriguez ◽

Ulf Norinder

Keyword(s):

Machine Learning ◽

Data Analysis ◽

Financial Institutions ◽

Detection System ◽

Synthetic Data ◽

Fraud Detection ◽

Financial Crime ◽

Innocent People ◽

Detection Algorithms ◽

Digital Payments

To address the challenges in the fight against financial crime, particularly in the COVID-19 pandemic context, this paper focuses on financial synthetic data and the use of a reliable benchmark tool to test fraud detection algorithms. Compliance departments at financial institutions face the challenge of reducing the number of innocent people erroneously accused of fraud. To cope with this problem financial institutions are exploring the application of machine learning fraud detection algorithms and data analysis technologies to develop a more accurate and precise fraud detection system. However, approaches to streamlining and automating banks’ monitoring and testing processes is challenging as there is no consensus on a benchmark. We explore the relevance of measuring the applicability of a financial crime benchmark in the presence of a growing digital financial sector, such as in the case of Mexico. This study is particularly important due to serious threats that are faced by a rapidly developing financial system (2019 Mexican Central Bank Report). These risks have been further exacerbated as a result of the COVID-19 pandemic accelerating the shift towards digital payments.

Download Full-text