data matrices Latest Research Papers

anndata: Annotated data

10.1101/2021.12.16.473007 ◽

2021 ◽

Author(s):

Isaac Virshup ◽

Sergei Rybakov ◽

Fabian J Theis ◽

Philipp Angerer ◽

F. Alexander Wolf

Keyword(s):

Sparse Data ◽

Computationally Efficient ◽

Data Support ◽

Python Package ◽

Data Matrices

anndata is a Python package for handling annotated data matrices in memory and on disk, positioned between pandas and xarray. anndata offers a broad range of computationally efficient features including, among others, sparse data support, lazy operations, and a PyTorch interface.

Download Full-text

Canonical correlation analysis in high dimensions with structured regularization

Statistical Modelling ◽

10.1177/1471082x211041033 ◽

2021 ◽

pp. 1471082X2110410

Author(s):

Elena Tuzhilina ◽

Leonardo Tozzi ◽

Trevor Hastie

Keyword(s):

Data Structure ◽

Correlation Analysis ◽

Canonical Correlation Analysis ◽

Canonical Correlation ◽

Multivariate Data ◽

High Dimensional Data ◽

High Dimensional ◽

High Dimensions ◽

Structured Regularization ◽

Data Matrices

Canonical correlation analysis (CCA) is a technique for measuring the association between two multivariate data matrices. A regularized modification of canonical correlation analysis (RCCA) which imposes an [Formula: see text] penalty on the CCA coefficients is widely used in applications with high-dimensional data. One limitation of such regularization is that it ignores any data structure, treating all the features equally, which can be ill-suited for some applications. In this article we introduce several approaches to regularizing CCA that take the underlying data structure into account. In particular, the proposed group regularized canonical correlation analysis (GRCCA) is useful when the variables are correlated in groups. We illustrate some computational strategies to avoid excessive computations with regularized CCA in high dimensions. We demonstrate the application of these methods in our motivating application from neuroscience, as well as in a small simulation example.

Download Full-text

Estimating the boundaries of the feasible profiles in the bilinear decomposition of multi-component data matrices.

Chemometrics and Intelligent Laboratory Systems ◽

10.1016/j.chemolab.2021.104387 ◽

2021 ◽

pp. 104387

Author(s):

Alejandro C. Olivieri

Keyword(s):

Data Matrices

Download Full-text

Elaboration of Digital Watermarking Method for a Protection of Cloning Attack on Paper Certificates

Proceedings of Telecommunication Universities ◽

10.31854/1813-324x-2021-7-2-79-84 ◽

2021 ◽

Vol 7 (2) ◽

pp. 79-84

Author(s):

V. Korzhik ◽

V. Starostin ◽

D Flaksman

Keyword(s):

False Alarm ◽

Product Quality ◽

Digital Watermarking ◽

Current Paper ◽

Data Matrices

Paper (or plastic) certificates are considered as a means against different forgery of product quality and brand falsification. It is commonly to use barcodes or data matrices to solve this problem. However such approach does not work usually against such sophisticated attacks as cloning of certificate copies. In the current paper we propose to use a digital watermarking and estimation of inner noises of scanners and printers in order to detect cloning attack effectively. Algorithm of cloning attack detecting is presented. The probabilities of attack missing and false alarm are proved.

Download Full-text

Use of Multivariate Mathematical Methods for the Evaluation of Retention Data Matrices

ADVANCES IN Chromatography ◽

10.1201/9781003210290-1 ◽

2021 ◽

pp. 1-63

Author(s):

Tibor Cserháti ◽

Esther Forgács

Keyword(s):

Retention Data ◽

Mathematical Methods ◽

Data Matrices

Download Full-text

Multiblock variable influence on orthogonal projections (MB-VIOP) for enhanced interpretation of total, global, local and unique variations in OnPLS models

BMC Bioinformatics ◽

10.1186/s12859-021-04015-9 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Beatriz Galindo-Prieto ◽

Paul Geladi ◽

Johan Trygg

Keyword(s):

Big Data ◽

Variable Selection ◽

Latent Variables ◽

Latent Variable ◽

Orthogonal Projections ◽

Multiple Data ◽

Input Variables ◽

Model Components ◽

Variable Influence ◽

Data Matrices

Abstract Background For multivariate data analysis involving only two input matrices (e.g., X and Y), the previously published methods for variable influence on projection (e.g., VIPOPLS or VIPO2PLS) are widely used for variable selection purposes, including (i) variable importance assessment, (ii) dimensionality reduction of big data and (iii) interpretation enhancement of PLS, OPLS and O2PLS models. For multiblock analysis, the OnPLS models find relationships among multiple data matrices (more than two blocks) by calculating latent variables; however, a method for improving the interpretation of these latent variables (model components) by assessing the importance of the input variables was not available up to now. Results A method for variable selection in multiblock analysis, called multiblock variable influence on orthogonal projections (MB-VIOP) is explained in this paper. MB-VIOP is a model based variable selection method that uses the data matrices, the scores and the normalized loadings of an OnPLS model in order to sort the input variables of more than two data matrices according to their importance for both simplification and interpretation of the total multiblock model, and also of the unique, local and global model components separately. MB-VIOP has been tested using three datasets: a synthetic four-block dataset, a real three-block omics dataset related to plant sciences, and a real six-block dataset related to the food industry. Conclusions We provide evidence for the usefulness and reliability of MB-VIOP by means of three examples (one synthetic and two real-world cases). MB-VIOP assesses in a trustable and efficient way the importance of both isolated and ranges of variables in any type of data. MB-VIOP connects the input variables of different data matrices according to their relevance for the interpretation of each latent variable, yielding enhanced interpretability for each OnPLS model component. Besides, MB-VIOP can deal with strong overlapping of types of variation, as well as with many data blocks with very different dimensionality. The ability of MB-VIOP for generating dimensionality reduced models with high interpretability makes this method ideal for big data mining, multi-omics data integration and any study that requires exploration and interpretation of large streams of data.

Download Full-text

Subspace estimation from unbalanced and incomplete data matrices: ℓ2,∞ statistical guarantees

The Annals of Statistics ◽

10.1214/20-aos1986 ◽

2021 ◽

Vol 49 (2) ◽

Author(s):

Changxiao Cai ◽

Gen Li ◽

Yuejie Chi ◽

H. Vincent Poor ◽

Yuxin Chen

Keyword(s):

Incomplete Data ◽

Subspace Estimation ◽

Data Matrices

Download Full-text

Large-Scale Ideal Point Estimation

Political Analysis ◽

10.1017/pan.2021.5 ◽

2021 ◽

pp. 1-18

Author(s):

Michael Peress

Keyword(s):

Voting Behavior ◽

Large Scale ◽

Ideal Point ◽

Computation Time ◽

Large Data ◽

R Package ◽

Point Estimation ◽

Political Actors ◽

Ideal Point Estimation ◽

Data Matrices

Abstract Recent advances in the study of voting behavior and the study of legislatures have relied on ideal point estimation for measuring the preferences of political actors, and increasingly, these applications have involved very large data matrices. This has proved challenging for the widely available approaches. Limitations of existing methods include excessive computation time and excessive memory requirements on large datasets, the inability to efficiently deal with sparse data matrices, inefficient computation of standard errors, and ineffective methods for generating starting values. I develop an approach for estimating multidimensional ideal points in large-scale applications, which overcomes these limitations. I demonstrate my approach by applying it to a number of challenging problems. The methods I develop are implemented in an r package (ipe).

Download Full-text

Characters and Character States

10.7591/cornell/9781501752773.003.0003 ◽

2021 ◽

pp. 69-112

Author(s):

Andrew V. Z. Brower ◽

Randall T. Schuh

Keyword(s):

Phylogenetic Relationships ◽

Semantic Treatment ◽

Alternate States ◽

Multiple Attributes ◽

Data Matrices

This chapter examines the theory and methods that allow systematists to recognize characters, character states, and the taxa they delimit. In systematics, similarity is a relative relation that exists among at least three things. For a given attribute, two things are more similar to one another than either of them is to a third thing, and when multiple attributes are assessed together, the nested degrees of similarity across the range of attributes provide evidence for hypothesizing phylogenetic relationships. Yet things can be similar in one aspect but not similar in other aspects. Once recognized and characterized in words, a theory of similarity of a feature shared among taxa may be tested in three (often interconnected) ways: (1) conjunction, (2) similarity of structure, and (3) similarity of position. Although the distinction between characters and states may be semantic, treatment of features as alternate states of the same character versus different characters is necessary for the construction of data matrices. How this is done can have important implications for character weights, and potentially the outcome of analyses.

Download Full-text

Rank Detection Thresholds for Hankel or Toeplitz Data Matrices

2020 28th European Signal Processing Conference (EUSIPCO) ◽

10.23919/eusipco47968.2020.9287856 ◽

2021 ◽

Author(s):

Alle-Jan van der Veen ◽

Jac Romme ◽

Ye Cui

Keyword(s):

Detection Thresholds ◽

Data Matrices

Download Full-text

data matrices
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

anndata: Annotated data

Canonical correlation analysis in high dimensions with structured regularization

Estimating the boundaries of the feasible profiles in the bilinear decomposition of multi-component data matrices.

Elaboration of Digital Watermarking Method for a Protection of Cloning Attack on Paper Certificates

Use of Multivariate Mathematical Methods for the Evaluation of Retention Data Matrices

Multiblock variable influence on orthogonal projections (MB-VIOP) for enhanced interpretation of total, global, local and unique variations in OnPLS models

Subspace estimation from unbalanced and incomplete data matrices: ℓ2,∞ statistical guarantees

Large-Scale Ideal Point Estimation

Characters and Character States

Rank Detection Thresholds for Hankel or Toeplitz Data Matrices

Export Citation Format

data matricesRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

anndata: Annotated data

Canonical correlation analysis in high dimensions with structured regularization

Estimating the boundaries of the feasible profiles in the bilinear decomposition of multi-component data matrices.

Elaboration of Digital Watermarking Method for a Protection of Cloning Attack on Paper Certificates

Use of Multivariate Mathematical Methods for the Evaluation of Retention Data Matrices

Multiblock variable influence on orthogonal projections (MB-VIOP) for enhanced interpretation of total, global, local and unique variations in OnPLS models

Subspace estimation from unbalanced and incomplete data matrices: ℓ2,∞ statistical guarantees

Large-Scale Ideal Point Estimation

Characters and Character States

Rank Detection Thresholds for Hankel or Toeplitz Data Matrices

data matrices
Recently Published Documents