data matrices
Recently Published Documents


TOTAL DOCUMENTS

164
(FIVE YEARS 36)

H-INDEX

24
(FIVE YEARS 3)

2021 ◽  
Author(s):  
Isaac Virshup ◽  
Sergei Rybakov ◽  
Fabian J Theis ◽  
Philipp Angerer ◽  
F. Alexander Wolf

anndata is a Python package for handling annotated data matrices in memory and on disk, positioned between pandas and xarray. anndata offers a broad range of computationally efficient features including, among others, sparse data support, lazy operations, and a PyTorch interface.


2021 ◽  
pp. 1471082X2110410
Author(s):  
Elena Tuzhilina ◽  
Leonardo Tozzi ◽  
Trevor Hastie

Canonical correlation analysis (CCA) is a technique for measuring the association between two multivariate data matrices. A regularized modification of canonical correlation analysis (RCCA) which imposes an [Formula: see text] penalty on the CCA coefficients is widely used in applications with high-dimensional data. One limitation of such regularization is that it ignores any data structure, treating all the features equally, which can be ill-suited for some applications. In this article we introduce several approaches to regularizing CCA that take the underlying data structure into account. In particular, the proposed group regularized canonical correlation analysis (GRCCA) is useful when the variables are correlated in groups. We illustrate some computational strategies to avoid excessive computations with regularized CCA in high dimensions. We demonstrate the application of these methods in our motivating application from neuroscience, as well as in a small simulation example.


2021 ◽  
Vol 7 (2) ◽  
pp. 79-84
Author(s):  
V. Korzhik ◽  
V. Starostin ◽  
D Flaksman

Paper (or plastic) certificates are considered as a means against different forgery of product quality and brand falsification. It is commonly to use barcodes or data matrices to solve this problem. However such approach does not work usually against such sophisticated attacks as cloning of certificate copies. In the current paper we propose to use a digital watermarking and estimation of inner noises of scanners and printers in order to detect cloning attack effectively. Algorithm of cloning attack detecting is presented. The probabilities of attack missing and false alarm are proved.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Beatriz Galindo-Prieto ◽  
Paul Geladi ◽  
Johan Trygg

Abstract Background For multivariate data analysis involving only two input matrices (e.g., X and Y), the previously published methods for variable influence on projection (e.g., VIPOPLS or VIPO2PLS) are widely used for variable selection purposes, including (i) variable importance assessment, (ii) dimensionality reduction of big data and (iii) interpretation enhancement of PLS, OPLS and O2PLS models. For multiblock analysis, the OnPLS models find relationships among multiple data matrices (more than two blocks) by calculating latent variables; however, a method for improving the interpretation of these latent variables (model components) by assessing the importance of the input variables was not available up to now. Results A method for variable selection in multiblock analysis, called multiblock variable influence on orthogonal projections (MB-VIOP) is explained in this paper. MB-VIOP is a model based variable selection method that uses the data matrices, the scores and the normalized loadings of an OnPLS model in order to sort the input variables of more than two data matrices according to their importance for both simplification and interpretation of the total multiblock model, and also of the unique, local and global model components separately. MB-VIOP has been tested using three datasets: a synthetic four-block dataset, a real three-block omics dataset related to plant sciences, and a real six-block dataset related to the food industry. Conclusions We provide evidence for the usefulness and reliability of MB-VIOP by means of three examples (one synthetic and two real-world cases). MB-VIOP assesses in a trustable and efficient way the importance of both isolated and ranges of variables in any type of data. MB-VIOP connects the input variables of different data matrices according to their relevance for the interpretation of each latent variable, yielding enhanced interpretability for each OnPLS model component. Besides, MB-VIOP can deal with strong overlapping of types of variation, as well as with many data blocks with very different dimensionality. The ability of MB-VIOP for generating dimensionality reduced models with high interpretability makes this method ideal for big data mining, multi-omics data integration and any study that requires exploration and interpretation of large streams of data.


2021 ◽  
Vol 49 (2) ◽  
Author(s):  
Changxiao Cai ◽  
Gen Li ◽  
Yuejie Chi ◽  
H. Vincent Poor ◽  
Yuxin Chen

2021 ◽  
pp. 1-18
Author(s):  
Michael Peress

Abstract Recent advances in the study of voting behavior and the study of legislatures have relied on ideal point estimation for measuring the preferences of political actors, and increasingly, these applications have involved very large data matrices. This has proved challenging for the widely available approaches. Limitations of existing methods include excessive computation time and excessive memory requirements on large datasets, the inability to efficiently deal with sparse data matrices, inefficient computation of standard errors, and ineffective methods for generating starting values. I develop an approach for estimating multidimensional ideal points in large-scale applications, which overcomes these limitations. I demonstrate my approach by applying it to a number of challenging problems. The methods I develop are implemented in an r package (ipe).


2021 ◽  
pp. 69-112
Author(s):  
Andrew V. Z. Brower ◽  
Randall T. Schuh

This chapter examines the theory and methods that allow systematists to recognize characters, character states, and the taxa they delimit. In systematics, similarity is a relative relation that exists among at least three things. For a given attribute, two things are more similar to one another than either of them is to a third thing, and when multiple attributes are assessed together, the nested degrees of similarity across the range of attributes provide evidence for hypothesizing phylogenetic relationships. Yet things can be similar in one aspect but not similar in other aspects. Once recognized and characterized in words, a theory of similarity of a feature shared among taxa may be tested in three (often interconnected) ways: (1) conjunction, (2) similarity of structure, and (3) similarity of position. Although the distinction between characters and states may be semantic, treatment of features as alternate states of the same character versus different characters is necessary for the construction of data matrices. How this is done can have important implications for character weights, and potentially the outcome of analyses.


Sign in / Sign up

Export Citation Format

Share Document