pairwise models
Recently Published Documents


TOTAL DOCUMENTS

21
(FIVE YEARS 8)

H-INDEX

5
(FIVE YEARS 1)

2021 ◽  
Vol 2021 (12) ◽  
pp. 124007
Author(s):  
Christoph Feinauer ◽  
Carlo Lucibello

Abstract Pairwise models like the Ising model or the generalized Potts model have found many successful applications in fields like physics, biology, and economics. Closely connected is the problem of inverse statistical mechanics, where the goal is to infer the parameters of such models given observed data. An open problem in this field is the question of how to train these models in the case where the data contain additional higher-order interactions that are not present in the pairwise model. In this work, we propose an approach based on energy-based models and pseudolikelihood maximization to address these complications: we show that hybrid models, which combine a pairwise model and a neural network, can lead to significant improvements in the reconstruction of pairwise interactions. We show these improvements to hold consistently when compared to a standard approach using only the pairwise model and to an approach using only a neural network. This is in line with the general idea that simple interpretable models and complex black-box models are not necessarily a dichotomy: interpolating these two classes of models can allow to keep some advantages of both.


2021 ◽  
Author(s):  
Christoph Feinauer ◽  
Barthelemy Meynard-Piganeau ◽  
Carlo Lucibello

Many different types of generative models for protein sequences have been proposed in literature. Their uses include the prediction of mutational effects, protein design and the prediction of structural properties. Neural network (NN) architectures have shown great performances, commonly attributed to the capacity to extract non-trivial higher-order interactions from the data. In this work, we analyze three different NN models and assess how close they are to simple pairwise distributions, which have been used in the past for similar problems. We present an approach for extracting pairwise models from more complex ones using an energy-based modeling framework. We show that for the tested models the extracted pairwise models can replicate the energies of the original models and are also close in performance in tasks like mutational effect prediction.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Janet C. Siebert ◽  
Martine Saint-Cyr ◽  
Sarah J. Borengasser ◽  
Brandie D. Wagner ◽  
Catherine A. Lozupone ◽  
...  

Abstract Background One goal of multi-omic studies is to identify interpretable predictive models for outcomes of interest, with analytes drawn from multiple omes. Such findings could support refined biological insight and hypothesis generation. However, standard analytical approaches are not designed to be “ome aware.” Thus, some researchers analyze data from one ome at a time, and then combine predictions across omes. Others resort to correlation studies, cataloging pairwise relationships, but lacking an obvious approach for cohesive and interpretable summaries of these catalogs. Methods We present a novel workflow for building predictive regression models from network neighborhoods in multi-omic networks. First, we generate pairwise regression models across all pairs of analytes from all omes, encoding the resulting “top table” of relationships in a network. Then, we build predictive logistic regression models using the analytes in network neighborhoods of interest. We call this method CANTARE (Consolidated Analysis of Network Topology And Regression Elements). Results We applied CANTARE to previously published data from healthy controls and patients with inflammatory bowel disease (IBD) consisting of three omes: gut microbiome, metabolomics, and microbial-derived enzymes. We identified 8 unique predictive models with AUC > 0.90. The number of predictors in these models ranged from 3 to 13. We compare the results of CANTARE to random forests and elastic-net penalized regressions, analyzing AUC, predictions, and predictors. CANTARE AUC values were competitive with those generated by random forests and  penalized regressions. The top 3 CANTARE models had a greater dynamic range of predicted probabilities than did random forests and penalized regressions (p-value = 1.35 × 10–5). CANTARE models were significantly more likely to prioritize predictors from multiple omes than were the alternatives (p-value = 0.005). We also showed that predictive models from a network based on pairwise models with an interaction term for IBD have higher AUC than predictive models built from a correlation network (p-value = 0.016). R scripts and a CANTARE User’s Guide are available at https://sourceforge.net/projects/cytomelodics/files/CANTARE/. Conclusion CANTARE offers a flexible approach for building parsimonious, interpretable multi-omic models. These models yield quantitative and directional effect sizes for predictors and support the generation of hypotheses for follow-up investigation.


2020 ◽  
Author(s):  
Almut Heinken ◽  
Geeta Acharya ◽  
Dmitry A. Ravcheev ◽  
Johannes Hertel ◽  
Malgorzata Nyga ◽  
...  

AbstractThe human microbiome influences the efficacy and safety of a wide variety of commonly prescribed drugs, yet comprehensive systems-level approaches to interrogate drug-microbiome interactions are lacking. Here, we present a computational resource of human microbial genome-scale reconstructions, deemed AGORA2, which accounts for 7,206 strains, includes microbial drug degradation and biotransformation, and was extensively curated based on comparative genomics and literature searches. AGORA2 serves as a knowledge base for the human microbiome and as a metabolic modelling resource. We demonstrate the latter by mechanistically modelling microbial drug metabolism capabilities in single strains and pairwise models. Moreover, we predict the individual-specific drug conversion potential in a cohort of 616 colorectal cancer patients and controls. This analysis reveals that some drug activation capabilities are present in only a subset of individuals, moreover, drug conversion potential correlate with clinical parameters. Thus, AGORA2 paves the way towards personalised, predictive analysis of host-drug-microbiome interactions.


Entropy ◽  
2020 ◽  
Vol 22 (7) ◽  
pp. 744
Author(s):  
Giulio Burgio ◽  
Joan T. Matamalas ◽  
Sergio Gómez ◽  
Alex Arenas

Many real systems are strongly characterized by collective cooperative phenomena whose existence and properties still need a satisfactory explanation. Coherently with their collective nature, they call for new and more accurate descriptions going beyond pairwise models, such as graphs, in which all the interactions are considered as involving only two individuals at a time. Hypergraphs respond to this need, providing a mathematical representation of a system allowing from pairs to larger groups. In this work, through the use of different hypergraphs, we study how group interactions influence the evolution of cooperation in a structured population, by analyzing the evolutionary dynamics of the public goods game. Here we show that, likewise to network reciprocity, group interactions also promote cooperation. More importantly, by means of an invasion analysis in which the conditions for a strategy to survive are studied, we show how, in heterogeneously-structured populations, reciprocity among players is expected to grow with the increasing of the order of the interactions. This is due to the heterogeneity of connections and, particularly, to the presence of individuals standing out as hubs in the population. Our analysis represents a first step towards the study of evolutionary dynamics through higher-order interactions, and gives insights into why cooperation in heterogeneous higher-order structures is enhanced. Lastly, it also gives clues about the co-existence of cooperative and non-cooperative behaviors related to the structural properties of the interaction patterns.


2020 ◽  
Vol 34 (04) ◽  
pp. 4166-4173
Author(s):  
Mengdi Huai ◽  
Di Wang ◽  
Chenglin Miao ◽  
Aidong Zhang

Recently, there are increasingly more attentions paid to an important family of learning problems called pairwise learning, in which the associated loss functions depend on pairs of instances. Despite the tremendous success of pairwise learning in many real-world applications, the lack of transparency behind the learned pairwise models makes it difficult for users to understand how particular decisions are made by these models, which further impedes users from trusting the predicted results. To tackle this problem, in this paper, we study feature importance scoring as a specific approach to the problem of interpreting the predictions of black-box pairwise models. Specifically, we first propose a novel adaptive Shapley-value-based interpretation method, based on which a vector of importance scores associated with the underlying features of a testing instance pair can be adaptively calculated with the consideration of feature correlations, and these scores can be used to indicate which features make key contributions to the final prediction. Considering that Shapley-value-based methods are usually computationally challenging, we further propose a novel robust approximation interpretation method for pairwise models. This method is not only much more efficient but also robust to data noise. To the best of our knowledge, we are the first to investigate how to enable interpretation in pairwise learning. Theoretical analysis and extensive experiments demonstrate the effectiveness of the proposed methods.


2019 ◽  
Author(s):  
Jason Cory Brunson ◽  
Thomas P. Agresta ◽  
Reinhard C. Laubenbacher

1Summary and KeywordsBackgroundComorbidity network analysis (CNA) is an increasingly popular approach in systems medicine, in which mathematical graphs encode epidemiological correlations (links) between diseases (nodes) inferred from their occurrence in an underlying patient population. A variety of methods have been used to infer properties of the constituent diseases or underlying populations from the network structure, but few have been validated or reproduced.ObjectivesTo test the robustness and sensitivity of several common CNA techniques to the source of population health data and the method of link determination.MethodsWe obtained six sources of aggregated disease co-occurrence data, coded using varied ontologies, most of which were provided by the authors of CNAs. We constructed families of comorbidity networks from these data sets, in which links were determined using a range of statistical thresholds and measures of association. We calculated degree distributions, single-value statistics, and centrality rankings for these networks and evaluated their sensitivity to the source of data and link determination parameters. From two open-access sources of patient-level data, we constructed comorbidity networks using several multivariate models in addition to comparable pairwise models and evaluated differences between correlation estimates and network structure.ResultsGlobal network statistics vary widely depending on the underlying population. Much of this variation is due to network density, which for our six data sets ranged over three orders of magnitude. The statistical threshold for link determination also had strong effects on global statistics, though at any fixed threshold the same patterns distinguished our six populations. The association measure used to quantify comorbid relations had smaller but discernible effects on global structure. Co-occurrence rates estimated using multivariate models were increasingly negative-shifted as models accounted for more effects. However, only associations between the most prevalent disorders were consistent from model to model. Centrality rankings were likewise similar when based on the same dataset using different constructions; but they were difficult to compare, and very different when comparable, between data sets, especially those using different ontologies. The most central disease codes were particular to the underlying populations and were often broad categories, injuries, or non-specific symptoms.ConclusionsCNAs can improve robustness and comparability by accounting for known limitations. In particular, we urge comorbidity network analysts (a) to include, where permissible, disaggregated disease occurrence data to allow more targeted reproduction and comparison of results; (b) to report differences in results obtained using different association measures, including both one of relative risk and one of correlation; (c) when identifying centrally located disorders, to carefully decide the most suitable ontology for this purpose; and, (d) when relevant to the interpretation of results, to compare them to those obtained using a multivariate model.


2019 ◽  
Vol 79 (3) ◽  
pp. 823-860 ◽  
Author(s):  
Rosanna C. Barnard ◽  
Luc Berthouze ◽  
Péter L. Simon ◽  
István Z. Kiss

Entropy ◽  
2018 ◽  
Vol 20 (10) ◽  
pp. 739 ◽  
Author(s):  
Alberto Beretta ◽  
Claudia Battistin ◽  
Clélia de Mulatier ◽  
Iacopo Mastromatteo ◽  
Matteo Marsili

Models can be simple for different reasons: because they yield a simple and computationally efficient interpretation of a generic dataset (e.g., in terms of pairwise dependencies)—as in statistical learning—or because they capture the laws of a specific phenomenon—as e.g., in physics—leading to non-trivial falsifiable predictions. In information theory, the simplicity of a model is quantified by the stochastic complexity, which measures the number of bits needed to encode its parameters. In order to understand how simple models look like, we study the stochastic complexity of spin models with interactions of arbitrary order. We show that bijections within the space of possible interactions preserve the stochastic complexity, which allows to partition the space of all models into equivalence classes. We thus found that the simplicity of a model is not determined by the order of the interactions, but rather by their mutual arrangements. Models where statistical dependencies are localized on non-overlapping groups of few variables are simple, affording predictions on independencies that are easy to falsify. On the contrary, fully connected pairwise models, which are often used in statistical learning, appear to be highly complex, because of their extended set of interactions, and they are hard to falsify.


2018 ◽  
Vol 28 (09) ◽  
pp. 1750062 ◽  
Author(s):  
Pawel Trajdos ◽  
Marek Kurzynski

In the paper, the problem of multi-label (ML) classification using the label-pairwise (LPW) scheme is addressed. For this approach, the method of correction of binary classifiers which constitute the LPW ensemble is proposed. The correction is based on a probabilistic (randomized) model of a classifier that assesses the local class-specific probabilities of correct classification and misclassification. These probabilities are determined using the original concepts of a randomized reference classifier (RRC) and a local soft confusion matrix. Additionally, two special cases that deal with imbalanced labels and double labeled instances are considered. The proposed methods were evaluated using 29 benchmark datasets. In order to assess the efficiency of the introduced models and the proposed correction scheme, they were compared against original binary classifiers working in the LPW ensemble. The comparison was performed using four different ML evaluation measures: macro and micro-averaged [Formula: see text] loss, zero-one loss and Hamming loss. Moreover, relations between classification quality and the characteristics of ML datasets such as average imbalance ratio or label density were investigated. The experimental study reveals that the correction approaches significantly outperform the reference method in terms of zero-one loss and Hamming loss.


Sign in / Sign up

Export Citation Format

Share Document