similarity matrices
Recently Published Documents


TOTAL DOCUMENTS

122
(FIVE YEARS 37)

H-INDEX

18
(FIVE YEARS 3)

2021 ◽  
Vol 12 ◽  
Author(s):  
Xiaoyu Yang ◽  
Linai Kuang ◽  
Zhiping Chen ◽  
Lei Wang

Accumulating studies have shown that microbes are closely related to human diseases. In this paper, a novel method called MSBMFHMDA was designed to predict potential microbe–disease associations by adopting multi-similarities bilinear matrix factorization. In MSBMFHMDA, a microbe multiple similarities matrix was constructed first based on the Gaussian interaction profile kernel similarity and cosine similarity for microbes. Then, we use the Gaussian interaction profile kernel similarity, cosine similarity, and symptom similarity for diseases to compose the disease multiple similarities matrix. Finally, we integrate these two similarity matrices and the microbe-disease association matrix into our model to predict potential associations. The results indicate that our method can achieve reliable AUCs of 0.9186 and 0.9043 ± 0.0048 in the framework of leave-one-out cross validation (LOOCV) and fivefold cross validation, respectively. What is more, experimental results indicated that there are 10, 10, and 8 out of the top 10 related microbes for asthma, inflammatory bowel disease, and type 2 diabetes mellitus, respectively, which were confirmed by experiments and literatures. Therefore, our model has favorable performance in predicting potential microbe–disease associations.


2021 ◽  
Vol 150 (4) ◽  
pp. A114-A114
Author(s):  
Grazina Korvel ◽  
Krzysztof Kakol ◽  
Bozena Kostek

2021 ◽  
pp. 2142002
Author(s):  
Giuseppe Agapito ◽  
Marianna Milano ◽  
Mario Cannataro

A new coronavirus, causing a severe acute respiratory syndrome (COVID-19), was started at Wuhan, China, in December 2019. The epidemic has rapidly spread across the world becoming a pandemic that, as of today, has affected more than 70 million people causing over 2 million deaths. To better understand the evolution of spread of the COVID-19 pandemic, we developed PANC (Parallel Network Analysis and Communities Detection), a new parallel preprocessing methodology for network-based analysis and communities detection on Italian COVID-19 data. The goal of the methodology is to analyze set of homogeneous datasets (i.e. COVID-19 data in several regions) using a statistical test to find similar/dissimilar behaviours, mapping such similarity information on a graph and then using community detection algorithm to visualize and analyze the initial dataset. The methodology includes the following steps: (i) a parallel methodology to build similarity matrices that represent similar or dissimilar regions with respect to data; (ii) an effective workload balancing function to improve performance; (iii) the mapping of similarity matrices into networks where nodes represent Italian regions, and edges represent similarity relationships; (iv) the discovering and visualization of communities of regions that show similar behaviour. The methodology is general and can be applied to world-wide data about COVID-19, as well as to all types of data sets in tabular and matrix format. To estimate the scalability with increasing workloads, we analyzed three synthetic COVID-19 datasets with the size of 90.0[Formula: see text]MB, 180.0[Formula: see text]MB, and 360.0[Formula: see text]MB. Experiments was performed on showing the amount of data that can be analyzed in a given amount of time increases almost linearly with the number of computing resources available. Instead, to perform communities detection, we employed the real data set.


2021 ◽  
Vol 52 (4) ◽  
pp. 859-867
Author(s):  
Hussein & Jubrael

In this study, the genetic relatedness of 12 cultivars of fig from different populations in Kurdistan region- Iraq were analyzed using eleven AFLP primers pairs combinations by using the technology of molecular analysis the DNA. Genetic similarity matrices were produced for the AFLP data to calculate genetic distances among their cultivars. Genetic similarity coefficient ranged from 0.1261 to 0.3905. The lowest genetic similarity was observed between Kola and Gala Zard (0.1261). The Hejeera Rash and Shela cultivars were most similar ones with a coefficient of 0.3905. Clustering based on AFLP data for the 12 fig cultivars was identified at the 0.32 similarity level. In the developed dendogram two main groups were found, the first one combined Ketek and Shela together, while the second group contained two sub group Shingaly and Benatty combined together, while in the other sub group cluster three other sub-group were identified. The results of this study may help in the formulation of appropriate strategies for conservation and cultivar improvement in figs, for which limited knowledge of the genetic diversity is available.


2021 ◽  
Vol 16 ◽  
Author(s):  
Jiaxin Zhang ◽  
Quanmeng Sun ◽  
Cheng Liang

Background: Long non-coding RNAs (lncRNAs) are nonprotein-coding transcripts of more than 200 nucleotides in length. In recent years, studies have shown that long non-coding RNAs (lncRNA) play a vital role in various biological processes, complex disease diagnosis, prognosis, and treatment. Objective: Analysis of known lncRNA-disease associations and the prediction of potential lncRNA-disease associations are necessary to provide the most probable candidates for subsequent experimental validation. Methods: In this paper, we present a novel robust computational framework for lncRNA-disease association prediction by combining the ℓ1-norm graph with multi-label learning. Specifically, we first construct a set of similarity matrices for lncRNAs and diseases using known associations. Then, both lncRNA and disease similarity matrices are adaptively re-weighted to enhance the robustness via the ℓ1-norm graph. Lastly, the association matrix is updated with a graph-based multi-label learning framework to uncover the underlying consistency between the lncRNA space and the disease space. Results : We compared the proposed method with the four latest methods on five widely used data sets. The experimental results show that our method can achieve comparable performance in both five-fold cross-validation and leave-one-disease-out cross-validation prediction tasks. The case study of prostate cancer further confirms the practicability of our approach in identifying lncRNAs as potential prognostic biomarkers. Conclusion: Our method can serve as a useful tool for the prediction of novel lncRNA-disease associations.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Matthew J. Leming ◽  
Simon Baron-Cohen ◽  
John Suckling

Abstract Background Autism has previously been characterized by both structural and functional differences in brain connectivity. However, while the literature on single-subject derivations of functional connectivity is extensively developed, similar methods of structural connectivity or similarity derivation from T1 MRI are less studied. Methods We introduce a technique of deriving symmetric similarity matrices from regional histograms of grey matter volumes estimated from T1-weighted MRIs. We then validated the technique by inputting the similarity matrices into a convolutional neural network (CNN) to classify between participants with autism and age-, motion-, and intracranial-volume-matched controls from six different databases (29,288 total connectomes, mean age = 30.72, range 0.42–78.00, including 1555 subjects with autism). We compared this method to similar classifications of the same participants using fMRI connectivity matrices as well as univariate estimates of grey matter volumes. We further applied graph-theoretical metrics on output class activation maps to identify areas of the matrices that the CNN preferentially used to make the classification, focusing particularly on hubs. Limitations While this study used a large sample size, the majority of data was from a young age group; furthermore, to make a viable machine learning study, we treated autism, a highly heterogeneous condition, as a binary label. Thus, these results are not necessarily generalizable to all subtypes and age groups in autism. Results Our models gave AUROCs of 0.7298 (69.71% accuracy) when classifying by only structural similarity, 0.6964 (67.72% accuracy) when classifying by only functional connectivity, and 0.7037 (66.43% accuracy) when classifying by univariate grey matter volumes. Combining structural similarity and functional connectivity gave an AUROC of 0.7354 (69.40% accuracy). Analysis of classification performance across age revealed the greatest accuracy in adolescents, in which most data were present. Graph analysis of class activation maps revealed no distinguishable network patterns for functional inputs, but did reveal localized differences between groups in bilateral Heschl’s gyrus and upper vermis for structural similarity. Conclusion This study provides a simple means of feature extraction for inputting large numbers of structural MRIs into machine learning models. Our methods revealed a unique emphasis of the deep learning model on the structure of the bilateral Heschl’s gyrus when characterizing autism.


Author(s):  
Shijia Wang ◽  
Shufei Ge ◽  
Caroline ColIJn ◽  
Priscila Biller ◽  
Liangliang Wang ◽  
...  

PeerJ ◽  
2021 ◽  
Vol 9 ◽  
pp. e10906
Author(s):  
Long Tian ◽  
Reza Mazloom ◽  
Lenwood S. Heath ◽  
Boris A. Vinatzer

Background Computing genomic similarity between strains is a prerequisite for genome-based prokaryotic classification and identification. Genomic similarity was first computed as Average Nucleotide Identity (ANI) values based on the alignment of genomic fragments. Since this is computationally expensive, faster and computationally cheaper alignment-free methods have been developed to estimate ANI. However, these methods do not reach the level of accuracy of alignment-based methods. Methods Here we introduce LINflow, a computational pipeline that infers pairwise genomic similarity in a set of genomes. LINflow takes advantage of the speed of the alignment-free sourmash tool to identify the genome in a dataset that is most similar to a query genome and the precision of the alignment-based pyani software to precisely compute ANI between the query genome and the most similar genome identified by sourmash. This is repeated for each new genome that is added to a dataset. The sequentially computed ANI values are stored as Life Identification Numbers (LINs), which are then used to infer all other pairwise ANI values in the set. We tested LINflow on four sets, 484 genomes in total, and compared the needed time and the generated similarity matrices with other tools. Results LINflow is up to 150 times faster than pyani and pairwise ANI values generated by LINflow are highly correlated with those computed by pyani. However, because LINflow infers most pairwise ANI values instead of computing them directly, ANI values occasionally depart from the ANI values computed by pyani. In conclusion, LINflow is a fast and memory-efficient pipeline to infer similarity among a large set of prokaryotic genomes. Its ability to quickly add new genome sequences to an already computed similarity matrix makes LINflow particularly useful for projects when new genome sequences need to be regularly added to an existing dataset.


Sign in / Sign up

Export Citation Format

Share Document