markov clustering
Recently Published Documents


TOTAL DOCUMENTS

96
(FIVE YEARS 31)

H-INDEX

12
(FIVE YEARS 1)

2021 ◽  
Author(s):  
John Lagergren ◽  
Mikaela Cashman ◽  
Verónica G. Melesse Vergara ◽  
Paul R. Eller ◽  
Joao Gabriel Felipe Machado Gazolla ◽  
...  

AbstractPredicted growth in world population will put unparalleled stress on the need for sustainable energy and global food production, as well as increase the likelihood of future pandemics. In this work, we identify high-resolution environmental zones in the context of a changing climate and predict longitudinal processes relevant to these challenges. We do this using exhaustive vector comparison methods that measure the climatic similarity between all locations on earth at high geospatial resolution. The results are captured as networks, in which edges between geolocations are defined if their historical climates exceed a similarity threshold. We then apply Markov clustering and our novel Correlation of Correlations method to the resulting climatic networks, which provides unprecedented agglomerative and longitudinal views of climatic relationships across the globe. The methods performed here resulted in the fastest (9.37 × 1018 operations/sec) and one of the largest (168.7 × 1021 operations) scientific computations ever performed, with more than 100 quadrillion edges considered for a single climatic network. Correlation and network analysis methods of this kind are widely applicable across computational and predictive biology domains, including systems biology, ecology, carbon cycles, biogeochemistry, and zoonosis research.


2021 ◽  
Author(s):  
Ruhollah Shemirani ◽  
Gillian M Belbin ◽  
Keith Burghardt ◽  
Kristina Lerman ◽  
Christy L Avery ◽  
...  

Background: Groups of distantly related individuals who share a short segment of their genome identical-by-descent (IBD) can provide insights about rare traits and diseases in massive biobanks via a process called IBD mapping. Clustering algorithms play an important role in finding these groups. We set out to analyze the fitness of commonly used, fast and scalable clustering algorithms for IBD mapping applications. We designed a realistic benchmark for local IBD graphs and utilized it to compare clustering algorithms in terms of statistical power. We also investigated the effectiveness of common clustering metrics as replacements for statistical power. Results: We simulated 3.4 million clusters across 850 experiments with varying cluster counts, false-positive, and false-negative rates. Infomap and Markov Clustering (MCL) community detection methods have high statistical power in most of the graphs, compared to greedy methods such as Louvain and Leiden. We demonstrate that standard clustering metrics, such as modularity, cannot predict statistical power of algorithms in IBD mapping applications, though they can help with simulating realistic benchmarks. We extend our findings to real datasets by analyzing 3 populations in the Population Architecture using Genomics and Epidemiology (PAGE) Study with ~51,000 members and 2 million shared segments on Chromosome 1, resulting in the extraction of ~39 million local IBD clusters across three different populations in PAGE. We used cluster properties derived in PAGE to increase the accuracy of our simulations and comparison. Conclusions: Markov Clustering produces a 30% increase in statistical power compared to the current state-of-art approach, while reducing runtime by 3 orders of magnitude; making it computationally tractable in modern large-scale genetic datasets. We provide an efficient implementation to enable clustering at scale for IBD mapping and poplation-based linkage for various populations and scenarios.


2021 ◽  
Author(s):  
Xavier Grau-Bové ◽  
Arnau Sebé-Pedrós

Possvm (Phylogenetic Ortholog Sorting with Species oVerlap and MCL) is a tool that automates the process of classifying clusters of orthologous genes from precomputed phylogenetic trees. It identifies orthology relationships between genes using the species overlap algorithm to infer taxonomic information from the gene tree topology, and then uses the Markov Clustering Algorithm (MCL) to identify orthology clusters and provide annotated gene family classifications. Our benchmarking shows that this approach, when provided with accurate phylogenies, is able to identify manually curated orthogroups with high precision and recall. Overall, Possvm automates the routine process of gene tree inspection and annotation in a highly interpretable manner, and provides reusable outputs that can be used to obtain phylogeny-informed gene annotations and inform comparative genomics and gene family evolution analyses.


Author(s):  
Xiaonan Jing ◽  
Qingyuan Hu ◽  
Yi Zhang ◽  
Julia Taylor Rayz

Twitter serves as a data source for many Natural Language Processing (NLP) tasks. It can be challenging to identify topics on Twitter due to continuous updating data stream. In this paper, we present an unsupervised graph based framework to identify the evolution of sub-topics within two weeks of real-world Twitter data. We first employ a Markov Clustering Algorithm (MCL) with a node removal method to identify optimal graph clusters from temporal Graph-of-Words (GoW). Subsequently, we model the clustering transitions between the temporal graphs to identify the topic evolution. Finally, the transition flows generated from both computational approach and human annotations are compared to ensure the validity of our framework.


2021 ◽  
Vol 6 (1) ◽  
Author(s):  
Claudio Durán ◽  
Alessandro Muscoloni ◽  
Carlo Vittorio Cannistraci

AbstractMarkov clustering is an effective unsupervised pattern recognition algorithm for data clustering in high-dimensional feature space. However, its community detection performance in complex networks has been demonstrating results far from the state of the art methods such as Infomap and Louvain. The crucial issue is to convert the unweighted network topology in a ‘smart-enough’ pre-weighted connectivity that adequately steers the stochastic flow procedure behind Markov clustering. Here we introduce a conceptual innovation and we discuss how to leverage network latent geometry notions in order to design similarity measures for pre-weighting the adjacency matrix used in Markov clustering community detection. Our results demonstrate that the proposed strategy improves Markov clustering significantly, to the extent that it is often close to the performance of current state of the art methods for community detection. These findings emerge considering both synthetic ‘realistic’ networks (with known ground-truth communities) and real networks (with community metadata), and even when the real network connectivity is corrupted by noise artificially induced by missing or spurious links. Our study enhances the generalized understanding of how network geometry plays a fundamental role in the design of algorithms based on network navigability.


Sign in / Sign up

Export Citation Format

Share Document