scholarly journals Spread of SARS-CoV-2 Genomes on Genomic Index Maps of Hierarchy - Compared with B.1.1.7 Lineage on BLAST

2021 ◽  
Author(s):  
Jeffrey Zheng ◽  
Yang Zhou ◽  
Minghan Zhu ◽  
Mu Qiao ◽  
Zhigang Zhang

Abstract COVID-19 patients worldwide are conveniently described by position information to collect samples, and modern GIS maps are useful to show influenced flows and numbers of patients on various regions of a pendamic. From an analysis viewpoint, it is more interesting to organize genomic information into a phylogenic tree with multiple branches and leaves in representations. Clusters of genomes are organized as phylogenic trees to represent intrinsic information of genomes. However, there are structural difficulties in projecting phylogenetic information into 2D distributions as GIS maps naturally.Considering advanced generating schemes of phylogenetic trees, information entropy provides ultra optimal properties in the minimum computational complexity, superior flexibility, better stability, improved reliability and higher quality on global constructions.In this paper, a novel projection is proposed to arrange SARS-CoV-2 genomes by genomic indexes to make a structural organization as 2D GIS maps. For any genome, there is a unique invariant under certain conditions to provide an absolute position on a specific region. In this hierarchical framework, it is possible to use a visual tool to represent any selected region for clustering genomes on refined effects. Applied diversity measure to a given set of genomes, equivalent clusters and complementary visual effects are provided between genomic index maps and phylogenetic trees. Sample genomes of three UK new lineages are aligned by BLAST as a basis on both RNA-dependent RNA polymerase RDRP segments and whole genomes. Selected regions and various projections show spread effects of five thousand SARS-CoV-2 genomes in 72 countries on both RDRP and whole genomes, and six special countries/regions are selected on genomic index maps.Based on genomic index maps, one SNV of two genomes on B.1.1.7 lineage can be identified from a unit of 10^4 probability measure to a unit of 10^6 difference for genomic indexes on a special ‘G’ projection to extract the finest variation.Further exploration on optimal classification and phylogenetic analysis of genomic index maps and phylogenetic trees on SARS-CoV-2 genomes worldwide are discussed.

2020 ◽  
Author(s):  
Jeffrey Zheng ◽  
Yang Zhou ◽  
Minghan Zhu ◽  
Mu Qiao ◽  
Zhigang Zhang

Abstract Using visual technologies, COVID-19 patients worldwide are conveniently described by position information to collect samples, and modern GIS maps are useful to show influenced flows and numbers of patients on various regions of a pendamic. From an analysis viewpoint, it is more interesting to organize genomic information into a phylogenic tree with multiple branches and leaves in representations. Clusters of genomes are organized as phylogenic trees to represent intrinsic information of genomes. However, there are structural difficulties in projecting phylogenetic information into 2D distributions as GIS maps naturally. In this paper, a novel projection is proposed to arrange SARS-CoV-2 genomes by genomic indexes to make a structural organization as 2D GIS maps. For any genome, there is a unique invariant under certain conditions to provide an absolute position on a specific region. In this hierarchical framework, it is possible to use a visual tool to represent any selected region for clustering genomes on refined effects. Complementary visual effects are provided with phylogenetic tree technology. Sample regions and various projections show spread effects of five thousand SARS-CoV-2 genomes in 72 countries, and four special countries are selected on genomic index maps.


2020 ◽  
Author(s):  
Jeffrey Zheng ◽  
Yang Zhou ◽  
Minghan Zhu ◽  
Mu Qiao ◽  
Zhigang Zhang

Abstract Using visual technologies, COVID-19 patients worldwide are conveniently described by position information to collect sample, modern GIS maps are useful to show influenced flows and numbers of patients on various regions of a pendamic. From an analysis viewpoint, it is more interesting to organize genomic information into a phylogenic tree with multiple branches and leaves in representations. Clusters of genomes are organized as phylogenic trees to represent intrinsic information of genomes. However, there are structural difficulties to project phylogenic information into 2D distributions as GIS maps naturally.In this paper, a novel projection is proposed to arrange SARS-CoV-2 genomes by genomic indexes to make a structural organization as 2D GIS maps. For any genome, there is a unique invariant under the certain conditions to provide an absolute position on a specific region. In this hierarchical framework, it is possible to use a visual tool to represent any selected region for clustering genomes on refined effects. Visual effects are complementarily provided with Phylogenic tree technology. Sample regions and various projections show spread effects of five thousand SARS-CoV-2 genomes in 72 countries and special four countries are selected on genomic index maps.


2020 ◽  
Author(s):  
Jeffrey Zheng ◽  
Yang Zhou ◽  
Minghan Zhu ◽  
Mu Qiao ◽  
Zhigang Zhang

Abstract Using visual technologies, COVID-19 patients worldwide are conveniently described by position information to collect sample, modern GIS maps are useful to show influenced flows and numbers of patients on various regions of a pendamic. From an analysis viewpoint, it is more interesting to organize genomic information into a phylogenic tree with multiple branches and leaves in representations. Clusters of genomes are organized as phylogenic trees to represent intrinsic information of genomes. However, there are structural difficulties to project phylogenic information into 2D distributions as GIS maps naturally.In this paper, a novel projection is proposed to arrange SARS-CoV-2 genomes by genomic indexes to make a structural organization as 2D GIS maps. For any genome, there is a unique invariant under the certain conditions to provide an absolute position on a specific region. In this hierarchical framework, it is possible to use a visual tool to represent any selected region for clustering genomes on refined effects. Visual effects are complementarily provided with Phylogenic tree technology. Sample regions and various projections show spread effects of five thousand SARS-CoV-2 genomes in 72 countries and special four countries are selected on genomic index maps.


2017 ◽  
Author(s):  
Konstantin Gunbin ◽  
Konstantin Popadin ◽  
Leonid Peshkin ◽  
Sofia Annis ◽  
Rebecca Ackermann ◽  
...  

The question: human evolution- gradual process or a rapid discontinuous change? Whether human origin was a gradual process or a result of rapid change has been a focus of intense debate. Of particular interest is the climate change ~2.9-2.5 Ma, thought to have precipitated the separation of the genus Homo (~2.8Ma). The debate mostly concerned continuity/punctuality of the fossil record, but of course the rate of the underlying genetic change is of ultimate interest/importance. Did hominid lineage experience an increased mutation rate when a large number of hominins emerged and eventually gave rise to the split between Australopitecus/Paranthropus and Homo? The obstacle: vague timing of conventional mutations. The difficulty in answering the above question lies in the way past mutations are timed. Conventional point mutations are assigned to specific branches of the DNA-derived phylogenetic trees. The essence of the problem is that mutations can be located within branch segments from branching point to branching point, but the exact position within the segment is principally unknown. Because the hominid DNA-derived phylogenetic tree is rather sparsely populated with branches, the precision of mutation timing is low, e.g., human-specific mutations can be positioned within ~6 My from separation from chimpanzee. The solution: NUMTs – mutations with an internal clock. NUMTs are insertions of mtDNA sequences into the nuclear genome. Unlike point mutation, each NUMTs actually represents a branch on the mtDNA phylogenic tree and thus its time of insertion can be determined as precise as their branching point can be positioned on the tree. In a sense, NUMTs are “mutations with an internal clock”, which is synchronized with the well-established mtDNA mutation evolution clock. By determining the NUMTs’ insertion time points, one can ask whether whether NUMTs were inserted at a constant rate over time or at increased rate during critical periods of evolution, according with the “punctuated evolution” model. Results: Hundreds of pseudogenes have been inserted into the human genome over the last ~60 My of which we considered the last 6 My. Various quality filters resulted in the selection of 18 NUMTs most suitable for phylogenetic analysis. Insertion times of these 18 NUMTs cluster around 2.8Ma. While timing of insertion of NUMTs is imprecise, the observation such a cluster is highly statistically significant. Discussion: It is tempting to hypothesize that accelerated insertion of NUMTs is somehow linked to the speciation process. NUMTs could be either "riders i.e., the rate of insertion could be increased by the overall higher genome flexibility during the speciation period, or "drivers", i.e. they are fixed in the population at increased rate during speciation due to increased selective pressures. If correct, the hypothesis of accelerated pseudogenization would support the idea that evolution of our genus might have been discontinuous.


2018 ◽  
Author(s):  
N. Moshiri

AbstractPhylogenetic trees are essential to evolutionary biology, and numerous methods exist that attempt to extract phylogenetic information applicable to a wide range of disciplines, such as epidemiology and metagenomics. Currently, the three main Python packages for trees are Bio.Phylo, DendroPy, and the ETE Toolkit, but as dataset sizes grow, parsing and manipulating ultra-large trees becomes impractical for these tools. To address this issue, we present TreeSwift, a user-friendly and massively scalable Python package for traversing and manipulating trees that is ideal for algorithms performed on ultra-large trees.


2017 ◽  
Author(s):  
Diogo Provete

Phylogenetic information has increasingly been included in studies of local communities and also at broad spatial scales. Despite recent criticisms in the last four years, phylogenetic relationships may still provide insights into theorganization and assembly of ecological communities. The objectives of this study were 1) to review the history of the use of phylogenetic information, as well as criticisms and perspectives of its use in community ecology; 2) understand how the size and shape of phylogenetic trees and the phylogenetic structure of metacomunidaes affect the amount of variation accounted for by a eigenvectorbasedmethod used to describe the phylogenetic composition of metacomunidaes (PCPS); 3) to test the effect of diversity of evolutionary history (MNTD and MPD) and species richness as predictors of three variables of freshwater ecosystemfunctioning (productivity, respiration, and decomposition); and finally 4) to test how environmental gradients, especially pond canopy cover, influence the phylogenetic structure of an anuran metacommunity from southeastern Brazil. Ifound that the structure of metacommunities had greater impact on eigenvalues of PCPS than tree shape metrics, such as symmetry and stemminess. In addition, decomposition and respiration were best predicted by MNTD as a linear function, while productivity was affected by the quadratic term of MNTD. Finally, pond canopy cover and floating vegetation strongly affected the phylogenetic structureof the anruan metacommunity, influencing lineage sorting. These findings 1) can help users interpret the results of PCPS; 2) provide better understand of the effectof species loss in multitrophic, freshwater ecosystems; and 3) improve our knowledge about the effect of canopy cover on the lineage composition in anuran metacomunities.


2021 ◽  
Author(s):  
Caizhi Huang ◽  
Benjamin John Callahan ◽  
Michael C Wu ◽  
Shannon T. Holloway ◽  
Hayden Brochu ◽  
...  

Abstract Background: The relationship between host conditions and microbiome profiles, typically characterized by operational taxonomic units (OTUs), contains important information about the microbial role in human health. Traditional association testing frameworks are challenged by the high-dimensionality and sparsity of typical microbiome profiles. Incorporating phylogenetic information is often used to address these challenges with the assumption that evolutionarily similar taxa tend to behave similarly. However, this assumption may not always be valid due to the complex effect of microbes, and phylogenetic information should be incorporated in a data-supervised fashion. Results: In this work, we propose a local collapsing test called Phylogeny-guided microbiome OTU-Specific association Test (POST). In POST, whether or not to borrow information and how much information to borrow from the neighboring OTUs in the phylogenic tree are supervised by phylogenetic distance and the outcome-OTU association. POST is constructed under the kernel machine framework to accommodate complex OTU effects and extends kernel machine microbiome tests from community-level to OTU-level. Using simulation studies, we showed that when the phylogenetic tree is informative, POST has better performance than existing OTU-level association tests. When the phylogenetic tree is not informative, POST achieves similar performance as existing methods. Finally, we show that POST can identify more outcome-associated OTUs that are of biological relevance in real data applications on bacterial vaginosis and on preterm birth. Conclusions: Using POST, we show that the power of detecting associated microbiome features can be enhanced by adaptively leveraging the phylogenetic information when testing for a target OTU. We developed an user friendly R package POSTm which is now available at CRAN (https://CRAN.R-project.org/package=POSTm) for public access.


2017 ◽  
Author(s):  
Konstantin Gunbin ◽  
Konstantin Popadin ◽  
Leonid Peshkin ◽  
Sofia Annis ◽  
Rebecca Ackermann ◽  
...  

The question: human evolution- gradual process or a rapid discontinuous change? Whether human origin was a gradual process or a result of rapid change has been a focus of intense debate. Of particular interest is the climate change ~2.9-2.5 Ma, thought to have precipitated the separation of the genus Homo (~2.8Ma). The debate mostly concerned continuity/punctuality of the fossil record, but of course the rate of the underlying genetic change is of ultimate interest/importance. Did hominid lineage experience an increased mutation rate when a large number of hominins emerged and eventually gave rise to the split between Australopitecus/Paranthropus and Homo? The obstacle: vague timing of conventional mutations. The difficulty in answering the above question lies in the way past mutations are timed. Conventional point mutations are assigned to specific branches of the DNA-derived phylogenetic trees. The essence of the problem is that mutations can be located within branch segments from branching point to branching point, but the exact position within the segment is principally unknown. Because the hominid DNA-derived phylogenetic tree is rather sparsely populated with branches, the precision of mutation timing is low, e.g., human-specific mutations can be positioned within ~6 My from separation from chimpanzee. The solution: NUMTs – mutations with an internal clock. NUMTs are insertions of mtDNA sequences into the nuclear genome. Unlike point mutation, each NUMTs actually represents a branch on the mtDNA phylogenic tree and thus its time of insertion can be determined as precise as their branching point can be positioned on the tree. In a sense, NUMTs are “mutations with an internal clock”, which is synchronized with the well-established mtDNA mutation evolution clock. By determining the NUMTs’ insertion time points, one can ask whether whether NUMTs were inserted at a constant rate over time or at increased rate during critical periods of evolution, according with the “punctuated evolution” model. Results: Hundreds of pseudogenes have been inserted into the human genome over the last ~60 My of which we considered the last 6 My. Various quality filters resulted in the selection of 18 NUMTs most suitable for phylogenetic analysis. Insertion times of these 18 NUMTs cluster around 2.8Ma. While timing of insertion of NUMTs is imprecise, the observation such a cluster is highly statistically significant. Discussion: It is tempting to hypothesize that accelerated insertion of NUMTs is somehow linked to the speciation process. NUMTs could be either "riders i.e., the rate of insertion could be increased by the overall higher genome flexibility during the speciation period, or "drivers", i.e. they are fixed in the population at increased rate during speciation due to increased selective pressures. If correct, the hypothesis of accelerated pseudogenization would support the idea that evolution of our genus might have been discontinuous.


2020 ◽  
Vol 8 (2) ◽  
pp. 312
Author(s):  
Ehdieh Khaledian ◽  
Kelly A. Brayton ◽  
Shira L. Broschat

Reconstructing and visualizing phylogenetic relationships among living organisms is a fundamental challenge because not all organisms share the same genes. As a result, the first phylogenetic visualizations employed a single gene, e.g., rRNA genes, sufficiently conserved to be present in all organisms but divergent enough to provide discrimination between groups. As more genome data became available, researchers began concatenating different combinations of genes or proteins to construct phylogenetic trees believed to be more robust because they incorporated more information. However, the genes or proteins chosen were based on ad hoc approaches. The large number of complete genome sequences available today allows the use of whole genomes to analyze relationships among organisms rather than using an ad hoc set of genes. We present a systematic approach for constructing a phylogenetic tree based on simultaneously clustering the complete proteomes of 360 bacterial species. From the homologous clusters, we identify 49 protein sequences shared by 99% of the organisms to build a tree. Of the 49 sequences, 47 have homologous sequences in both archaea and eukarya. The clusters are also used to create a network from which bacterial species with horizontally-transferred genes from other phyla are identified.


2020 ◽  
Vol 13 ◽  
pp. 117863372093071
Author(s):  
Mohamed M Hassan ◽  
Mohamed A Hussain ◽  
Sumaya Kambal ◽  
Ahmed A Elshikh ◽  
Osama R Gendeel ◽  
...  

Recently, Coronavirus has been given considerable attention from the biomedical community based on the emergence and isolation of a deadly coronavirus infecting human. To understand the behavior of the newly emerging MERS-CoV requires knowledge at different levels (epidemiologic, antigenic, and pathogenic), and this knowledge can be generated from the most related viruses. In this study, we aimed to compare between 3 species of Coronavirus, namely Middle East Respiratory Syndrome (MERS-CoV), Severe Acute Respiratory Syndrome (SARS-CoV), and NeoCoV regarding whole genomes and 6 similar proteins (E, M, N, S, ORF1a, and ORF1ab) using different bioinformatics tools to provide a better understanding of the relationship between the 3 viruses at the nucleotide and amino acids levels. All sequences have been retrieved from National Center for Biotechnology Information (NCBI). Regards to target genomes’ phylogenetic analysis showed that MERS and SARS-CoVs were closer to each other compared with NeoCoV, and the last has the longest relative time. We found that all phylogenetic methods in addition to all parameters (physical and chemical properties of amino acids such as the number of amino acid, molecular weight, atomic composition, theoretical pI, and structural formula) indicated that NeoCoV proteins were the most related to MERS-CoV one. All phylogenetic trees (by both maximum-likelihood and neighbor-joining methods) indicated that NeoCoV proteins have less evolutionary changes except for ORF1a by just maximum-likelihood method. Our results indicated high similarity between viral structural proteins which are responsible for viral infectivity; therefore, we expect that NeoCoV sooner may appear in human-related infection.


Sign in / Sign up

Export Citation Format

Share Document