Computational determination of gene age and characterization of evolutionary dynamics in human

2018 ◽  
Vol 20 (6) ◽  
pp. 2141-2149 ◽  
Author(s):  
Hongyan Yin ◽  
Mengwei Li ◽  
Lin Xia ◽  
Chaozu He ◽  
Zhang Zhang

Abstract Genes originate at different evolutionary time scales and possess different ages, accordingly presenting diverse functional characteristics and reflecting distinct adaptive evolutionary innovations. In the past decades, progresses have been made in gene age identification by a variety of methods that are principally based on comparative genomics. Here we summarize methods for computational determination of gene age and evaluate the effectiveness of different computational methods for age identification. Our results show that improved age determination can be achieved by combining homolog clustering with phylogeny inference, which enables more accurate age identification in human genes. Accordingly, we characterize evolutionary dynamics of human genes based on an extremely long evolutionary time scale spanning ~4,000 million years from archaea/bacteria to human, revealing that young genes are clustered on certain chromosomes and that Mendelian disease genes (including monogenic disease and polygenic disease genes) and cancer genes exhibit divergent evolutionary origins. Taken together, deciphering genes’ ages as well as their evolutionary dynamics is of fundamental significance in unveiling the underlying mechanisms during evolution and better understanding how young or new genes become indispensable integrants coupled with novel phenotypes and biological diversity.

2020 ◽  
Author(s):  
Yi-Bo Tong ◽  
Meng-Wei Shi ◽  
Sheng Hu Qian ◽  
Yu-Jie Chen ◽  
Zhi-Hui Luo ◽  
...  

ABSTRACTThe origination of new genes contributes to the biological diversity of life. New genes may quickly build their own network in the genomes, exert important functions, and generate novel phenotypes. Dating gene age and inferring the origination mechanisms of new genes, like primate-specific gene, is the basis for the functional study of the genes. However, no comprehensive resource of gene age estimates across species is available. Here, we systematically dated the age of 9,102,113 protein-coding genes from 565 species in the Ensembl and Ensembl Genomes databases, including 82 bacteria, 57 protists, 134 fungi, 58 plants, 56 metazoa, and 178 vertebrates, using protein-family-based pipeline with Wagner parsimony algorithm. We also collected gene age estimate data from other studies and uniformed the gene age estimates to time ranges in million years for comparison across studies. All the data were cataloged into GenOrigin (http://genorigin.chenzxlab.cn/), a userfriendly new database of gene age estimates, where users can browse gene age estimates by species, age and gene ontology. In GenOrigin, the information such as gene age estimates, annotation, gene ontology, ortholog and paralog, as well as detailed gene presence/absence views for gene age inference based on the species tree with evolutionary timescale, was provided to researchers for exploring gene functions.


2020 ◽  
Author(s):  
Boyoung Yoo ◽  
Johannes Birgmeier ◽  
Jon A Bernstein ◽  
Gill Bejerano

Close to 70% of patients suspected to have a Mendelian disease remain undiagnosed after genome sequencing, partly because our current knowledge about disease-causing genes is incomplete. Although hundreds of new diseases-causing genes are discovered every year, the discovery rate has been constant for over a decade. Generating an attractive novel disease gene hypothesis from patient data can be time-consuming as each patient's genome can contain dozens to hundreds of rare, possibly pathogenic variants. To generate the most plausible hypothesis, many sources of indirect evidence about each candidate variant may be considered. We introduce InpherNet, a network-based machine learning approach to accelerate this process. InpherNet ranks candidate genes based on gene neighbors from 4 graphs, of orthologs, paralogs, functional pathway members, and co-localized interaction partners. As such InpherNet can be used to both prioritize potentially novel disease genes and also help reveal known disease genes where their direct annotation is missing, or partial. InpherNet is applied to over 100 patient cases for whom the causative gene is incorrectly given low priority by two clinical gene ranking methods that rely exclusively on human patient-derived evidence. It correctly ranks the causative gene among its top 5 candidates in 68% of the cases, compared to 9-44% using comparable tools including Phevor, Phive and hiPhive.


2021 ◽  
Author(s):  
William J Young ◽  
Najim Lahrouchi ◽  
Aaron Isaacs ◽  
ThuyVy Duong ◽  
Luisa Foco ◽  
...  

The QT interval is an electrocardiographic measure representing the sum of ventricular depolarization (QRS duration) and repolarization (JT interval). Abnormalities of the QT interval are associated with potentially fatal ventricular arrhythmia. We conducted genome-wide multi-ancestry analyses in >250,000 individuals and identified 177, 156 and 121 independent loci for QT, JT and QRS, respectively, including a male-specific X-chromosome locus. Using gene-based rare-variant methods, we identified associations with Mendelian disease genes. Enrichments were observed in established pathways for QT and JT, with new genes indicated in insulin-receptor signalling and cardiac energy metabolism. In contrast, connective tissue components and processes for cell growth and extracellular matrix interactions were significantly enriched for QRS. We demonstrate polygenic risk score associations with atrial fibrillation, conduction disease and sudden cardiac death. Prioritization of druggable genes highlighted potential therapeutic targets for arrhythmia. Together, these results substantially advance our understanding of the genetic architecture of ventricular depolarization and repolarization.


2013 ◽  
Vol 9 (5) ◽  
pp. e1003073 ◽  
Author(s):  
Wei-Hua Chen ◽  
Xing-Ming Zhao ◽  
Vera van Noort ◽  
Peer Bork

2003 ◽  
Vol 15 (3) ◽  
pp. 223-227 ◽  
Author(s):  
S. Bortoluzzi ◽  
C. Romualdi ◽  
A. Bisognin ◽  
G. A. Danieli

By a computational approach we reconstructed genomic transcriptional profiles of 19 different adult human tissues, based on information on activity of 27,924 genes obtained from unbiased UniGene cDNA libraries. In each considered tissue, a small number of genes resulted highly expressed or “tissue specific.” Distribution of gene expression levels in a tissue appears to follow a power law, thus suggesting a correspondence between transcriptional profile and “scale-free” topology of protein networks. The expression of 737 genes involved in Mendelian diseases was analyzed, compared with a large reference set of known human genes. Disease genes resulted significantly more expressed than expected. The possible correspondence of their products to important nodes of intracellular protein network is suggested. Auto-organization of the protein network, its stability in time in the differentiated state, and relationships with the degree of genetic variability at genome level are discussed.


2005 ◽  
Vol 6 (4) ◽  
pp. 194-203 ◽  
Author(s):  
Cord Drögemüller ◽  
Anne Wöhlke ◽  
Tosso Leeb ◽  
Ottmar Distl

The bovine RPCI-42 BAC library was screened to construct a sequence-ready ~4 Mb single contig of 92 BAC clones on BTA 1q12. The contig covers the region between the genesKRTAP8P1andCLIC6. This genomic segment in cattle is of special interest as it contains the dominant gene responsible for the hornless or polled phenotype in cattle. The construction of the BAC contig was initiated by screening the bovine BAC library with heterologous cDNA probes derived from 12 human genes of the syntenic region on HSA 21q22. Contig building was facilitated by BAC end sequencing and chromosome walking. During the construction of the contig, 165 BAC end sequences and 109 single-copy STS markers were generated. For comparative mapping of 25 HSA 21q22 genes, genomic PCR primers were designed from bovine EST sequences and the gene-associated STSs mapped on the contig. Furthermore, bovine BAC end sequence comparisons against the human genome sequence revealed significant matches to HSA 21q22 and allowed thein silicomapping of two new genes in cattle. In total, 31 orthologues of human genes located on HSA 21q22 were directly mapped within the bovine BAC contig, of which 16 genes have been cloned and mapped for the first time in cattle. In contrast to the existing comparative bovine–human RH maps of this region, these results provide a better alignment and reveal a completely conserved gene order in this 4 Mb segment between cattle, human and mouse. The mapping of known polled linked BTA 1q12 microsatellite markers allowed the integration of the physical contig map with existing linkage maps of this region and also determined the exact order of these markers for the first time. Our physical map and transcript map may be useful for positional cloning of the putative polled gene in cattle. The nucleotide sequence data reported in this paper have been submitted to EMBL and have been assigned Accession Numbers AJ698510–AJ698674.


2022 ◽  
pp. 150-170
Author(s):  
Moumit Roy Goswami ◽  
Aniruddha Mukhopadhyay

Wetland ecosystems support rich and unique biodiversity. Biodiversity of a given ecosystem in general and wetlands in particular provide important insights to the ecological health of an area. The Ramsar Convention 1971 identified nine criteria for identifying wetlands of international importance. Out of the nine criteria, eight are linked to biodiversity of which three are based on sites of international importance for conserving biological diversity, two are specific for water birds, two are specific for fish, and one criterion for other taxa. Hence, determination of biodiversity of wetlands is of utmost importance. In order to understand that birds, fishes, amphibians, odonates, mammals, and aquatic plants were particularly selected as indicators of wetland biodiversity, the chapter discusses the different methodologies about determination of each of these taxa under different criteria as mentioned above. These methodologies will help various stakeholders in appropriate determination of biodiversity of wetlands of a particular area.


2020 ◽  
Vol 48 (16) ◽  
pp. e91-e91
Author(s):  
Yatish Turakhia ◽  
Heidi I Chen ◽  
Amir Marcovitz ◽  
Gill Bejerano

Abstract Gene losses provide an insightful route for studying the morphological and physiological adaptations of species, but their discovery is challenging. Existing genome annotation tools focus on annotating intact genes and do not attempt to distinguish nonfunctional genes from genes missing annotation due to sequencing and assembly artifacts. Previous attempts to annotate gene losses have required significant manual curation, which hampers their scalability for the ever-increasing deluge of newly sequenced genomes. Using extreme sequence erosion (amino acid deletions and substitutions) and sister species support as an unambiguous signature of loss, we developed an automated approach for detecting high-confidence gene loss events across a species tree. Our approach relies solely on gene annotation in a single reference genome, raw assemblies for the remaining species to analyze, and the associated phylogenetic tree for all organisms involved. Using human as reference, we discovered over 400 unique human ortholog erosion events across 58 mammals. This includes dozens of clade-specific losses of genes that result in early mouse lethality or are associated with severe human congenital diseases. Our discoveries yield intriguing potential for translational medical genetics and evolutionary biology, and our approach is readily applicable to large-scale genome sequencing efforts across the tree of life.


Sign in / Sign up

Export Citation Format

Share Document