genome variations
Recently Published Documents


TOTAL DOCUMENTS

73
(FIVE YEARS 25)

H-INDEX

15
(FIVE YEARS 3)

2021 ◽  
Author(s):  
Tianqing Zheng ◽  
Yinghui Li ◽  
Yanfei Li ◽  
Shengrui Zhang ◽  
Chunchao Wang ◽  
...  

In Chinese National Soybean GeneBank (CNSGB), we have collected more than 30,000 soybean accessions. However, data sharing for soybean remains an especially sensitive question, and how to share the genome variations within rule frame has been bothering the soybean germplasm workers for a long time. Here we release a big data source named Soybean Functional Genomics & Breeding database (SoyFGB v2.0) (https://sfgb.rmbreeding.cn/), which embed a core collection of 2,214 soybean resequencing genome (2K-SG) from the CNSGB germplasm. This source presents a unique example which may help elucidating the following three major functions for multiple genome data mining with general interests for plant researchers. 1) On-line analysis tools are provided by the Analysis module for haplotype mining in high-throughput genotyped germplasms with different methods. 2) Variations for 2K-SG are provided in SoyFGB v2.0 by Browse module which contains two functions of SNP and InDel. Together with the Gene (SNP & InDel) function embedded in Search module, the genotypic information of 2K-SG for targeting gene / region is accessible. 3) Scaled phenotype data of 42 traits, including 9 quality and 33 quantitative traits are provided by SoyFGB v2.0. With the scaled-phenotype data search and seed request tools under a control list, the germplasm information could be shared without direct downloading the unpublished phenotypic data or information of sensitive germplasms. In a word, the mode of data mining and sharing underlies SoyFGB v2.0 may inspire more ideas for works on genome resources of not only soybean but also the other plants.


2021 ◽  
Vol 14 (1) ◽  
Author(s):  
Jing Guo ◽  
Zhen-Tian Yan ◽  
Wen-Bo Fu ◽  
Huan Yuan ◽  
Xu-Dong Li ◽  
...  

Abstract Background Despite the medical importance of mosquitoes of the genus Anopheles in the transmission of malaria and other human diseases, its phylogenetic relationships are not settled, and the characteristics of mitochondrial genome (mitogenome) are not thoroughly understood. Methods The present study sequenced and analyzed the complete mitogenomes of An. peditaeniatus and An. nitidus, investigated genome characteristics, and inferred the phylogenetic relationships of 76 Anopheles spp. Results The complete mitogenomes of An. peditaeniatus and An. nitidus are 15,416 and 15,418 bp long, respectively, and both include 13 PCGs, 22 tRNAs, two tRNAs and one control region (CR). Mitogenomes of Anopheles spp. are similar to those of other insects in general characteristics; however, the trnR and trnA have been reversed to “trnR-trnA,” as has been reported in other mosquito genera. Genome variations mainly occur in CR length (493–886 bp) with six repeat unit types identified for the first time that demonstrate an evolutionary signal. The subgenera Lophopodomyia, Stethomyia, Kerteszia, Nyssorhynchus, Anopheles and Cellia are inferred to be monophyletic, and the phylogenetic analyses support a new phylogenetic relationship among the six subgenera investigated, in that subgenus Lophopodomyia is the sister to all other five subgenera, and the remaining five subgenera are divided into two clades, one of which is a sister-taxon subgenera Stethomyia + Kerteszia, and the other consists of subgenus Nyssorhynchus as the sister to a sister-group subgenera Anopheles + Cellia. Four series (Neomyzomyia, Pyretophorus, Neocellia and Myzomyia) of the subgenus Cellia, and two series (Arribalzagia and Myzorhynchus) of the subgenus Anopheles were found to be monophyletic, whereas three sections (Myzorhynchella, Argyritarsis and Albimanus) and their subdivisions of the subgenus Nyssorhynchus were polyphyletic or paraphyletic. Conclusions The study comprehensively uncovered the characteristics of mitogenome and the phylogenetics based on mitogenomes in the genus Anopheles, and provided information for further study on the mitogenomes, phylogenetics and taxonomic revision of the genus. Graphical abstract


2021 ◽  
Vol 0 (0) ◽  
Author(s):  
Koray Ergünay ◽  
Mücahit Kaya ◽  
Muhittin Serdar ◽  
Yakut Akyön ◽  
Engin Yılmaz

Abstract Objectives We assessed SARS-CoV-2 genome diversity and probable impact on epidemiology, immune response and clinical disease in Turkey. Materials and methods Complete genomes and partial Spike (S) sequences were accessed from the Global Initiative on Sharing Avian Influenza Data (GISAID) database. The genomes were analysed for variations and recombinations using appropriate softwares. Results Four hundred ten complete genomes and 206 S region sequences were included. Overall, 1,200 distinct nucleotide variations were noted. Mean variation count was 14.2 per genome and increased significantly during the course of the pandemic. The most frequent variations were identified as A23403G (D614G; 92.9,%), C14408T (P323L, 92.2%), C3037T (89.8%), C241T (83.4%) and GGG28881AAC (RG203KR, 62.6%). The A23403G mutation was the most frequent variation in the S region sequences (99%). Most genomes (98.3%) belonged in the SARS-CoV-2 haplogroup A. No evidence for recombination was identified in genomes representing sub-haplogroup branches. The variants B.1.1.7, B.1.351 and P.1 were detected, with a statistically-significant time-associated increase in B.1.1.7 prevalence. Conclusions We described prominent SARS-CoV-2 variations as well as comparisons with global virus diversity. Continuing a molecular surveillance in agreement with local disease epidemiology appears to be crucial, as vaccination and mitigation efforts are ongoing.


2021 ◽  
Author(s):  
Maria Zelenova ◽  
Anna Ivanova ◽  
Semyon Semyonov ◽  
Yuriy Gankin

Background: The 31st of December 2019 was when the World Health Organization received a report about an outbreak of pneumonia of unknown etiology in the Chinese city of Wuhan. The outbreak was the result of the novel virus labeled as SARS-CoV-2, which spread to about 220 countries and caused approximately 3,311,780 deaths, infecting more than 159,319,384 people by May 12th, of 2021. The virus caused a worldwide pandemic leading to panic, quarantines, and lockdowns - although none of its predecessors from the coronavirus family have ever achieved such a scale. The key to understanding the global success of SARS-CoV-2 is hidden in its genome. Materials and Methods: We retrieved data for 329,942 SARS-CoV-2 records uploaded to the GISAID database from the beginning of the pandemic until the 8th of January 2021. To process the data, a Python variant detection script was developed, using pairwise2 from the BioPython library. Pandas, Matplotlib, and Seaborn, were applied to visualize the data. Genomic coordinates were obtained from the UCSC Genome Browser (https://genome.ucsc.edu/). Sequence alignments were performed for every gene separately. Genomes less than 26,000 nucleotides long were excluded from the research. Clustering was performed using HDBScan. Results: Here, we addressed the genetic variability of SARS-CoV-2 using 329,942 worldwide samples. The analysis yielded 155 genome variations (SNPs and deletions) in more than 0.3% of the sequences. Nine common SNPs were present in more than 20% of the samples. Clustering results suggested that a proportion of people (2.46%) were infected with a distinct subtype of the B.1.1.7 variant. The subtype may be characterized by four to six additional mutations, with four being a more frequent option (G28881A, G28882A, and G28883C in the N gene, A23403G in S, A28095T in ORF8, G25437T in ORF3a). Two clusters were formed by mutations in the samples uploaded predominantly by Denmark and Australia, which may indicate the emergence of "Danish" and "Australian" variants. Five clusters were linked to increased/decreased age, shifted gender ratio, or both. According to a correlation coefficient matrix, 69 mutations correlate with at least one other mutation (correlation coefficient greater than 0.7). We also addressed the completeness of the GISAID database, where between 77% and 93% of the fields were either left blank or filled incorrectly. Metadata mining analysis has led to a hypothesis about gender inequality in medical care in certain countries. Finally, we found ORF6 and E as the most conserved genes (96.15% and 94.66% of the sequences totally match the reference, respectively), making them potential targets for vaccines and treatment. Our results indicate areas of the SARS-CoV-2 genome that researchers can focus on for further structural and functional analysis.


2021 ◽  
Author(s):  
Poonam Mehta ◽  
Saumya Sarkar ◽  
Ujjala Ghoshal ◽  
Ankita Pandey ◽  
Ratender Singh ◽  
...  

Outcome of infection with Severe Acute Respiratory Syndrome Coronavirus-2 (SARS-CoV-2) may depend on the host, virus or the host-virus interaction related factors. Complete SARS-CoV-2 genome was sequenced using Illumina and Nanopore platforms from naso-/oro-pharyngeal ri-bonucleic acid (RNA) specimens from COVID-19 patients of varying severity and outcomes, including patients with mild upper respiratory symptoms (n=35), severe disease admitted to intensive care with respiratory and gastrointestinal symptoms (n=21), fatal COVID-19 outcome (n=17) and asymptomatic (n=42). Of a number of genome variants observed, p.16L>L (Nsp1), p.39C>C (Nsp3), p.57Q>H (ORF3a), p.71Y>Y (Membrane glycoprotein), p.194S>L (Nucleocapsid protein) were observed in similar frequencies in different patient subgroups. However, seventeen other variants were observed only in symptomatic patients with severe and fatal COVID-19. Out of the latter, one was in the 5UTR (g.241C>T), eight were synonymous (p.14V>V and p.92L>L in Nsp1 protein, p.226D>D, p.253V>V, and p.305N>N in Nsp3, p.34G>G and p.79C>C in Nsp10 protein, p.789Y>Y in Spike protein), and eight were non-synonymous (p.106P>S, p.157V>F and p.159A>V in Nsp2, p.1197S>R and p.1198T>K in Nsp3, p.97A>V in RdRp, p.614D>G in Spike protein, p.13P>L in nucleocapsid). These were completely absent in the asymptomatic group. SARS-CoV-2 genome variations have a significant impact on COVID-19 presentation, severity and outcome.


2021 ◽  
Author(s):  
Koray Ergünay ◽  
Mücahit Kaya ◽  
Muhittin Serdar ◽  
Yakut Akyön ◽  
Engin Yılmaz

Abstract Introduction: Nearly a year following the emergence of COVID-19 in Turkey, we analysed SARS-CoV-2 sequences to identify virus genome variations and their probable impact on epidemiology, immune response and clinical disease.Materials and Methods: Complete genomes and partial Spike (S) region sequences originating from Turkey were accessed from the Global Initiative on Sharing Avian Influenza Data (GISAID) database. The genomes were aligned and analysed for variations and recombinations using appropriate softwares. Results: 410 complete genomes and 206 S region sequences were included. Overall, 1200 distinct nucleotide variations were noted. Mean variation count was noted as 14.2 per genome and increased significantly during the course of the pandemic. The most frequent variations were identified as A23403G (D614G; 92.9,%), C14408T (P323L, 92.2%), C3037T (89.8%), C241T (83.4%) and GGG28881AAC (RG203KR, 62.6%). The A23403G mutation was the most frequent variation in the S region sequences (99%). Majority of the genomes (%98.3) belonged in the SARS-CoV-2 haplogroup A. No evidence for recombination was identified in genomes representing sub-haplogroup branches. The variants of concern B.1.1.7, B.1.351 and P.1 were detected, with a statistically-significant time-associated increase in the variant B.1.1.7 prevalence. Discussion: We described prominant SARS-CoV-2 variantions as well as comparisons with global virus diversity. Continuing a molecular surveillence in agreement with local disease epidemiology appears to be crucial, as vaccination and mitigation efforts are ongoing.


2021 ◽  
Author(s):  
Hao Lu ◽  
Luyu Ma ◽  
Lei Li ◽  
Cheng Quan ◽  
Yiming Lu ◽  
...  

Noncoding genomic variants constitute the majority of trait-associated genome variations; however, identification of functional noncoding variants is still a challenge in human genetics, and a method systematically assessing the impact of regulatory variants on gene expression and linking them to potential target genes is still lacking. Here we introduce a deep neural network (DNN)-based computational framework, RegVar, that can accurately predict the tissue-specific impact of noncoding regulatory variants on target genes. We show that, by robustly learning the genomic characteristics of massive variant-gene expression associations in a variety of human tissues, RegVar vastly surpasses all current noncoding variants prioritization methods in predicting regulatory variants under different circumstances. The unique features of RegVar make it an excellent framework for assessing the regulatory impact of any variant on its putative target genes in a variety of tissues. RegVar is available as a webserver at http://regvar.cbportal.org/.


2021 ◽  
Author(s):  
Qingyao Huang ◽  
Paula Carrio-Cordo ◽  
Bo Gao ◽  
Rahel Paloots ◽  
Michael Baudis

AbstractIn cancer, copy number aberrations (CNA) represent a type of nearly ubiquitous and frequently extensive structural genome variations. To disentangle the molecular mechanisms underlying tumorigenesis as well as identify and characterize molecular subtypes, the comparative and meta-analysis of large genomic variant collections can be of immense importance. Over the last decades, cancer genomic profiling projects have resulted in a large amount of somatic genome variation profiles, however segregated in a multitude of individual studies and datasets. The Progenetix project, initiated in 2001, curates individual cancer CNA profiles and associated metadata from published oncogenomic studies and data repositories with the aim to empower integrative analyses spanning all different cancer biologies.During the last few years, the fields of genomics and cancer research have seen significant advancement in terms of molecular genetics technology, disease concepts, data standard harmonization as well as data availability, in an increasingly structured and systematic manner. For the Progenetix resource, continuous data integration, curation and maintenance have resulted in the most comprehensive representation of cancer genome CNA profiling data with 138’663 (including 115’357 tumor) CNV profiles. In this article, we report a 4.5-fold increase in sample number since 2013, improvements in data quality, ontology representation with a CNV landscape summary over 51 distinctive NCIt cancer terms as well as updates in database schemas, and data access including new web front-end and programmatic data access. Database URL:progenetix.org


Sign in / Sign up

Export Citation Format

Share Document