refseq database
Recently Published Documents


TOTAL DOCUMENTS

7
(FIVE YEARS 4)

H-INDEX

3
(FIVE YEARS 0)

2021 ◽  
Author(s):  
David A Yarmosh ◽  
Juan G Lopera ◽  
Nikhita P Puthuveetil ◽  
Patrick Ford Combs ◽  
Amy L Reese ◽  
...  

The quality and traceability of microbial genomics data in public databases is deteriorating as they rapidly expand and struggle to cope with data curation challenges. While the availability of public genomic data has become essential for modern life sciences research, the curation of the data is a growing area of concern that has significant real-world impacts on public health epidemiology, drug discovery, and environmental biosurveillance research. While public microbial genome databases such as NCBI's RefSeq database leverage the scalability of crowd sourcing for growth, they do not require data provenance to the original biological source materials or accurate descriptions of how the data was produced. Here, we describe the de novo assembly of 1,113 bacterial genome references produced from authenticated materials sourced from the American Type Culture Collection (ATCC), each with full data provenance. Over 98% of these ATCC Standard Reference Genomes (ASRGs) are superior to assemblies for comparable strains found in NCBI's RefSeq database. Comparative genomics analysis revealed significant issues in RefSeq bacterial genome assemblies related to genome completeness, mutations, structural differences, metadata errors, and gaps in traceability to the original biological source materials. For example, nearly half of RefSeq assemblies lack details on sample source information, sequencing technology, or bioinformatics methods. We suggest there is an intrinsic connection between the quality of genomic metadata, the traceability of the data, and the methods used to produce them with the quality of the resulting genome assemblies themselves. Our results highlight common problems with "reference genomes" and underscore the importance of data provenance for precision science and reproducibility. These gaps in metadata accuracy and data provenance represent an "elephant in the room" for microbial genomics research, but addressing these issues would require raising the level of accountability for data depositors and our own expectations of data quality.


2021 ◽  
Author(s):  
Peihong Zhu ◽  
Peter Bowden ◽  
Voitek Pendrak ◽  
Herbert Thiele ◽  
Du Zhang ◽  
...  

The proteins in blood were all first expressed as mRNAs from genes within cells. There are databases of human proteins that are known to be expressed as mRNA in human cells and tissues. Proteins identified from human blood by the correlation of mass spectra that fail to match human mRNA expression products may not be correct. We compared the proteins identified in human blood by mass spectrometry by 10 different groups by correlation to human and nonhuman nucleic acid sequences. We determined whether the peptides or proteins identified by the different groups mapped to the human known proteins of the Reference Sequence (RefSeq) database. We used Structured Query Language data base searches of the peptide sequences correlated to tandem mass spectrometry spectra and basic local alignment search tool analysis of the identified full length proteins to control for correlation to the wrong peptide sequence or the existence of the same or very similar peptide sequence shared by more than one protein. Mass spectra were correlated against large protein data bases that contain many sequences that may not be expressed in human beings yet the search returned a very high percentage of peptides or proteins that are known to be found in humans. Only about 5% of proteins mapped to hypothetical sequences, which is in agreement with the reported false-positive rate of searching algorithms conditions. The results were highly enriched in secreted and soluble proteins and diminished in insoluble or membrane proteins. Most of the proteins identified were relatively short and showed a similar size distribution compared to the RefSeq database. At least three groups agree on a nonredundant set of 1671 types of proteins and a nonredundant set of 3151 proteins were identified by at least three peptides.


2021 ◽  
Author(s):  
Peihong Zhu ◽  
Peter Bowden ◽  
Voitek Pendrak ◽  
Herbert Thiele ◽  
Du Zhang ◽  
...  

The proteins in blood were all first expressed as mRNAs from genes within cells. There are databases of human proteins that are known to be expressed as mRNA in human cells and tissues. Proteins identified from human blood by the correlation of mass spectra that fail to match human mRNA expression products may not be correct. We compared the proteins identified in human blood by mass spectrometry by 10 different groups by correlation to human and nonhuman nucleic acid sequences. We determined whether the peptides or proteins identified by the different groups mapped to the human known proteins of the Reference Sequence (RefSeq) database. We used Structured Query Language data base searches of the peptide sequences correlated to tandem mass spectrometry spectra and basic local alignment search tool analysis of the identified full length proteins to control for correlation to the wrong peptide sequence or the existence of the same or very similar peptide sequence shared by more than one protein. Mass spectra were correlated against large protein data bases that contain many sequences that may not be expressed in human beings yet the search returned a very high percentage of peptides or proteins that are known to be found in humans. Only about 5% of proteins mapped to hypothetical sequences, which is in agreement with the reported false-positive rate of searching algorithms conditions. The results were highly enriched in secreted and soluble proteins and diminished in insoluble or membrane proteins. Most of the proteins identified were relatively short and showed a similar size distribution compared to the RefSeq database. At least three groups agree on a nonredundant set of 1671 types of proteins and a nonredundant set of 3151 proteins were identified by at least three peptides.


2021 ◽  
Author(s):  
Mohd Abdullah ◽  
Mohammad Kadivella ◽  
Rolee Sharma ◽  
Mirza. S. Baig ◽  
Syed M. Faisal ◽  
...  

AbstractLeptospirosis is an emerging zoonotic and neglected disease across the world causing huge loss of life and economy. The disease is caused by Leptospira of which 605 sequenced genomes representing 72 species are available in RefSeq database. A comparative genomics approach based on Average Amino acid Identity (AAI), Average Nucleotide Identity (ANI), and Insilco DNA-DNA hybridization provide insight that taxonomic and evolutionary position of few genomes needs to be changed and reclassified. Clustering on the basis of AAI of core and pan-genome contradict clustering pattern on basis of ANI into 4 clusters. Amino acid identity based hierarchical clustering clearly established 3 clusters of Leptospira correlating with level of virulence. Whole genome tree supported three cluster classifications and grouped Leptospira into three clades termed as pathogenic, intermediate and saprophytic. Leptospira genus consist of diverse species and exist in heterogeneous environment, it contains relatively large and closed core genome of 1038 genes. Analysis provided pan genome remains open with 20822 genes. COG analysis revealed that mobilome related genes were found mainly in pan-genome of pathogenic clade. Clade specific genes mined in the study can be used as marker for determining clade and associating level of virulence of any new Leptospira species. Many known Leptospira virulent genes were absent in set of 78 virulent factors mined using Virulence Factor database. A deep search approach provided a repertoire of 496 virulent genes in pan-genome. Further validation of virulent genes will help in accurately targeting pathogenic Leptospira and controlling leptospirosis.Graphical Abstract


2015 ◽  
Vol 44 (D1) ◽  
pp. D733-D745 ◽  
Author(s):  
Nuala A. O'Leary ◽  
Mathew W. Wright ◽  
J. Rodney Brister ◽  
Stacy Ciufo ◽  
Diana Haddad ◽  
...  

2006 ◽  
Vol 24 (1) ◽  
pp. 33-41 ◽  
Author(s):  
Suhas Tikole ◽  
Ramasubbu Sankararamakrishnan
Keyword(s):  

Sign in / Sign up

Export Citation Format

Share Document