reference genomes
Recently Published Documents


TOTAL DOCUMENTS

389
(FIVE YEARS 252)

H-INDEX

22
(FIVE YEARS 11)

Genes ◽  
2022 ◽  
Vol 13 (1) ◽  
pp. 148
Author(s):  
Clifton P. Bueno de Mesquita ◽  
Jinglie Zhou ◽  
Susanna Theroux ◽  
Susannah G. Tringe

Aerobic bacteria that degrade methylphosphonates and produce methane as a byproduct have emerged as key players in marine carbon and phosphorus cycles. Here, we present two new draft genome sequences of the genus Marivita that were assembled from metagenomes from hypersaline former industrial salterns and compare them to five other Marivita reference genomes. Phylogenetic analyses suggest that both of these metagenome-assembled genomes (MAGs) represent new species in the genus. Average nucleotide identities to the closest taxon were <85%. The MAGs were assembled with SPAdes, binned with MetaBAT, and curated with scaffold extension and reassembly. Both genomes contained the phnCDEGHIJLMP suite of genes encoding the full C-P lyase pathway of methylphosphonate degradation and were significantly more abundant in two former industrial salterns than in nearby reference and restored wetlands, which have lower salinity levels and lower methane emissions than the salterns. These organisms contain a variety of compatible solute biosynthesis and transporter genes to cope with high salinity levels but harbor only slightly acidic proteomes (mean isoelectric point of 6.48).


2022 ◽  
Author(s):  
Elise Parey ◽  
Alexandra Louis ◽  
Jerome Monfort ◽  
Yann Guiguen ◽  
Hugues Roest Crollius ◽  
...  

Teleost fish are one of the most species-rich and diverse clades amongst vertebrates, which makes them an outstanding model group for evolutionary, ecological and functional genomics. Yet, despite a growing number of sequence reference genomes, large-scale comparative analysis remains challenging in teleosts due to the specifics of their genomic organization. As legacy of a whole genome duplication dated 320 million years ago, a large fraction of teleost genomes remain in duplicate paralogous copies. This ancestral polyploidy confounds the detailed identification of orthologous genomic regions across teleost species. Here, we combine tailored gene phylogeny methodology together with the state-of-the art ancestral karyotype reconstruction to establish the first high resolution comparative atlas of paleopolyploid regions across 74 teleost fish genomes. We show that this atlas represents a unique, robust and reliable resource for fish genomics. We then use the comparative atlas to study the tetraploidization and rediploidization mechanisms that affected the ancestor of teleosts. Although the polyploid history of teleost genomes appears complex, we uncover that meiotic recombination persisted between duplicated chromosomes for over 60 million years after polyploidization, suggesting that the teleost ancestor was an autotetraploid.


2022 ◽  
Author(s):  
Lars Wienbrandt ◽  
David Ellinghaus

Background: Reference-based phasing and genotype imputation algorithms have been developed with sublinear theoretical runtime behaviour, but runtimes are still high in practice when large genome-wide reference datasets are used. Methods: We developed EagleImp, a software with algorithmic and technical improvements and new features for accurate and accelerated phasing and imputation in a single tool. Results: We compared accuracy and runtime of EagleImp with Eagle2, PBWT and prominent imputation servers using whole-genome sequencing data from the 1000 Genomes Project, the Haplotype Reference Consortium and simulated data with more than 1 million reference genomes. EagleImp is 2 to 10 times faster (depending on the single or multiprocessor configuration selected) than Eagle2/PBWT, with the same or better phasing and imputation quality in all tested scenarios. For common variants investigated in typical GWAS studies, EagleImp provides same or higher imputation accuracy than the Sanger Imputation Service, Michigan Imputation Server and the newly developed TOPMed Imputation Server, despite larger (not publicly available) reference panels. It has many new features, including automated chromosome splitting and memory management at runtime to avoid job aborts, fast reading and writing of large files, and various user-configurable algorithm and output options. Conclusions: Due to the technical optimisations, EagleImp can perform fast and accurate reference-based phasing and imputation for future very large reference panels with more than 1 million genomes. EagleImp is freely available for download from https://github.com/ikmb/eagleimp.


2022 ◽  
Author(s):  
Luiz Carlos Irber ◽  
Phillip T Brooks ◽  
Taylor E Reiter ◽  
N Tessa Pierce-Ward ◽  
Mahmudur Rahman Hera ◽  
...  

The identification of reference genomes and taxonomic labels from metagenome data underlies many microbiome studies. Here we describe two algorithms for compositional analysis of metagenome sequencing data. We first investigate the FracMinHash sketching technique, a derivative of modulo hash that supports Jaccard containment estimation between sets of different sizes. We implement FracMinHash in the sourmash software, evaluate its accuracy, and demonstrate large-scale containment searches of metagenomes using 700,000 microbial reference genomes. We next frame shotgun metagenome compositional analysis as the problem of finding a minimum collection of reference genomes that "cover" the known k-mers in a metagenome, a minimum set cover problem. We implement a greedy approximate solution using FracMinHash sketches, and evaluate its accuracy for taxonomic assignment using a CAMI community benchmark. Finally, we show that the minimum metagenome cover can be used to guide the selection of reference genomes for read mapping. sourmash is available as open source software under the BSD 3-Clause license at github.com/dib-lab/sourmash/.


2022 ◽  
Author(s):  
Huang Zhen ◽  
Luohao Xu ◽  
Cheng Cai ◽  
Yitao Zhou ◽  
Jing Liu ◽  
...  

The slow-evolving invertebrate amphioxus has an irreplaceable role in advancing our understanding into the vertebrate origin and innovations. Here we resolve the nearly complete chromosomal genomes of three amphioxus species, one of which best recapitulates the 17 chordate ancestor linkage groups. We reconstruct the fusions, retention or rearrangements between descendants of whole genome duplications (WGDs), which gave rise to the extant microchromosomes likely existed in the vertebrate ancestor. Similar to vertebrates, the amphioxus genome gradually establishes its 3D chromatin architecture at the onset of zygotic activation, and forms two topologically associated domains at the Hox gene cluster. We find that all three amphioxus species have ZW sex chromosomes with little sequence differentiation, and their putative sex-determining regions are nonhomologous to each other. Our results illuminate the unappreciated interspecific diversity and developmental dynamics of amphioxus genomes, and provide high-quality references for understanding the mechanisms of chordate functional genome evolution.


2022 ◽  
Vol 12 ◽  
Author(s):  
Isabel García-García ◽  
Belén Méndez-Cea ◽  
David Martín-Gálvez ◽  
José Ignacio Seco ◽  
Francisco Javier Gallego ◽  
...  

Forest tree species are highly vulnerable to the effects of climate change. As sessile organisms with long generation times, their adaptation to a local changing environment may rely on epigenetic modifications when allele frequencies are not able to shift fast enough. However, the current lack of knowledge on this field is remarkable, due to many challenges that researchers face when studying this issue. Huge genome sizes, absence of reference genomes and annotation, and having to analyze huge amounts of data are among these difficulties, which limit the current ability to understand how climate change drives tree species epigenetic modifications. In spite of this challenging framework, some insights on the relationships among climate change-induced stress and epigenomics are coming. Advances in DNA sequencing technologies and an increasing number of studies dealing with this topic must boost our knowledge on tree adaptive capacity to changing environmental conditions. Here, we discuss challenges and perspectives in the epigenetics of climate change-induced forests decline, aiming to provide a general overview of the state of the art.


GigaScience ◽  
2022 ◽  
Vol 11 (1) ◽  
Author(s):  
Christophe Djemiel ◽  
Pierre-Alain Maron ◽  
Sébastien Terrat ◽  
Samuel Dequiedt ◽  
Aurélien Cottin ◽  
...  

Abstract Deciphering microbiota functions is crucial to predict ecosystem sustainability in response to global change. High-throughput sequencing at the individual or community level has revolutionized our understanding of microbial ecology, leading to the big data era and improving our ability to link microbial diversity with microbial functions. Recent advances in bioinformatics have been key for developing functional prediction tools based on DNA metabarcoding data and using taxonomic gene information. This cheaper approach in every aspect serves as an alternative to shotgun sequencing. Although these tools are increasingly used by ecologists, an objective evaluation of their modularity, portability, and robustness is lacking. Here, we reviewed 100 scientific papers on functional inference and ecological trait assignment to rank the advantages, specificities, and drawbacks of these tools, using a scientific benchmarking. To date, inference tools have been mainly devoted to bacterial functions, and ecological trait assignment tools, to fungal functions. A major limitation is the lack of reference genomes—compared with the human microbiota—especially for complex ecosystems such as soils. Finally, we explore applied research prospects. These tools are promising and already provide relevant information on ecosystem functioning, but standardized indicators and corresponding repositories are still lacking that would enable them to be used for operational diagnosis.


2021 ◽  
Author(s):  
David A Yarmosh ◽  
Juan G Lopera ◽  
Nikhita P Puthuveetil ◽  
Patrick Ford Combs ◽  
Amy L Reese ◽  
...  

The quality and traceability of microbial genomics data in public databases is deteriorating as they rapidly expand and struggle to cope with data curation challenges. While the availability of public genomic data has become essential for modern life sciences research, the curation of the data is a growing area of concern that has significant real-world impacts on public health epidemiology, drug discovery, and environmental biosurveillance research. While public microbial genome databases such as NCBI's RefSeq database leverage the scalability of crowd sourcing for growth, they do not require data provenance to the original biological source materials or accurate descriptions of how the data was produced. Here, we describe the de novo assembly of 1,113 bacterial genome references produced from authenticated materials sourced from the American Type Culture Collection (ATCC), each with full data provenance. Over 98% of these ATCC Standard Reference Genomes (ASRGs) are superior to assemblies for comparable strains found in NCBI's RefSeq database. Comparative genomics analysis revealed significant issues in RefSeq bacterial genome assemblies related to genome completeness, mutations, structural differences, metadata errors, and gaps in traceability to the original biological source materials. For example, nearly half of RefSeq assemblies lack details on sample source information, sequencing technology, or bioinformatics methods. We suggest there is an intrinsic connection between the quality of genomic metadata, the traceability of the data, and the methods used to produce them with the quality of the resulting genome assemblies themselves. Our results highlight common problems with "reference genomes" and underscore the importance of data provenance for precision science and reproducibility. These gaps in metadata accuracy and data provenance represent an "elephant in the room" for microbial genomics research, but addressing these issues would require raising the level of accountability for data depositors and our own expectations of data quality.


2021 ◽  
Author(s):  
Jamshed Khan ◽  
Marek Kokot ◽  
Sebastian Deorowicz ◽  
Rob Patro

The de Bruijn graph has become a key data structure in modern computational genomics, and of keen interest is its compacted variant. The compacted de Bruijn graph provides a lossless representation of the graph, and it is often considerably more efficient to store and process than its non-compacted counterpart. Construction of the compacted de Bruijn graph resides upstream of many genomic analyses. As the quantity of sequencing data and the number of reference genomes on which to perform these analyses grow rapidly, efficient construction of the compacted graph becomes a computational bottleneck for these tasks. We present Cuttlefish 2, significantly advancing the existing state-of-the-art methods for construction of this graph. On a typical shared-memory machine, it reduces the construction of the compacted de Bruijn graph for 661K bacterial genomes (2.58 Tbp of input reference genomes) from about 4.5 days to 17—23 hours. Similarly on sequencing data, it constructs the graph for a 1.52 Tbp white spruce read set in about 10 hours, while the closest competitor, which also uses considerably more memory, requires 54—58 hours. Cuttlefish 2 is implemented in C++14, and is available as open-source software under a BSD-3-Clause license at https://github.com/COMBINE-lab/cuttlefish.


Sign in / Sign up

Export Citation Format

Share Document