Whole genome sequencing and assembly of a Caenorhabditis elegans genome with complex genomic rearrangements using the MinION sequencing device

Mapping Intimacies ◽

10.1101/099143 ◽

2017 ◽

Cited By ~ 12

Author(s):

JR Tyson ◽

NJ O’Neil ◽

M Jain ◽

HE Olsen ◽

P Hieter ◽

...

Keyword(s):

Caenorhabditis Elegans ◽

Whole Genome Sequencing ◽

Genome Sequencing ◽

Genome Assembly ◽

De Novo ◽

Sequence Data ◽

Genomic Rearrangements ◽

Whole Genome ◽

C Elegans ◽

Long Reads

ABSTRACTAdvances in 3rd generation sequencing have opened new possibilities for ‘benchtop’ whole genome sequencing. The MinION is a portable device that uses nanopore technology and can sequence long DNA molecules. MinION long reads are well suited for sequencing and de novo assembly of complex genomes with large repetitive elements. Long reads also facilitate the identification of complex genomic rearrangements such as those observed in tumor genomes. To assess the feasibility of the de novo assembly of large complex genomes using both MinION and Illumina platforms, we sequenced the genome of a Caenorhabditis elegans strain that contains a complex acetaldehyde-induced rearrangement and a biolistic bombardment-mediated insertion of a GFP containing plasmid. Using ∼5.8 gigabases of MinION sequence data, we were able to assemble a C. elegans genome containing 145 contigs (N50 contig length = 1.22 Mb) that covered >99% of the 100,286,401 bp reference genome. In contrast, using ∼8.04 gigabases of Illumina sequence data, we were able to assemble a C. elegans genome in 38,645 contigs (N50 contig length = ∼26 kb) containing 117 Mb. From the MinION genome assembly we identified the complex structures of both the acetaldehyde-induced mutation and the biolistic-mediated insertion. To date, this is the largest genome to be assembled exclusively from MinION data and is the first demonstration that the long reads of MinION sequencing can be used for whole genome assembly of large (100 Mb) genomes and the elucidation of complex genomic rearrangements.

Download Full-text

Whole-genome sequencing of 182 Bursaphelenchus xylophilus strains generates first long read based de novo genome assembly and reveals temperature associated population structure

10.22541/au.159352211.19983305 ◽

2020 ◽

Author(s):

Xiaolei Ding ◽

Yunfei Guo ◽

Jianren Ye ◽

Xiaoqin Wu ◽

Sixi Lin ◽

...

Keyword(s):

Population Structure ◽

Whole Genome Sequencing ◽

Genome Sequencing ◽

Genome Assembly ◽

De Novo ◽

Bursaphelenchus Xylophilus ◽

Whole Genome ◽

De Novo Genome Assembly ◽

Long Read

Download Full-text

PlasmidSeeker: identification of known plasmids from bacterial whole genome sequencing reads

PeerJ ◽

10.7717/peerj.4588 ◽

2018 ◽

Vol 6 ◽

pp. e4588 ◽

Cited By ~ 26

Author(s):

Märt Roosaare ◽

Mikk Puustusmaa ◽

Märt Möls ◽

Mihkel Vaher ◽

Maido Remm

Keyword(s):

Antibiotic Resistance ◽

Whole Genome Sequencing ◽

Genome Sequencing ◽

De Novo ◽

Sequence Data ◽

Source Code ◽

Complex Problem ◽

Whole Genome ◽

Short Read ◽

Plasmid Sequence

BackgroundPlasmids play an important role in the dissemination of antibiotic resistance, making their detection an important task. Using whole genome sequencing (WGS), it is possible to capture both bacterial and plasmid sequence data, but short read lengths make plasmid detection a complex problem.ResultsWe developed a tool named PlasmidSeeker that enables the detection of plasmids from bacterial WGS data without read assembly. The PlasmidSeeker algorithm is based onk-mers and usesk-mer abundance to distinguish between plasmid and bacterial sequences. We tested the performance of PlasmidSeeker on a set of simulated and real bacterial WGS samples, resulting in 100% sensitivity and 99.98% specificity.ConclusionPlasmidSeeker enables quick detection of known plasmids and complements existing tools that assemble plasmids de novo. The PlasmidSeeker source code is stored on GitHub:https://github.com/bioinfo-ut/PlasmidSeeker.

Download Full-text

Deciphering complex genome rearrangements in C. elegans using short-read whole genome sequencing

Scientific Reports ◽

10.1038/s41598-021-97764-9 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Tatiana Maroilley ◽

Xiao Li ◽

Matthew Oldach ◽

Francesca Jean ◽

Susan J. Stasiuk ◽

...

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

Chromosomal Rearrangements ◽

Large Deletion ◽

Genomic Rearrangements ◽

Model Organisms ◽

Whole Genome ◽

Short Read ◽

C Elegans ◽

Long Read

AbstractGenomic rearrangements cause congenital disorders, cancer, and complex diseases in human. Yet, they are still understudied in rare diseases because their detection is challenging, despite the advent of whole genome sequencing (WGS) technologies. Short-read (srWGS) and long-read WGS approaches are regularly compared, and the latter is commonly recommended in studies focusing on genomic rearrangements. However, srWGS is currently the most economical, accurate, and widely supported technology. In Caenorhabditis elegans (C. elegans), such variants, induced by various mutagenesis processes, have been used for decades to balance large genomic regions by preventing chromosomal crossover events and allowing the maintenance of lethal mutations. Interestingly, those chromosomal rearrangements have rarely been characterized on a molecular level. To evaluate the ability of srWGS to detect various types of complex genomic rearrangements, we sequenced three balancer strains using short-read Illumina technology. As we experimentally validated the breakpoints uncovered by srWGS, we showed that, by combining several types of analyses, srWGS enables the detection of a reciprocal translocation (eT1), a free duplication (sDp3), a large deletion (sC4), and chromoanagenesis events. Thus, applying srWGS to decipher real complex genomic rearrangements in model organisms may help designing efficient bioinformatics pipelines with systematic detection of complex rearrangements in human genomes.

Download Full-text

First de novo whole genome sequencing and assembly of the bar-headed goose

PeerJ ◽

10.7717/peerj.8914 ◽

2020 ◽

Vol 8 ◽

pp. e8914 ◽

Cited By ~ 1

Author(s):

Wen Wang ◽

Fang Wang ◽

Rongkai Hao ◽

Aizhen Wang ◽

Kirill Sharshov ◽

...

Keyword(s):

High Altitude ◽

Whole Genome Sequencing ◽

Genome Sequencing ◽

Genome Assembly ◽

De Novo ◽

Gene Prediction ◽

Repetitive Sequences ◽

Gene Families ◽

Whole Genome ◽

Sequencing Data

Background The bar-headed goose (Anser indicus) mainly inhabits the plateau wetlands of Asia. As a specialized high-altitude species, bar-headed geese can migrate between South and Central Asia and annually fly twice over the Himalayan mountains along the central Asian flyway. The physiological, biochemical and behavioral adaptations of bar-headed geese to high-altitude living and flying have raised much interest. However, to date, there is still no genome assembly information publicly available for bar-headed geese. Methods In this study, we present the first de novo whole genome sequencing and assembly of the bar-headed goose, along with gene prediction and annotation. Results 10X Genomics sequencing produced a total of 124 Gb sequencing data, which can cover the estimated genome size of bar-headed goose for 103 times (average coverage). The genome assembly comprised 10,528 scaffolds, with a total length of 1.143 Gb and a scaffold N50 of 10.09 Mb. Annotation of the bar-headed goose genome assembly identified a total of 102 Mb (8.9%) of repetitive sequences, 16,428 protein-coding genes, and 282 tRNAs. In total, we determined that there were 63 expanded and 20 contracted gene families in the bar-headed goose compared with the other 15 vertebrates. We also performed a positive selection analysis between the bar-headed goose and the closely related low-altitude goose, swan goose (Anser cygnoides), to uncover its genetic adaptations to the Qinghai-Tibetan Plateau. Conclusion We reported the currently most complete genome sequence of the bar-headed goose. Our assembly will provide a valuable resource to enhance further studies of the gene functions of bar-headed goose. The data will also be valuable for facilitating studies of the evolution, population genetics and high-altitude adaptations of the bar-headed geese at the genomic level.

Download Full-text

Whole-Genome Sequencing for Comparative Genomics and De Novo Genome Assembly

Methods in Molecular Biology - Mycobacteria Protocols ◽

10.1007/978-1-4939-2450-9_1 ◽

2015 ◽

pp. 1-16 ◽

Cited By ~ 5

Author(s):

Andrej Benjak ◽

Claudia Sala ◽

Ruben C. Hartkoorn

Keyword(s):

Comparative Genomics ◽

Whole Genome Sequencing ◽

Genome Sequencing ◽

Genome Assembly ◽

De Novo ◽

Whole Genome ◽

De Novo Genome Assembly

Download Full-text

Effective variant filtering and expected candidate variant yield in studies of rare human disease

npj Genomic Medicine ◽

10.1038/s41525-021-00227-3 ◽

2021 ◽

Vol 6 (1) ◽

Author(s):

Brent S. Pedersen ◽

Joe M. Brown ◽

Harriet Dashnow ◽

Amelia D. Wallace ◽

Matt Velinder ◽

...

Keyword(s):

Whole Genome Sequencing ◽

Rare Disease ◽

Genome Sequencing ◽

Autosomal Dominant ◽

De Novo ◽

Autosomal Dominant Inheritance ◽

Compound Heterozygous ◽

Whole Genome ◽

Dominant Inheritance ◽

Family Based

AbstractIn studies of families with rare disease, it is common to screen for de novo mutations, as well as recessive or dominant variants that explain the phenotype. However, the filtering strategies and software used to prioritize high-confidence variants vary from study to study. In an effort to establish recommendations for rare disease research, we explore effective guidelines for variant (SNP and INDEL) filtering and report the expected number of candidates for de novo dominant, recessive, and autosomal dominant modes of inheritance. We derived these guidelines using two large family-based cohorts that underwent whole-genome sequencing, as well as two family cohorts with whole-exome sequencing. The filters are applied to common attributes, including genotype-quality, sequencing depth, allele balance, and population allele frequency. The resulting guidelines yield ~10 candidate SNP and INDEL variants per exome, and 18 per genome for recessive and de novo dominant modes of inheritance, with substantially more candidates for autosomal dominant inheritance. For family-based, whole-genome sequencing studies, this number includes an average of three de novo, ten compound heterozygous, one autosomal recessive, four X-linked variants, and roughly 100 candidate variants following autosomal dominant inheritance. The slivar software we developed to establish and rapidly apply these filters to VCF files is available at https://github.com/brentp/slivar under an MIT license, and includes documentation and recommendations for best practices for rare disease analysis.

Download Full-text

A study of transposable element-associated structural variations (TASVs) using a de novo-assembled Korean genome

Experimental & Molecular Medicine ◽

10.1038/s12276-021-00586-y ◽

2021 ◽

Author(s):

Seyoung Mun ◽

Songmi Kim ◽

Wooseok Lee ◽

Keunsoo Kang ◽

Thomas J. Meyer ◽

...

Keyword(s):

Genome Sequencing ◽

Genome Assembly ◽

De Novo ◽

Personal Genome ◽

Human Populations ◽

Whole Genome ◽

Structural Variations ◽

Insert Size ◽

Human Genomes ◽

Next Generation Sequencing Ngs

AbstractAdvances in next-generation sequencing (NGS) technology have made personal genome sequencing possible, and indeed, many individual human genomes have now been sequenced. Comparisons of these individual genomes have revealed substantial genomic differences between human populations as well as between individuals from closely related ethnic groups. Transposable elements (TEs) are known to be one of the major sources of these variations and act through various mechanisms, including de novo insertion, insertion-mediated deletion, and TE–TE recombination-mediated deletion. In this study, we carried out de novo whole-genome sequencing of one Korean individual (KPGP9) via multiple insert-size libraries. The de novo whole-genome assembly resulted in 31,305 scaffolds with a scaffold N50 size of 13.23 Mb. Furthermore, through computational data analysis and experimental verification, we revealed that 182 TE-associated structural variation (TASV) insertions and 89 TASV deletions contributed 64,232 bp in sequence gain and 82,772 bp in sequence loss, respectively, in the KPGP9 genome relative to the hg19 reference genome. We also verified structural differences associated with TASVs by comparative analysis with TASVs in recent genomes (AK1 and TCGA genomes) and reported their details. Here, we constructed a new Korean de novo whole-genome assembly and provide the first study, to our knowledge, focused on the identification of TASVs in an individual Korean genome. Our findings again highlight the role of TEs as a major driver of structural variations in human individual genomes.

Download Full-text

Fast genetic mapping using insertion-deletion polymorphisms in Caenorhabditis elegans

Scientific Reports ◽

10.1038/s41598-021-90190-x ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Ho-Yon Hwang ◽

Jiou Wang

Keyword(s):

Caenorhabditis Elegans ◽

Genetic Mapping ◽

Whole Genome Sequencing ◽

Genome Sequencing ◽

Genetic Material ◽

Mapping Method ◽

Forward Genetics ◽

Whole Genome ◽

Nucleotide Polymorphisms ◽

Large Populations

AbstractGenetic mapping is used in forward genetics to narrow the list of candidate mutations and genes corresponding to the mutant phenotype of interest. Even with modern advances in biology such as efficient identification of candidate mutations by whole-genome sequencing, mapping remains critical in pinpointing the responsible mutation. Here we describe a simple, fast, and affordable mapping toolkit that is particularly suitable for mapping in Caenorhabditis elegans. This mapping method uses insertion-deletion polymorphisms or indels that could be easily detected instead of single nucleotide polymorphisms in commonly used Hawaiian CB4856 mapping strain. The materials and methods were optimized so that mapping could be performed using tiny amount of genetic material without growing many large populations of mutants for DNA purification. We performed mapping of previously known and unknown mutations to show strengths and weaknesses of this method and to present examples of completed mapping. For situations where Hawaiian CB4856 is unsuitable, we provide an annotated list of indels as a basis for fast and easy mapping using other wild isolates. Finally, we provide rationale for using this mapping method over other alternatives as a part of a comprehensive strategy also involving whole-genome sequencing and other methods.

Download Full-text

A long reads-based de-novo assembly of the genome of the Arlee homozygous line reveals chromosomal rearrangements in rainbow trout

G3 Genes|Genome|Genetics ◽

10.1093/g3journal/jkab052 ◽

2021 ◽

Author(s):

Guangtu Gao ◽

Susana Magadan ◽

Geoffrey C Waldbieser ◽

Ramey C Youngblood ◽

Paul A Wheeler ◽

...

Keyword(s):

Rainbow Trout ◽

Chromosome Number ◽

Genome Assembly ◽

De Novo Assembly ◽

De Novo ◽

Sequence Data ◽

Structural Variations ◽

High Coverage ◽

Haploid Chromosome Number ◽

Long Reads

Abstract Currently, there is still a need to improve the contiguity of the rainbow trout reference genome and to use multiple genetic backgrounds that will represent the genetic diversity of this species. The Arlee doubled haploid line was originated from a domesticated hatchery strain that was originally collected from the northern California coast. The Canu pipeline was used to generate the Arlee line genome de-novo assembly from high coverage PacBio long-reads sequence data. The assembly was further improved with Bionano optical maps and Hi-C proximity ligation sequence data to generate 32 major scaffolds corresponding to the karyotype of the Arlee line (2 N = 64). It is composed of 938 scaffolds with N50 of 39.16 Mb and a total length of 2.33 Gb, of which ∼95% was in 32 chromosome sequences with only 438 gaps between contigs and scaffolds. In rainbow trout the haploid chromosome number can vary from 29 to 32. In the Arlee karyotype the haploid chromosome number is 32 because chromosomes Omy04, 14 and 25 are divided into six acrocentric chromosomes. Additional structural variations that were identified in the Arlee genome included the major inversions on chromosomes Omy05 and Omy20 and additional 15 smaller inversions that will require further validation. This is also the first rainbow trout genome assembly that includes a scaffold with the sex-determination gene (sdY) in the chromosome Y sequence. The utility of this genome assembly is demonstrated through the improved annotation of the duplicated genome loci that harbor the IGH genes on chromosomes Omy12 and Omy13.

Download Full-text

Norgal: extraction and de novo assembly of mitochondrial DNA from whole-genome sequencing data

BMC Bioinformatics ◽

10.1186/s12859-017-1927-y ◽

2017 ◽

Vol 18 (1) ◽

Cited By ~ 21

Author(s):

Kosai Al-Nakeeb ◽

Thomas Nordahl Petersen ◽

Thomas Sicheritz-Pontén

Keyword(s):

Mitochondrial Dna ◽

Whole Genome Sequencing ◽

Genome Sequencing ◽

De Novo Assembly ◽

De Novo ◽

Whole Genome Sequencing Data ◽

Whole Genome ◽

Sequencing Data

Download Full-text