Using Short Read Sequencing to Characterise Balanced Reciprocal Translocations in Pigs

Mapping Intimacies ◽

10.21203/rs.3.rs-28830/v1 ◽

2020 ◽

Author(s):

Aniek Cornelia Bouwman ◽

Martijn F.L. Derks ◽

Marleen L.W.J. Broekhuijse ◽

Barbara Harlizius ◽

Roel F. Veerkamp

Keyword(s):

Sequence Data ◽

Variant Calling ◽

Reciprocal Translocations ◽

Short Read ◽

Short Read Sequencing ◽

Long Read ◽

Short Read Sequence ◽

Staining Techniques ◽

Chromosome Staining ◽

Paired End Sequencing

Abstract Background A balanced constitutional reciprocal translocation (RT) is a mutual exchange of terminal segments of two non-homologous chromosomes without any loss or gain of DNA in germline cells. Carriers of balanced RTs are viable individuals with no apparent phenotypical consequences. These animals produce, however, unbalanced gametes and show therefore reduced fertility and offspring with congenital abnormalities. This cytogenetic abnormality is usually detected using chromosome staining techniques. The aim of this study was to test the possibilities of using paired end short read sequencing for detection of balanced RTs in boars and investigate their breakpoints and junctions.Results Balanced RTs were recovered in a blinded analysis, using structural variant calling software DELLY, in 6 of the 7 carriers with 30 fold short read paired end sequencing. In 15 non-carriers we did not detect any RTs. Reducing the coverage to 20 fold, 15 fold and 10 fold showed that at least 20 fold coverage is required to obtain good results. One RT was not detected using the blind screening, however, a highly likely RT was discovered after unblinding. This RT was located in a repetitive region, showing the limitations of short read sequence data. The detailed analysis of the breakpoints and junctions suggested three junctions showing microhomology, three junctions with blunt-end ligation, and three micro-insertions at the breakpoint junctions. The RTs detected also showed to disrupt genes.Conclusions We conclude that paired end short read sequence data can be used to detect and characterize balanced reciprocal translocations, if sequencing depth is at least 20 fold coverage. However, translocations in repetitive areas may require large fragments or even long read sequence data.

Download Full-text

Using short read sequencing to characterise balanced reciprocal translocations in pigs

10.21203/rs.3.rs-28830/v3 ◽

2020 ◽

Author(s):

Aniek C. Bouwman ◽

Martijn F.L. Derks ◽

Marleen L.W.J. Broekhuijse ◽

Barbara Harlizius ◽

Roel F. Veerkamp

Keyword(s):

Sequence Data ◽

Variant Calling ◽

Reciprocal Translocations ◽

Short Read ◽

Short Read Sequencing ◽

Long Read ◽

Short Read Sequence ◽

Staining Techniques ◽

Chromosome Staining ◽

Paired End Sequencing

Abstract Background A balanced constitutional reciprocal translocation (RT) is a mutual exchange of terminal segments of two non-homologous chromosomes without any loss or gain of DNA in germline cells. Carriers of balanced RTs are viable individuals with no apparent phenotypical consequences. These animals produce, however, unbalanced gametes and show therefore reduced fertility and offspring with congenital abnormalities. This cytogenetic abnormality is usually detected using chromosome staining techniques. The aim of this study was to test the possibilities of using paired end short read sequencing for detection of balanced RTs in boars and investigate their breakpoints and junctions. Results Balanced RTs were recovered in a blinded analysis, using structural variant calling software DELLY, in 6 of the 7 carriers with 30 fold short read paired end sequencing. In 15 non-carriers we did not detect any RTs. Reducing the coverage to 20 fold, 15 fold and 10 fold showed that at least 20 fold coverage is required to obtain good results. One RT was not detected using the blind screening, however, a highly likely RT was discovered after unblinding. This RT was located in a repetitive region, showing the limitations of short read sequence data. The detailed analysis of the breakpoints and junctions suggested three junctions showing microhomology, three junctions with blunt-end ligation, and three micro-insertions at the breakpoint junctions. The RTs detected also showed to disrupt genes. Conclusions We conclude that paired end short read sequence data can be used to detect and characterize balanced reciprocal translocations, if sequencing depth is at least 20 fold coverage. However, translocations in repetitive areas may require large fragments or even long read sequence data.

Download Full-text

Using short read sequencing to characterise balanced reciprocal translocations in pigs

10.21203/rs.3.rs-28830/v2 ◽

2020 ◽

Author(s):

Aniek C. Bouwman ◽

Martijn F.L. Derks ◽

Marleen L.W.J. Broekhuijse ◽

Barbara Harlizius ◽

Roel F. Veerkamp

Keyword(s):

Sequence Data ◽

Variant Calling ◽

Reciprocal Translocations ◽

Short Read ◽

Short Read Sequencing ◽

Long Read ◽

Short Read Sequence ◽

Staining Techniques ◽

Chromosome Staining ◽

Paired End Sequencing

Abstract Background A balanced constitutional reciprocal translocation (RT) is a mutual exchange of terminal segments of two non-homologous chromosomes without any loss or gain of DNA in germline cells. Carriers of balanced RTs are viable individuals with no apparent phenotypical consequences. These animals produce, however, unbalanced gametes and show therefore reduced fertility and offspring with congenital abnormalities. This cytogenetic abnormality is usually detected using chromosome staining techniques. The aim of this study was to test the possibilities of using paired end short read sequencing for detection of balanced RTs in boars and investigate their breakpoints and junctions. Results Balanced RTs were recovered in a blinded analysis, using structural variant calling software DELLY, in 6 of the 7 carriers with 30 fold short read paired end sequencing. In 15 non-carriers we did not detect any RTs. Reducing the coverage to 20 fold, 15 fold and 10 fold showed that at least 20 fold coverage is required to obtain good results. One RT was not detected using the blind screening, however, a highly likely RT was discovered after unblinding. This RT was located in a repetitive region, showing the limitations of short read sequence data. The detailed analysis of the breakpoints and junctions suggested three junctions showing microhomology, three junctions with blunt-end ligation, and three micro-insertions at the breakpoint junctions. The RTs detected also showed to disrupt genes. Conclusions We conclude that paired end short read sequence data can be used to detect and characterize balanced reciprocal translocations, if sequencing depth is at least 20 fold coverage. However, translocations in repetitive areas may require large fragments or even long read sequence data.

Download Full-text

REscan: inferring repeat expansions and structural variation in paired-end short read sequencing data

Bioinformatics ◽

10.1093/bioinformatics/btaa753 ◽

2020 ◽

Author(s):

Russell Lewis McLaughlin

Keyword(s):

Structural Variation ◽

Sequence Data ◽

Neurological Diseases ◽

Repeat Expansion ◽

Sequencing Data ◽

Short Read ◽

Short Read Sequencing ◽

Repeat Expansions ◽

Paired End Sequencing

Abstract Motivation Repeat expansions are an important class of genetic variation in neurological diseases. However, the identification of novel repeat expansions using conventional sequencing methods is a challenge due to their typical lengths relative to short sequence reads and difficulty in producing accurate and unique alignments for repetitive sequence. However, this latter property can be harnessed in paired-end sequencing data to infer the possible locations of repeat expansions and other structural variation. Results This article presents REscan, a command-line utility that infers repeat expansion loci from paired-end short read sequencing data by reporting the proportion of reads orientated towards a locus that do not have an adequately mapped mate. A high REscan statistic relative to a population of data suggests a repeat expansion locus for experimental follow-up. This approach is validated using genome sequence data for 259 cases of amyotrophic lateral sclerosis, of which 24 are positive for a large repeat expansion in C9orf72, showing that REscan statistics readily discriminate repeat expansion carriers from non-carriers. Availabilityand implementation C source code at https://github.com/rlmcl/rescan (GNU General Public Licence v3).

Download Full-text

Rapid Mycobacterium tuberculosis spoligotyping from uncorrected long reads using Galru

10.1101/2020.05.31.126490 ◽

2020 ◽

Author(s):

Andrew J. Page ◽

Nabil-Fareed Alikhan ◽

Michael Strinden ◽

Thanh Le Viet ◽

Timofey Skvortsov

Keyword(s):

Mycobacterium Tuberculosis ◽

State Of The Art ◽

Sequence Data ◽

Human Pathogen ◽

Sequencing Data ◽

Short Read ◽

Short Read Sequencing ◽

Long Reads ◽

Long Read

AbstractSpoligotyping of Mycobacterium tuberculosis provides a subspecies classification of this major human pathogen. Spoligotypes can be predicted from short read genome sequencing data; however, no methods exist for long read sequence data such as from Nanopore or PacBio. We present a novel software package Galru, which can rapidly detect the spoligotype of a Mycobacterium tuberculosis sample from as little as a single uncorrected long read. It allows for near real-time spoligotyping from long read data as it is being sequenced, giving rapid sample typing. We compare it to the existing state of the art software and find it performs identically to the results obtained from short read sequencing data. Galru is freely available from https://github.com/quadram-institute-bioscience/galru under the GPLv3 open source licence.

Download Full-text

Haplotype-aware variant calling enables high accuracy in nanopore long-reads using deep neural networks

10.1101/2021.03.04.433952 ◽

2021 ◽

Author(s):

Kishwar Shafin ◽

Trevor Pesout ◽

Pi-Chuan Chang ◽

Maria Nattestad ◽

Alexey Kolesnikov ◽

...

Keyword(s):

De Novo ◽

Sequence Data ◽

Variant Calling ◽

High Accuracy ◽

Superior Performance ◽

Read Length ◽

Single Nucleotide Variants ◽

Single Nucleotide ◽

Short Read ◽

Long Read

Long-read sequencing has the potential to transform variant detection by reaching currently difficult-to-map regions and routinely linking together adjacent variations to enable read based phasing. Third-generation nanopore sequence data has demonstrated a long read length, but current interpretation methods for its novel pore-based signal have unique error profiles, making accurate analysis challenging. Here, we introduce a haplotype-aware variant calling pipeline PEPPER-Margin-DeepVariant that produces state-of-the-art variant calling results with nanopore data. We show that our nanopore-based method outperforms the short-read-based single nucleotide variant identification method at the whole genome-scale and produces high-quality single nucleotide variants in segmental duplications and low-mappability regions where short-read based genotyping fails. We show that our pipeline can provide highly-contiguous phase blocks across the genome with nanopore reads, contiguously spanning between 85% to 92% of annotated genes across six samples. We also extend PEPPER-Margin-DeepVariant to PacBio HiFi data, providing an efficient solution with superior performance than the current WhatsHap-DeepVariant standard. Finally, we demonstrate de novo assembly polishing methods that use nanopore and PacBio HiFi reads to produce diploid assemblies with high accuracy (Q35+ nanopore-polished and Q40+ PacBio-HiFi-polished).

Download Full-text

Genomic analysis of carbapenemase-encoding plasmids from Klebsiella pneumoniae across Europe highlights three major patterns of dissemination

10.1101/2019.12.19.873935 ◽

2019 ◽

Cited By ~ 3

Author(s):

Sophia David ◽

Victoria Cohen ◽

Sandra Reuter ◽

Anna E. Sheppard ◽

Tommaso Giani ◽

...

Keyword(s):

Klebsiella Pneumoniae ◽

Sequence Data ◽

Genomic Analysis ◽

Carbapenem Resistance ◽

Short Read ◽

Primary Mechanism ◽

Carbapenemase Gene ◽

Long Read ◽

Short Read Sequence ◽

Stable Association

AbstractThe incidence of Klebsiella pneumoniae infections that are resistant to carbapenems, a last-line class of antibiotics, has been rapidly increasing. The primary mechanism of carbapenem resistance is production of carbapenemase enzymes, which are most frequently encoded on plasmids by blaOXA-48-like, blaVIM, blaNDM and blaKPC genes. Using short-read sequence data, we previously analysed genomes of 1717 isolates from the K. pneumoniae species complex submitted during the European survey of carbapenemase-producing Enterobacteriaceae (EuSCAPE). Here, we investigated the diversity, prevalence and transmission dynamics of carbapenemase-encoding plasmids using long-read sequencing of representative isolates (n=79) from this collection in combination with short-read data from all isolates. We highlight three major patterns by which carbapenemase genes have disseminated via plasmids. First, blaOXA-48-like genes have spread across diverse lineages primarily via a highly conserved, epidemic pOXA-48-like plasmid. Second, blaVIM and blaNDM genes have spread via transient associations of diverse plasmids with numerous lineages. Third, blaKPC genes have transmitted predominantly by stable association with one clonal lineage (ST258/512) despite frequent mobilisation between pre-existing yet diverse plasmids within the lineage. Despite contrasts in these three modes of carbapenemase gene spread, which can be summarised as using one plasmid/multiple lineages, multiple plasmids/multiple lineages, and multiple plasmids/one lineage, all are underpinned by significant propagation along high-risk clonal lineages.

Download Full-text

Paragraph: a graph-based structural variant genotyper for short-read sequence data

Genome Biology ◽

10.1186/s13059-019-1909-7 ◽

2019 ◽

Vol 20 (1) ◽

Cited By ~ 19

Author(s):

Sai Chen ◽

Peter Krusche ◽

Egor Dolzhenko ◽

Rachel M. Sherman ◽

Roman Petrovski ◽

...

Keyword(s):

Sequence Data ◽

Whole Genome Sequence ◽

Whole Genome ◽

Structural Variations ◽

Short Read ◽

Three Samples ◽

Genomics Research ◽

Long Read ◽

Short Read Sequence ◽

Population Scale

AbstractAccurate detection and genotyping of structural variations (SVs) from short-read data is a long-standing area of development in genomics research and clinical sequencing pipelines. We introduce Paragraph, an accurate genotyper that models SVs using sequence graphs and SV annotations. We demonstrate the accuracy of Paragraph on whole-genome sequence data from three samples using long-read SV calls as the truth set, and then apply Paragraph at scale to a cohort of 100 short-read sequenced samples of diverse ancestry. Our analysis shows that Paragraph has better accuracy than other existing genotypers and can be applied to population-scale studies.

Download Full-text

Paragraph: A graph-based structural variant genotyper for short-read sequence data

10.1101/635011 ◽

2019 ◽

Cited By ~ 5

Author(s):

Sai Chen ◽

Peter Krusche ◽

Egor Dolzhenko ◽

Rachel M. Sherman ◽

Roman Petrovski ◽

...

Keyword(s):

Sequence Data ◽

Whole Genome Sequence ◽

Whole Genome ◽

Structural Variations ◽

Short Read ◽

Three Samples ◽

Genomics Research ◽

Long Read ◽

Short Read Sequence ◽

Population Scale

AbstractAccurate detection and genotyping of structural variations (SVs) from short-read data is a long-standing area of development in genomics research and clinical sequencing pipelines. We introduce Paragraph, an accurate genotyper that models SVs using sequence graphs and SV annotations. We demonstrate the accuracy of Paragraph on whole-genome sequence data from three samples using long read SV calls as the truth set, and then apply Paragraph at scale to a cohort of 100 short-read sequenced samples of diverse ancestry. Our analysis shows that Paragraph has better accuracy than other existing genotypers and can be applied to population-scale studies.

Download Full-text

Biosynthetic potential of uncultured Antarctic soil bacteria revealed through long-read metagenomic sequencing

The ISME Journal ◽

10.1038/s41396-021-01052-3 ◽

2021 ◽

Author(s):

Valentin Waschulin ◽

Chiara Borsetto ◽

Robert James ◽

Kevin K. Newsham ◽

Stefano Donadio ◽

...

Keyword(s):

Genome Mining ◽

Gene Clusters ◽

Biosynthetic Gene Cluster ◽

Full Length ◽

Metagenomic Sequencing ◽

Short Read ◽

Short Read Sequencing ◽

Rich Diversity ◽

Long Read ◽

The Rich

AbstractThe growing problem of antibiotic resistance has led to the exploration of uncultured bacteria as potential sources of new antimicrobials. PCR amplicon analyses and short-read sequencing studies of samples from different environments have reported evidence of high biosynthetic gene cluster (BGC) diversity in metagenomes, indicating their potential for producing novel and useful compounds. However, recovering full-length BGC sequences from uncultivated bacteria remains a challenge due to the technological restraints of short-read sequencing, thus making assessment of BGC diversity difficult. Here, long-read sequencing and genome mining were used to recover >1400 mostly full-length BGCs that demonstrate the rich diversity of BGCs from uncultivated lineages present in soil from Mars Oasis, Antarctica. A large number of highly divergent BGCs were not only found in the phyla Acidobacteriota, Verrucomicrobiota and Gemmatimonadota but also in the actinobacterial classes Acidimicrobiia and Thermoleophilia and the gammaproteobacterial order UBA7966. The latter furthermore contained a potential novel family of RiPPs. Our findings underline the biosynthetic potential of underexplored phyla as well as unexplored lineages within seemingly well-studied producer phyla. They also showcase long-read metagenomic sequencing as a promising way to access the untapped genetic reservoir of specialised metabolite gene clusters of the uncultured majority of microbes.

Download Full-text

Closed Genome Sequence of Salmonella enterica Serovar Richmond Strain CFSAN000191, Obtained with Nanopore Sequencing

Microbiology Resource Announcements ◽

10.1128/mra.01472-18 ◽

2018 ◽

Vol 7 (23) ◽

Cited By ~ 3

Author(s):

Narjol González-Escalona ◽

Kuan Yao ◽

Maria Hoffmann

Keyword(s):

Genome Sequence ◽

Salmonella Enterica ◽

Nanopore Sequencing ◽

Short Read ◽

Content Type ◽

Short Read Sequencing ◽

Long Read

Here we report the genome sequence of Salmonella enterica serovar Richmond strain CFSAN000191, isolated from tilapia from Thailand in 2005. The genome was determined by a combination of long-read and short-read sequencing.

Download Full-text