long read
Recently Published Documents


TOTAL DOCUMENTS

2351
(FIVE YEARS 1814)

H-INDEX

57
(FIVE YEARS 24)

2022 ◽  
Author(s):  
Claire M&eacuterot ◽  
Kristina S R Stenl&oslashkk ◽  
Clare Venney ◽  
Martin Laporte ◽  
Michel Moser ◽  
...  

The parallel evolution of nascent pairs of ecologically differentiated species offers an opportunity to get a better glimpse at the genetic architecture of speciation. Of particular interest is our recent ability to consider a wider range of genomic variants, not only single-nucleotide polymorphisms (SNPs), thanks to long-read sequencing technology. We can now identify structural variants (SVs) like insertions, deletions, and other structural rearrangements, allowing further insights into the genetic architecture of speciation and how different variants are involved in species differentiation. Here, we investigated genomic patterns of differentiation between sympatric species pairs (Dwarf and Normal) belonging to the Lake Whitefish (Coregonus clupeaformis) species complex. We assembled the first reference genomes for both Dwarf and Normal Lake Whitefish, annotated the transposable elements, and analysed the genome in the light of related coregonid species. Next, we used a combination of long-read and short-read sequencing to characterize SVs and genotype them at population-scale using genome-graph approaches, showing that SVs cover five times more of the genome than SNPs. We then integrated both SNPs and SVs to investigate the genetic architecture of species differentiation in two different lakes and highlighted an excess of shared outliers of differentiation. In particular, a large fraction of SVs differentiating the two species was driven by transposable elements (TEs), suggesting that TE accumulation during a period of allopatry predating secondary contact may have been a key process in the speciation of the Dwarf and Normal Whitefish. Altogether, our results suggest that SVs play an important role in speciation and that by combining second and third generation sequencing we now have the ability to integrate SVs into speciation genomics.


2022 ◽  
Author(s):  
Garima Singh ◽  
Anjuli Calchera ◽  
Dominik Merges ◽  
Henrique Valim ◽  
Juergen Otte ◽  
...  

Natural products of lichen-forming fungi are structurally diverse and have a variety of medicinal properties. Yet they a have limited implementation in industry as for most of the natural products, the corresponding genes remain unknown. Here we implement a long-read sequencing and bioinformatic approach to identify the biosynthetic gene cluster of the bioactive natural product gyrophoric acid (GA). Using 15 high-quality genomes representing nine GA-producing species of the lichen-forming fungal genus Umbilicaria, we identify the most likely GA cluster and investigate cluster gene organization and composition across the nine species. Our results show that GA clusters are promiscuous within Umbilicaria with only three genes that are conserved across species, including the PKS gene. In addition, our results suggest that the same cluster codes for different but structurally similar NPs, i.e., GA, umbilicaric acid and hiascic acid, bringing new evidence that lichen metabolite diversity is also generated through regulatory mechanisms at the molecular level. Ours is the first study to identify the most likely GA cluster. This information is essential for opening up avenues for biotechnological approaches to producing and modifying GA, and possibly other lichen compounds. We show that bioinformatics approaches are useful in linking genes and potentially associated natural products. Genome analyses help unlocking the pharmaceutical potential of organisms such as lichens, which are biosynthetically diverse, but slow growing, and usually uncultivable due to their symbiotic nature.


2022 ◽  
Author(s):  
Linyi Zhang ◽  
Samridhi Chaturvedi ◽  
Chris Nice ◽  
Lauren Lucas ◽  
Zachariah Gompert

Structural variants (SVs) can promote speciation by directly causing reproductive isolation or by suppressing recombination across large genomic regions. Whereas examples of each mechanism have been documented, systematic tests of the role of SVs in speciation are lacking. Here, we take advantage of long-read (Oxford nanopore) whole-genome sequencing and a hybrid zone between two Lycaeides butterfly taxa (L. melissa and Jackson Hole Lycaeides) to comprehensively evaluate genome-wide patterns of introgression for SVs and relate these patterns to hypotheses about speciation. We found >100,000 SVs segregating within or between the two hybridizing species. SVs and SNPs exhibited similar levels of genetic differentiation between species, with the exception of inversions, which were more differentiated. We detected credible variation in patterns of introgression among SV loci in the hybrid zone, with 562 of 1419 ancestry-informative SVs exhibiting genomic clines that deviating from null expectations based on genome-average ancestry. Overall, hybrids exhibited a directional shift towards Jackson Hole Lycaeides ancestry at SV loci, consistent with the hypothesis that these loci experienced more selection on average then SNP loci. Surprisingly, we found that deletions, rather than inversions, showed the highest skew towards excess introgression from Jackson Hole Lycaeides. Excess Jackson Hole Lycaeides ancestry in hybrids was also especially pronounced for Z-linked SVs and inversions containing many genes. In conclusion, our results show that SVs are ubiquitous and suggest that SVs in general, but especially deletions, might contribute disproportionately to hybrid fitness and thus (partial) reproductive isolation.


2022 ◽  
Author(s):  
Ming Wen ◽  
Qiaowei Pan ◽  
Elodie Jouanno ◽  
Jerome Montfort ◽  
Margot Zahm ◽  
...  

The evolution of sex determination (SD) mechanisms in teleost fishes is amazingly dynamic, as reflected by the variety of different master sex-determining genes identified, even sometimes among closely related species. Pangasiids are a group of economically important catfishes in many South-Asian countries, but little is known about their sex determination system. Here, we generated novel genomic resources for 12 Pangasiid species and provided a first characterization of their SD system. Based on an Oxford Nanopore long-read chromosome-scale high quality genome assembly of the striped catfish Pangasianodon hypophthalmus, we identified a duplication of the anti-Mullerian hormone receptor type II gene (amhr2), which was further characterized as being sex-linked in males and expressed only in testicular samples. These first results point to a male-specific duplication on the Y chromosome (amhr2by) of the autosomal amhr2a. Sequence annotation revealed that the P. hypophthalmus Amhr2by is truncated in its N-terminal domain, lacking the cysteine-rich extracellular part of the receptor that is crucial for ligand binding, suggesting a potential route for its neofunctionalization. Short-read genome sequencing and reference-guided assembly of 11 additional Pangasiid species, along with sex-linkage studies, revealed that this truncated amhr2by duplication is also conserved as a male-specific gene in many Pangasiids. Reconstructions of the amhr2 phylogeny suggested that amhr2by arose from an ancient duplication / insertion event at the root of the Siluroidei radiation that is dated around 100 million years ago. Altogether these results bring multiple lines of evidence supporting that amhr2by is an ancient and conserved master sex-determining gene in Pangasiid catfishes, a finding that highlights the recurrent usage of the transforming growth factor β pathway in teleost sex determination and brings another empirical case towards the understanding of the dynamics or stability of sex determination systems.


2022 ◽  
Author(s):  
Michael A Schon ◽  
Stefan Lutzmayer ◽  
Falko Hofmann ◽  
Michael D Nodine

Accurate annotation of transcript isoforms is crucial for functional genomics research, but automated methods for reconstructing full-length transcripts from RNA sequencing (RNA-seq) data are imprecise. We developed a generalized transcript assembly framework called Bookend that incorporates data from multiple modes of RNA-seq, with a focus on identifying, labeling, and deconvoluting RNA 5′ and 3′ ends. Through end-guided assembly with Bookend we demonstrate that correctly modeling transcript start and end sites is essential for precise transcript assembly. Furthermore, we discover that reads from full-length single-cell RNA-seq (scRNA-seq) methods are sparsely end-labeled, and that these ends are sufficient to dramatically improve precision of assembly in single cells. Finally, we show that hybrid assembly across short-read, long-read, and end-capture RNA-seq in the model plant Arabidopsis and meta-assembly of single mouse embryonic stem cells (mESCs) are both capable of producing tissue-specific end-to-end transcript annotations of comparable or superior quality to existing reference isoforms.


2022 ◽  
Author(s):  
Kar-Tong Tan ◽  
Michael Slevin ◽  
Matthew Meyerson ◽  
Heng Li

Nanopore long-read genome sequencing is emerging as a potential approach for the study of genomes including long repetitive elements like telomeres. Here, we report extensive basecalling induced errors at telomere repeats across nanopore datasets, sequencing platforms, basecallers, and basecalling models. We found that telomeres which are represented by (TTAGGG)n and (CCCTAA)n repeats in many organisms were frequently miscalled (~40-50% of reads) as (TTAAAA)n, or as (CTTCTT)n and (CCCTGG)n repeats respectively in a strand-specific manner during nanopore sequencing. We showed that this miscalling is likely caused by the high similarity of current profiles between telomeric repeats and these repeat artefacts, leading to mis-assignment of electrical current profiles during basecalling. We further demonstrated that tuning of nanopore basecalling models, and selective application of the tuned models to telomeric reads led to improved recovery and analysis of telomeric regions, with little detected negative impact on basecalling of other genomic regions. Our study thus highlights the importance of verifying nanopore basecalls in long, repetitive, and poorly defined regions of the genome, and showcases how such artefacts in regions like telomeres can potentially be resolved by improvements in nanopore basecalling models.


2022 ◽  
Author(s):  
David Pellow ◽  
Abhinav Dutta ◽  
Ron Shamir

As sequencing datasets keep growing larger, time and memory efficiency of read mapping are becoming more critical. Many clever algorithms and data structures were used to develop mapping tools for next generation sequencing, and in the last few years also for third generation long reads. A key idea in mapping algorithms is to sketch sequences with their minimizers. Recently, syncmers were introduced as an alternative sketching method that is more robust to mutations and sequencing errors. Here we introduce parameterized syncmer schemes, and provide a theoretical analysis for multi-parameter schemes. By combining these schemes with downsampling or minimizers we can achieve any desired compression and window guarantee. We introduced syncmer schemes into the popular minimap2 and Winnowmap2 mappers. In tests on simulated and real long read data from a variety of genomes, the syncmer-based algorithms reduced unmapped reads by 20-60% at high compression while using less memory. The advantage of syncmer-based mapping was even more pronounced at lower sequence identity. At sequence identity of 65-75% and medium compression, syncmer mappers had 50-60% fewer unmapped reads, and ∼ 10% fewer of the reads that did map were incorrectly mapped. We conclude that syncmer schemes improve mapping under higher error and mutation rates. This situation happens, for example, when the high error rate of long reads is compounded by a high mutation rate in a cancer tumor, or due to differences between strains of viruses or bacteria.


BMC Genomics ◽  
2022 ◽  
Vol 23 (1) ◽  
Author(s):  
David J. Wright ◽  
Nicola A. L. Hall ◽  
Naomi Irish ◽  
Angela L. Man ◽  
Will Glynn ◽  
...  

Abstract Background Alternative splicing is a key mechanism underlying cellular differentiation and a driver of complexity in mammalian neuronal tissues. However, understanding of which isoforms are differentially used or expressed and how this affects cellular differentiation remains unclear. Long read sequencing allows full-length transcript recovery and quantification, enabling transcript-level analysis of alternative splicing processes and how these change with cell state. Here, we utilise Oxford Nanopore Technologies sequencing to produce a custom annotation of a well-studied human neuroblastoma cell line SH-SY5Y, and to characterise isoform expression and usage across differentiation. Results We identify many previously unannotated features, including a novel transcript of the voltage-gated calcium channel subunit gene, CACNA2D2. We show differential expression and usage of transcripts during differentiation identifying candidates for future research into state change regulation. Conclusions Our work highlights the potential of long read sequencing to uncover previously unknown transcript diversity and mechanisms influencing alternative splicing.


Sign in / Sign up

Export Citation Format

Share Document