Expansion of 5’ UTR CGG repeat in RILPL1 is associated with oculopharyngodistal myopathy

Mapping Intimacies ◽

10.1101/2021.09.18.21263669 ◽

2021 ◽

Author(s):

Xinzhuang Yang ◽

Dingding Zhang ◽

Pidong Li ◽

Jingwen Niu ◽

Dan Xu ◽

...

Keyword(s):

Trinucleotide Repeat ◽

Whole Genome ◽

Adult Onset ◽

Methylation Analysis ◽

Cgg Repeat ◽

Coding Regions ◽

Generation Family ◽

Long Read ◽

Repeat Expansions ◽

Muscle Disorder

AbstractOculopharyngodistal myopathy is an adult-onset degenerative muscle disorder characterized by ptosis, ophthalmoplegia and weakness of the facial, pharyngeal and limb muscles. Trinucleotide repeat expansions in non-coding regions of LRP12, G1PC1and NOTCH2NLC were recently reported to be the etiologies for OPDM. However, a significant portion of OPDM patients still have unknown genetic causes. In this study, we performed long-read whole-genome sequencing in a large five-generation family of 156 individuals, including 22 patients diagnosed with typical OPDM and identified CGG repeat expansions in RILPL1 gene in all patients we tested while not in unaffected family members. Methylation analysis indicated that methylation levels of the RILPL1 gene were unaltered in OPDM patients, which was in consistent with previous reports. Our findings first provided evidences that RILPL1 were associated OPDM which we suggested as OPDM type 4.

Download Full-text

Long-read sequencing across the C9orf72 ‘GGGGCC’ repeat expansion: implications for clinical use and genetic discovery efforts in human disease

10.1101/176651 ◽

2018 ◽

Cited By ~ 1

Author(s):

Mark T. W. Ebbert ◽

Stefan Farrugia ◽

Jonathon Sens ◽

Karen Jansen-West ◽

Tania F. Gendron ◽

...

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

Repeat Expansion ◽

Whole Genome ◽

Short Read ◽

Short Read Sequencing ◽

Sequencing Technologies ◽

Long Read ◽

Repeat Expansions ◽

Targeted Approach

AbstractBackground: Many neurodegenerative diseases are caused by nucleotide repeat expansions, but most expansions, like the C9orf72 ‘GGGGCC’ (G4C2) repeat that causes approximately 5-7% of all amyotrophic lateral sclerosis (ALS) and frontotemporal dementia (FTD) cases, are too long to sequence using short-read sequencing technologies. It is unclear whether long-read sequencing technologies can traverse these long, challenging repeat expansions. Here, we demonstrate that two long-read sequencing technologies, Pacific Biosciences’ (PacBio) and Oxford Nanopore Technologies’ (ONT), can sequence through disease-causing repeats cloned into plasmids, including the FTD/ALS-causing G4C2 repeat expansion. We also report the first long-read sequencing data characterizing the C9orf72 G4C2 repeat expansion at the nucleotide level in two symptomatic expansion carriers using PacBio whole-genome sequencing and a no-amplification (No-Amp) targeted approach based on CRISPR/Cas9.Results: Both the PacBio and ONT platforms successfully sequenced through the repeat expansions in plasmids. Throughput on the MinlON was a challenge for whole-genome sequencing; we were unable to attain reads covering the human C9orf72 repeat expansion using 15 flow cells. We obtained 8x coverage across the C9orf72 locus using the PacBio Sequel, accurately reporting the unexpanded allele at eight repeats, and reading through the entire expansion with 1324 repeats (7941 nucleotides). Using the No-Amp targeted approach, we attained >800x coverage and were able to identify the unexpanded allele, closely estimate expansion size, and assess nucleotide content in a single experiment. We estimate the individual’s repeat region was >99% G4C2 content, though we cannot rule out small interruptions.Conclusions: Our findings indicate that long-read sequencing is well suited to characterizing known repeat expansions, and for discovering new disease-causing, disease-modifying, or risk-modifying repeat expansions that have gone undetected with conventional short-read sequencing. The PacBio No-Amp targeted approach may have future potential in clinical and genetic counseling environments. Larger and deeper long-read sequencing studies in C9orf72 expansion carriers will be important to determine heterogeneity and whether the repeats are interrupted by non-G4C2 content, potentially mitigating or modifying disease course or age of onset, as interruptions are known to do in other repeat-expansion disorders. These results have broad implications across all diseases where the genetic etiology remains unclear.

Download Full-text

Straglr: discovering and genotyping tandem repeat expansions using whole genome long-read sequences

Genome Biology ◽

10.1186/s13059-021-02447-3 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Readman Chiu ◽

Indhu-Shree Rajan-Babu ◽

Jan M. Friedman ◽

Inanc Birol

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

Tandem Repeat ◽

Neurological Disorders ◽

Software Tool ◽

Whole Genome Sequencing Data ◽

Whole Genome ◽

Sequencing Data ◽

Long Read ◽

Repeat Expansions

AbstractTandem repeat (TR) expansion is the underlying cause of over 40 neurological disorders. Long-read sequencing offers an exciting avenue over conventional technologies for detecting TR expansions. Here, we present Straglr, a robust software tool for both targeted genotyping and novel expansion detection from long-read alignments. We benchmark Straglr using various simulations, targeted genotyping data of cell lines carrying expansions of known diseases, and whole genome sequencing data with chromosome-scale assembly. Our results suggest that Straglr may be useful for investigating disease-associated TR expansions using long-read sequencing.

Download Full-text

An update on the neurological short tandem repeat expansion disorders and the emergence of long-read sequencing diagnostics

Acta Neuropathologica Communications ◽

10.1186/s40478-021-01201-x ◽

2021 ◽

Vol 9 (1) ◽

Author(s):

Sanjog R. Chintalaphani ◽

Sandy S. Pineda ◽

Ira W. Deveson ◽

Kishore R. Kumar

Keyword(s):

Tandem Repeat ◽

Short Tandem Repeat ◽

Fragile X ◽

Cost Effective ◽

Repeat Expansion ◽

Main Body ◽

Cgg Repeat ◽

Long Read ◽

Repeat Expansions ◽

Short Tandem

Abstract Background Short tandem repeat (STR) expansion disorders are an important cause of human neurological disease. They have an established role in more than 40 different phenotypes including the myotonic dystrophies, Fragile X syndrome, Huntington’s disease, the hereditary cerebellar ataxias, amyotrophic lateral sclerosis and frontotemporal dementia. Main body STR expansions are difficult to detect and may explain unsolved diseases, as highlighted by recent findings including: the discovery of a biallelic intronic ‘AAGGG’ repeat in RFC1 as the cause of cerebellar ataxia, neuropathy, and vestibular areflexia syndrome (CANVAS); and the finding of ‘CGG’ repeat expansions in NOTCH2NLC as the cause of neuronal intranuclear inclusion disease and a range of clinical phenotypes. However, established laboratory techniques for diagnosis of repeat expansions (repeat-primed PCR and Southern blot) are cumbersome, low-throughput and poorly suited to parallel analysis of multiple gene regions. While next generation sequencing (NGS) has been increasingly used, established short-read NGS platforms (e.g., Illumina) are unable to genotype large and/or complex repeat expansions. Long-read sequencing platforms recently developed by Oxford Nanopore Technology and Pacific Biosciences promise to overcome these limitations to deliver enhanced diagnosis of repeat expansion disorders in a rapid and cost-effective fashion. Conclusion We anticipate that long-read sequencing will rapidly transform the detection of short tandem repeat expansion disorders for both clinical diagnosis and gene discovery.

Download Full-text

Accurate characterization of expanded tandem repeat length and sequence through whole genome long-read sequencing on PromethION

10.1101/439026 ◽

2018 ◽

Cited By ~ 7

Author(s):

Arne De Roeck ◽

Wouter De Coster ◽

Liene Bossaerts ◽

Rita Cacace ◽

Tim De Pooter ◽

...

Keyword(s):

Southern Blotting ◽

Flow Cell ◽

Current Data ◽

Whole Genome ◽

Genome Coverage ◽

Base Calling ◽

Length Estimation ◽

Long Read ◽

Repeat Expansions

AbstractTandem repeats (TRs) can cause disease through their length, sequence motif interruptions, and nucleotide modifications. For many TRs, however, these features are very difficult - if not impossible - to assess, requiring low-throughput and labor-intensive assays. One example is a VNTR in ABCA7 for which we recently discovered that expanded alleles strongly increase risk of Alzheimer’s disease. Here, we investigated the potential of long-read whole genome sequencing to surmount these challenges, using the high-throughput PromethION platform from Oxford Nanopore Technologies. To overcome the limitations of conventional base calling and alignment, we developed an algorithm to study the TR size and sequence directly on raw PromethION current data.We report the long-read sequencing of multiple human genomes (n = 11) using only a single sequencing run and flow cell per individual. With the use of fresh DNA extractions, DNA shearing to approximately 20kb and size selection, we obtained an average output of 70 gigabases (Gb) per flow cell, corresponding to a 21x genome coverage, and a maximum yield of 98 Gb (30x genome coverage). All ABCA7 VNTR alleles, including expansions up to 10,000 bases, were spanned by long sequencing reads, validated by Southern blotting. Classical approaches of TR length estimation suffered from low accuracy, low precision, DNA strand effects and/or inability to call pathogenic repeat expansions. In contrast, our novel NanoSatellite algorithm, which circumvents base calling by using dynamic time warping on raw PromethION current data, achieved more than 90% accuracy and high precision (5.6% relative standard deviation) of TR length estimation, and detected all clinically relevant repeat expansions. In addition, we identified alternative TR sequence motifs with high consistency, allowing determination of TR sequence and distinction of VNTR alleles with homozygous length.In conclusion, we validated the robustness of single-experiment whole genome long-read sequencing on PromethION, a prerequisite for application of long-read sequencing in the clinic. In addition, we outperformed Southern blotting, enabling improved characterization of the role of expanded ABCA7 VNTR alleles in Alzheimer’s disease, and opening new opportunities for TR research.

Download Full-text

Mind the gaps – ignoring errors in long read assemblies critically affects protein prediction

10.1101/285049 ◽

2018 ◽

Cited By ~ 9

Author(s):

Mick Watson

Keyword(s):

Genome Sequencing ◽

Single Molecule ◽

Whole Genome ◽

Protein Coding ◽

Single Molecule Sequencing ◽

Truncated Protein ◽

Coding Regions ◽

Sequencing Technologies ◽

Protein Prediction ◽

Long Read

Long read, single molecule sequencing technologies are now routinely used for whole-genome sequencing and assembly. However, even after multiple rounds of correction, many errors remain which can critically affect protein coding regions, resulting in significantly altered and often truncated protein predictions.

Download Full-text

Patients With Extreme Early Onset Juvenile Huntington Disease Can Have Delays in Diagnosis: A Case Report and Literature Review

Child Neurology Open ◽

10.1177/2329048x211036137 ◽

2021 ◽

Vol 8 ◽

pp. 2329048X2110361

Author(s):

Ashley A. Moeller ◽

Marcia V. Felker ◽

Jennifer A. Brault ◽

Laura C. Duncan ◽

Rizwan Hamid ◽

...

Keyword(s):

Symptom Onset ◽

Huntington Disease ◽

Early Onset ◽

Trinucleotide Repeat ◽

Cerebellar Atrophy ◽

Cag Repeats ◽

Ataxic Gait ◽

Repeat Expansions ◽

Delays In Diagnosis ◽

Juvenile Huntington Disease

Huntington disease (HD) is caused by a pathologic cytosine-adenine-guanine (CAG) trinucleotide repeat expansion in the HTT gene. Typical adult-onset disease occurs with a minimum of 40 repeats. With more than 60 CAG repeats, patients can have juvenile-onset disease (jHD), with symptom onset by the age of 20 years. We report a case of a boy with extreme early onset, paternally inherited jHD, with symptom onset between 18 and 24 months. He was found to have 250 to 350 CAG repeats, one of the largest repeat expansions published to date. At initial presentation, he had an ataxic gait, truncal titubation, and speech delay. Magnetic resonance imaging showed cerebellar atrophy. Over time, he continued to regress and became nonverbal, wheelchair-bound, gastrostomy-tube dependent, and increasingly rigid. His young age at presentation and the ethical concerns regarding HD testing in minors delayed his diagnosis.

Download Full-text

Secondary structural choice of DNA and RNA associated with CGG/CCG trinucleotide repeat expansion rationalizes the RNA misprocessing in FXTAS

Scientific Reports ◽

10.1038/s41598-021-87097-y ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Yogeeshwar Ajjugal ◽

Narendar Kolimi ◽

Thenmalarchelvi Rathinavelan

Keyword(s):

Rna Binding ◽

Trinucleotide Repeat ◽

Rna Binding Proteins ◽

Fragile X ◽

Repeat Expansion ◽

Mobility Shift ◽

Cgg Repeat ◽

Dna And Rna ◽

Structural Choice ◽

Sense Strand

AbstractCGG tandem repeat expansion in the 5′-untranslated region of the fragile X mental retardation-1 (FMR1) gene leads to unusual nucleic acid conformations, hence causing genetic instabilities. We show that the number of G…G (in CGG repeat) or C…C (in CCG repeat) mismatches (other than A…T, T…A, C…G and G…C canonical base pairs) dictates the secondary structural choice of the sense and antisense strands of the FMR1 gene and their corresponding transcripts in fragile X-associated tremor/ataxia syndrome (FXTAS). The circular dichroism (CD) spectra and electrophoretic mobility shift assay (EMSA) reveal that CGG DNA (sense strand of the FMR1 gene) and its transcript favor a quadruplex structure. CD, EMSA and molecular dynamics (MD) simulations also show that more than four C…C mismatches cannot be accommodated in the RNA duplex consisting of the CCG repeat (antisense transcript); instead, it favors an i-motif conformational intermediate. Such a preference for unusual secondary structures provides a convincing justification for the RNA foci formation due to the sequestration of RNA-binding proteins to the bidirectional transcripts and the repeat-associated non-AUG translation that are observed in FXTAS. The results presented here also suggest that small molecule modulators that can destabilize FMR1 CGG DNA and RNA quadruplex structures could be promising candidates for treating FXTAS.

Download Full-text

Long-read whole-genome sequencing identified a partial MBD5 deletion in an exome-negative patient with neurodevelopmental disorder

Journal of Human Genetics ◽

10.1038/s10038-020-00893-8 ◽

2021 ◽

Author(s):

Sachiko Ohori ◽

Rie S. Tsuburaya ◽

Masako Kinoshita ◽

Etsuko Miyagi ◽

Takeshi Mizuguchi ◽

...

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

Neurodevelopmental Disorder ◽

Whole Genome ◽

Negative Patient ◽

Long Read

Download Full-text

Comprehensive identification of transposable element insertions using multiple sequencing technologies

Nature Communications ◽

10.1038/s41467-021-24041-8 ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

Chong Chu ◽

Rebeca Borges-Monroy ◽

Vinayak V. Viswanadham ◽

Soohyun Lee ◽

Heng Li ◽

...

Keyword(s):

Transposable Element ◽

Structure And Function ◽

Endogenous Retroviruses ◽

Whole Genome Sequencing Data ◽

Whole Genome ◽

Sequencing Data ◽

Short Read ◽

Sequencing Technologies ◽

Long Read ◽

And Function

AbstractTransposable elements (TEs) help shape the structure and function of the human genome. When inserted into some locations, TEs may disrupt gene regulation and cause diseases. Here, we present xTea (x-Transposable element analyzer), a tool for identifying TE insertions in whole-genome sequencing data. Whereas existing methods are mostly designed for short-read data, xTea can be applied to both short-read and long-read data. Our analysis shows that xTea outperforms other short read-based methods for both germline and somatic TE insertion discovery. With long-read data, we created a catalogue of polymorphic insertions with full assembly and annotation of insertional sequences for various types of retroelements, including pseudogenes and endogenous retroviruses. Notably, we find that individual genomes have an average of nine groups of full-length L1s in centromeres, suggesting that centromeres and other highly repetitive regions such as telomeres are a significant yet unexplored source of active L1s. xTea is available at https://github.com/parklab/xTea.

Download Full-text

Hapo-G, haplotype-aware polishing of genome assemblies with accurate reads

NAR Genomics and Bioinformatics ◽

10.1093/nargab/lqab034 ◽

2021 ◽

Vol 3 (2) ◽

Author(s):

Jean-Marc Aury ◽

Benjamin Istace

Keyword(s):

Single Molecule ◽

Direct Consequence ◽

High Quality ◽

Sequencing Errors ◽

Coding Regions ◽

Sequencing Technologies ◽

Long Reads ◽

Oxford Nanopore ◽

Long Read ◽

Genome Assemblies

Abstract Single-molecule sequencing technologies have recently been commercialized by Pacific Biosciences and Oxford Nanopore with the promise of sequencing long DNA fragments (kilobases to megabases order) and then, using efficient algorithms, provide high quality assemblies in terms of contiguity and completeness of repetitive regions. However, the error rate of long-read technologies is higher than that of short-read technologies. This has a direct consequence on the base quality of genome assemblies, particularly in coding regions where sequencing errors can disrupt the coding frame of genes. In the case of diploid genomes, the consensus of a given gene can be a mixture between the two haplotypes and can lead to premature stop codons. Several methods have been developed to polish genome assemblies using short reads and generally, they inspect the nucleotide one by one, and provide a correction for each nucleotide of the input assembly. As a result, these algorithms are not able to properly process diploid genomes and they typically switch from one haplotype to another. Herein we proposed Hapo-G (Haplotype-Aware Polishing Of Genomes), a new algorithm capable of incorporating phasing information from high-quality reads (short or long-reads) to polish genome assemblies and in particular assemblies of diploid and heterozygous genomes.

Download Full-text