Pan-cancer repository of validated natural and cryptic mRNA splicing mutations

We present a major public resource of mRNA splicing mutations validated according to multiple lines of evidence of abnormal gene expression. Likely mutations present in all tumor types reported in the Cancer Genome Atlas (TCGA) were identified based on the comparative strengths of splice sites in tumor versus normal genomes, and then validated by respectively comparing counts of splice junction spanning and abundance of transcript reads in RNA-Seq data from matched tissues and tumors lacking these mutations. The comprehensive resource features 351,423 of these validated mutations, the majority of which (69.1%) are not present in the Single Nucleotide Polymorphism Database (dbSNP 150). There are 117,951 unique mutations which weaken or abolish natural splice sites, and 244,415 mutations which strengthen cryptic splice sites (10,943 affect both simultaneously). 27,803 novel or rare flagged variants (with <1% population frequency in dbSNP) were observed in multiple tumor tissue types. Single variants or chromosome ranges can be queried using a Global Alliance for Genomics and Health (GA4GH)-compliant, web-based Beacon “Validated Splicing Mutations” either separately or in aggregate alongside other Beacons through the public Beacon Network (http://www.beacon-network.org/#/search?beacon=cytognomix), as well as through our website (https://validsplicemut.cytognomix.com/).

Download Full-text

Pan-Cancer Repository of Validated Natural and Cryptic mRNA Splicing Mutations

10.1101/474452 ◽

2018 ◽

Cited By ~ 3

Author(s):

Ben C. Shirley ◽

Eliseos J. Mucaki ◽

Peter K. Rogan

Keyword(s):

Splice Junction ◽

Mrna Splicing ◽

The Cancer Genome Atlas ◽

Splice Sites ◽

Global Alliance ◽

Multiple Tumor ◽

Link Type ◽

Splicing Mutations ◽

Public Resource ◽

Single Nucleotide Polymorphism Database

AbstractWe present a major public resource of mRNA splicing mutations validated according to multiple lines of evidence of abnormal gene expression. Likely mutations present in all tumor types reported in the Cancer Genome Atlas (TCGA) were identified based on the comparative strengths of splice sites in tumor versus normal genomes and then validated by respectively comparing counts of splice junction spanning and abundance of transcript reads in RNA-Seq data from matched tissues and tumors lacking these mutations. The comprehensive resource features 351,423 of these validated mutations, the majority of which (69.1%) are not featured in the Single Nucleotide Polymorphism Database (dbSNP 150). There are 117,951 unique mutations which weaken or abolish natural splice sites, and 244,415 mutations which strengthen cryptic splice sites (10,943 affect both simultaneously). 27,803 novel or rare flagged variants (with <1% population frequency in dbSNP) were observed in multiple tumor tissue types. Single variants or chromosome ranges can be queried using a Global Alliance for Genomics and Health (GA4GH)-compliant web Beacon, Validated Splicing Mutations, either separately or in aggregate alongside other beacons through the public Beacon Network (http://www.beacon-network.org/#/search?beacon=cytognomix), as well as through our website (https://validsplicemut.cytognomix.com/).

Download Full-text

Pan-cancer repository of validated natural and cryptic mRNA splicing mutations

F1000Research ◽

10.12688/f1000research.17204.3 ◽

2019 ◽

Vol 7 ◽

pp. 1908 ◽

Cited By ~ 3

Author(s):

Ben C. Shirley ◽

Eliseos J. Mucaki ◽

Peter K. Rogan

Keyword(s):

Gene Expression ◽

Cancer Genome ◽

Mrna Splicing ◽

The Cancer Genome Atlas ◽

Splice Sites ◽

Global Alliance ◽

Multiple Tumor ◽

Link Type ◽

Splicing Mutations ◽

Molecular Phenotypes

We present a major public resource of mRNA splicing mutations validated according to multiple lines of evidence of abnormal gene expression. Likely mutations present in all tumor types reported in the Cancer Genome Atlas (TCGA) and the International Cancer Genome Consortium (ICGC) were identified based on the comparative strengths of splice sites in tumor versus normal genomes, and then validated by respectively comparing counts of splice junction spanning and abundance of transcript reads in RNA-Seq data from matched tissues and tumors lacking these mutations. The comprehensive resource features 341,486 of these validated mutations, the majority of which (69.9%) are not present in the Single Nucleotide Polymorphism Database (dbSNP 150). There are 131,347 unique mutations which weaken or abolish natural splice sites, and 222,071 mutations which strengthen cryptic splice sites (11,932 affect both simultaneously). 28,812 novel or rare flagged variants (with <1% population frequency in dbSNP) were observed in multiple tumor tissue types. An algorithm was developed to classify variants into splicing molecular phenotypes that integrates germline heterozygosity, degree of information change and impact on expression. The classification thresholds were calibrated against the ClinVar clinical database phenotypic assignments. Variants are partitioned into allele-specific alternative splicing, likely aberrant and aberrant splicing phenotypes. Single variants or chromosome ranges can be queried using a Global Alliance for Genomics and Health (GA4GH)-compliant, web-based Beacon “Validated Splicing Mutations” either separately or in aggregate alongside other Beacons through the public Beacon Network, as well as through our website. The website provides additional information, such as a visual representation of supporting RNAseq results, gene expression in the corresponding normal tissues, and splicing molecular phenotypes.

Download Full-text

Interpretation of mRNA splicing mutations in genetic disease: review of the literature and guidelines for information-theoretical analysis

F1000Research ◽

10.12688/f1000research.5654.1 ◽

2014 ◽

Vol 3 ◽

pp. 282 ◽

Cited By ~ 47

Author(s):

Natasha G. Caminsky ◽

Eliseos J. Mucaki ◽

Peter K. Rogan

Keyword(s):

Splice Site ◽

Genetic Disease ◽

Mrna Splicing ◽

Regulatory Sequences ◽

Splice Sites ◽

Review Of The Literature ◽

Cryptic Splice Site ◽

Broad Perspective ◽

Genomic Variants ◽

Splicing Mutations

The interpretation of genomic variants has become one of the paramount challenges in the post-genome sequencing era. In this review we summarize nearly 20 years of research on the applications of information theory (IT) to interpret coding and non-coding mutations that alter mRNA splicing in rare and common diseases. We compile and summarize the spectrum of published variants analyzed by IT, to provide a broad perspective of the distribution of deleterious natural and cryptic splice site variants detected, as well as those affecting splicing regulatory sequences. Results for natural splice site mutations can be interrogated dynamically with Splicing Mutation Calculator, a companion software program that computes changes in information content for any splice site substitution, linked to corresponding publications containing these mutations. The accuracy of IT-based analysis was assessed in the context of experimentally validated mutations. Because splice site information quantifies binding affinity, IT-based analyses can discern the differences between variants that account for the observed reduced (leaky) versus abolished mRNA splicing. We extend this principle by comparing predicted mutations in natural, cryptic, and regulatory splice sites with observed deleterious phenotypic and benign effects. Our analysis of 1727 variants revealed a number of general principles useful for ensuring portability of these analyses and accurate input and interpretation of mutations. We offer guidelines for optimal use of IT software for interpretation of mRNA splicing mutations.

Download Full-text

TRGAted: A web tool for survival analysis using protein data in the Cancer Genome Atlas.

F1000Research ◽

10.12688/f1000research.15789.2 ◽

2018 ◽

Vol 7 ◽

pp. 1235 ◽

Cited By ~ 2

Author(s):

Nicholas Borcherding ◽

Nicholas L. Bormann ◽

Andrew P. Voigt ◽

Weizhou Zhang

Keyword(s):

Tumor Stage ◽

Cancer Genome ◽

Protein Quantification ◽

The Cancer Genome Atlas ◽

Dot Blot ◽

Link Type ◽

Free Interval ◽

Level Data ◽

Cancer Genome Atlas ◽

Genome Atlas

Reverse-phase protein arrays (RPPAs) are a highthroughput approach to protein quantification utilizing antibody-based micro-to-nano scale dot blot. Within the Cancer Genome Atlas (TCGA), RPPAs were used to quantify over 200 proteins in 8,167 tumor and metastatic samples. Protein-level data has particular advantages in assessing putative prognostic or therapeutic targets in tumors. However, many of the available pipelines do not allow for the partitioning of clinical and RPPA information to make meaningful conclusions. We developed a cloud-based application, TRGAted to enable researchers to better examine patient survival based on single or multiple proteins across 31 cancer types in the TCGA. TRGAted contains up-to-date overall survival, disease-specific survival, disease-free interval and progression-free interval information. Furthermore, survival information for primary tumor samples can be stratified based on gender, age, tumor stage, histological type, and subtype, allowing for highly adaptive and intuitive user experience. The code and processed data are open sourced and available on github and contains a tutorial built into the application for assisting users.

Download Full-text

The Tangent copy-number inference pipeline for cancer genome analyses

10.1101/566505 ◽

2019 ◽

Cited By ~ 3

Author(s):

Barbara Tabak ◽

Gordon Saksena ◽

Coyin Oh ◽

Galen F. Gao ◽

Barbara Hill Meyers ◽

...

Keyword(s):

Dna Sequencing ◽

Copy Number ◽

Signal To Noise Ratio ◽

Snp Array ◽

Cancer Genome ◽

The Cancer Genome Atlas ◽

Sequencing Data ◽

Array Data ◽

Link Type ◽

Genome Analyses

AbstractMotivationSomatic copy-number alterations (SCNAs) play an important role in cancer development. Systematic noise in sequencing and array data present a significant challenge to the inference of SCNAs for cancer genome analyses. As part of The Cancer Genome Atlas (TCGA), the Broad Institute Genome Characterization Center developed the Tangent copy-number inference pipeline to generate copy-number profiles using single-nucleotide polymorphism (SNP) array and whole-exome sequencing (WES) data from over 10,000 pairs of tumors and matched normal samples. Here, we describe the Tangent pipeline, which begins with DNA sequencing data in the form of .bam files or raw SNP array probe-level intensity data, and ends with segmented copy-number calls to facilitate the identification of novel genes potentially targeted by SCNAs. We also describe a modification of Tangent, Pseudo-Tangent, which enables denoising through comparisons between tumor profiles when few normal samples are available.ResultsTangent Normalization offers substantial signal-to-noise ratio (SNR) improvements compared to conventional normalization methods in both SNP array and WES analyses. The improvement in SNRs is achieved primarily through noise reduction with minimal effect on signal. Pseudo-Tangent also reduces noise when few normal samples are available. Tangent and Pseudo-Tangent are broadly applicable and enable more accurate inference of SCNAs from DNA sequencing and array data.Availability and ImplementationTangent is available at https://github.com/coyin/tangent and as a Docker image (https://hub.docker.com/r/coyin/tangent). Tangent is also the normalization method for the Copy Number pipeline in Genome Analysis Toolkit 4 (GATK4)[email protected], [email protected], [email protected]

Download Full-text

TRGAted: A web tool for survival analysis using protein data in the Cancer Genome Atlas.

F1000Research ◽

10.12688/f1000research.15789.1 ◽

2018 ◽

Vol 7 ◽

pp. 1235 ◽

Cited By ~ 4

Author(s):

Nicholas Borcherding ◽

Nicholas L. Bormann ◽

Andrew P. Voigt ◽

Weizhou Zhang

Keyword(s):

Tumor Stage ◽

Cancer Genome ◽

Protein Quantification ◽

The Cancer Genome Atlas ◽

Dot Blot ◽

Link Type ◽

Free Interval ◽

Level Data ◽

Cancer Genome Atlas ◽

Genome Atlas

Reverse-phase protein arrays (RPPAs) are a highthroughput approach to protein quantification utilizing an antibody-based micro-to-nano scale dot blot. Within the Cancer Genome Atlas (TCGA), RPPAs were used to quantify over 200 proteins in 8,167 tumor or metastatic samples. This protein-level data has particular advantages in assessing putative prognostic or therapeutic targets in tumors. However, many of the available pipelines do not allow for the partitioning of clinical and RPPA information to make meaningful conclusions. We developed a cloud-based application, TRGAted to enable researchers to better examine survival based on single or multiple proteins across 31 cancer types in the TCGA. TRGAted contains up-to-date overall survival, disease-specific survival, disease-free interval and progression-free interval information. Furthermore, survival information for primary tumor samples can be stratified based on gender, age, tumor stage, histological type, and subtype, allowing for highly adaptive and intuitive user experience. The code and processed data is open sourced and available on github and with a tutorial built into the application for assisting users.

Download Full-text

Interpretation of mRNA splicing mutations in genetic disease: review of the literature and guidelines for information-theoretical analysis

F1000Research ◽

10.12688/f1000research.5654.2 ◽

2015 ◽

Vol 3 ◽

pp. 282 ◽

Cited By ~ 5

Author(s):

Natasha G. Caminsky ◽

Eliseos J. Mucaki ◽

Peter K. Rogan

Keyword(s):

Splice Site ◽

Genetic Disease ◽

Mrna Splicing ◽

Regulatory Sequences ◽

Splice Sites ◽

Review Of The Literature ◽

Cryptic Splice Site ◽

Broad Perspective ◽

Genomic Variants ◽

Splicing Mutations

The interpretation of genomic variants has become one of the paramount challenges in the post-genome sequencing era. In this review we summarize nearly 20 years of research on the applications of information theory (IT) to interpret coding and non-coding mutations that alter mRNA splicing in rare and common diseases. We compile and summarize the spectrum of published variants analyzed by IT, to provide a broad perspective of the distribution of deleterious natural and cryptic splice site variants detected, as well as those affecting splicing regulatory sequences. Results for natural splice site mutations can be interrogated dynamically with Splicing Mutation Calculator, a companion software program that computes changes in information content for any splice site substitution, linked to corresponding publications containing these mutations. The accuracy of IT-based analysis was assessed in the context of experimentally validated mutations. Because splice site information quantifies binding affinity, IT-based analyses can discern the differences between variants that account for the observed reduced (leaky) versus abolished mRNA splicing. We extend this principle by comparing predicted mutations in natural, cryptic, and regulatory splice sites with observed deleterious phenotypic and benign effects. Our analysis of 1727 variants revealed a number of general principles useful for ensuring portability of these analyses and accurate input and interpretation of mutations. We offer guidelines for optimal use of IT software for interpretation of mRNA splicing mutations.

Download Full-text