scholarly journals MetaFusion: A high-confidence metacaller for filtering and prioritizing RNA-seq gene fusion candidates

2020 ◽  
Author(s):  
Michael Apostolides ◽  
Yue Jiang ◽  
Mia Husić ◽  
Robert Siddaway ◽  
Cynthia Hawkins ◽  
...  

AbstractMotivationGene fusions are often associated with cancer, yet current fusion detection tools vary in their calling approaches, making selecting the right tool challenging. Ensemble fusion calling techniques appear promising; however, current options have limited accessibility and function.ResultsMetaFusion is a flexible meta-calling tool that amalgamates the outputs from any number of fusion callers. Results from individual callers are converted into Common Fusion Format, a new file type that standardizes outputs from callers. Calls are then annotated, merged using graph clustering, filtered and ranked to provide a final output of high confidence candidates. MetaFusion consistently outperformed individual callers with respect to recall and precision on real and simulated datasets, achieving up to 100% precision. Thus, an ensemble calling approach is imperative for high confidence results. MetaFusion also labels fusions found in databases using the FusionAnnotator package, and is provided with a benchmarking toolkit to calibrate new callers.AvailabilityMetaFusion is freely available at https://github.com/ccmbioinfo/[email protected]

BMC Genomics ◽  
2020 ◽  
Vol 21 (S11) ◽  
Author(s):  
Qian Liu ◽  
Yu Hu ◽  
Andres Stucky ◽  
Li Fang ◽  
Jiang F. Zhong ◽  
...  

Abstract Background Long-read RNA-Seq techniques can generate reads that encompass a large proportion or the entire mRNA/cDNA molecules, so they are expected to address inherited limitations of short-read RNA-Seq techniques that typically generate < 150 bp reads. However, there is a general lack of software tools for gene fusion detection from long-read RNA-seq data, which takes into account the high basecalling error rates and the presence of alignment errors. Results In this study, we developed a fast computational tool, LongGF, to efficiently detect candidate gene fusions from long-read RNA-seq data, including cDNA sequencing data and direct mRNA sequencing data. We evaluated LongGF on tens of simulated long-read RNA-seq datasets, and demonstrated its superior performance in gene fusion detection. We also tested LongGF on a Nanopore direct mRNA sequencing dataset and a PacBio sequencing dataset generated on a mixture of 10 cancer cell lines, and found that LongGF achieved better performance to detect known gene fusions over existing computational tools. Furthermore, we tested LongGF on a Nanopore cDNA sequencing dataset on acute myeloid leukemia, and pinpointed the exact location of a translocation (previously known in cytogenetic resolution) in base resolution, which was further validated by Sanger sequencing. Conclusions In summary, LongGF will greatly facilitate the discovery of candidate gene fusion events from long-read RNA-Seq data, especially in cancer samples. LongGF is implemented in C++ and is available at https://github.com/WGLab/LongGF.


Blood ◽  
2018 ◽  
Vol 132 (Supplement 1) ◽  
pp. 1898-1898
Author(s):  
Steven M. Foltz ◽  
Qingsong Gao ◽  
Christopher J. Yoon ◽  
Amila Weerasinghe ◽  
Hua Sun ◽  
...  

Abstract Introduction: Gene fusions are the result of genomic rearrangements that create hybrid protein products or bring the regulatory elements of one gene into close proximity of another. Fusions often dysregulate gene function or expression through oncogene overexpression or tumor suppressor underexpression (Gao, Liang, Foltz, et al. Cell Rep 2018). Some fusions such as EML4--ALK in lung adenocarcinoma are known druggable targets. Fusion detection algorithms utilize discordantly mapped RNA-seq reads. Careful consideration of detection and filtering procedures is vital for large-scale fusion detection because current methods are prone to reporting false positives and show poor concordance. Multiple myeloma (MM) is a blood cancer in which rapidly expanding clones of plasma cells spread in the bone marrow. Translocations that juxtapose the highly-expressed IGH enhancer with potential oncogenes are associated with overexpression of partner genes, although they may not lead to a detectable gene fusion in RNA-seq data. Previous studies have explored the fusion landscape of multiple myeloma cohorts (Cleynen, et al. Nat Comm 2017; Nasser, et al. Blood 2017). In this study, we developed a novel gene fusion detection pipeline and post-processing strategy to analyze 742 patient samples at the primary time point and 64 samples at follow-up time points (806 total samples) from the Multiple Myeloma Research Foundation (MMRF) CoMMpass Study using RNA-seq, WGS, and clinical data. Methods and Results: We overlapped five fusion detection algorithms (EricScript, FusionCatcher, INTEGRATE, PRADA, and STAR-Fusion) to report fusion events. Our filtered call set consisted of 2,817 fusions with a median of 3 fusions per sample (mean 3.8), similar to glioblastoma, breast, ovarian, and prostate cancers in TCGA. Major recurrent fusions involving immunoglobulin genes included IGH--WHSC1 (88 primary samples), IGL--BMI1 (29), and the upstream neighbor of MYC, PVT1, paired with IGH (6), IGK (3), and IGL (11). For each event, we used WGS data when available to determine if there was genomic support of the gene fusion (based on discordant WGS reads, SV event detection, and MMRF CoMMpass Seq-FISH WGS results) (Miller, et al. Blood 2016). WGS validation rates varied by the level of RNA-seq evidence supporting each fusion, with an overall rate of 24.1%, which is comparable to previously observed pan-cancer validation rates using low-pass WGS. We calculated the association between fusion status and gene expression and identified genes such as BCL2L11, CCND1/2, LTBR, and TXNDC5 that showed significant overexpression (t-test). We explored the clinical connections of fusion events through survival analysis and clinical data correlations, and by mining potentially druggable targets from our Database of Evidence for Precision Oncology (dinglab.wustl.edu/depo) (Sun, Mashl, Sengupta, et al. Bioinformatics 2018). Major examples of upregulated fusion kinases that could potentially be targeted with off-label drug use include FGFR3 and NTRK1. We examined the evolution of fusion events over multiple time points. In one MMRF patient with a t(8;14) translocation joining the IGH locus and transcription factor MAFA, we observed IGH fusions with TOP1MT (neighbor of MAFA) at all four time points with corresponding high expression of TOP1MT and MAFA. Using non-MMRF single-cell RNA data from different patients, we were able to track cell-type composition over time as well as detect subpopulations of cells harboring fusions at different time points with potential treatment implications. Discussion: Gene fusions offer potential targets for alternative MM therapies. Careful implementation of gene fusion detection algorithms and post-processing are essential in large cohort studies to reduce false positives and enrich results for clinically relevant information. Clinical fusion detection from untargeted RNA-seq remains a challenge due to poor sensitivity, specificity, and usability. By combining MMRF CoMMpass data from multiple platforms, we have produced a comprehensive fusion profile of 742 MM patients. We have shown novel gene fusion associations with gene expression and clinical data, and we identified candidates for druggability studies. Disclosures Vij: Bristol-Myers Squibb: Honoraria, Membership on an entity's Board of Directors or advisory committees, Research Funding; Celgene: Honoraria, Membership on an entity's Board of Directors or advisory committees, Research Funding; Jazz Pharmaceuticals: Honoraria, Membership on an entity's Board of Directors or advisory committees; Jansson: Honoraria, Membership on an entity's Board of Directors or advisory committees; Amgen: Honoraria, Membership on an entity's Board of Directors or advisory committees; Karyopharma: Honoraria, Membership on an entity's Board of Directors or advisory committees; Takeda: Honoraria, Membership on an entity's Board of Directors or advisory committees, Research Funding.


2013 ◽  
Vol 31 (15_suppl) ◽  
pp. e21523-e21523
Author(s):  
Milena Urbini ◽  
Annalisa Astolfi ◽  
Valentina Indio ◽  
Maristella Saponara ◽  
Margherita Nannini ◽  
...  

e21523 Background: A subset of KIT/PDGFRA wild-type GIST (WT) harbour mutations in SDH units. In the majority of the remaining cases of WT GIST no other molecular events are identified.We performed a RNA-seq in a WT GIST without mutations in SDH genes using next generation approach to discover molecular events in this GIST population. Methods: In 2003, a 63-year old woman underwent surgery for an ileal GIST (size 6 cm, MI 6/50HPF).After 6 years, she developed a recurrence with a single hepatic lesion. The KIT and PDGFRA analysis of the lesion did not show mutations. Therefore, she did not receive imatinib but she underwent a surgical removal. The analysis of all SDH units did not show mutations. So paired-end RNA-seq (75X2) was performed with Illumina HiScanSQ platform. After mapping the short reads on the human genome(HG19), SNVs and InDels were called by SNVMix2 with an accurate filtering procedures including predictors of mutations effect at protein level. Gene fusions discovery was done considering the agreement between DeFuse, ChimeraScan and FusionMap tools and validated by SangerSequencing using primers spanning the mRNA breakpoints. Results: Four different gene fusions and 206 non-synonymous SNVs were discovered, of which 62 were called deleterious by at least one predictor, and they are undergoing further validation. SPRED2-NELFCD gene fusion originated from an interchromosomal translocation-inversion between chr 20 and 2. The event involved exon1 of SPRED2 and exon11 of NELFCD, probably leading to inactivation of both genes. NELFCD encodes a component of the NELF complex that negatively regulates transcription elongation by RNA pol II, while SPRED2 is a member of the Sprouty /SPRED family that repress growth factor-induced activation of the MAPK/ERK pathway. The other three events were intrachromosomal aberrations: MARK2-PPFIA1 and PLA2G16-ATL3 on chr 11 and ASCC1-C10orf11 on chr 10. Only the first event led to an in-frame fusion (MARK2 ex1- PPFIA1 ex2) probably dysregulating the expression of the downstream gene. Conclusions: This is the first evidence of gene fusions in GIST. The oncogenetic role and the tumor frequency of these events deserve to be studied.


2019 ◽  
Author(s):  
Krutika S. Gaonkar ◽  
Federico Marini ◽  
Komal S. Rathi ◽  
Payal Jain ◽  
Yuankun Zhu ◽  
...  

AbstractBackgroundGene fusion events are a significant source of somatic variation across adult and pediatric cancers and are some of the most clinically-effective therapeutic targets, yet low consensus of RNA-Seq fusion prediction algorithms makes therapeutic prioritization difficult. In addition, events such as polymerase read-throughs, mis-mapping due to gene homology, and fusions occurring in healthy normal tissue require informed filtering, making it difficult for researchers and clinicians to rapidly discern gene fusions that might be true underlying oncogenic drivers of a tumor and in some cases, appropriate targets for therapy.ResultsWe developed annoFuse, an R package, and shinyFuse, a companion web application, to annotate, prioritize, and explore biologically-relevant expressed gene fusions, downstream of fusion calling. We validated annoFuse using a random cohort of TCGA RNA-Seq samples (N = 160) and achieved a 96% sensitivity for retention of high-confidence fusions (N = 603). annoFuse uses FusionAnnotator annotations to filter non-oncogenic and/or artifactual fusions. Then, fusions are prioritized if previously reported in TCGA and/or fusions containing gene partners that are known oncogenes, tumor suppressor genes, COSMIC genes, and/or transcription factors. We applied annoFuse to fusion calls from pediatric brain tumor RNA-Seq samples (N = 1,028) provided as part of the Open Pediatric Brain Tumor Atlas (OpenPBTA) Project to determine recurrent fusions and recurrently-fused genes within different brain tumor histologies. annoFuse annotates protein domains using the PFAM database, assesses reciprocality, and annotates gene partners for kinase domain retention. As a standard function, reportFuse enables generation of a reproducible R Markdown report to summarize filtered fusions, visualize breakpoints and protein domains by transcript, and plot recurrent fusions within cohorts. Finally, we created shinyFuse for algorithm-agnostic interactive exploration and plotting of gene fusions.ConclusionsannoFuse provides standardized filtering and annotation for gene fusion calls from STARFusion and Arriba by merging, filtering, and prioritizing putative oncogenic fusions across large cancer datasets, as demonstrated here with data from the OpenPBTA project. We are expanding the package to be widely-applicable to other fusion algorithms and expect annoFuse to provide researchers a method for rapidly evaluating, prioritizing, and translating fusion findings in patient tumors.


2016 ◽  
Author(s):  
Chengpei Zhu ◽  
Yanling Lv ◽  
Liangcai Wu ◽  
Jinxia Guan ◽  
Xue Bai ◽  
...  

AbstractMost hepatocellular carcinoma (HCC) patients are diagnosed at advanced stages and suffer limited treatment options. Challenges in early stage diagnosis may be due to the genetic complexity of HCC. Gene fusion plays a critical function in tumorigenesis and cancer progression in multiple cancers, yet the identities of fusion genes as potential diagnostic markers in HCC have not been investigated.Paired-end RNA sequencing was performed on noncancerous and cancerous lesions in two representative HBV-HCC patients. Potential fusion genes were identified by STAR-Fusion in STAR software and validated by four publicly available RNA-seq datasets. Fourteen pairs of frozen HBV-related HCC samples and adjacent non-tumor liver tissues were examined by RT-PCR analysis for gene fusion expression.We identified 2,354 different gene fusions in the two HBV-HCC patients. Validation analysis against the four RNA-seq datasets revealed only 1.8% (43/2,354) as recurrent fusions that were supported by public datasets. Comparison with four fusion databases demonstrated that three (HLA-DPB2-HLA-DRB1, CDH23-HLA-DPB1, and C15orf57-CBX3) out of 43 recurrent gene fusions were annotated as disease-related fusion events. Nineteen were novel recurrent fusions not previously annotated to diseases, including DCUN1D3-GSG1L and SERPINA5-SERPINA9. RT-PCR and Sanger sequencing of 14 pairs of HBV-related HCC samples confirmed expression of six of the new fusions, including RP11-476K15.1-CTD-2015H3.2.Our study provides new insights into gene fusions in HCC and could contribute to the development of anti-HCC therapy. RP11–476K15.1-CTD–2015H3.2 may serve as a new therapeutic biomarker in HCC.


2021 ◽  
Author(s):  
Hamid Reza Mohebbi ◽  
Nurit Haspel

Gene fusions events, which are the result of two genes fused together to create a hybrid gene, were first described in cancer cells in the early 1980s. These events are relatively common in many cancers including prostate, lymphoid, soft tissue, and breast. Recent advances in next-generation sequencing (NGS) provide a high volume of genomic data, including cancer genomes. The detection of possible gene fusions requires fast and accurate methods. However, current methods suffer from inefficiency, lack of sufficient accuracy, and a high false-positive rate. We present an RNA-Seq fusion detection method that uses dimensionality reduction and parallel computing to speed up the computation. We convert the RNA categorical space into a compact binary array called binary fingerprints, which enables us to reduce the memory usage and increase efficiency. The search and detection of fusion candidates are done using the Jaccard distance. The detection of candidates is followed by refinement. We benchmarked our fusion prediction accuracy using both simulated and genuine RNA-Seq datasets. Paired-end Illumina RNA-Seq genuine data were obtained from 60 publicly available cancer cell line data sets. The results are compared against the state-of-the-art-methods such as STAR-Fusion, InFusion, and TopHat-Fusion. Our results show that FDJD exhibits superior accuracy compared to popular alternative fusion detection methods. We achieved 90% accuracy on simulated fusion transcript inputs, which is the highest among the compared methods while maintaining comparable run time.


2017 ◽  
Author(s):  
Páll Melsted ◽  
Shannon Hateley ◽  
Isaac Charles Joseph ◽  
Harold Pimentel ◽  
Nicolas Bray ◽  
...  

RNA sequencing in cancer cells is a powerful technique to detect chromosomal rearrangements, allowing for de novo discovery of actively expressed fusion genes. Here we focus on the problem of detecting gene fusions from raw sequencing data, assembling the reads to define fusion transcripts and their associated breakpoints, and quantifying their abundances. Building on the pseudoalignment idea that simplifies and accelerates transcript quantification, we introduce a novel approach to fusion detection based on inspecting paired reads that cannot be pseudoaligned due to conflicting matches. The method and software, called pizzly, filters false positives, assembles new transcripts from the fusion reads, and reports candidate fusions. With pizzly, fusion detection from raw RNA-Seq reads can be performed in a matter of minutes, making the program suitable for the analysis of large cancer gene expression databases and for clinical use. pizzly is available at https://github.com/pmelsted/pizzly


2021 ◽  
Vol 39 (15_suppl) ◽  
pp. 2042-2042
Author(s):  
Ainhoa Hernandez Gonzalez ◽  
Anna Esteve-Codina ◽  
Cristina Carrato ◽  
Ana Munoz ◽  
Estela Pineda ◽  
...  

2042 Background: Malignant gliomas are heterogeneous diseases in genetic basis. The development of sequencing techniques, such as RNA-Sequencing, has identified many gene rearrangements encoding novel oncogenic fusions. Gene fusion discovery can potentially lead to the development of novel treatments, however studies of gene fusions in glioma remain limited. Methods: The GLIOCAT project studied 139 patient samples of newly diagnosed glioblastoma who had received the standard first-line treatment from 2004 to 2015, to identify gene fusion events from glioblastoma transcriptome data (RNA-Seq). The molecular subtype could be studied in 124 cases. RNA-Seq reads were mapped against the reference human genome with STAR-fusion version 0.7.0, specifically, with FusionInspector validate ( http://star-fusion.github.io ). Two other platforms, FusionHub ( https://fusionhub.persistent.co.in ) and Oncofuse ( www.unav.es/genetica/oncofuse.html ), were applied to eliminate false positives or previously described in healthy tissue and to predict of the oncogenic potential each fusion. Results: A total of61 patients showed 103 different fusions, a median of two fusions by sample. The majority of gene fusions were intrachromosomal and most frequently implied chromosome was 12 followed by 7. In addition, fusions were more common in patients with MGMT promoter methylation, TCGA classical subtype and 18 IGS subtype. There were no differences in age, sex, type of surgery or long survivors ( > 30 months). Ten fusions were already described in cancer, including three in gliomas (FRS2-KIF5A, EGFR-SEPT14 and FGFR3-TACC3). From the detected fusions, 22 of them included an oncogene or protooncogene. Conclusions: In our study, we report the landscape of gene fusions from a large data set of glioblastomas analyzed by RNA-seq. The majority of the fusions were private fusions. A minority of these recur in a low frequency but as many as a quarter of them included an oncogene or protooncogene. RNA-seq of GBM patient samples it is an important tool for the identification of patient-specific fusions that could drive personalized therapy. Furtherless, we will plan to validate this gene fusions.


2013 ◽  
Vol 14 (1) ◽  
pp. 193 ◽  
Author(s):  
Chenglin Liu ◽  
Jinwen Ma ◽  
ChungChe Chang ◽  
Xiaobo Zhou

Blood ◽  
2019 ◽  
Vol 134 (Supplement_1) ◽  
pp. 4655-4655
Author(s):  
Paul Kerbs ◽  
Aarif Mohamed Nazeer Batcha ◽  
Sebastian Vosberg ◽  
Dirk Metzler ◽  
Tobias Herold ◽  
...  

Accurate and complete genetic classification of AML is crucial for the prediction of clinical outcome and treatment stratification. Deciphering the spectrum of genetic abnormalities by polymerase chain reaction (PCR), karyotyping and fluorescence in situ hybridization (FISH) in routine diagnostics is the current gold standard, however, fusion genes might potentially be missed by these assays. Recently, several methods have been developed to improve the detection of gene fusion transcripts based on RNA sequencing data, providing robust results. To test the detection power and assess the applicability of RNA-Seq based methods in clinical diagnostics we applied two different algorithms, namely FusionCatcher (Nicorici D et al., bioRxiv, 2014) and Arriba (Uhrig S et al., DKFZ, https://github.com/suhrig/arriba), to the transcriptomes of 895 well-characterized AML samples from three independently sequenced cohorts: AMLCG (Herold T et al., Haematologica, 2018, n=261), DKTK (Greif PA et al., Clin Cancer Res, 2018 and unpublished data, n=166), BeatAML (Tyner JW et al., Nature 2018, n=468) and publicly available healthy control samples (SRA studies: SRP018028, SRP047126, SRP050146, SRP105369, SRP115911, SRP133442, n=38). According to karyotyping, 31% (277/895) of samples harbored chromosomal aberrations putatively causing gene fusions (i.e. translocations, interstitial deletions, duplications, inversions, insertions). Analyses by FISH and/or PCR confirmed these rearrangements in 51.3% (142/277) of samples, whereas fusion detection by the means of RNA-Seq showed evidence for fusion genes corresponding to these rearrangements in 60.3% (167/277) of samples. Chromosomal aberrations, identified by karyotyping, which are known to result in clinically relevant fusions (e.g. RUNX1-RUNX1T1, KMT2A fusions) were confirmed by FISH/PCR (AMLCG: n=27/27, DKTK: n=21/21, BeatAML: n=54/57) and RNA-Seq based methods (AMLCG: n=17/27, DKTK: n=21/21, BeatAML: n=56/57) in most of the cases. Of note, the AMLCG cohort was sequenced using the SENSE mRNA Library Prep Kit from Lexogen which seems to be not optimal for fusion detection. Furthermore, 19 samples (AMLCG: n=12, DKTK: n=4, BeatAML: n=3) were found to harbor known pathogenic fusions, described in previous studies, which were not reported by routine diagnostics: NUP98-NSD1 (n=11); CBFB-MYH11, RUNX1-RUNX1T1 and DEK-NUP214 (n=2 each); RUNX1-CBFA2T2 and RUNX1-CBFA2T3 (n=1 each). Reanalysis of six of these samples by PCR confirmed three fusions which were initially missed by routine diagnostics. In general, the amount of reported fusion events by RNA-Seq is high (on average 69 and 39 per sample as detected by FusionCatcher and Arriba respectively), even after applying the built-in filters, indicating a high false positive rate. To robustly identify putative novel fusions, we developed a filtering pipeline and incorporated two new filtering steps. The promiscuity score (PS) of a fusion measures the amount of further distinct fusion partners which were detected in the respective cohort for the 5' and 3' gene. The fusion transcript score (FTS) measures the relative abundance of a fusion transcript to its 5' and 3' partner gene. PS and FTS of known, clinically relevant fusions confirmed by FISH/PCR were used to define cut-offs. To further maximize specificity while maintaining sensitivity, we excluded fusion events which we detected in publicly available healthy samples and subsequently filtered for overlapping calls from FusionCatcher and Arriba (Fig. 1A). Additionally, we obtained further evidence for a fusion event by an elevated transcription of the 3' fusion partner. In case of a fusion event, the transcription of the 3' partner gene likely gets under the control of the promoter of the 5' partner gene. This results in an elevated transcription of genes which are otherwise transcribed at low levels (Fig. 1B-C). Thus, we identified five putatively novel recurrent fusion genes which were detected in two cohorts independently: NRIP1-MIR99AHG, LATS2-ZMYM2, ATP11A-ING1, MBP-SLC66A2, PRDM16-SKI (Fig. 1D-F). Although these events were called with high evidence, we aim at independent validation by complementary methods. In our study, we have not only demonstrated that the application of RNA-Seq to the detection of fusion genes is a valuable complement to diagnostic routine but also has the potential to discover novel putatively pathogenic fusions. Disclosures No relevant conflicts of interest to declare.


Sign in / Sign up

Export Citation Format

Share Document