scholarly journals miPIE: NGS-based Prediction of miRNA Using Integrated Evidence

2018 ◽  
Author(s):  
R.J. Peace ◽  
M. Sheikh Hassani ◽  
J.R. Green

AbstractMethods for the de novo identification of microRNA (miRNA) have been developed using a range of sequence-based features. With the increasing availability of next generation sequencing (NGS) transcriptome data, there is a need for miRNA identification that integrates both NGS transcript expression-based patterns as well as advanced genomic sequence-based methods. While miRDeep2 does examine the predicted secondary structure of putative miRNA sequences, it does not leverage many of the sequence-based features used in state-of-the-art de novo methods. Meanwhile, other NGS-based methods, such as miRanalyzer, place an emphasis on sequence-based features without leveraging advanced expression-based features reflecting miRNA biosynthesis. This represents an opportunity to combine the strengths of NGS-based analysis with recent advances in de novo sequence-based miRNA prediction. We here develop a method, microRNA Prediction using Integrated Evidence (miPIE), which integrates both expression-based and sequence-based features to achieve significantly improved miRNA prediction performance. Feature selection identifies the 20 most discriminative features, 3 of which reflect strictly expression-based information. Evaluation using precision-recall curves, for six NGS data sets representing six diverse species, demonstrates substantial improvements in prediction performance compared to miRDeep2 and miRanalyzer. The individual contributions of expression-based and sequence-based features are also examined and we demonstrate that their combination is more effective than either alone.

Author(s):  
Matthew L Bendall ◽  
Keylie M Gibson ◽  
Margaret C Steiner ◽  
Uzma Rentia ◽  
Marcos Pérez-Losada ◽  
...  

Abstract Deep sequencing of viral populations using next generation sequencing (NGS) offers opportunities to understand and investigate evolution, transmission dynamics, and population genetics. Currently, the standard practice for processing NGS data to study viral populations is to summarize all the observed sequences from a sample as a single consensus sequence, thus discarding valuable information about the intra-host viral molecular epidemiology. Furthermore, existing analytical pipelines may only analyze genomic regions involved in drug resistance, thus are not suited for full viral genome analysis. Here we present HAPHPIPE, a HAplotype and PHylodynamics PIPEline for genome-wide assembly of viral consensus sequences and haplotypes. The HAPHPIPE protocol includes modules for quality trimming, error correction, de novo assembly, alignment, and haplotype reconstruction. The resulting consensus sequences, haplotypes, and alignments can be further analyzed using a variety of phylogenetic and population genetic software. HAPHPIPE is designed to provide users with a single pipeline to rapidly analyze sequences from viral populations generated from NGS platforms and provide quality output properly formatted for downstream evolutionary analyses.


2020 ◽  
Vol 21 (1) ◽  
Author(s):  
Xingyu Liao ◽  
Xin Gao ◽  
Xiankai Zhang ◽  
Fang-Xiang Wu ◽  
Jianxin Wang

Abstract Background Repetitive sequences account for a large proportion of eukaryotes genomes. Identification of repetitive sequences plays a significant role in many applications, such as structural variation detection and genome assembly. Many existing de novo repeat identification pipelines or tools make use of assembly of the high-frequency k-mers to obtain repeats. However, a certain degree of sequence coverage is required for assemblers to get the desired assemblies. On the other hand, assemblers cut the reads into shorter k-mers for assembly, which may destroy the structure of the repetitive regions. For the above reasons, it is difficult to obtain complete and accurate repetitive regions in the genome by using existing tools. Results In this study, we present a new method called RepAHR for de novo repeat identification by assembly of the high-frequency reads. Firstly, RepAHR scans next-generation sequencing (NGS) reads to find the high-frequency k-mers. Secondly, RepAHR filters the high-frequency reads from whole NGS reads according to certain rules based on the high-frequency k-mer. Finally, the high-frequency reads are assembled to generate repeats by using SPAdes, which is considered as an outstanding genome assembler with NGS sequences. Conlusions We test RepAHR on five data sets, and the experimental results show that RepAHR outperforms RepARK and REPdenovo for detecting repeats in terms of N50, reference alignment ratio, coverage ratio of reference, mask ratio of Repbase and some other metrics.


Author(s):  
Dmitry A. Kovalev ◽  
Sergey V. Pisarenko ◽  
Anna Yu. Evchenko ◽  
Dmitry G. Ponomarenko ◽  
Olga V. Bobrysheva ◽  
...  

Brucellosis is one of the most pressing global zoonotic diseases, which is endemic in many regions of the world. It is believed that Brucella melitensis is the most pathogenic species of Brucella genus for humans. However, the processes underlying the pathogenicity of this pathogen remain not fully understood. In our study, we report on the first complete genome of the clinical B. melitensis strain isolated in Russia, perform structural and functional analysis of the genomic sequence, and evaluate the expression level of genes associated with virulence based on Next Generation Sequencing (NGS) data. The obtained information on the genetic similarities and differences between B. melitensis strains can be used to study the mechanisms responsible for the pathogenicity of Brucella spp., as well as in the process of developing new therapeutic and preventive strategies for controlling brucellosis.


Cells ◽  
2021 ◽  
Vol 10 (2) ◽  
pp. 416
Author(s):  
Lorena Landuzzi ◽  
Maria Cristina Manara ◽  
Pier-Luigi Lollini ◽  
Katia Scotlandi

Osteosarcoma (OS) is a rare malignant primary tumor of mesenchymal origin affecting bone. It is characterized by a complex genotype, mainly due to the high frequency of chromothripsis, which leads to multiple somatic copy number alterations and structural rearrangements. Any effort to design genome-driven therapies must therefore consider such high inter- and intra-tumor heterogeneity. Therefore, many laboratories and international networks are developing and sharing OS patient-derived xenografts (OS PDX) to broaden the availability of models that reproduce OS complex clinical heterogeneity. OS PDXs, and new cell lines derived from PDXs, faithfully preserve tumor heterogeneity, genetic, and epigenetic features and are thus valuable tools for predicting drug responses. Here, we review recent achievements concerning OS PDXs, summarizing the methods used to obtain ectopic and orthotopic xenografts and to fully characterize these models. The availability of OS PDXs across the many international PDX platforms and their possible use in PDX clinical trials are also described. We recommend the coupling of next-generation sequencing (NGS) data analysis with functional studies in OS PDXs, as well as the setup of OS PDX clinical trials and co-clinical trials, to enhance the predictive power of experimental evidence and to accelerate the clinical translation of effective genome-guided therapies for this aggressive disease.


Molecules ◽  
2018 ◽  
Vol 23 (2) ◽  
pp. 399 ◽  
Author(s):  
Sima Taheri ◽  
Thohirah Lee Abdullah ◽  
Mohd Yusop ◽  
Mohamed Hanafi ◽  
Mahbod Sahebi ◽  
...  

2017 ◽  
Vol 2 ◽  
pp. 35 ◽  
Author(s):  
Shazia Mahamdallie ◽  
Elise Ruark ◽  
Shawn Yost ◽  
Emma Ramsay ◽  
Imran Uddin ◽  
...  

Detection of deletions and duplications of whole exons (exon CNVs) is a key requirement of genetic testing. Accurate detection of this variant type has proved very challenging in targeted next-generation sequencing (NGS) data, particularly if only a single exon is involved. Many different NGS exon CNV calling methods have been developed over the last five years. Such methods are usually evaluated using simulated and/or in-house data due to a lack of publicly-available datasets with orthogonally generated results. This hinders tool comparisons, transparency and reproducibility. To provide a community resource for assessment of exon CNV calling methods in targeted NGS data, we here present the ICR96 exon CNV validation series. The dataset includes high-quality sequencing data from a targeted NGS assay (the TruSight Cancer Panel) together with Multiplex Ligation-dependent Probe Amplification (MLPA) results for 96 independent samples. 66 samples contain at least one validated exon CNV and 30 samples have validated negative results for exon CNVs in 26 genes. The dataset includes 46 exon CNVs in BRCA1, BRCA2, TP53, MLH1, MSH2, MSH6, PMS2, EPCAM or PTEN, giving excellent representation of the cancer predisposition genes most frequently tested in clinical practice. Moreover, the validated exon CNVs include 25 single exon CNVs, the most difficult type of exon CNV to detect. The FASTQ files for the ICR96 exon CNV validation series can be accessed through the European-Genome phenome Archive (EGA) under the accession number EGAS00001002428.


2021 ◽  
Author(s):  
Jakub Mlokosiewicz ◽  
Piotr Deszynski ◽  
Wiktoria Wilman ◽  
Igor Jaszczyszyn ◽  
Rajkumar Ganesan ◽  
...  

Motivation: Rational design of therapeutic antibodies can be improved by harnessing the natural sequence diversity of these molecules. Our understanding of the diversity of antibodies has recently been greatly facilitated through the deposition of hundreds of millions of human antibody sequences in next-generation sequencing (NGS) repositories. Contrasting a query therapeutic antibody sequence to naturally observed diversity in similar antibody sequences from NGS can provide a mutational road-map for antibody engineers designing biotherapeutics. Because of the sheer scale of the antibody NGS datasets, performing queries across them is computationally challenging. Results: To facilitate harnessing antibody NGS data, we developed AbDiver (http://naturalantibody.com/abdiver), a free portal allowing users to compare their query sequences to those observed in the natural repertoires. AbDiver offers three antibody-specific use-cases: 1) compare a query antibody to positional variability statistics precomputed from multiple independent studies 2) retrieve close full variable sequence matches to a query antibody and 3) retrieve CDR3 or clonotype matches to a query antibody. We applied our system to a set of 742 therapeutic antibodies, demonstrating that for each use-case our system can retrieve relevant results for most sequences. AbDiver facilitates the navigation of vast antibody mutation space for the purpose of rational therapeutic antibody de-sign and engineering. Availability: AbDiver is freely accessible at http://naturalantibody.com/abdiver


mSphere ◽  
2021 ◽  
Vol 6 (2) ◽  
Author(s):  
Madolyn L. MacDonald ◽  
Shawn W. Polson ◽  
Kelvin H. Lee

ABSTRACT Adventitious agent detection during the production of vaccines and biotechnology-based medicines is of critical importance to ensure the final product is free from any possible viral contamination. Increasing the speed and accuracy of viral detection is beneficial as a means to accelerate development timelines and to ensure patient safety. Here, several rapid viral metagenomics approaches were tested on simulated next-generation sequencing (NGS) data sets and existing data sets from virus spike-in studies done in CHO-K1 and HeLa cell lines. It was observed that these rapid methods had comparable sensitivity to full-read alignment methods used for NGS viral detection for these data sets, but their specificity could be improved. A method that first filters host reads using KrakenUniq and then selects the virus classification tool based on the number of remaining reads is suggested as the preferred approach among those tested to detect nonlatent and nonendogenous viruses. Such an approach shows reasonable sensitivity and specificity for the data sets examined and requires less time and memory as full-read alignment methods. IMPORTANCE Next-generation sequencing (NGS) has been proposed as a complementary method to detect adventitious viruses in the production of biotherapeutics and vaccines to current in vivo and in vitro methods. Before NGS can be established in industry as a main viral detection technology, further investigation into the various aspects of bioinformatics analyses required to identify and classify viral NGS reads is needed. In this study, the ability of rapid metagenomics tools to detect viruses in biopharmaceutical relevant samples is tested and compared to recommend an efficient approach. The results showed that KrakenUniq can quickly and accurately filter host sequences and classify viral reads and had comparable sensitivity and specificity to slower full read alignment approaches, such as BLASTn, for the data sets examined.


Sign in / Sign up

Export Citation Format

Share Document