genomic databases
Recently Published Documents


TOTAL DOCUMENTS

263
(FIVE YEARS 110)

H-INDEX

25
(FIVE YEARS 4)

2022 ◽  
pp. cebp.0876.2021
Author(s):  
Daniel Backenroth ◽  
Jeremy Snider ◽  
Ronglai Shen ◽  
Venkatraman Seshan ◽  
Emily Castellanos ◽  
...  

2022 ◽  
Vol 23 (1) ◽  
Author(s):  
Jörg Winkler ◽  
Gianvito Urgese ◽  
Elisa Ficarra ◽  
Knut Reinert

Abstract Background The function of non-coding RNA sequences is largely determined by their spatial conformation, namely the secondary structure of the molecule, formed by Watson–Crick interactions between nucleotides. Hence, modern RNA alignment algorithms routinely take structural information into account. In order to discover yet unknown RNA families and infer their possible functions, the structural alignment of RNAs is an essential task. This task demands a lot of computational resources, especially for aligning many long sequences, and it therefore requires efficient algorithms that utilize modern hardware when available. A subset of the secondary structures contains overlapping interactions (called pseudoknots), which add additional complexity to the problem and are often ignored in available software. Results We present the SeqAn-based software LaRA 2 that is significantly faster than comparable software for accurate pairwise and multiple alignments of structured RNA sequences. In contrast to other programs our approach can handle arbitrary pseudoknots. As an improved re-implementation of the LaRA tool for structural alignments, LaRA 2 uses multi-threading and vectorization for parallel execution and a new heuristic for computing a lower boundary of the solution. Our algorithmic improvements yield a program that is up to 130 times faster than the previous version. Conclusions With LaRA 2 we provide a tool to analyse large sets of RNA secondary structures in relatively short time, based on structural alignment. The produced alignments can be used to derive structural motifs for the search in genomic databases.


2022 ◽  
Vol 12 ◽  
Author(s):  
Alejandro Rodríguez-Gijón ◽  
Julia K. Nuy ◽  
Maliheh Mehrshad ◽  
Moritz Buck ◽  
Frederik Schulz ◽  
...  

Our view of genome size in Archaea and Bacteria has remained skewed as the data has been dominated by genomes of microorganisms that have been cultivated under laboratory settings. However, the continuous effort to catalog Earth’s microbiomes, specifically propelled by recent extensive work on uncultivated microorganisms, provides an opportunity to revise our perspective on genome size distribution. We present a meta-analysis that includes 26,101 representative genomes from 3 published genomic databases; metagenomic assembled genomes (MAGs) from GEMs and stratfreshDB, and isolates from GTDB. Aquatic and host-associated microbial genomes present on average the smallest estimated genome sizes (3.1 and 3.0 Mbp, respectively). These are followed by terrestrial microbial genomes (average 3.7 Mbp), and genomes from isolated microorganisms (average 4.3 Mbp). On the one hand, aquatic and host-associated ecosystems present smaller genomes sizes in genera of phyla with genome sizes above 3 Mbp. On the other hand, estimated genome size in phyla with genomes under 3 Mbp showed no difference between ecosystems. Moreover, we observed that when using 95% average nucleotide identity (ANI) as an estimator for genetic units, only 3% of MAGs cluster together with genomes from isolated microorganisms. Although there are potential methodological limitations when assembling and binning MAGs, we found that in genome clusters containing both environmental MAGs and isolate genomes, MAGs were estimated only an average 3.7% smaller than isolate genomes. Even when assembly and binning methods introduce biases, estimated genome size of MAGs and isolates are very similar. Finally, to better understand the ecological drivers of genome size, we discuss on the known and the overlooked factors that influence genome size in different ecosystems, phylogenetic groups, and trophic strategies.


2021 ◽  
Vol 1 ◽  
Author(s):  
Karl Gemayel ◽  
Alexandre Lomsadze ◽  
Mark Borodovsky

State-of-the-art algorithms of ab initio gene prediction for prokaryotic genomes were shown to be sufficiently accurate. A pair of algorithms would agree on predictions of gene 3′ends. Nonetheless, predictions of gene starts would not match for 15–25% of genes in a genome. This discrepancy is a serious issue that is difficult to be resolved due to the absence of sufficiently large sets of genes with experimentally verified starts. We have introduced StartLink that infers gene starts from conservation patterns revealed by multiple alignments of homologous nucleotide sequences. We also have introduced StartLink+ combining both ab initio and alignment-based methods. The ability of StartLink to predict the start of a given gene is restricted by the availability of homologs in a database. We observed that StartLink made predictions for 85% of genes per genome on average. The StartLink+ accuracy was shown to be 98–99% on the sets of genes with experimentally verified starts. In comparison with database annotations, we observed that the annotated gene starts deviated from the StartLink+ predictions for ∼5% of genes in AT-rich genomes and for 10–15% of genes in GC-rich genomes on average. The use of StartLink+ has a potential to significantly improve gene start annotation in genomic databases.


2021 ◽  
Vol 7 (12) ◽  
pp. 1027
Author(s):  
Rebeca Vázquez-Avendaño ◽  
José Benjamín Rodríguez-Haas ◽  
Hugo Velázquez-Delgado ◽  
Greta Hanako Rosas-Saito ◽  
Eric Edmundo Hernández-Domínguez ◽  
...  

Neofusicoccum parvum belongs to the Botryosphaeriaceae family, which contains endophytes and pathogens of woody plants. In this study, we isolated 11 strains from diseased tissue of Liquidambar styraciflua. Testing with Koch’s postulates—followed by a molecular approach—revealed that N. parvum was the most pathogenic strain. We established an in vitro pathosystem (L. styraciflua foliar tissue–N. parvum) in order to characterize the infection process during the first 16 days. New CysRPs were identified for both organisms using public transcriptomic and genomic databases, while mRNA expression of CysRPs was analyzed by RT-qPCR. The results showed that N. parvum caused disease symptoms after 24 h that intensified over time. Through in silico analysis, 5 CysRPs were identified for each organism, revealing that all of the proteins are potentially secreted and novel, including two of N. parvum proteins containing the CFEM domain. Interestingly, the levels of the CysRPs mRNAs change during the interaction. This study reports N. parvum as a pathogen of L. styraciflua for the first time and highlights the potential involvement of CysRPs in both organisms during this interaction.


2021 ◽  
Vol 118 (49) ◽  
pp. e2112279118
Author(s):  
James R. Rybarski ◽  
Kuang Hu ◽  
Alexis M. Hill ◽  
Claus O. Wilke ◽  
Ilya J. Finkelstein

CRISPR-associated Tn7 transposons (CASTs) co-opt cas genes for RNA-guided transposition. CASTs are exceedingly rare in genomic databases; recent surveys have reported Tn7-like transposons that co-opt Type I-F, I-B, and V-K CRISPR effectors. Here, we expand the diversity of reported CAST systems via a bioinformatic search of metagenomic databases. We discover architectures for all known CASTs, including arrangements of the Cascade effectors, target homing modalities, and minimal V-K systems. We also describe families of CASTs that have co-opted the Type I-C and Type IV CRISPR-Cas systems. Our search for non-Tn7 CASTs identifies putative candidates that include a nuclease dead Cas12. These systems shed light on how CRISPR systems have coevolved with transposases and expand the programmable gene-editing toolkit.


Cancers ◽  
2021 ◽  
Vol 13 (22) ◽  
pp. 5661
Author(s):  
Sharavan Ramachandran ◽  
Itishree S. Kaushik ◽  
Sanjay K. Srivastava

Pancreatic tumors exhibit high basal autophagy compared to that of other cancers. Several studies including those from our laboratory reported that enhanced autophagy leads to apoptosis in cancer cells. In this study, we evaluated the autophagy and apoptosis inducing effects of Pimavanserin tartrate (PVT). Autophagic effects of PVT were determined by Acridine Orange assay and Transmission Electron Microscopy analysis. Clinical significance of ULK1 in normal and pancreatic cancer patients was evaluated by R2 and GEPIA cancer genomic databases. Modulation of proteins in autophagy signaling was assessed by Western blotting and Immunofluorescence. Apoptotic effects of PVT was evaluated by Annexin-V/APC assay. Subcutaneous xenograft pancreatic tumor model was used to evaluate the autophagy-mediated apoptotic effects of PVT in vivo. Autophagy was induced upon PVT treatment in pancreatic ducal adenocarcinoma (PDAC) cells. Pancreatic cancer patients exhibit reduced levels of autophagy initiator gene, ULK1, which correlated with reduced patient survival. Interestingly, PVT induced the expression of autophagy markers ULK1, FIP200, Atg101, Beclin-1, Atg5, LC3A/B, and cleavage of caspase-3, an indicator of apoptosis in several PDAC cells. ULK1 agonist LYN-1604 enhanced the autophagic and apoptotic effects of PVT. On the other hand, autophagy inhibitors chloroquine and bafilomycin blocked the autophagic and apoptotic effects of PVT in PDAC cells. Notably, chloroquine abrogated the growth suppressive effects of PVT by 25% in BxPC3 tumor xenografts in nude mice. Collectively, our results indicate that PVT mediated pancreatic tumor growth suppression was associated with induction of autophagy mediated apoptosis.


2021 ◽  
Author(s):  
Michal Vasina ◽  
Pavel Vanacek ◽  
Jiri Hon ◽  
David Kovar ◽  
Hanka Faldynova ◽  
...  

Abstract Next-generation sequencing doubles genomic databases every 2.5 years. The accumulation of sequence data raises the need to speed up functional analysis. Herein, we present a pipeline integrating bioinformatics and microfluidics and its application for high-throughput mining of novel haloalkane dehalogenases. We employed bioinformatics to identify 2,905 putative dehalogenases and selected 45 representative enzymes, of which 24 were produced in soluble form. Droplet-based microfluidics accelerates subsequent experimental testing up to 20,000 reactions per day while achieving 1,000-fold lower protein consumption. This resulted in doubling the dehalogenation “toolbox" characterized over three decades, yielding biocatalysts surpassing the efficiency of currently available enzymes. Combining microfluidics with modern global data analysis provided precious mechanistic information related to the high catalytic efficiency of new variants. This pipeline applied to other enzyme families can accelerate the identification of biocatalysts for industrial applications as well as the collection of high-quality data for machine learning.


2021 ◽  
Author(s):  
Clement Agret ◽  
Bastien Cazaux ◽  
Antoine Limasset

Motivation: To keep up with the scale of genomic databases, several methods rely on local sensitive hashing methods to efficiently find potential matches within large genome collections. Existing solutions rely on Minhash or Hyperloglog fingerprints and require reading the whole index to perform a query. Such solutions can not be considered scalable with the growing amount of documents to index. Results: We present NIQKI, a novel structure using well-designed fingerprints that lead to theoretical and practical query time improvements, outperforming state-of-the-art by orders of magnitude. Our contribution is threefold. First, we generalize the concept of Hyperminhash fingerprints in (h,m)-HMH fingerprints that can be tuned to present the lowest false positive rate given the expected sub-sampling applied. Second, we provide a structure able to index any kind of fingerprints based on inverted indexes that provide optimal queries, namely linear with the size of the output. Third, we implemented these approaches in a tool dubbed NIQKI that can index and calculate pairwise distances for over one million bacterial genomes from GenBank in a matter of days on a small cluster. We show that our approach can be orders of magnitude faster than state-of-the-art with comparable precision. We believe that this approach can lead to tremendous improvement allowing fast query, scaling on extensive genomic databases. Availability and implementation: We wrote the NIQKI index as an open-source C++ library under the AGPL3 license available at https://github.com/Malfoy/ NIQKI. It is designed as a user-friendly tool and comes along with usage sample


2021 ◽  
Vol 16 (11) ◽  
pp. 1934578X2110399
Author(s):  
Bing Liu ◽  
Hao Lian

Objectives: Caesalpinia Sappan L. is a traditional Chinese medicine with a long history. Recent studies have confirmed that Sappan has an antitumor effect, but its specific mechanism is still unclear. Methods: In this study, we used network pharmacology to predict the target and signal pathway of Sappan. In addition, the Cancer Genome Atlas and cancer cell lines encyclopedia large-scale genomic databases were used to analyze the relationship between different subtypes of Akt. Based on molecular docking technology, the interaction mode between small molecule compounds and protein targets was explored. Finally, we studied the effect of Sappan on Akt protein expression by Western blot in vitro. Results: AKT1 and AKT2 were significantly expressed in breast cancer cells, but they were significantly different from AKT3. Finally, molecular docking analysis showed that (3R,5R)-1,3,4,5-tetrakis(((E)-3-(3,4-dihydroxyphenyl)acryloyl)oxy)cyclohexane-1-carboxylic acid had a very ideal binding mode with Akt. Subsequent experiments showed that Sappan extract could induce apoptosis of HepG2 cells in a dose-dependent manner, and down regulate the phosphorylation level of Akt protein thr308 in a dose-dependent manner. Conclusions: This study provides new ideas for Sappan's anticancer research through the strategy of system pharmacology.


Sign in / Sign up

Export Citation Format

Share Document