Comparison of Methods for Meta-dimensional Data Analysis Using in Silico and Biological Data Sets

Author(s):  
Emily R. Holzinger ◽  
Scott M. Dudek ◽  
Alex T. Frase ◽  
Brooke Fridley ◽  
Prabhakar Chalise ◽  
...  
2020 ◽  
Vol 27 (38) ◽  
pp. 6523-6535 ◽  
Author(s):  
Antreas Afantitis ◽  
Andreas Tsoumanis ◽  
Georgia Melagraki

Drug discovery as well as (nano)material design projects demand the in silico analysis of large datasets of compounds with their corresponding properties/activities, as well as the retrieval and virtual screening of more structures in an effort to identify new potent hits. This is a demanding procedure for which various tools must be combined with different input and output formats. To automate the data analysis required we have developed the necessary tools to facilitate a variety of important tasks to construct workflows that will simplify the handling, processing and modeling of cheminformatics data and will provide time and cost efficient solutions, reproducible and easier to maintain. We therefore develop and present a toolbox of >25 processing modules, Enalos+ nodes, that provide very useful operations within KNIME platform for users interested in the nanoinformatics and cheminformatics analysis of chemical and biological data. With a user-friendly interface, Enalos+ Nodes provide a broad range of important functionalities including data mining and retrieval from large available databases and tools for robust and predictive model development and validation. Enalos+ Nodes are available through KNIME as add-ins and offer valuable tools for extracting useful information and analyzing experimental and virtual screening results in a chem- or nano- informatics framework. On top of that, in an effort to: (i) allow big data analysis through Enalos+ KNIME nodes, (ii) accelerate time demanding computations performed within Enalos+ KNIME nodes and (iii) propose new time and cost efficient nodes integrated within Enalos+ toolbox we have investigated and verified the advantage of GPU calculations within the Enalos+ nodes. Demonstration data sets, tutorial and educational videos allow the user to easily apprehend the functions of the nodes that can be applied for in silico analysis of data.


2018 ◽  
Vol 20 (4) ◽  
pp. 1513-1523 ◽  
Author(s):  
António Cruz ◽  
Joel P Arrais ◽  
Penousal Machado

AbstractThe field of computational biology has become largely dependent on data visualization tools to analyze the increasing quantities of data gathered through the use of new and growing technologies. Aside from the volume, which often results in large amounts of noise and complex relationships with no clear structure, the visualization of biological data sets is hindered by their heterogeneity, as data are obtained from different sources and contain a wide variety of attributes, including spatial and temporal information. This requires visualization approaches that are able to not only represent various data structures simultaneously but also provide exploratory methods that allow the identification of meaningful relationships that would not be perceptible through data analysis algorithms alone. In this article, we present a survey of visualization approaches applied to the analysis of biological data. We focus on graph-based visualizations and tools that use coordinated multiple views to represent high-dimensional multivariate data, in particular time series gene expression, protein–protein interaction networks and biological pathways. We then discuss how these methods can be used to help solve the current challenges surrounding the visualization of complex biological data sets.


2005 ◽  
Vol 01 (02) ◽  
pp. 305-327
Author(s):  
GIORGIO TERRACINA

In the last years, the completion of the human genome sequencing showed a wide range of new challenging issues involving raw data analysis. In particular, the discovery of information implicitly encoded in biological sequences is assuming a prominent role in identifying genetic diseases and in deciphering biological mechanisms. This information is usually represented by patterns frequently occurring in the sequences. Because of biological observations, a specific class of patterns is becoming particularly interesting: frequent structured patterns. In this respect, it is biologically meaningful to look at both "exact" and "approximate" repetitions of pattens within the available sequences. This paper gives a contribution in this setting by providing algorithms which allow to discover frequent structured patterns, both in "exact" and "approximate" form, present in a collection of input biological sequences.


2010 ◽  
Vol 41 (01) ◽  
Author(s):  
HP Müller ◽  
A Unrath ◽  
A Riecker ◽  
AC Ludolph ◽  
J Kassubek

2019 ◽  
Author(s):  
Filip Fratev ◽  
Denisse A. Gutierrez ◽  
Renato J. Aguilera ◽  
suman sirimulla

AKT1 is emerging as a useful target for treating cancer. Herein, we discovered a new set of ligands that inhibit the AKT1, as shown by in vitro binding and cell line studies, using a newly designed virtual screening protocol that combines structure-based pharmacophore and docking screens. Taking together with the biological data, the combination of structure based pharamcophore and docking methods demonstrated reasonable success rate in identifying new inhibitors (60-70%) proving the success of aforementioned approach. A detail analysis of the ligand-protein interactions was performed explaining observed activities.<br>


2020 ◽  
Vol 21 (S18) ◽  
Author(s):  
Sudipta Acharya ◽  
Laizhong Cui ◽  
Yi Pan

Abstract Background In recent years, to investigate challenging bioinformatics problems, the utilization of multiple genomic and proteomic sources has become immensely popular among researchers. One such issue is feature or gene selection and identifying relevant and non-redundant marker genes from high dimensional gene expression data sets. In that context, designing an efficient feature selection algorithm exploiting knowledge from multiple potential biological resources may be an effective way to understand the spectrum of cancer or other diseases with applications in specific epidemiology for a particular population. Results In the current article, we design the feature selection and marker gene detection as a multi-view multi-objective clustering problem. Regarding that, we propose an Unsupervised Multi-View Multi-Objective clustering-based gene selection approach called UMVMO-select. Three important resources of biological data (gene ontology, protein interaction data, protein sequence) along with gene expression values are collectively utilized to design two different views. UMVMO-select aims to reduce gene space without/minimally compromising the sample classification efficiency and determines relevant and non-redundant gene markers from three cancer gene expression benchmark data sets. Conclusion A thorough comparative analysis has been performed with five clustering and nine existing feature selection methods with respect to several internal and external validity metrics. Obtained results reveal the supremacy of the proposed method. Reported results are also validated through a proper biological significance test and heatmap plotting.


2014 ◽  
Vol 11 (2) ◽  
pp. 68-79
Author(s):  
Matthias Klapperstück ◽  
Falk Schreiber

Summary The visualization of biological data gained increasing importance in the last years. There is a large number of methods and software tools available that visualize biological data including the combination of measured experimental data and biological networks. With growing size of networks their handling and exploration becomes a challenging task for the user. In addition, scientists also have an interest in not just investigating a single kind of network, but on the combination of different types of networks, such as metabolic, gene regulatory and protein interaction networks. Therefore, fast access, abstract and dynamic views, and intuitive exploratory methods should be provided to search and extract information from the networks. This paper will introduce a conceptual framework for handling and combining multiple network sources that enables abstract viewing and exploration of large data sets including additional experimental data. It will introduce a three-tier structure that links network data to multiple network views, discuss a proof of concept implementation, and shows a specific visualization method for combining metabolic and gene regulatory networks in an example.


Author(s):  
Mária Ždímalová ◽  
Tomáš Bohumel ◽  
Katarína Plachá-Gregorovská ◽  
Peter Weismann ◽  
Hisham El Falougy

Sign in / Sign up

Export Citation Format

Share Document