Comparison of Methods for Meta-dimensional Data Analysis Using in Silico and Biological Data Sets

Drug discovery as well as (nano)material design projects demand the in silico analysis of large datasets of compounds with their corresponding properties/activities, as well as the retrieval and virtual screening of more structures in an effort to identify new potent hits. This is a demanding procedure for which various tools must be combined with different input and output formats. To automate the data analysis required we have developed the necessary tools to facilitate a variety of important tasks to construct workflows that will simplify the handling, processing and modeling of cheminformatics data and will provide time and cost efficient solutions, reproducible and easier to maintain. We therefore develop and present a toolbox of >25 processing modules, Enalos+ nodes, that provide very useful operations within KNIME platform for users interested in the nanoinformatics and cheminformatics analysis of chemical and biological data. With a user-friendly interface, Enalos+ Nodes provide a broad range of important functionalities including data mining and retrieval from large available databases and tools for robust and predictive model development and validation. Enalos+ Nodes are available through KNIME as add-ins and offer valuable tools for extracting useful information and analyzing experimental and virtual screening results in a chem- or nano- informatics framework. On top of that, in an effort to: (i) allow big data analysis through Enalos+ KNIME nodes, (ii) accelerate time demanding computations performed within Enalos+ KNIME nodes and (iii) propose new time and cost efficient nodes integrated within Enalos+ toolbox we have investigated and verified the advantage of GPU calculations within the Enalos+ nodes. Demonstration data sets, tutorial and educational videos allow the user to easily apprehend the functions of the nodes that can be applied for in silico analysis of data.

Download Full-text

Interactive and coordinated visualization approaches for biological data analysis

Briefings in Bioinformatics ◽

10.1093/bib/bby019 ◽

2018 ◽

Vol 20 (4) ◽

pp. 1513-1523 ◽

Cited By ~ 4

Author(s):

António Cruz ◽

Joel P Arrais ◽

Penousal Machado

Keyword(s):

Data Analysis ◽

Biological Data ◽

Data Sets ◽

Protein Protein Interaction ◽

Biological Data Analysis ◽

Time Series Gene Expression ◽

Protein Protein Interaction Networks ◽

Complex Relationships ◽

Meaningful Relationships ◽

Different Sources

AbstractThe field of computational biology has become largely dependent on data visualization tools to analyze the increasing quantities of data gathered through the use of new and growing technologies. Aside from the volume, which often results in large amounts of noise and complex relationships with no clear structure, the visualization of biological data sets is hindered by their heterogeneity, as data are obtained from different sources and contain a wide variety of attributes, including spatial and temporal information. This requires visualization approaches that are able to not only represent various data structures simultaneously but also provide exploratory methods that allow the identification of meaningful relationships that would not be perceptible through data analysis algorithms alone. In this article, we present a survey of visualization approaches applied to the analysis of biological data. We focus on graph-based visualizations and tools that use coordinated multiple views to represent high-dimensional multivariate data, in particular time series gene expression, protein–protein interaction networks and biological pathways. We then discuss how these methods can be used to help solve the current challenges surrounding the visualization of complex biological data sets.

Download Full-text

A FAST TECHNIQUE FOR DERIVING FREQUENT STRUCTURED PATTERNS FROM BIOLOGICAL DATA SETS

New Mathematics and Natural Computation ◽

10.1142/s1793005705000111 ◽

2005 ◽

Vol 01 (02) ◽

pp. 305-327

Author(s):

GIORGIO TERRACINA

Keyword(s):

Data Analysis ◽

Genetic Diseases ◽

Biological Data ◽

Data Sets ◽

Specific Class ◽

Biological Sequences ◽

Biological Mechanisms ◽

Approximate Form ◽

Wide Range ◽

Human Genome Sequencing

In the last years, the completion of the human genome sequencing showed a wide range of new challenging issues involving raw data analysis. In particular, the discovery of information implicitly encoded in biological sequences is assuming a prominent role in identifying genetic diseases and in deciphering biological mechanisms. This information is usually represented by patterns frequently occurring in the sequences. Because of biological observations, a specific class of patterns is becoming particularly interesting: frequent structured patterns. In this respect, it is biologically meaningful to look at both "exact" and "approximate" repetitions of pattens within the available sequences. This paper gives a contribution in this setting by providing algorithms which allow to discover frequent structured patterns, both in "exact" and "approximate" form, present in a collection of input biological sequences.

Download Full-text

DTI data analysis: application of fiber tracking to group averaged data sets

Klinische Neurophysiologie ◽

10.1055/s-0030-1250957 ◽

2010 ◽

Vol 41 (01) ◽

Author(s):

HP Müller ◽

A Unrath ◽

A Riecker ◽

AC Ludolph ◽

J Kassubek

Keyword(s):

Data Analysis ◽

Fiber Tracking ◽

Data Sets ◽

Analysis Application

Download Full-text

Discovery of New AKT1 Inhibitors by Combination of In silico Structure Based Virtual Screening Approaches and Biological Evaluations

10.26434/chemrxiv.7591202 ◽

2019 ◽

Author(s):

Filip Fratev ◽

Denisse A. Gutierrez ◽

Renato J. Aguilera ◽

suman sirimulla

Keyword(s):

Virtual Screening ◽

Success Rate ◽

Protein Interactions ◽

In Silico ◽

Biological Data ◽

In Vitro Binding ◽

Vitro Binding ◽

Screening Protocol ◽

Screening Approaches

AKT1 is emerging as a useful target for treating cancer. Herein, we discovered a new set of ligands that inhibit the AKT1, as shown by in vitro binding and cell line studies, using a newly designed virtual screening protocol that combines structure-based pharmacophore and docking screens. Taking together with the biological data, the combination of structure based pharamcophore and docking methods demonstrated reasonable success rate in identifying new inhibitors (60-70%) proving the success of aforementioned approach. A detail analysis of the ligand-protein interactions was performed explaining observed activities.<br>

Download Full-text

In Silico ADME Prediction: Data Sets and Models

Current Computer - Aided Drug Design ◽

10.2174/157340905774330318 ◽

2005 ◽

Vol 1 (4) ◽

pp. 365-376 ◽

Cited By ~ 6

Author(s):

Gonzalo Colmenarejo

Keyword(s):

In Silico ◽

Data Sets ◽

Adme Prediction ◽

In Silico Adme Prediction

Download Full-text

Multi-view feature selection for identifying gene markers: a diversified biological data driven approach

BMC Bioinformatics ◽

10.1186/s12859-020-03810-0 ◽

2020 ◽

Vol 21 (S18) ◽

Author(s):

Sudipta Acharya ◽

Laizhong Cui ◽

Yi Pan

Keyword(s):

Gene Expression ◽

Feature Selection ◽

Gene Selection ◽

Marker Gene ◽

Biological Data ◽

Protein Interaction Data ◽

Marker Genes ◽

Data Sets ◽

Gene Markers ◽

Multi Objective

Abstract Background In recent years, to investigate challenging bioinformatics problems, the utilization of multiple genomic and proteomic sources has become immensely popular among researchers. One such issue is feature or gene selection and identifying relevant and non-redundant marker genes from high dimensional gene expression data sets. In that context, designing an efficient feature selection algorithm exploiting knowledge from multiple potential biological resources may be an effective way to understand the spectrum of cancer or other diseases with applications in specific epidemiology for a particular population. Results In the current article, we design the feature selection and marker gene detection as a multi-view multi-objective clustering problem. Regarding that, we propose an Unsupervised Multi-View Multi-Objective clustering-based gene selection approach called UMVMO-select. Three important resources of biological data (gene ontology, protein interaction data, protein sequence) along with gene expression values are collectively utilized to design two different views. UMVMO-select aims to reduce gene space without/minimally compromising the sample classification efficiency and determines relevant and non-redundant gene markers from three cancer gene expression benchmark data sets. Conclusion A thorough comparative analysis has been performed with five clustering and nine existing feature selection methods with respect to several internal and external validity metrics. Obtained results reveal the supremacy of the proposed method. Reported results are also validated through a proper biological significance test and heatmap plotting.

Download Full-text

Integrating Biological Data Sources and Data Analysis Tools through Mediators (available online only)

Proceedings of the 2004 ACM symposium on Applied computing - SAC '04 ◽

10.1145/967900.980091 ◽

2004 ◽

Cited By ~ 4

Author(s):

J. F. Aldana ◽

M. Roldán ◽

I. Navas ◽

A. J. Pérez ◽

O. Trelles

Keyword(s):

Data Analysis ◽

Biological Data ◽

Data Sources ◽

Analysis Tools

Download Full-text

BioNetLink - An Architecture for Working with Network Data

Journal of Integrative Bioinformatics ◽

10.1515/jib-2014-241 ◽

2014 ◽

Vol 11 (2) ◽

pp. 68-79

Author(s):

Matthias Klapperstück ◽

Falk Schreiber

Keyword(s):

Experimental Data ◽

Biological Networks ◽

Regulatory Networks ◽

Large Data ◽

Biological Data ◽

Network Data ◽

Data Sets ◽

Dynamic Views ◽

Gene Regulatory ◽

Multiple Network

Summary The visualization of biological data gained increasing importance in the last years. There is a large number of methods and software tools available that visualize biological data including the combination of measured experimental data and biological networks. With growing size of networks their handling and exploration becomes a challenging task for the user. In addition, scientists also have an interest in not just investigating a single kind of network, but on the combination of different types of networks, such as metabolic, gene regulatory and protein interaction networks. Therefore, fast access, abstract and dynamic views, and intuitive exploratory methods should be provided to search and extract information from the networks. This paper will introduce a conceptual framework for handling and combining multiple network sources that enables abstract viewing and exploration of large data sets including additional experimental data. It will introduce a three-tier structure that links network data to multiple network views, discuss a proof of concept implementation, and shows a specific visualization method for combining metabolic and gene regulatory networks in an example.

Download Full-text

Graph Cutting in Image Processing Handling with Biological Data Analysis

Advances in Intelligent Systems and Computing - Information Technology, Systems Research, and Computational Physics ◽

10.1007/978-3-030-18058-4_16 ◽

2019 ◽

pp. 203-216

Author(s):

Mária Ždímalová ◽

Tomáš Bohumel ◽

Katarína Plachá-Gregorovská ◽

Peter Weismann ◽

Hisham El Falougy

Keyword(s):

Image Processing ◽

Data Analysis ◽

Biological Data ◽

Biological Data Analysis

Download Full-text