real gene
Recently Published Documents


TOTAL DOCUMENTS

17
(FIVE YEARS 3)

H-INDEX

6
(FIVE YEARS 0)

Entropy ◽  
2020 ◽  
Vol 22 (10) ◽  
pp. 1101
Author(s):  
Eran Agmon ◽  
Ryan K. Spangler

The degree to which we can understand the multi-scale organization of cellular life is tied to how well our models can represent this organization and the processes that drive its evolution. This paper uses Vivarium—an engine for composing heterogeneous computational biology models into integrated, multi-scale simulations. Vivarium’s approach is demonstrated by combining several sub-models of biophysical processes into a model of chemotactic E. coli that exchange molecules with their environment, express the genes required for chemotaxis, swim, grow, and divide. This model is developed incrementally, highlighting cross-compartment mechanisms that link E. coli to its environment, with models for: (1) metabolism and transport, with transport moving nutrients across the membrane boundary and metabolism converting them to useful metabolites, (2) transcription, translation, complexation, and degradation, with stochastic mechanisms that read real gene sequence data and consume base pairs and ATP to make proteins and complexes, and (3) the activity of flagella and chemoreceptors, which together support navigation in the environment.


2020 ◽  
Vol 36 (20) ◽  
pp. 5054-5060
Author(s):  
Xiangyu Liu ◽  
Di Li ◽  
Juntao Liu ◽  
Zhengchang Su ◽  
Guojun Li

Abstract Motivation Biclustering has emerged as a powerful approach to identifying functional patterns in complex biological data. However, existing tools are limited by their accuracy and efficiency to recognize various kinds of complex biclusters submerged in ever large datasets. We introduce a novel fast and highly accurate algorithm RecBic to identify various forms of complex biclusters in gene expression datasets. Results We designed RecBic to identify various trend-preserving biclusters, particularly, those with narrow shapes, i.e. clusters where the number of genes is larger than the number of conditions/samples. Given a gene expression matrix, RecBic starts with a column seed, and grows it into a full-sized bicluster by simply repetitively comparing real numbers. When tested on simulated datasets in which the elements of implanted trend-preserving biclusters and those of the background matrix have the same distribution, RecBic was able to identify the implanted biclusters in a nearly perfect manner, outperforming all the compared salient tools in terms of accuracy and robustness to noise and overlaps between the clusters. Moreover, RecBic also showed superiority in identifying functionally related genes in real gene expression datasets. Availability and implementation Code, sample input data and usage instructions are available at the following websites. Code: https://github.com/holyzews/RecBic/tree/master/RecBic/. Data: http://doi.org/10.5281/zenodo.3842717. Supplementary information Supplementary data are available at Bioinformatics online.


10.29007/d87q ◽  
2020 ◽  
Author(s):  
San Ha Seo ◽  
Saeed Salem

Large amount of gene expression data has been collected for various environmental and biological conditions. Extracting co-expression networks that are recurrent in multiple co-expression networks has been shown promising in functional gene annotation and biomarkers discovery. Frequent subgraph mining reports a large number of subnetworks. In this work, we propose to mine approximate dense frequent subgraphs. Our proposed approach reports representative frequent subgraphs that are also dense. Our experiments on real gene coexpression networks show that frequent subgraphs are biologically interesting as evidenced by the large percentage of biologically enriched frequent dense subgraphs.


2017 ◽  
Author(s):  
Craig Disselkoen ◽  
Nathan Hekman ◽  
Brian Gilbert ◽  
Sydney Benson ◽  
Matthew Anderson ◽  
...  

AbstractAn important question in many biological applications, is to estimate or classify gene activity states (active or inactive) based on genome-wide transcriptomics data. Recently, we proposed a Bayesian method, titled MultiMM, which showed superior results compared to existing methods. In short, MultiMM performed better than existing methods on both simulated and real gene expression data, confirming well-known biological results and yielding better agreement with fluxomics data. Despite these promising results, MultiMM has numerous limitations. First, MultiMM leverages co-regulatory models to improve activity state estimates, but information about co-regulation is incorporated in a manner that assumes that networks are known with certainty. Second, MultiMM assumes that genes that change states in the dataset can be distinguished with certainty from those that remain in one state. Third, the model can be sensitive to extreme measures (outliers) of gene expression. In this manuscript, we propose a modified Bayesian approach, which addresses these three limitations by improving outlier handling and by explicitly modeling network and other uncertainty yielding improved gene activity state estimates when compared to MultiMM.


2017 ◽  
Vol 18 (1) ◽  
Author(s):  
Shailesh Tripathi ◽  
Jason Lloyd-Price ◽  
Andre Ribeiro ◽  
Olli Yli-Harja ◽  
Matthias Dehmer ◽  
...  

2017 ◽  
Vol 2017 ◽  
pp. 1-18 ◽  
Author(s):  
Md. Shahjaman ◽  
Nishith Kumar ◽  
Md. Manir Hossain Mollah ◽  
Md. Shakil Ahmed ◽  
Anjuman Ara Begum ◽  
...  

Identification of differentially expressed (DE) genes with two or more conditions is an important task for discovery of few biomarker genes. Significance Analysis of Microarrays (SAM) is a popular statistical approach for identification of DE genes for both small- and large-sample cases. However, it is sensitive to outlying gene expressions and produces low power in presence of outliers. Therefore, in this paper, an attempt is made to robustify the SAM approach using the minimum β-divergence estimators instead of the maximum likelihood estimators of the parameters. We demonstrated the performance of the proposed method in a comparison of some other popular statistical methods such as ANOVA, SAM, LIMMA, KW, EBarrays, GaGa, and BRIDGE using both simulated and real gene expression datasets. We observe that all methods show good and almost equal performance in absence of outliers for the large-sample cases, while in the small-sample cases only three methods (SAM, LIMMA, and proposed) show almost equal and better performance than others with two or more conditions. However, in the presence of outliers, on an average, only the proposed method performs better than others for both small- and large-sample cases with each condition.


Scientifica ◽  
2016 ◽  
Vol 2016 ◽  
pp. 1-8 ◽  
Author(s):  
Hamid Alavi Majd ◽  
Soodeh Shahsavari ◽  
Ahmad Reza Baghestani ◽  
Seyyed Mohammad Tabatabaei ◽  
Naghme Khadem Bashi ◽  
...  

Background.Biclustering algorithms for the analysis of high-dimensional gene expression data were proposed. Among them, the plaid model is arguably one of the most flexible biclustering models up to now.Objective.The main goal of this study is to provide an evaluation of plaid models. To that end, we will investigate this model on both simulation data and real gene expression datasets.Methods.Two simulated matrices with different degrees of overlap and noise are generated and then the intrinsic structure of these data is compared with biclusters result. Also, we have searched biologically significant discovered biclusters by GO analysis.Results.When there is no noise the algorithm almost discovered all of the biclusters but when there is moderate noise in the dataset, this algorithm cannot perform very well in finding overlapping biclusters and if noise is big, the result of biclustering is not reliable.Conclusion.The plaid model needs to be modified because when there is a moderate or big noise in the data, it cannot find good biclusters. This is a statistical model and is a quite flexible one. In summary, in order to reduce the errors, model can be manipulated and distribution of error can be changed.


2013 ◽  
Vol 2013 ◽  
pp. 1-7 ◽  
Author(s):  
Alexey Anatolievich Morozov ◽  
Yuri Pavlovich Galachyants ◽  
Yelena Valentinovna Likhoshway

Existing algorithms allow us to infer phylogenetic networks from sequences (DNA, protein or binary), sets of trees, and distance matrices, but there are no methods to build them using the gene order data as an input. Here we describe several methods to build split networks from the gene order data, perform simulation studies, and use our methods for analyzing and interpreting different real gene order datasets. All proposed methods are based on intermediate data, which can be generated from genome structures under study and used as an input for network construction algorithms. Three intermediates are used: set of jackknife trees, distance matrix, and binary encoding. According to simulations and case studies, the best intermediates are jackknife trees and distance matrix (when used with Neighbor-Net algorithm). Binary encoding can also be useful, but only when the methods mentioned above cannot be used.


2013 ◽  
Vol 2013 ◽  
pp. 1-5 ◽  
Author(s):  
Xiao-Ying Liu ◽  
Yong Liang ◽  
Zong-Ben Xu ◽  
Hai Zhang ◽  
Kwong-Sak Leung

A new adaptiveL1/2shooting regularization method for variable selection based on the Cox’s proportional hazards mode being proposed. This adaptiveL1/2shooting algorithm can be easily obtained by the optimization of a reweighed iterative series ofL1penalties and a shooting strategy ofL1/2penalty. Simulation results based on high dimensional artificial data show that the adaptiveL1/2shooting regularization method can be more accurate for variable selection than Lasso and adaptive Lasso methods. The results from real gene expression dataset (DLBCL) also indicate that theL1/2regularization method performs competitively.


2010 ◽  
Vol 9 ◽  
pp. CIN.S3805 ◽  
Author(s):  
Yingdong Zhao ◽  
Richard Simon

There have been relatively few publications using linear regression models to predict a continuous response based on microarray expression profiles. Standard linear regression methods are problematic when the number of predictor variables exceeds the number of cases. We have evaluated three linear regression algorithms that can be used for the prediction of a continuous response based on high dimensional gene expression data. The three algorithms are the least angle regression (LAR), the least absolute shrinkage and selection operator (LASSO), and the averaged linear regression method (ALM). All methods are tested using simulations based on a real gene expression dataset and analyses of two sets of real gene expression data and using an unbiased complete cross validation approach. Our results show that the LASSO algorithm often provides a model with somewhat lower prediction error than the LAR method, but both of them perform more efficiently than the ALM predictor. We have developed a plug-in for BRB-ArrayTools that implements the LAR and the LASSO algorithms with complete cross-validation.


Sign in / Sign up

Export Citation Format

Share Document