expression matrix
Recently Published Documents


TOTAL DOCUMENTS

35
(FIVE YEARS 16)

H-INDEX

8
(FIVE YEARS 2)

PeerJ ◽  
2021 ◽  
Vol 9 ◽  
pp. e12570
Author(s):  
Yunqing Liu ◽  
Na Lu ◽  
Changwei Bi ◽  
Tingyu Han ◽  
Guo Zhuojun ◽  
...  

Background One goal of expression data analysis is to discover the biological significance or function of genes that are differentially expressed. Gene Set Enrichment (GSE) analysis is one of the main tools for function mining that has been widely used. However, every gene expressed in a cell is valuable information for GSE for single-cell RNA sequencing (scRNA-SEQ) data and not should be discarded. Methods We developed the functional expression matrix (FEM) algorithm to utilize the information from all expressed genes. The algorithm converts the gene expression matrix (GEM) into a FEM. The FEM algorithm can provide insight on the biological significance of a single cell. It can also integrate with GEM for downstream analysis. Results We found that FEM performed well with cell clustering and cell-type specific function annotation in three datasets (peripheral blood mononuclear cells, human liver, and human pancreas).


PeerJ ◽  
2021 ◽  
Vol 9 ◽  
pp. e12233
Author(s):  
Diem-Trang Tran ◽  
Matthew Might

Normalization of RNA-seq data has been an active area of research since the problem was first recognized a decade ago. Despite the active development of new normalizers, their performance measures have been given little attention. To evaluate normalizers, researchers have been relying on ad hoc measures, most of which are either qualitative, potentially biased, or easily confounded by parametric choices of downstream analysis. We propose a metric called condition-number based deviation, or cdev, to quantify normalization success. cdev measures how much an expression matrix differs from another. If a ground truth normalization is given, cdev can then be used to evaluate the performance of normalizers. To establish experimental ground truth, we compiled an extensive set of public RNA-seq assays with external spike-ins. This data collection, together with cdev, provides a valuable toolset for benchmarking new and existing normalization methods.


2021 ◽  
Vol 12 ◽  
Author(s):  
Jing Li ◽  
Urminder Singh ◽  
Zebulun Arendsee ◽  
Eve Syrkin Wurtele

The “dark transcriptome” can be considered the multitude of sequences that are transcribed but not annotated as genes. We evaluated expression of 6,692 annotated genes and 29,354 unannotated open reading frames (ORFs) in the Saccharomyces cerevisiae genome across diverse environmental, genetic and developmental conditions (3,457 RNA-Seq samples). Over 30% of the highly transcribed ORFs have translation evidence. Phylostratigraphic analysis infers most of these transcribed ORFs would encode species-specific proteins (“orphan-ORFs”); hundreds have mean expression comparable to annotated genes. These data reveal unannotated ORFs most likely to be protein-coding genes. We partitioned a co-expression matrix by Markov Chain Clustering; the resultant clusters contain 2,468 orphan-ORFs. We provide the aggregated RNA-Seq yeast data with extensive metadata as a project in MetaOmGraph (MOG), a tool designed for interactive analysis and visualization. This approach enables reuse of public RNA-Seq data for exploratory discovery, providing a rich context for experimentalists to make novel, experimentally testable hypotheses about candidate genes.


2021 ◽  
Vol 12 ◽  
Author(s):  
Jieping Yang ◽  
Jiaxing Lin ◽  
Jun An ◽  
Yongkang Zhao ◽  
Siyang Jing ◽  
...  

BackgroundBladder cancer is a common malignant tumor characterized by high mortality and high management costs; however, it lacks useful molecular prognostic markers. Tribbles pseudokinase 3 (TRIB3) is a pseudokinase that participates in cell tumor progression and metabolism and whose function in bladder cancer is not precisely known.Main MethodsWe downloaded transcriptome data and clinical data of bladder cancer from associated databases and extracted the expression matrix of TRIB3 for multiple bioinformatics analysis. RT-PCR detected the expression of TRIB3 in bladder cancer cells. After knockdown of TRIB3 with siRNA, we investigated TRIB3 function using CCK8, Cell Cycle and Transwell assays.Key FindingsKaplan–Meier analysis of TRIB3 in the four cohorts showed that high expression of TRIB3 correlated with poor outcome. Expression of TRIB3 positively correlated with stage and grade and down-regulation of TRIB3 expression significantly inhibited proliferation, migration and cell cycle of bladder cancer cells.SignificanceTRIB3 is a potential prognostic marker and therapeutic target. It can be used to individualize the treatment of bladder cancer.


2021 ◽  
Author(s):  
Ron Zeira ◽  
Max Land ◽  
Benjamin J. Raphael

AbstractSpatial transcriptomics (ST) is a new technology that measures mRNA expression across thousands of spots on a tissue slice, while preserving information about the spatial location of spots. ST is typically applied to several replicates from adjacent slices of a tissue. However, existing methods to analyze ST data do not take full advantage of the similarity in both gene expression and spatial organization across these replicates. We introduce a new method PASTE (Probabilistic Alignment of ST Experiments) to align and integrate ST data across adjacent tissue slices leveraging both transcriptional similarity and spatial distances between spots. First, we formalize and solve the problem of pairwise alignment of ST data from adjacent tissue slices, or layers, using Fused Gromov-Wasserstein Optimal Transport (FGW-OT), which accounts for variability in the composition and spatial location of the spots on each layer. From these pairwise alignments, we construct a 3D representation of the tissue. Next, we introduce the problem of simultaneous alignment and integration of multiple ST layers into a single layer with a low rank gene expression matrix. We derive an algorithm to solve the problem by alternating between solving FGW-OT instances and solving a Non-negative Matrix Factorization (NMF) of a weighted expression matrix. We show on both simulated and real ST datasets that PASTE accurately aligns spots across adjacent layers and accurately estimates a consensus expression matrix from multiple ST layers. PASTE outperforms integration methods that rely solely on either transcriptional similarity or spatial similarity, demonstrating the advantages of combining both types of information.Code availabilitySoftware is available at https://github.com/raphael-group/paste


2021 ◽  
Author(s):  
Zi-Hang Wen ◽  
Jeremy L. Langsam ◽  
Lu Zhang ◽  
Wenjun Shen ◽  
Xin Zhou

AbstractSingle-cell RNA-seq (scRNA-seq) offers opportunities to study gene expression of tens of thousands of single cells simultaneously, to investigate cell-to-cell variation, and to reconstruct cell-type-specific gene regulatory networks. Recovering dropout events in a sparse gene expression matrix for scRNA-seq data is a long-standing matrix completion problem. We introduce Bfimpute, a Bayesian factorization imputation algorithm that reconstructs two latent gene and cell matrices to impute final gene expression matrix within each cell group, with or without the aid of cell type labels or bulk data. Bfimpute achieves better accuracy than other six publicly notable scRNA-seq imputation methods on simulated and real scRNA-seq data, as measured by several different evaluation metrics. Bfimpute can also flexibly integrate any gene or cell related information that users provide to increase the performance. Availability: Bfimpute is implemented in R and is freely available at https://github.com/maiziezhoulab/Bfimpute.


Author(s):  
Qiong Wu ◽  
Tianzhou Ma ◽  
Qingzhi Liu ◽  
Donald K Milton ◽  
Yuan Zhang ◽  
...  

Abstract Motivation The analysis of gene co-expression network (GCN) is critical in examining the gene-gene interactions and learning the underlying complex yet highly organized gene regulatory mechanisms. Numerous clustering methods have been developed to detect communities of co-expressed genes in the large network. The assumed independent community structure, however, can be oversimplified and may not adequately characterize the complex biological processes. Results We develop a new computational package to extract interconnected communities from gene co-expression network. We consider a pair of communities be interconnected if a subset of genes from one community is correlated with a subset of genes from another community. The interconnected community structure is more flexible and provides a better fit to the empirical co-expression matrix. To overcome the computational challenges, we develop efficient algorithms by leveraging advanced graph norm shrinkage approach. We validate and show the advantage of our method by extensive simulation studies. We then apply our interconnected community detection method to an RNA-seq data from The Cancer Genome Atlas (TCGA) Acute Myeloid Leukemia (AML) study and identify essential interacting biological pathways related to the immune evasion mechanism of tumor cells. Availability The software is available at Github: https://github.com/qwu1221/ICN and Figshare: https://figshare.com/articles/software/ICN-package/13229093. Supplementary information Supplementary data are available at Bioinformatics online.


2020 ◽  
Vol 2020 ◽  
pp. 1-11
Author(s):  
Mingshuang Li ◽  
Conglin Ren ◽  
Chenxia Wu ◽  
Xinyao Li ◽  
Xinyi Li ◽  
...  

Background. Acute coronary syndrome (ACS) has a high incidence and mortality rate. Early detection and intervention would provide clinical benefits. This study aimed to reveal hub genes, transcription factors (TFs), and microRNAs (miRNAs) that affect plaque stability and provide the possibility for the early diagnosis and treatment of ACS. Methods. We obtained gene expression matrix GSE19339 for ACS patients and healthy subjects from public database. The differentially expressed genes (DEGs) were screened using Limma package in R software. The biological functions of DEGs were shown by Gene Ontology (GO), Kyoto Encyclopedia of Genes and Genomes (KEGG), and Gene Set Enrichment Analysis (GSEA). Protein-protein interaction (PPI) network was mapped in Cytoscape, followed by screening of hub genes based on the Molecular Complex Detection (MCODE) plug-in. Functional Enrichment analysis tool (FunRich) and Database for Annotation, Visualization and Integrated Discovery (DAVID) were used to predict miRNAs and TFs, respectively. Finally, GSE60993 expression matrix was chosen to plot receiver operating characteristic (ROC) curves with the aim of further assessing the reliability of our findings. Results. We obtained 176 DEGs and further identified 16 hub genes by MCODE. The results of functional enrichment analysis showed that DEGs mediated inflammatory response and immune-related pathways. Among the predicted miRNAs, hsa-miR-4770, hsa-miR-5195, and hsa-miR-6088 all possessed two target genes, which might be closely related to the development of ACS. Moreover, we identified 11 TFs regulating hub gene transcriptional processes. Finally, ROC curves confirmed three genes with high confidence (area under the curve > 0.9), including VEGFA, SPP1, and VCAM1. Conclusion. This study suggests that three genes (VEGFA, SPP1, and VCAM1) were involved in the molecular mechanisms of ACS pathogenesis and could serve as biomarkers of disease progression.


2020 ◽  
Vol 36 (20) ◽  
pp. 5054-5060
Author(s):  
Xiangyu Liu ◽  
Di Li ◽  
Juntao Liu ◽  
Zhengchang Su ◽  
Guojun Li

Abstract Motivation Biclustering has emerged as a powerful approach to identifying functional patterns in complex biological data. However, existing tools are limited by their accuracy and efficiency to recognize various kinds of complex biclusters submerged in ever large datasets. We introduce a novel fast and highly accurate algorithm RecBic to identify various forms of complex biclusters in gene expression datasets. Results We designed RecBic to identify various trend-preserving biclusters, particularly, those with narrow shapes, i.e. clusters where the number of genes is larger than the number of conditions/samples. Given a gene expression matrix, RecBic starts with a column seed, and grows it into a full-sized bicluster by simply repetitively comparing real numbers. When tested on simulated datasets in which the elements of implanted trend-preserving biclusters and those of the background matrix have the same distribution, RecBic was able to identify the implanted biclusters in a nearly perfect manner, outperforming all the compared salient tools in terms of accuracy and robustness to noise and overlaps between the clusters. Moreover, RecBic also showed superiority in identifying functionally related genes in real gene expression datasets. Availability and implementation Code, sample input data and usage instructions are available at the following websites. Code: https://github.com/holyzews/RecBic/tree/master/RecBic/. Data: http://doi.org/10.5281/zenodo.3842717. Supplementary information Supplementary data are available at Bioinformatics online.


2020 ◽  
Vol 36 (11) ◽  
pp. 3588-3589 ◽  
Author(s):  
Kaiyi Zhu ◽  
Dimitris Anastassiou

Abstract Summary We developed 2DImpute, an imputation method for correcting false zeros (known as dropouts) in single-cell RNA-sequencing (scRNA-seq) data. It features preventing excessive correction by predicting the false zeros and imputing their values by making use of the interrelationships between both genes and cells in the expression matrix. We showed that 2DImpute outperforms several leading imputation methods by applying it on datasets from various scRNA-seq protocols. Availability and implementation The R package of 2DImpute is freely available at GitHub (https://github.com/zky0708/2DImpute). Contact [email protected] Supplementary information Supplementary data are available at Bioinformatics online.


Sign in / Sign up

Export Citation Format

Share Document