scholarly journals A Non-negative Matrix Factorization Based Method for Identifying Essential Proteins

Author(s):  
Zhihong Zhang ◽  
Sai Hu ◽  
Wei Yan ◽  
Bihai Zhao ◽  
Lei Wang

Abstract BackgroundIdentification of essential proteins is very important for understanding the basic requirements to sustain a living organism. In recent years, various different computational methods have been proposed to identify essential proteins based on protein-protein interaction (PPI) networks. However, there has been reliable evidence that a huge amount of false negatives and false positives exist in PPI data. Therefore, it is necessary to reduce the influence of false data on accuracy of essential proteins prediction by integrating multi-source biological information with PPI networks.ResultsIn this paper, we proposed a non-negative matrix factorization and multiple biological information based model (NDM) for identifying essential proteins. The first stage in this progress was to construct a weighted PPI network by combing the information of protein domain, protein complex and the topology characteristic of the original PPI network. Then, the non-negative matrix factorization technique was used to reconstruct an optimized PPI network with whole enough weight of edges. In the final stage, the ranking score of each protein was computed by the PageRank algorithm in which the initial scores were calculated with homologous and subcellular localization information. In order to verify the effectiveness of the NDM method, we compared the NDM with other state-of-the-art essential proteins prediction methods. The comparison of the results obtained from different methods indicated that our NDM model has better performance in predicting essential proteins.ConclusionEmploying the non-negative matrix factorization and integrating multi-source biological data can effectively improve quality of the PPI network, which resulted in the led to optimization of the performance essential proteins identification. This will also provide a new perspective for other prediction based on protein-protein interaction networks.

2021 ◽  
Vol 12 ◽  
Author(s):  
Zhihong Zhang ◽  
Meiping Jiang ◽  
Dongjie Wu ◽  
Wang Zhang ◽  
Wei Yan ◽  
...  

Identification of essential proteins is very important for understanding the basic requirements to sustain a living organism. In recent years, there has been an increasing interest in using computational methods to predict essential proteins based on protein–protein interaction (PPI) networks or fusing multiple biological information. However, it has been observed that existing PPI data have false-negative and false-positive data. The fusion of multiple biological information can reduce the influence of false data in PPI, but inevitably more noise data will be produced at the same time. In this article, we proposed a novel non-negative matrix tri-factorization (NMTF)-based model (NTMEP) to predict essential proteins. Firstly, a weighted PPI network is established only using the topology features of the network, so as to avoid more noise. To reduce the influence of false data (existing in PPI network) on performance of identify essential proteins, the NMTF technique, as a widely used recommendation algorithm, is performed to reconstruct a most optimized PPI network with more potential protein–protein interactions. Then, we use the PageRank algorithm to compute the final ranking score of each protein, in which subcellular localization and homologous information of proteins were used to calculate the initial scores. In addition, extensive experiments are performed on the publicly available datasets and the results indicate that our NTMEP model has better performance in predicting essential proteins against the start-of-the-art method. In this investigation, we demonstrated that the introduction of non-negative matrix tri-factorization technology can effectively improve the condition of the protein–protein interaction network, so as to reduce the negative impact of noise on the prediction. At the same time, this finding provides a more novel angle of view for other applications based on protein–protein interaction networks.


F1000Research ◽  
2017 ◽  
Vol 6 ◽  
pp. 1969
Author(s):  
Dongmin Jung ◽  
Xijin Ge

Interactions between proteins occur in many, if not most, biological processes. This fact has motivated the development of a variety of experimental methods for the identification of protein-protein interaction (PPI) networks. Leveraging PPI data available STRING database, we use network-based statistical learning methods to infer the putative functions of proteins from the known functions of neighboring proteins on a PPI network. This package identifies such proteins often involved in the same or similar biological functions. The package is freely available at the Bioconductor web site (http://bioconductor.org/packages/PPInfer/).


F1000Research ◽  
2018 ◽  
Vol 6 ◽  
pp. 1969 ◽  
Author(s):  
Dongmin Jung ◽  
Xijin Ge

Interactions between proteins occur in many, if not most, biological processes. This fact has motivated the development of a variety of experimental methods for the identification of protein-protein interaction (PPI) networks. Leveraging PPI data available in the STRING database, we use a network-based statistical learning methods to infer the putative functions of proteins from the known functions of neighboring proteins on a PPI network. This package identifies such proteins often involved in the same or similar biological functions. The package is freely available at the Bioconductor web site (http://bioconductor.org/packages/PPInfer/).


2014 ◽  
Vol 934 ◽  
pp. 159-164
Author(s):  
Yun Yuan Dong ◽  
Xian Chun Zhang

Protein-protein interaction (PPI) networks provide a simplified overview of the web of interactions that take place inside a cell. According to the centrality-lethality rule, hub proteins (proteins with high degree) tend to be essential in the PPI network. Moreover, there are also many low degree proteins in the PPI network, but they have different lethality. Some of them are essential proteins (essential-nonhub proteins), and the others are not (nonessential-nonhub proteins). In order to explain why nonessential-nonhub proteins don’t have essentiality, we propose a new measure n-iep (the number of essential neighbors) and compare nonessential-nonhub proteins with essential-nonhub proteins from topological, evolutionary and functional view. The comparison results show that there are statistical differences between nonessential-nonhub proteins and essential-nonhub proteins in centrality measures, clustering coefficient, evolutionary rate and the number of essential neighbors. These are reasons why nonessential-nonhub proteins don’t have lethality.


2014 ◽  
Vol 22 (03) ◽  
pp. 339-351 ◽  
Author(s):  
JIAWEI LUO ◽  
NAN ZHANG

Essential proteins are important for the survival and development of organisms. Lots of centrality algorithms based on network topology have been proposed to detect essential proteins and achieve good results. However, most of them only focus on the network topology, but ignore the false positive (FP) interactions in protein–protein interaction (PPI) network. In this paper, gene ontology (GO) information is proposed to measure the reliability of the edges in PPI network and we propose a novel algorithm for identifying essential proteins, named EGC algorithm. EGC algorithm integrates topology character of PPI network and GO information. To validate the performance of EGC algorithm, we use EGC and other nine methods (DC, BC, CC, SC, EC, LAC, NC, PEC and CoEWC) to identify the essential proteins in the two different yeast PPI networks: DIP and MIPS. The results show that EGC is better than the other nine methods, which means adding GO information can help in predicting essential proteins.


2020 ◽  
Author(s):  
Brennan Klein ◽  
Ludvig Holmér ◽  
Keith M. Smith ◽  
Mackenzie M. Johnson ◽  
Anshuman Swain ◽  
...  

AbstractProtein-protein interaction (PPI) networks represent complex intra-cellular protein interactions, and the presence or absence of such interactions can lead to biological changes in an organism. Recent network-based approaches have shown that a phenotype’s PPI network’s resilience to environmental perturbations is related to its placement in the tree of life; though we still do not know how or why certain intra-cellular factors can bring about this resilience. One such factor is gene expression, which controls the simultaneous presence of proteins for allowed extant interactions and the possibility of novel associations. Here, we explore the influence of gene expression and network properties on a PPI network’s resilience, focusing especially on ribosomal proteins—vital molecular-complexes involved in protein synthesis, which have been extensively and reliably mapped in many species. Using publicly-available data of ribosomal PPIs for E. coli, S.cerevisae, and H. sapiens, we compute changes in network resilience as new nodes (proteins) are added to the networks under three node addition mechanisms—random, degree-based, and gene-expression-based attachments. By calculating the resilience of the resulting networks, we estimate the effectiveness of these node addition mechanisms. We demonstrate that adding nodes with gene-expression-based preferential attachment (as opposed to random or degree-based) preserves and can increase the original resilience of PPI network. This holds in all three species regardless of their distributions of gene expressions or their network community structure. These findings introduce a general notion of prospective resilience, which highlights the key role of network structures in understanding the evolvability of phenotypic traits.1Author SummaryProteins in organismal cells are present at different levels of concentration and interact with other proteins to provide specific functional roles. Accumulating lists of all of these interactions, complex networks of protein interactions become apparent. This allows us to begin asking whether there are network-level mechanisms at play guiding the evolution of biological systems. Here, using this network perspective, we address two important themes in evolutionary biology (i) How are biological systems able to successfully incorporate novelty? (ii) What is the evolutionary role of biological noise in evolutionary novelty? We consider novelty to be the introduction of a new protein, represented as a new “node”, into a network. We simulate incorporation of novel proteins into Protein-Protein Interaction (PPI) networks in different ways and analyse how the resilience of the PPI network alters. We find that novel interactions guided by gene expression (indicative of concentration levels of proteins) creates a more resilient network than either uniformly random interactions or interactions guided solely by the network structure (preferential attachment). Moreover, simulated biological noise in the gene expression increases network resilience. We suggest that biological noise induces novel structure in the PPI network which has the effect of making it more resilient.


2021 ◽  
Vol 29 (2) ◽  
Author(s):  
Soheir Noori ◽  
Nabeel Al-A’araji ◽  
Eman Al-Shamery

Defining protein complexes by analysing the protein–protein interaction (PPI) networks is a crucial task in understanding the principles of a biological cell. In the last few decades, researchers have proposed numerous methods to explore the topological structure of a PPI network to detect dense protein complexes. In this paper, the overlapping protein complexes with different densities are predicted within an acceptable execution time using seed expanding model and topological structure of the PPI network (SETS). SETS depend on the relation between the seed and its neighbours. The algorithm was compared with six algorithms on six datasets: five for yeast and one for human. The results showed that SETS outperformed other algorithms in terms of F-measure, coverage rate and the number of complexes that have high similarity with real complexes.


Genes ◽  
2019 ◽  
Vol 10 (2) ◽  
pp. 177 ◽  
Author(s):  
Xiujuan Lei ◽  
Siguo Wang ◽  
Fang-Xiang Wu

Essential proteins are critical to the development and survival of cells. Identifying and analyzing essential proteins is vital to understand the molecular mechanisms of living cells and design new drugs. With the development of high-throughput technologies, many protein–protein interaction (PPI) data are available, which facilitates the studies of essential proteins at the network level. Up to now, although various computational methods have been proposed, the prediction precision still needs to be improved. In this paper, we propose a novel method by applying Hyperlink-Induced Topic Search (HITS) on weighted PPI networks to detect essential proteins, named HSEP. First, an original undirected PPI network is transformed into a bidirectional PPI network. Then, both biological information and network topological characteristics are taken into account to weighted PPI networks. Pieces of biological information include gene expression data, Gene Ontology (GO) annotation and subcellular localization. The edge clustering coefficient is represented as network topological characteristics to measure the closeness of two connected nodes. We conducted experiments on two species, namely Saccharomyces cerevisiae and Drosophila melanogaster, and the experimental results show that HSEP outperformed some state-of-the-art essential proteins detection techniques.


2014 ◽  
Vol 2014 ◽  
pp. 1-9 ◽  
Author(s):  
Qiguo Dai ◽  
Maozu Guo ◽  
Yingjie Guo ◽  
Xiaoyan Liu ◽  
Yang Liu ◽  
...  

Protein complex formed by a group of physical interacting proteins plays a crucial role in cell activities. Great effort has been made to computationally identify protein complexes from protein-protein interaction (PPI) network. However, the accuracy of the prediction is still far from being satisfactory, because the topological structures of protein complexes in the PPI network are too complicated. This paper proposes a novel optimization framework to detect complexes from PPI network, named PLSMC. The method is on the basis of the fact that if two proteins are in a common complex, they are likely to be interacting. PLSMC employs this relation to determine complexes by a penalized least squares method. PLSMC is applied to several public yeast PPI networks, and compared with several state-of-the-art methods. The results indicate that PLSMC outperforms other methods. In particular, complexes predicted by PLSMC can match known complexes with a higher accuracy than other methods. Furthermore, the predicted complexes have high functional homogeneity.


Sign in / Sign up

Export Citation Format

Share Document