scholarly journals LE-MDCAP: A Computational Model to Prioritize Causal miRNA–Disease Associations

2021 ◽  
Vol 22 (24) ◽  
pp. 13607
Author(s):  
Zhou Huang ◽  
Yu Han ◽  
Leibo Liu ◽  
Qinghua Cui ◽  
Yuan Zhou

MicroRNAs (miRNAs) are associated with various complex human diseases and some miRNAs can be directly involved in the mechanisms of disease. Identifying disease-causative miRNAs can provide novel insight in disease pathogenesis from a miRNA perspective and facilitate disease treatment. To date, various computational models have been developed to predict general miRNA–disease associations, but few models are available to further prioritize causal miRNA–disease associations from non-causal associations. Therefore, in this study, we constructed a Levenshtein-Distance-Enhanced miRNA–Disease Causal Association Predictor (LE-MDCAP), to predict potential causal miRNA–disease associations. Specifically, Levenshtein distance matrixes covering the sequence, expression and functional miRNA similarities were introduced to enhance the previous Gaussian interaction profile kernel-based similarity matrix. LE-MDCAP integrated miRNA similarity matrices, disease semantic similarity matrix and known causal miRNA–disease associations to make predictions. For regular causal vs. non-disease association discrimination task, LF-MDCAP achieved area under the receiver operating characteristic curve (AUROC) of 0.911 and 0.906 in 10-fold cross-validation and independent test, respectively. More importantly, LE-MDCAP prominently outperformed the previous MDCAP model in distinguishing causal versus non-causal miRNA–disease associations (AUROC 0.820 vs. 0.695). Case studies performed on diabetic retinopathy and hsa-mir-361 also validated the accuracy of our model. In summary, LE-MDCAP could be useful for screening causal miRNA–disease associations from general miRNA–disease associations.

2019 ◽  
Vol 35 (23) ◽  
pp. 4922-4929 ◽  
Author(s):  
Zhao-Chun Xu ◽  
Peng-Mian Feng ◽  
Hui Yang ◽  
Wang-Ren Qiu ◽  
Wei Chen ◽  
...  

Abstract Motivation Dihydrouridine (D) is a common RNA post-transcriptional modification found in eukaryotes, bacteria and a few archaea. The modification can promote the conformational flexibility of individual nucleotide bases. And its levels are increased in cancerous tissues. Therefore, it is necessary to detect D in RNA for further understanding its functional roles. Since wet-experimental techniques for the aim are time-consuming and laborious, it is urgent to develop computational models to identify D modification sites in RNA. Results We constructed a predictor, called iRNAD, for identifying D modification sites in RNA sequence. In this predictor, the RNA samples derived from five species were encoded by nucleotide chemical property and nucleotide density. Support vector machine was utilized to perform the classification. The final model could produce the overall accuracy of 96.18% with the area under the receiver operating characteristic curve of 0.9839 in jackknife cross-validation test. Furthermore, we performed a series of validations from several aspects and demonstrated the robustness and reliability of the proposed model. Availability and implementation A user-friendly web-server called iRNAD can be freely accessible at http://lin-group.cn/server/iRNAD, which will provide convenience and guide to users for further studying D modification.


2019 ◽  
Vol 2019 ◽  
pp. 1-11 ◽  
Author(s):  
Han-Jing Jiang ◽  
Yu-An Huang ◽  
Zhu-Hong You

Computational drug repositioning, designed to identify new indications for existing drugs, significantly reduced the cost and time involved in drug development. Prediction of drug-disease associations is promising for drug repositioning. Recent years have witnessed an increasing number of machine learning-based methods for calculating drug repositioning. In this paper, a novel feature learning method based on Gaussian interaction profile kernel and autoencoder (GIPAE) is proposed for drug-disease association. In order to further reduce the computation cost, both batch normalization layer and the full-connected layer are introduced to reduce training complexity. The experimental results of 10-fold cross validation indicate that the proposed method achieves superior performance on Fdataset and Cdataset with the AUCs of 93.30% and 96.03%, respectively, which were higher than many previous computational models. To further assess the accuracy of GIPAE, we conducted case studies on two complex human diseases. The top 20 drugs predicted, 14 obesity-related drugs, and 11 drugs related to Alzheimer's disease were validated in the CTD database. The results of cross validation and case studies indicated that GIPAE is a reliable model for predicting drug-disease associations.


Author(s):  
Yu Zhang ◽  
Cangzhi Jia ◽  
Chee Keong Kwoh

Abstract Long noncoding RNAs (lncRNAs) play significant roles in various physiological and pathological processes via their interactions with biomolecules like DNA, RNA and protein. The existing in silico methods used for predicting the functions of lncRNA mainly rely on calculating the similarity of lncRNA or investigating whether an lncRNA can interact with a specific biomolecule or disease. In this work, we explored the functions of lncRNA from a different perspective: we presented a tool for predicting the interaction biomolecule type for a given lncRNA. For this purpose, we first investigated the main molecular mechanisms of the interactions of lncRNA–RNA, lncRNA–protein and lncRNA–DNA. Then, we developed an ensemble deep learning model: lncIBTP (lncRNA Interaction Biomolecule Type Prediction). This model predicted the interactions between lncRNA and different types of biomolecules. On the 5-fold cross-validation, the lncIBTP achieves average values of 0.7042 in accuracy, 0.7903 and 0.6421 in macro-average area under receiver operating characteristic curve and precision–recall curve, respectively, which illustrates the model effectiveness. Besides, based on the analysis of the collected published data and prediction results, we hypothesized that the characteristics of lncRNAs that interacted with DNA may be different from those that interacted with only RNA.


2020 ◽  
Author(s):  
Tian-Ru Wu ◽  
Meng-Meng Yin ◽  
Cui-Na Jiao ◽  
Ying-Lian Gao ◽  
Xiang-Zhen Kong ◽  
...  

Abstract Background: microRNAs (miRNAs) are non-coding RNAs with regulatory functions. Many studies have shown that miRNAs are closely associated with human diseases. Among the methods to explore the relationship between the miRNA and the disease, traditional methods are time-consuming and the accuracy needs to be improved. In view of the shortcoming of previous models, a collaborative matrix factorization based on matrix completion (MCCMF) is proposed to predict the unknown miRNA-disease associations.Results: The complete matrix of the miRNA and the disease is obtained by matrix completion. Moreover, Gaussian Interaction Profile (GIP) kernel is added to the miRNA functional similarity matrix and the disease semantic similarity matrix to form the GIP kernel similarity matrix. Then the Weight K Nearest Known Neighbors (WKNKN) method is used to pretreat the association matrix, so the model is close to the reality. Finally, collaborative matrix factorization (CMF) method is applied to obtain the prediction results. Therefore, the MCCMF obtains a satisfactory result in the five-fold cross-validation, with an AUC of 0.9569(0.0005).Conclusions: The AUC value of MCCMF is higher than other advanced methods in the 5-fold cross validation experiment. In order to comprehensively evaluate the performance of MCCMF, accuracy, precision, recall and f-measure are also added. The final experimental results demonstrate that MCCMF outperforms other methods in predicting miRNA-disease associations. In the end, the effectiveness and practicability of MCCMF are further verified by researching three specific diseases.


2020 ◽  
Author(s):  
Tian-Ru Wu ◽  
Meng-Meng Yin ◽  
Cui-Na Jiao ◽  
Ying-Lian Gao ◽  
Xiang-Zhen Kong ◽  
...  

Abstract Background: MicroRNAs (MiRNAs) are non-coding RNAs with regulatory functions. Many studies have shown that miRNAs are closely associated with human diseases. Among the methods to explore the relationship between the miRNA and the disease, traditional methods are time-consuming and the accuracy needs to be improved. In view of the shortcoming of previous models, a collaborative matrix factorization based on matrix completion (MCCMF) is proposed to predict the unknown miRNA-disease associations.Results: The complete matrix of the miRNA and the disease is obtained by matrix completion. Moreover, Gaussian Interaction Profile (GIP) kernel is added to the miRNA functional similarity matrix and the disease semantic similarity matrix to form the GIP kernel similarity matrix. Then the Weight K Nearest Known Neighbors (WKNKN) method is used to pretreat the association matrix, so the model is close to the reality. Finally, collaborative matrix factorization (CMF) method is applied to obtain the prediction results. Therefore, the MCCMF obtains a satisfactory result in the five-fold cross-validation, with an AUC of 0.9569(0.0005). Conclusions: The AUC value of MCCMF is higher than other advanced methods in the 5-fold cross validation experiment. In order to comprehensively evaluate the performance of MCCMF, f-measure and other evaluation indexes are also added. The final experimental results demonstrate that MCCMF outperforms other methods in prediction miRNA-disease associations. In the end, the effectiveness and practicability of MCCMF are further verified by researching three specific diseases.


2020 ◽  
Author(s):  
Tian-Ru Wu ◽  
Meng-Meng Yin ◽  
Cui-Na Jiao ◽  
Jin-Xing Liu ◽  
Ying-Lian Gao ◽  
...  

Abstract Background: MicroRNAs (MiRNAs) are non-coding RNAs with regulatory functions. Many studies have shown that miRNAs are closely associated with human diseases. Among the methods to explore the relationship between the miRNA and the disease, traditional methods are time-consuming and the accuracy needs to be improved. In view of the shortcoming of previous models, a collaborative matrix factorization based on matrix completion (MCCMF) is proposed to predict the unknown miRNA-disease associations. Results: The complete matrix of the miRNA and the disease is obtained by matrix completion. Moreover, Gaussian Interaction Profile (GIP) kernel is added to the miRNA functional similarity matrix and the disease semantic similarity matrix to form the GIP kernel similarity matrix. Then the Weight K Nearest Known Neighbors (WKNKN) method is used to pretreat the association matrix, so the model is close to the reality. Finally, collaborative matrix factorization (CMF) method is applied to obtain the prediction results. Therefore, the MCCMF obtains a satisfactory result in the five-fold cross-validation, with an AUC of 0.9569(0.0005). Conclusions: The AUC value of MCCMF is higher than other advanced methods in the 5-fold cross validation experiment. In order to comprehensively evaluate the performance of MCCMF, f-measure and other evaluation indexes are also added. The final experimental results demonstrate that MCCMF outperforms other methods in prediction miRNA-disease associations. In the end, the effectiveness and practicability of MCCMF are further verified by researching three specific diseases.


2020 ◽  
Author(s):  
Tian-Ru Wu ◽  
Meng-Meng Yin ◽  
Cui-Na Jiao ◽  
Ying-Lian Gao ◽  
Xiang-Zhen Kong ◽  
...  

Abstract Background: microRNAs (miRNAs) are non-coding RNAs with regulatory functions. Many studies have shown that miRNAs are closely associated with human diseases. Among the methods to explore the relationship between the miRNA and the disease, traditional methods are time-consuming and the accuracy needs to be improved. In view of the shortcoming of previous models, a method, collaborative matrix factorization based on matrix completion (MCCMF) is proposed to predict the unknown miRNA-disease associations.Results: The complete matrix of the miRNA and the disease is obtained by matrix completion. Moreover, Gaussian Interaction Profile (GIP) kernel is added to the miRNA functional similarity matrix and the disease semantic similarity matrix. Then the Weight K Nearest Known Neighbors (WKNKN) method is used to pretreat the association matrix, so the model is close to the reality. Finally, collaborative matrix factorization (CMF) method is applied to obtain the prediction results. Therefore, the MCCMF obtains a satisfactory result in the five-fold cross-validation, with an AUC of 0.9569(0.0005).Conclusions: The AUC value of MCCMF is higher than other advanced methods in the 5-fold cross validation experiment. In order to comprehensively evaluate the performance of MCCMF, accuracy, precision, recall and f-measure are also added. The final experimental results demonstrate that MCCMF outperforms other methods in predicting miRNA-disease associations. In the end, the effectiveness and practicability of MCCMF are further verified by researching three specific diseases.


2017 ◽  
Author(s):  
Ashley I. Naimi ◽  
Laura B. Balzer

AbstractStacked generalization is an ensemble method that allows researchers to combine several different prediction algorithms into one. Since its introduction in the early 1990s, the method has evolved several times into what is now known as “Super Learner”. Super Learner uses V -fold cross-validation to build the optimal weighted combination of predictions from a library of candidate algorithms. Optimality is defined by a user-specified objective function, such as minimizing mean squared error or maximizing the area under the receiver operating characteristic curve. Although relatively simple in nature, use of the Super Learner by epidemiologists has been hampered by limitations in understanding conceptual and technical details. We work step-by-step through two examples to illustrate concepts and address common concerns.


Author(s):  
Felipe Guimarães Teixeira ◽  
Paulo Tadeu Cardozo Ribeiro Rosa ◽  
Roger Gomes Tavares Mello ◽  
Jurandir Nadal

Purpose: The study aimed to identify the variables that differentiate judo athletes at national and regional levels. Multivariable analysis was applied to biomechanical, anthropometric, and Special Judo Fitness Test (SJFT) data. Method: Forty-two male judo athletes from 2 competitive groups (14 national and 28 state levels) performed the following measurements and tests: (1) skinfold thickness, (2) circumference, (3) bone width, (4) longitudinal length, (5) stabilometric tests, (6) dynamometric tests, and (7) SJFT. The variables with significant differences in the Wilcoxon rank-sum test were used in stepwise logistic regression to select those that better separate the groups. The authors considered models with a maximum of 3 variables to avoid overfitting. They used 7-fold cross validation to calculate optimism-corrected measures of model performance. Results: The 3 variables that best differentiated the groups were the epicondylar humerus width, the total number of throws on the SJFT, and the stabilometric mean velocity of the center of pressure in the mediolateral direction. The area under the receiver-operating-characteristic curve for the model (based on 7-fold cross validation) was 0.95. Conclusion: This study suggests that a reduced set of anthropometric, biomechanical, and SJFT variables can differentiate judo athlete’s levels.


Author(s):  
M. Adnan Nur

Pada analisis sentimen pengguna twitter dibutuhkan tahap preprocessing sebelum mengklasifikasikan sentimen. Preprocessing digunakan untuk menyaring kata yang dianggap perlu untuk kebutuhan klasifikasi. Kesalahan penulisan pada tweet merupakan suatu permasalahan dalam tahap preprocessing yang tentunya mempengaruhi tingkat akurasi klasifikasi. Berdasarkan hal tersebut dibutuhkan proses tambahan pada preprocessing untuk melakukan koreksi kesalahan penulisan kata. Pada penelitian ini, penulis membandingkan kinerja metode  levenshtein distance dan jaro-winkler distance dalam melakukan koreksi kesalahan penulisan kata. Penelitian ini diawali dengan melakukan survei literatur untuk mengidentifikasi masalah. Selanjutnya melakukan studi pustaka untuk menentukan objek dan parameter yang dibutuhkan dalam merancang dan memodelkan data serta perangkat lunak. Perangkat lunak dikembangkan menggunakan bahasa pemrograman python dengan beberapa library sastrawi, levenshtein, pyjarowinkler dan sklearn. Perangkat lunak ini dibangun untuk memudahkan dalam melihat kinerja metode yang digunakan. Pengujian dilakukan menggunakan confusion matrix dengan 10 fold cross validation. Pengujian melibatkan pengukuran kinerja levenshtein distance jika ditempatkan sebelum dan sesudah proses stemming. Begitupula untuk  metode jaro-winkler distance juga ditempatkan sebelum dan sesudah proses stemming dalam preprocessing. Dari hasil pengujian diperoleh nilai accuracy, recall dan f1score dari metode levenshtein distance lebih baik dibandingkan jaro-winkler distance. Penerapan koreksi kata dengan metode levenshtein distance juga meningkatkan accuracy, recall dan f1score jika dibandingkan tanpa koreksi kata pada preprocessing. Penempatan koreksi kata pada tahap preprocessing dari hasil pengujian menunjukan posisi setelah proses stemming lebih baik dari penempatan koreksi kata sebelum proses stemming


Sign in / Sign up

Export Citation Format

Share Document