Inference of an Integrative, Executable Network for Rheumatoid Arthritis Combining Data-Driven Machine Learning Approaches and a State-of-the-Art Mechanistic Disease Map

Quentin Miagoux; Vidisha Singh; Dereck de Mézquita; Valerie Chaudru; Mohamed Elati; Elisabeth Petit-Teixeira; Anna Niarakis

doi:10.3390/jpm11080785

Inference of an Integrative, Executable Network for Rheumatoid Arthritis Combining Data-Driven Machine Learning Approaches and a State-of-the-Art Mechanistic Disease Map

Journal of Personalized Medicine ◽

10.3390/jpm11080785 ◽

2021 ◽

Vol 11 (8) ◽

pp. 785

Author(s):

Quentin Miagoux ◽

Vidisha Singh ◽

Dereck de Mézquita ◽

Valerie Chaudru ◽

Mohamed Elati ◽

...

Keyword(s):

Rheumatoid Arthritis ◽

Machine Learning ◽

State Of The Art ◽

Biological Information ◽

Response To Treatment ◽

Patient Specific ◽

Learning Approaches ◽

Data Types ◽

Disease Heterogeneity ◽

Combining Data

Rheumatoid arthritis (RA) is a multifactorial, complex autoimmune disease that involves various genetic, environmental, and epigenetic factors. Systems biology approaches provide the means to study complex diseases by integrating different layers of biological information. Combining multiple data types can help compensate for missing or conflicting information and limit the possibility of false positives. In this work, we aim to unravel mechanisms governing the regulation of key transcription factors in RA and derive patient-specific models to gain more insights into the disease heterogeneity and the response to treatment. We first use publicly available transcriptomic datasets (peripheral blood) relative to RA and machine learning to create an RA-specific transcription factor (TF) co-regulatory network. The TF cooperativity network is subsequently enriched in signalling cascades and upstream regulators using a state-of-the-art, RA-specific molecular map. Then, the integrative network is used as a template to analyse patients’ data regarding their response to anti-TNF treatment and identify master regulators and upstream cascades affected by the treatment. Finally, we use the Boolean formalism to simulate in silico subparts of the integrated network and identify combinations and conditions that can switch on or off the identified TFs, mimicking the effects of single and combined perturbations.

Download Full-text

Inference of an integrative network for Rheumatoid Arthritis combining data-driven machine learning approaches and a state-of-the-art mechanistic disease map

10.1101/2021.01.28.428679 ◽

2021 ◽

Author(s):

Quentin Miagoux ◽

Dereck de Mezquita ◽

Vidisha Singh ◽

Valerie Chaudru ◽

Mohamed Elati ◽

...

Keyword(s):

Rheumatoid Arthritis ◽

Machine Learning ◽

State Of The Art ◽

Biological Data ◽

Patient Specific ◽

Combining Data ◽

Shiny App ◽

Signalling Cascades ◽

Key Genes ◽

Integrative Network

MotivationRheumatoid arthritis (RA) is a multifactorial autoimmune disease that causes chronic inflammation of the joints. RA is considered a complex disease as it involves various genetic, environmental, and epigenetic factors. Systems biology approaches provide the means to study complex diseases by integrating different layers of biological information. Combining multiple data types can help compensate for missing or conflicting information and limit the possibility of false positives. In this approach, we integrate three different biological layers (gene expression, signalling cascades, mutations), obtained by bottom-up and top-down methods to build an integrative, disease-specific network. The goal behind this endeavour is to see if we can unravel mechanisms governing the regulation of key genes identified as mutation carriers in RA and derive patient-specific models to gain more insights into the disease heterogeneity.ResultsIn this work, we combine biological data relevant to Rheumatoid Arthritis, in the form of a global, integrative network. We first make use of publicly available transcriptomic datasets (peripheral blood) relative to RA and machine learning to create an RA specific transcription factor (TF) co-regulatory network. The TF cooperativity network is subsequently enriched in signalling cascades and upstream regulators using prior knowledge encoded in a state-of-the-art, RA-specific molecular map. Lastly, a list of RA specific variants highlights key genes associated with known disease mutations.AvailabilityDatasets used for the analysis are publicly available. All scripts used to generate results and the Shiny app will be freely accessible after peer-reviewed publication.

Download Full-text

Supervised learning for the detection of negation and of its scope in French and Brazilian Portuguese biomedical corpora

Natural Language Engineering ◽

10.1017/s1351324920000352 ◽

2020 ◽

pp. 1-21 ◽

Cited By ~ 2

Author(s):

Clément Dalloux ◽

Vincent Claveau ◽

Natalia Grabar ◽

Lucas Emanuel Silva Oliveira ◽

Claudia Maria Cabral Moro ◽

...

Keyword(s):

Machine Learning ◽

Information Extraction ◽

State Of The Art ◽

Automatic Detection ◽

Brazilian Portuguese ◽

Supervised Machine Learning ◽

Biomedical Domain ◽

Learning Approaches ◽

Cross Domain ◽

Automatic Methods

Abstract Automatic detection of negated content is often a prerequisite in information extraction systems in various domains. In the biomedical domain especially, this task is important because negation plays an important role. In this work, two main contributions are proposed. First, we work with languages which have been poorly addressed up to now: Brazilian Portuguese and French. Thus, we developed new corpora for these two languages which have been manually annotated for marking up the negation cues and their scope. Second, we propose automatic methods based on supervised machine learning approaches for the automatic detection of negation marks and of their scopes. The methods show to be robust in both languages (Brazilian Portuguese and French) and in cross-domain (general and biomedical languages) contexts. The approach is also validated on English data from the state of the art: it yields very good results and outperforms other existing approaches. Besides, the application is accessible and usable online. We assume that, through these issues (new annotated corpora, application accessible online, and cross-domain robustness), the reproducibility of the results and the robustness of the NLP applications will be augmented.

Download Full-text

Learning fair models and representations

Intelligenza Artificiale ◽

10.3233/ia-190034 ◽

2020 ◽

Vol 14 (1) ◽

pp. 151-178

Author(s):

Luca Oneto

Keyword(s):

Machine Learning ◽

Social Services ◽

Ethical Issues ◽

State Of The Art ◽

Online Advertising ◽

Radical Change ◽

Data Representation ◽

Learning Approaches ◽

Central Question ◽

Disparate Treatment

Machine learning based systems and products are reaching society at large in many aspects of everyday life, including financial lending, online advertising, pretrial and immigration detention, child maltreatment screening, health care, social services, and education. This phenomenon has been accompanied by an increase in concern about the ethical issues that may rise from the adoption of these technologies. In response to this concern, a new area of machine learning has recently emerged that studies how to address disparate treatment caused by algorithmic errors and bias in the data. The central question is how to ensure that the learned model does not treat subgroups in the population unfairly. While the design of solutions to this issue requires an interdisciplinary effort, fundamental progress can only be achieved through a radical change in the machine learning paradigm. In this work, we will describe the state of the art on algorithmic fairness using statistical learning theory, machine learning, and deep learning approaches that are able to learn fair models and data representation.

Download Full-text

From Characters to Time Intervals: New Paradigms for Evaluation and Neural Parsing of Time Normalizations

Transactions of the Association for Computational Linguistics ◽

10.1162/tacl_a_00025 ◽

2018 ◽

Vol 6 ◽

pp. 343-356 ◽

Cited By ~ 2

Author(s):

Egoitz Laparra ◽

Dongfang Xu ◽

Steven Bethard

Keyword(s):

Neural Network ◽

Machine Learning ◽

Comparative Analysis ◽

State Of The Art ◽

Learning Approaches ◽

Semantic Parsing ◽

Time Intervals ◽

Semantic Composition ◽

Previous State ◽

New Scoring

This paper presents the first model for time normalization trained on the SCATE corpus. In the SCATE schema, time expressions are annotated as a semantic composition of time entities. This novel schema favors machine learning approaches, as it can be viewed as a semantic parsing task. In this work, we propose a character level multi-output neural network that outperforms previous state-of-the-art built on the TimeML schema. To compare predictions of systems that follow both SCATE and TimeML, we present a new scoring metric for time intervals. We also apply this new metric to carry out a comparative analysis of the annotations of both schemes in the same corpus.

Download Full-text

Comparing Deep-Learning Architectures and Traditional Machine-Learning Approaches for Satire Identification in Spanish Tweets

Mathematics ◽

10.3390/math8112075 ◽

2020 ◽

Vol 8 (11) ◽

pp. 2075

Author(s):

Óscar Apolinario-Arzube ◽

José Antonio García-Díaz ◽

José Medina-Moreira ◽

Harry Luna-Aveiga ◽

Rafael Valencia-García

Keyword(s):

Machine Learning ◽

Deep Learning ◽

User Interfaces ◽

State Of The Art ◽

Learning Approaches ◽

Word Embeddings ◽

Linguistic Features ◽

Intended Meaning ◽

Language User ◽

Learning Architectures

Automatic satire identification can help to identify texts in which the intended meaning differs from the literal meaning, improving tasks such as sentiment analysis, fake news detection or natural-language user interfaces. Typically, satire identification is performed by training a supervised classifier for finding linguistic clues that can determine whether a text is satirical or not. For this, the state-of-the-art relies on neural networks fed with word embeddings that are capable of learning interesting characteristics regarding the way humans communicate. However, as far as our knowledge goes, there are no comprehensive studies that evaluate these techniques in Spanish in the satire identification domain. Consequently, in this work we evaluate several deep-learning architectures with Spanish pre-trained word-embeddings and compare the results with strong baselines based on term-counting features. This evaluation is performed with two datasets that contain satirical and non-satirical tweets written in two Spanish variants: European Spanish and Mexican Spanish. Our experimentation revealed that term-counting features achieved similar results to deep-learning approaches based on word-embeddings, both outperforming previous results based on linguistic features. Our results suggest that term-counting features and traditional machine learning models provide competitive results regarding automatic satire identification, slightly outperforming state-of-the-art models.

Download Full-text

Machine learning-based analysis of multi-omics data on the cloud for investigating gene regulations

Briefings in Bioinformatics ◽

10.1093/bib/bbaa032 ◽

2020 ◽

Cited By ~ 2

Author(s):

Minsik Oh ◽

Sungjoon Park ◽

Sun Kim ◽

Heejoon Chae

Keyword(s):

Machine Learning ◽

Gene Regulation ◽

State Of The Art ◽

Patient Specific ◽

Specific Gene ◽

Omics Data ◽

Gene Expressions ◽

Learning Methods ◽

Machine Learning Methods ◽

Disease Subtype

Abstract Gene expressions are subtly regulated by quantifiable measures of genetic molecules such as interaction with other genes, methylation, mutations, transcription factor and histone modifications. Integrative analysis of multi-omics data can help scientists understand the condition or patient-specific gene regulation mechanisms. However, analysis of multi-omics data is challenging since it requires not only the analysis of multiple omics data sets but also mining complex relations among different genetic molecules by using state-of-the-art machine learning methods. In addition, analysis of multi-omics data needs quite large computing infrastructure. Moreover, interpretation of the analysis results requires collaboration among many scientists, often requiring reperforming analysis from different perspectives. Many of the aforementioned technical issues can be nicely handled when machine learning tools are deployed on the cloud. In this survey article, we first survey machine learning methods that can be used for gene regulation study, and we categorize them according to five different goals: gene regulatory subnetwork discovery, disease subtype analysis, survival analysis, clinical prediction and visualization. We also summarize the methods in terms of multi-omics input types. Then, we explain why the cloud is potentially a good solution for the analysis of multi-omics data, followed by a survey of two state-of-the-art cloud systems, Galaxy and BioVLAB. Finally, we discuss important issues when the cloud is used for the analysis of multi-omics data for the gene regulation study.

Download Full-text

Predicting Heritability of Oil Palm Breeding Using Phenotypic Traits and Machine Learning

Sustainability ◽

10.3390/su132212613 ◽

2021 ◽

Vol 13 (22) ◽

pp. 12613

Author(s):

Najihah Ahmad Latif ◽

Fatini Nadhirah Mohd Nain ◽

Nurul Hashimah Ahamed Hassain Malim ◽

Rosni Abdullah ◽

Muhammad Farid Abdul Rahim ◽

...

Keyword(s):

Machine Learning ◽

Conceptual Framework ◽

Oil Palm ◽

Crop Yields ◽

Phenotypic Traits ◽

Learning Approaches ◽

Data Types ◽

Phenotypic Data ◽

Fresh Fruit Bunch ◽

The Sustainable Development

Oil palm is one of the main crops grown to help achieve sustainability in Malaysia. The selection of the best breeds will produce quality crops and increase crop yields. This study aimed to examine machine learning (ML) in oil palm breeding (OPB) using factors other than genetic data. A new conceptual framework to adopt the ML in OPB will be presented at the end of this paper. At first, data types, phenotype traits, current ML models, and evaluation technique will be identified through a literature survey. This study found that the phenotype and genotype data are widely used in oil palm breeding programs. The average bunch weight, bunch number, and fresh fruit bunch are the most important characteristics that can influence the genetic improvement of progenies. Although machine learning approaches have been applied to increase the productivity of the crop, most studies focus on molecular markers or genotypes for plant breeding, rather than on phenotype. Theoretically, the use of phenotypic data related to offspring should predict high breeding values by using ML. Therefore, a new ML conceptual framework to study the phenotype and progeny data of oil palm breeds will be discussed in relation to achieving the Sustainable Development Goals (SDGs).

Download Full-text

Deep Machine Learning provides state-of-the-art performance in image-based plant phenotyping

10.1101/053033 ◽

2016 ◽

Cited By ~ 12

Author(s):

Michael P. Pound ◽

Alexandra J. Burgess ◽

Michael H. Wilson ◽

Jonathan A. Atkinson ◽

Marcus Griffiths ◽

...

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Data Analysis ◽

Paradigm Shift ◽

State Of The Art ◽

Plant Phenotyping ◽

Learning Approaches ◽

Challenging Problem ◽

Feature Identification ◽

Art Performance

AbstractDeep learning is an emerging field that promises unparalleled results on many data analysis problems. We show the success offered by such techniques when applied to the challenging problem of image-based plant phenotyping, and demonstrate state-of-the-art results for root and shoot feature identification and localisation. We predict a paradigm shift in image-based phenotyping thanks to deep learning approaches.

Download Full-text

EEG Based Eye State Classification using Deep Belief Network and Stacked AutoEncoder

International Journal of Electrical and Computer Engineering (IJECE) ◽

10.11591/ijece.v6i6.12967 ◽

2016 ◽

Vol 6 (6) ◽

pp. 3131 ◽

Cited By ~ 4

Author(s):

Sanam Narejo ◽

Eros Pasero ◽

Farzana Kulsoom

Keyword(s):

Machine Learning ◽

State Of The Art ◽

Deep Belief Network ◽

Machine Learning Algorithms ◽

Learning Approaches ◽

Alternative Communication ◽

Eeg Signals ◽

Communication Interface ◽

Belief Network ◽

State Classification

<p>A Brain-Computer Interface (BCI) provides an alternative communication interface between the human brain and a computer. The Electroencephalogram (EEG) signals are acquired, processed and machine learning algorithms are further applied to extract useful information. During EEG acquisition, artifacts are induced due to involuntary eye movements or eye blink, casting adverse effects on system performance. The aim of this research is to predict eye states from EEG signals using Deep learning architectures and present improved classifier models. Recent studies reflect that Deep Neural Networks are trending state of the art Machine learning approaches. Therefore, the current work presents the implementation of Deep Belief Network (DBN) and Stacked AutoEncoders (SAE) as Classifiers with encouraging performance accuracy. One of the designed SAE models outperforms the performance of DBN and the models presented in existing research by an impressive error rate of 1.1% on the test set bearing accuracy of 98.9%. The findings in this study, may provide a contribution towards the state of the art performance on the problem of EEG based eye state classification.</p>

Download Full-text

P17 Prediction of response of methotrexate in patients with rheumatoid arthritis using serum lipidomics

Rheumatology ◽

10.1093/rheumatology/keaa111.016 ◽

2020 ◽

Vol 59 (Supplement_2) ◽

Author(s):

Mateusz Maciejewski ◽

Caroline Sands ◽

Nisha Nair ◽

Stephanie Ling ◽

Suzanne Verstappen ◽

...

Keyword(s):

Rheumatoid Arthritis ◽

Machine Learning ◽

Serum Lipid ◽

Stock Options ◽

State Of The Art ◽

Lipid Levels ◽

Eular Response ◽

Time Points ◽

Clinical Covariates ◽

Pre Treatment

Abstract Background For patients with rheumatoid arthritis (RA), introduction of early, effective therapy has consistently been shown to improve long-term outcomes. Low-dose methotrexate (MTX) is commonly prescribed as first-line treatment for RA. However, MTX is not effective for a large minority of patients and there is currently no way to determine ahead of therapy which patients are most likely to benefit. Metabolomics and lipidomics are emerging approaches for studying patient stratification in RA and have the potential to identify disease processes that underpin treatment outcomes. Here we apply state-of-the-art machine learning algorithms to predict MTX treatment response, by testing serum lipid levels measured at two time-points (pre-treatment and following 4 weeks on drug) to predict MTX response by 6 months. Methods This study included patients from the Rheumatoid Arthritis Medication Study (RAMS), a UK multi-centre one-year prospective observational study investigating predictors of response to MTX in patients with RA. Since 2008, patients who are about to start MTX for the first time are asked to provide demographic and clinical data, as well as blood samples to permit DNA, RNA and serum-based biomarker studies. Patients about to commence MTX treatment were followed longitudinally and those categorised as good or non-responders following 6 months on-drug using EULAR response criteria were analysed. Serum lipid levels were measured at pre-treatment and following 4 weeks on drug using ultra-performance liquid chromatography tailored for complex lipid analysis, coupled to mass spectrometry. State-of-the-art supervised machine learning methods were then applied to predict EULAR response at 6 months. Models including lipid levels were compared to models including clinical covariates (including: MTX start dose, steroid use at inclusion, BMI, number of swollen joints, number of tender joints, CRP levels, patients’ assessment of their overall wellbeing, gender, age-at-inclusion, age-at-onset, disease duration, HAQ score and pre-treatment smoking habits). Results Following quality control, 3,366 features (1,060 in negatively-charged mode and 2,306 in positive mode) were available for analysis at pre-treatment and 4 weeks from 100 RA patients categorised as good (GR, n = 50) or poor (NR, n = 50) responders to MTX following 6 months on drug. The best model performance for the classifier including clinical covariates was observed using L1/L2-regularised logistic regression (ROC AUC 0.68 ± 0.02). However, the clinical covariate model outperformed the classifier including lipid levels when either pre- or on-treatment time-points were investigated (ROC AUC 0.61 ± 0.02). Conclusion These data do not support the utility of early treatment lipidomic monitoring in routine clinical practice in patients started on MTX for their RA. Disclosures M. Maciejewski: Shareholder/stock ownership; owns stock or stock options in Pfizer. C. Sands None. N. Nair None. S. Ling None. S. Verstappen None. K. Hyrich None. A. Barton None. D. Ziemek Shareholder/stock ownership; owns stock or stock options in Pfizer. M. Lewis None. D. Plant None.

Download Full-text