Classification and photometric redshift estimation of quasars in photometric surveys

The climate change has turned out to be a determining factor in the development of forest in Spain. Production systems have emitted polluting gases and other particles into the atmosphere, for which some plants have not yet developed adaptation systems. Among the most harmful pollutants for the environment are gases such as nitrous oxides, ozone, particulate matter.However, this condition is not the same in Peninsular Spain, and the Balearic Islands since the plant compositions differ in the territory and the bioclimatic, topographic, and anthropic characteristics. Monitoring the vegetation with sufficient spatial and temporal resolution, studying variables conditioning plant health is a challenge from the nature of the variables and the amount of data to be handled.&#160;The Mediterranean forest is one of the most ecosystem affected by climate change because of usually experimented long periods of drought that, in combination with increased temperatures, can drastically reduce the photosynthetic activity of trees and therefore the biomass of forests.That is why the application of environmental technologies based on Remote Sensing (which provide plant health indices from passive sensors on satellite platforms and other variables of interest), Geographic Information Systems (to integrate, process, analyze spatial and temporal data) and machine learning models (which facilitate the extraction of relationships between variables, conditioning factors and predict patterns).&#160;In this regard, this work's objective is to evaluate the possible effect that different pollutants have on the health of the vegetation, measured from the annual values of the Normalized Difference Vegetation Index (NDVI), in the Mediterranean forests of Peninsular Spain. To achieve this, we are used machine learning techniques using the Random Forest algorithm. The study has also been done with various climatic, topographic, and anthropic variables that characterize the forest to carry it out.&#160;The results showed that certain variables such as the aridity index had generated the NDVI values and therefore plant development, while others are limiting factors such as the concentration of certain pollutants and the direct relationship between them particulates and NOx. This study can verify how the Random Forest algorithm offers reliable results, even when working with heterogeneous variables.&#160;

Download Full-text

Prediction of Fatal and Major Injury of Drivers, Cyclists, and Pedestrians in Collisions

PROMET - Traffic&Transportation ◽

10.7307/ptt.v32i1.3134 ◽

2020 ◽

Vol 32 (1) ◽

pp. 39-53

Author(s):

Dalia Shanshal ◽

Ceni Babaoglu ◽

Ayşe Başar

Keyword(s):

Machine Learning ◽

Random Forest ◽

Injury Severity ◽

Predictive Analytics ◽

Machine Learning Techniques ◽

Lasso Regression ◽

Severe Injuries ◽

Factors Affecting ◽

Spatio Temporal ◽

Using Data

Traffic-related deaths and severe injuries may affect every person on the roads, whether driving, cycling or walking. Toronto, the largest city in Canada and the fourth largest in North America, aims to eliminate traffic-related fatalities and serious injuries on city streets. The aim of this study is to build a prediction model using data analytics and machine learning techniques that learn from past patterns, providing additional data-driven decision support for strategic planning. A detailed exploratory analysis is presented, investigating the relationship between the variables and factors affecting collisions in Toronto. A learning-based model is proposed to predict the fatalities and severe injuries in traffic collisions through a comparison of two predictive models: Lasso Regression and Random Forest. Exploratory data analysis results reveal both spatio-temporal and behavioural patterns such as the prevalence of collisions in intersections, in the spring and summer and aggressive driving and inattentive behaviours in drivers. The prediction results show that the best predictor of injury severity for drivers, cyclists and pedestrians is Random Forest with an accuracy of 0.80, 0.89, and 0.80, respectively. The proposed methods demonstrate the effectiveness of machine learning application to traffic and collision data, both for exploratory and predictive analytics.

Download Full-text

Prediction of novel mouse TLR9 agonists using a random forest approach

BMC Molecular and Cell Biology ◽

10.1186/s12860-019-0241-0 ◽

2019 ◽

Vol 20 (S2) ◽

Author(s):

Varun Khanna ◽

Lei Li ◽

Johnson Fung ◽

Shoba Ranganathan ◽

Nikolai Petrovsky

Keyword(s):

Machine Learning ◽

Random Forest ◽

Correlation Coefficient ◽

Matthews Correlation Coefficient ◽

Learning Algorithms ◽

Ensemble Classifier ◽

Innate Immune ◽

Machine Learning Algorithms ◽

Support Vector ◽

Random Forest Algorithm

Abstract Background Toll-like receptor 9 is a key innate immune receptor involved in detecting infectious diseases and cancer. TLR9 activates the innate immune system following the recognition of single-stranded DNA oligonucleotides (ODN) containing unmethylated cytosine-guanine (CpG) motifs. Due to the considerable number of rotatable bonds in ODNs, high-throughput in silico screening for potential TLR9 activity via traditional structure-based virtual screening approaches of CpG ODNs is challenging. In the current study, we present a machine learning based method for predicting novel mouse TLR9 (mTLR9) agonists based on features including count and position of motifs, the distance between the motifs and graphically derived features such as the radius of gyration and moment of Inertia. We employed an in-house experimentally validated dataset of 396 single-stranded synthetic ODNs, to compare the results of five machine learning algorithms. Since the dataset was highly imbalanced, we used an ensemble learning approach based on repeated random down-sampling. Results Using in-house experimental TLR9 activity data we found that random forest algorithm outperformed other algorithms for our dataset for TLR9 activity prediction. Therefore, we developed a cross-validated ensemble classifier of 20 random forest models. The average Matthews correlation coefficient and balanced accuracy of our ensemble classifier in test samples was 0.61 and 80.0%, respectively, with the maximum balanced accuracy and Matthews correlation coefficient of 87.0% and 0.75, respectively. We confirmed common sequence motifs including ‘CC’, ‘GG’,‘AG’, ‘CCCG’ and ‘CGGC’ were overrepresented in mTLR9 agonists. Predictions on 6000 randomly generated ODNs were ranked and the top 100 ODNs were synthesized and experimentally tested for activity in a mTLR9 reporter cell assay, with 91 of the 100 selected ODNs showing high activity, confirming the accuracy of the model in predicting mTLR9 activity. Conclusion We combined repeated random down-sampling with random forest to overcome the class imbalance problem and achieved promising results. Overall, we showed that the random forest algorithm outperformed other machine learning algorithms including support vector machines, shrinkage discriminant analysis, gradient boosting machine and neural networks. Due to its predictive performance and simplicity, the random forest technique is a useful method for prediction of mTLR9 ODN agonists.

Download Full-text

Research on machine learning framework based on random forest algorithm

10.1063/1.4977376 ◽

2017 ◽

Cited By ~ 5

Author(s):

Qiong Ren ◽

Hui Cheng ◽

Hai Han

Keyword(s):

Machine Learning ◽

Random Forest ◽

Random Forest Algorithm ◽

Learning Framework

Download Full-text

Forest Fire Prediction using Machine Learning Models based on DC, Wind and RH

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.f1026.0386s20 ◽

2020 ◽

Vol 8 (6S) ◽

pp. 142-143

Keyword(s):

Machine Learning ◽

Random Forest ◽

Forest Fire ◽

Random Forest Algorithm ◽

Learning Models ◽

Learning Classifier ◽

Machine Learning Models ◽

Classifier Algorithms

The paper points out forest fire prediction using machine learning models on the basis of viz. DC, Wind, RH out of the several machine learning classifier algorithms, It is relevant that random forest algorithm generates optimum accuracy(99.61%).

Download Full-text

A Daily Covid-19 Cases Prediction System using Data Mining and Machine Learning Algorithm

10.5121/csit.2021.112320 ◽

2021 ◽

Author(s):

Yiqi Jack Gao ◽

Yu Sun

Keyword(s):

Machine Learning ◽

Random Forest ◽

Hospital Admissions ◽

Polynomial Regression ◽

Learning Algorithm ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Policy Makers ◽

Diverse Range ◽

Using Data

The start of 2020 marked the beginning of the deadly COVID-19 pandemic caused by the novel SARS-COV-2 from Wuhan, China. As of the time of writing, the virus had infected over 150 million people worldwide and resulted in more than 3.5 million global deaths. Accurate future predictions made through machine learning algorithms can be very useful as a guide for hospitals and policy makers to make adequate preparations and enact effective policies to combat the pandemic. This paper carries out a two pronged approach to analyzing COVID-19. First, the model utilizes the feature significance of random forest regressor to select eight of the most significant predictors (date, new tests, weekly hospital admissions, population density, total tests, total deaths, location, and total cases) for predicting daily increases of Covid-19 cases, highlighting potential target areas in order to achieve efficient pandemic responses. Then it utilizes machine learning algorithms such as linear regression, polynomial regression, and random forest regression to make accurate predictions of daily COVID-19 cases using a combination of this diverse range of predictors and proved to be competent at generating predictions with reasonable accuracy.

Download Full-text

A Machine Learning-Based Prediction Model for Cardiovascular Risk in Women With Preeclampsia

Frontiers in Cardiovascular Medicine ◽

10.3389/fcvm.2021.736491 ◽

2021 ◽

Vol 8 ◽

Author(s):

Guan Wang ◽

Yanbo Zhang ◽

Sijin Li ◽

Jun Zhang ◽

Dongkui Jiang ◽

...

Keyword(s):

Machine Learning ◽

Cardiovascular Disease ◽

Cardiovascular Risk ◽

Random Forest ◽

Prediction Model ◽

Disease Risk ◽

Cardiovascular Disease Risk ◽

Machine Learning Algorithms ◽

Brier Score ◽

Random Forest Algorithm

Objective: Preeclampsia affects 2–8% of women and doubles the risk of cardiovascular disease in women after preeclampsia. This study aimed to develop a model based on machine learning to predict postpartum cardiovascular risk in preeclamptic women.Methods: Collecting demographic characteristics and clinical serum markers associated with preeclampsia during pregnancy of 907 preeclamptic women retrospectively, we predicted the cardiovascular risk (ischemic heart disease, ischemic cerebrovascular disease, peripheral vascular disease, chronic kidney disease, metabolic system disease or arterial hypertension). The study samples were divided into training sets and test sets randomly in the ratio of 8:2. The prediction model was developed by 5 different machine learning algorithms, including Random Forest. 10-fold cross-validation was performed on the training set, and the performance of the model was evaluated on the test set.Results: Cardiovascular disease risk occurred in 186 (20.5%) of these women. By weighing area under the curve (AUC), the Random Forest algorithm presented the best performance (AUC = 0.711[95%CI: 0.697–0.726]) and was adopted in the feature selection and the establishment of the prediction model. The most important variables in Random Forest algorithm included the systolic blood pressure, Urea nitrogen, neutrophil count, glucose, and D-Dimer. Random Forest algorithm was well calibrated (Brier score = 0.133) in the test group, and obtained the highest net benefit in the decision curve analysis.Conclusion: Based on the general situation of patients and clinical variables, a new machine learning algorithm was developed and verified for the individualized prediction of cardiovascular risk in post-preeclamptic women.

Download Full-text

Prediction of Lung Cancer Risk using Random Forest Algorithm Based on Kaggle Data Set

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.f7879.038620 ◽

2020 ◽

Vol 8 (6) ◽

pp. 1623-1630

Keyword(s):

Machine Learning ◽

Lung Cancer ◽

Random Forest ◽

Naive Bayes ◽

Early Stage ◽

Naïve Bayes ◽

Training Data ◽

Random Forest Algorithm ◽

Data Set ◽

Wide Range

As huge amount of data accumulating currently, Challenges to draw out the required amount of data from available information is needed. Machine learning contributes to various fields. The fast-growing population caused the evolution of a wide range of diseases. This intern resulted in the need for the machine learning model that uses the patient's datasets. From different sources of datasets analysis, cancer is the most hazardous disease, it may cause the death of the forbearer. The outcome of the conducted surveys states cancer can be nearly cured in the initial stages and it may also cause the death of an affected person in later stages. One of the major types of cancer is lung cancer. It highly depends on the past data which requires detection in early stages. The recommended work is based on the machine learning algorithm for grouping the individual details into categories to predict whether they are going to expose to cancer in the early stage itself. Random forest algorithm is implemented, it results in more efficiency of 97% compare to KNN and Naive Bayes. Further, the KNN algorithm doesn't learn anything from training data but uses it for classification. Naive Bayes results in the inaccuracy of prediction. The proposed system is for predicting the chances of lung cancer by displaying three levels namely low, medium, and high. Thus, mortality rates can be reduced significantly.

Download Full-text

A machine learning framework for the evaluation of myocardial rotation in patients with noncompaction cardiomyopathy

PLoS ONE ◽

10.1371/journal.pone.0260195 ◽

2021 ◽

Vol 16 (11) ◽

pp. e0260195

Author(s):

Marcelo Dantas Tavares de Melo ◽

Jose de Arimatéia Batista Araujo-Filho ◽

José Raimundo Barbosa ◽

Camila Rocon ◽

Carlos Danilo Miranda Regis ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Ejection Fraction ◽

Area Under The Curve ◽

Random Forest Algorithm ◽

Noncompaction Cardiomyopathy ◽

2D Echocardiography ◽

Specific Strain ◽

Lv Ejection Fraction ◽

Sensitivity Specificity

Aims Noncompaction cardiomyopathy (NCC) is considered a genetic cardiomyopathy with unknown pathophysiological mechanisms. We propose to evaluate echocardiographic predictors for rigid body rotation (RBR) in NCC using a machine learning (ML) based model. Methods and results Forty-nine outpatients with NCC diagnosis by echocardiography and magnetic resonance imaging (21 men, 42.8±14.8 years) were included. A comprehensive echocardiogram was performed. The layer-specific strain was analyzed from the apical two-, three, four-chamber views, short axis, and focused right ventricle views using 2D echocardiography (2DE) software. RBR was present in 44.9% of patients, and this group presented increased LV mass indexed (118±43.4 vs. 94.1±27.1g/m2, P = 0.034), LV end-diastolic and end-systolic volumes (P< 0.001), E/e’ (12.2±8.68 vs. 7.69±3.13, P = 0.034), and decreased LV ejection fraction (40.7±8.71 vs. 58.9±8.76%, P < 0.001) when compared to patients without RBR. Also, patients with RBR presented a significant decrease of global longitudinal, radial, and circumferential strain. When ML model based on a random forest algorithm and a neural network model was applied, it found that twist, NC/C, torsion, LV ejection fraction, and diastolic dysfunction are the strongest predictors to RBR with accuracy, sensitivity, specificity, area under the curve of 0.93, 0.99, 0.80, and 0.88, respectively. Conclusion In this study, a random forest algorithm was capable of selecting the best echocardiographic predictors to RBR pattern in NCC patients, which was consistent with worse systolic, diastolic, and myocardium deformation indices. Prospective studies are warranted to evaluate the role of this tool for NCC risk stratification.

Download Full-text

Analysing the Capability of the Catchment's Spectral Signature for the Regionalization of Hydrological Parameters

10.22541/au.162100995.56312514/v1 ◽

2021 ◽

Author(s):

Laura Fragoso-Campón ◽

Pablo Durán-Barroso ◽

Elia Rosado

Keyword(s):

Machine Learning ◽

Random Forest ◽

Physical Properties ◽

Spectral Response ◽

Spectral Signature ◽

Spectral Approach ◽

Hydrological Response ◽

Random Forest Algorithm ◽

Hydrological Parameters ◽

Climatic Environment

Water resource management in ungauged catchments is complex due to the uncertainties around the hydrological parameters that dominate the streamflow behaviour. These parameters are usually defined by regionalization approaches in which hydrological response patterns are transferred from gauged to ungauged basins. Regression-based methods using physical properties derived from cartographic data sources are widely used. The current remote sensing techniques offer us new standpoints in regionalisation processing since the hydrological response depends on the physical attributes related to the spectral responses of the territory. Moreover, machine learning approaches have not been specifically applied to the regionalization of hydrologic parameters. This work studies the capability of a catchment’s spectral response based on Sentinel-1 and Sentinel-2 data to address a regression-based regionalization of hydrological parameters using a machine learning approach. Hydrological modelling was conducted by the HBV-light model. We tested the random forest algorithm in several regionalization scenarios: the new approach using the catchments’ spectral signature, the traditional method using physical properties and a fusion of them. The calibration results were excellent (median KGE = 0.83), and the regionalized parameters obtained with the random forest algorithm achieved good performance in which the three scenarios showed almost the same goodness of fit (median KGE = 0.45 to 0.50). We found that the effectiveness depends on the climatic environment and that predictions in humid catchments exhibited better performance than those in the driest catchments. The physical approach (median KGE= 0.71) exhibited better performance than the spectral approach (median KGE= 0.64) in humid catchments, whereas spectral regionalization (median KGE= 0.33) outperformed the physical scenario in the driest catchments (median KGE= 0.25). Herein, our results confirm that regionalization is still challenging in Mediterranean climate variants where the new spectral approach showed promising results and time series of satellite data could improve seasonal regionalization methodologies.

Download Full-text