Agricultural Irrigation Area Prediction Based on Improved Random Forest Model

Abstract The food problem is a major problem of common concern in the world, and the prediction of irrigation area can promote the solution of food and agricultural problems. In this paper, the data of grain production and irrigation area in the world are analyzed. An improved Random Forest Regression model is proposed and applied to the prediction of irrigation area. Based on ordinary Random Forest and Limit Tree Regression algorithm, an improved random forest prediction model for irrigation area in China is proposed. Firstly, the arithmetic mean value (AMM) of mean square error (MSE) and mean absolute error (MAE) was used as the evaluation index of the improved impure function and irrigation area prediction effect. Then, the grid search method is used to determine the optimal number of decision trees (70 trees and 30 trees respectively) in ordinary random forest and limit tree regression, and a new improved random forest model is established. After following, the model is compared with other prediction models, and 10 fold cross validation shows the rationality of the model. Finally, the error analysis of the improved Random Forest model shows that the prediction error is small. It is expected to be applied in the annual analysis of irrigation area in China.

Download Full-text

Reconstruction of Multidecadal Country-Aggregated Hydro Power Generation in Europe Based on a Random Forest Model

Energies ◽

10.3390/en13071786 ◽

2020 ◽

Vol 13 (7) ◽

pp. 1786

Author(s):

Linh T. T. Ho ◽

Laurent Dubus ◽

Matteo De Felice ◽

Alberto Troccoli

Keyword(s):

Power Generation ◽

Random Forest ◽

Model Performance ◽

Absolute Error ◽

Random Forest Model ◽

Low Carbon ◽

Climate Data ◽

Hydro Power ◽

Forest Model ◽

Continental Scale

Hydro power can provide a source of dispatchable low-carbon electricity and a storage solution in a climate-dependent energy mix with high shares of wind and solar production. Therefore, understanding the effect climate has on hydro power generation is critical to ensure a stable energy supply, particularly at a continental scale. Here, we introduce a framework using climate data to model hydro power generation at the country level based on a machine learning method, the random forest model, to produce a publicly accessible hydro power dataset from 1979 to present for twelve European countries. In addition to producing a consistent European hydro power generation dataset covering the past 40 years, the specific novelty of this approach is to focus on the lagged effect of climate variability on hydro power. Specifically, multiple lagged values of temperature and precipitation are used. Overall, the model shows promising results, with the correlation values ranging between 0.85 and 0.98 for run-of-river and between 0.73 and 0.90 for reservoir-based generation. Compared to the more standard optimal lag approach the normalised mean absolute error reduces by an average of 10.23% and 5.99%, respectively. The model was also implemented over six Italian bidding zones to also test its skill at the sub-country scale. The model performance is only slightly degraded at the bidding zone level, but this also depends on the actual installed capacity, with higher capacities displaying higher performance. The framework and results presented could provide a useful reference for applications such as pan-European (continental) hydro power planning and for system adequacy and extreme events assessments.

Download Full-text

The Random Forest Model Has the Best Accuracy Among the Four Pressure Ulcer Prediction Models Using Machine Learning Algorithms

Risk Management and Healthcare Policy ◽

10.2147/rmhp.s297838 ◽

2021 ◽

Vol Volume 14 ◽

pp. 1175-1187

Author(s):

Jie Song ◽

Yuan Gao ◽

Pengbin Yin ◽

Yi Li ◽

Yang Li ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Pressure Ulcer ◽

Prediction Models ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Random Forest Model ◽

Forest Model

Download Full-text

Machine Learning Approach Using Routine Immediate Postoperative Laboratory Values for Predicting Postoperative Mortality

Journal of Personalized Medicine ◽

10.3390/jpm11121271 ◽

2021 ◽

Vol 11 (12) ◽

pp. 1271

Author(s):

Jaehyeong Cho ◽

Jimyung Park ◽

Eugene Jeong ◽

Jihye Shin ◽

Sangjeong Ahn ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Prediction Models ◽

External Validation ◽

Model Development ◽

Postoperative Mortality ◽

Random Forest Model ◽

Forest Model ◽

Laboratory Values ◽

Increased Risk

Background: Several prediction models have been proposed for preoperative risk stratification for mortality. However, few studies have investigated postoperative risk factors, which have a significant influence on survival after surgery. This study aimed to develop prediction models using routine immediate postoperative laboratory values for predicting postoperative mortality. Methods: Two tertiary hospital databases were used in this research: one for model development and another for external validation of the resulting models. The following algorithms were utilized for model development: LASSO logistic regression, random forest, deep neural network, and XGBoost. We built the models on the lab values from immediate postoperative blood tests and compared them with the SASA scoring system to demonstrate their efficacy. Results: There were 3817 patients who had immediate postoperative blood test values. All models trained on immediate postoperative lab values outperformed the SASA model. Furthermore, the developed random forest model had the best AUROC of 0.82 and AUPRC of 0.13, and the phosphorus level contributed the most to the random forest model. Conclusions: Machine learning models trained on routine immediate postoperative laboratory values outperformed previously published approaches in predicting 30-day postoperative mortality, indicating that they may be beneficial in identifying patients at increased risk of postoperative death.

Download Full-text

Application of Data Mining Technology in Risk Prediction of Metabolic Syndrome in Oil Workers

10.21203/rs.3.rs-31038/v1 ◽

2020 ◽

Author(s):

Jie Wang ◽

Chao Li ◽

Jing Li ◽

Sheng Qin ◽

Chunlei Liu ◽

...

Keyword(s):

Metabolic Syndrome ◽

Random Forest ◽

Risk Prediction ◽

Roc Curve ◽

Prediction Models ◽

Prediction Performance ◽

Random Forest Model ◽

Forest Model ◽

The Metabolic Syndrome ◽

Oil Workers

Abstract Background. The prevalence of metabolic syndrome continues to rise sharply worldwide, seriously threatening people's health.In this paper, three kinds of risk prediction models applicable to the metabolic syndrome of oil workers were established, and the optimal models were found through comparison. The optimal model can be used to identify people at high risk of metabolic syndrome as early as possible, to predict their risk, and to persuade them to change their adverse lifestyle so as to slow down and reduce the incidence of metabolic syndrome.Methods. A total of 1,468 workers from an oil company who participated in occupational health physical examination from April 2017 to October 2018 were included in this study. We established the Logistic regression model, the random forest model and the convolutional neural network model, and compared the prediction performance of the models according to the F1 score, sensitivity, accuracy and other indicators of the three models.Results. The results showed that the accuracy of the three models in the training set was 83.45%, 94.21% and 86.34%, the sensitivity was 78.47%, 94.62% and 81.30%, the F1 score was 0.79, 0.93 and 0.83, and the area under the ROC curve was 0.894, 0.987 and 0.935, respectively. In the test set, the accuracy was 76.72%, 80.66% and 78.69%, the sensitivity was 70.00%, 77.50% and 68.33%, the F1 score was 0.70, 0.76 and 0.71, and the area under the ROC curve was 0.797, 0.861 and 0.855, respectively.Conclusions. The study showed that the prediction performance of random forest model is better than other models, and the model has higher application value, which can better predict the risk of metabolic syndrome in oil workers, and provide corresponding theoretical basis for the health management of oil workers.

Download Full-text

Variant pathogenic prediction models VSRFM and VSRFM-s, the importance of splicing and allele frequency

10.1101/430975 ◽

2018 ◽

Author(s):

JL Cabrera-Alarcon ◽

J Garcia-Martinez

Keyword(s):

Random Forest ◽

Allele Frequency ◽

Prediction Models ◽

Specific Model ◽

Random Forest Model ◽

Independent Data ◽

Data Set ◽

New Model ◽

Forest Model

ABSTRACTCurrently, there are available several tools to predict the effect of variants, with the aim of classify variants in neutral or pathogenic. In this study, we propose a new model trained over ensemble scores with two particularities, first we consider minor frequency allele from gnomAD and second, we split variants based on their splicing for training each specific model. Variants Stacked Random Forest Model (VSRFM) was constructed for variants not involved in splicing and Variants Stacked Random Forest Model for splicing (VSRFM-s) was trained for variants affected by splicing. Comparing these scores with their constituent scores used as features, our models showed the best outcomes. These results were confirmed using an independent data set from Clinvar database, with similar results.

Download Full-text

Predicting the required pre-surgery blood volume in surgical patients based on machine learning

10.1101/19008045 ◽

2019 ◽

Author(s):

Ruilin Li ◽

Xinyin Han ◽

Liping Sun ◽

Yannan Feng ◽

Xiaolin Sun ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Blood Transfusion ◽

Blood Volume ◽

Prediction Models ◽

Machine Learning Algorithms ◽

Random Forest Model ◽

Surgical Patients ◽

Forest Model ◽

Formidable Challenge

AbstractPrecisely predicting the required pre-surgery blood volume (PBV) in surgical patients is a formidable challenge in China. Inaccurate estimation is associate with excessive costs, postponed surgeries and adverse outcome after surgery due to in sufficient supply or inventory. This study aimed to predict required PBV based on machine learning techniques. 181,027 medical documents over 6 years were cleaned and finally obtained 92,057 blood transfusion records. The blood transfusion and surgery related factors of perioperative patients, surgeons experience volumes and the actual volumes of transfused RBCs were extracted. 6 machine learning algorithms were used to build prediction models. The surgery patients received allogenic RBCs or without transfusion, had total volume less than 10 units, or had the latest laboratory examinations of pre-surgery within 7 days were included, providing 118,823 data points. 39 predictive factors related to the RBCs transfusion were identified. Random forest model was selected to predict the required PBV of RBCs with 72.9% accuracy and strikingly improved the accuracy by 30.4% compared with surgeons experience, where 90% of data was used for training. We tested and demonstrated that both the data-driven models and the random forest model achieved higher accuracy than surgeons experience. Furthermore, we developed a computational tool, PTRBC, to precisely estimate the required PBV in surgical patients and we believe this tool will find more applications in assisting clinician decisions, not only confined to making accurate pre-surgery blood requirement predicting.

Download Full-text

Comparison of Models Used to Predict Flight Delays at Jomo Kenyatta International Airport

Asian Journal of Probability and Statistics ◽

10.9734/ajpas/2019/v3i330097 ◽

2019 ◽

pp. 1-8

Author(s):

P. K. Gachoki ◽

M. M. Muraya

Keyword(s):

Logistic Regression ◽

Random Forest ◽

Support Vector Machine Model ◽

Prediction Models ◽

Random Forest Model ◽

Aviation Industry ◽

Support Vector ◽

Flight Delays ◽

Machine Model ◽

Forest Model

Delays in flights have negative socio-economics effects on passengers, airlines and airports, resulting to huge economic loses. Therefore, their prediction is crucial during the decision-making process for all players of aviation industry for proper management. The development of accurate prediction models for flight delays depend on the complexity of air transport system and airport infrastructure, hence may be country specific. However, there exists no prediction models tailored to Kenyan aviation industry. Hence there is need to develop prediction models amenable to Kenya aviation conditions. The objective of this study was to compare the prediction power of the developed models. Secondary data from Jomo Kenya International Airport (JKIA) was used in this study. The data collected included the day of the flight (Monday to Sunday), the month (January to December), the airline, the flight class (domestic or international), season (summer or winter), capacity of the aircraft, flight ID (tail number) and whether the flight had flown at night or during the day. The analysis of the data was done using R- software. Three models, Logistic model, Support Vector Machine model and Random Forest model, were fitted. The strength and utility of the models was determined using bias-variance learning curves. The study revealed that the models predicted delays with different accuracies. The Random Forest model had a prediction accuracy of 68.99% while the Support Vector Machine model (SVM) had an accuracy of 68.62% and the Logistic Regression model had an accuracy of 66.18%. The Random Forest model outperformed the SVM and Logistic Regression with accuracies of 0.37% and 2.71% respectively. The SVM and Random Forest do not assume probability distribution of the response under investigation, probably indicating why they performed better than the logistic regression. The study recommends application of Random Forest model to predict flight delays at JKIA.

Download Full-text

Heatwave Damage Prediction Using Random Forest Model in Korea

Applied Sciences ◽

10.3390/app10228237 ◽

2020 ◽

Vol 10 (22) ◽

pp. 8237

Author(s):

Minsoo Park ◽

Daekyo Jung ◽

Seungsoo Lee ◽

Seunghee Park

Keyword(s):

Random Forest ◽

Prediction Models ◽

Mean Squared Error ◽

Random Forest Model ◽

Floating Population ◽

Coefficient Of Determination ◽

Positioning Systems ◽

Forest Model ◽

Population Variable ◽

Proposed Model

Climate change increases the frequency and intensity of heatwaves, causing significant human and material losses every year. Big data, whose volumes are rapidly increasing, are expected to be used for preemptive responses. However, human cognitive abilities are limited, which can lead to ineffective decision making during disaster responses when artificial intelligence-based analysis models are not employed. Existing prediction models have limitations with regard to their validation, and most models focus only on heat-associated deaths. In this study, a random forest model was developed for the weekly prediction of heat-related damages on the basis of four years (2015–2018) of statistical, meteorological, and floating population data from South Korea. The model was evaluated through comparisons with other traditional regression models in terms of mean absolute error, root mean squared error, root mean squared logarithmic error, and coefficient of determination (R2). In a comparative analysis with observed values, the proposed model showed an R2 value of 0.804. The results show that the proposed model outperforms existing models. They also show that the floating population variable collected from mobile global positioning systems contributes more to predictions than the aggregate population variable.

Download Full-text

Predictive Classification of IBS-subtype: Performance of a 250-gene RNA expression panel vs. Complete Blood Count (CBC) profiles under a Random Forest model.

10.1101/2021.08.31.21262766 ◽

2021 ◽

Author(s):

Jeffrey Robinson

Keyword(s):

Random Forest ◽

Blood Count ◽

Predictive Accuracy ◽

Complete Blood Count ◽

Mean Value ◽

Buffy Coat ◽

Random Forest Model ◽

Rna Expression ◽

Forest Model ◽

Data Column

In this experiment, an R-script was developed to select the best performing machine learning (ML) predictive classification algorithm for IBS subtype, and compare the performance of two datasets from the same clinical cohort: 1) The Complete Blood Count (CBC) results, and 2) A 250 gene Nanostring expression panel run on RNA from the Buffy Coat fraction. This publicly available data was compiled from open-source repositories and previously published supplementary data. Column labels were reformatted according to tidy-data standards. NA values in the data were imputed based on the mean value of the data column. Subject groups included Control (ie. healthy), IBS-D (diarrhea predominant), and IBS-C (constipation predominant) subtypes. These groups had unequal numbers in the original study, and so random re-sampling was used to make the group numbers equal for downstream linear regression-based analyses. The data was randomly split into training and validation subsets, and 5 classification algorithms were tested. Random Forest was clearly the best performing algorithm for both CBC and gene expression panel data, generally with >95% predictive accuracy, without additional tuning. The 250-gene RNA expression panel performed somewhat better than the CBC profile under a Random Forest model, however the CBC profiles had only 13 predictor variables vs. the 250 of the RNA expression panel. Some artifacts may result from the duplication of IBS-D and IBS-C rows from to the group-size balancing method, and so larger and more comprehensive datasets will be obtained for a follow-up analysis. The R-script and reformatted data are published as supplementary material here, and as a component of the AnalyzeBloodworkv1.2 GitHub repository.

Download Full-text