multivariate adaptive regression splines
Recently Published Documents


TOTAL DOCUMENTS

508
(FIVE YEARS 214)

H-INDEX

40
(FIVE YEARS 10)

Author(s):  
Ahmad Shaker Abdalrada ◽  
Jemal Abawajy ◽  
Tahsien Al-Quraishi ◽  
Sheikh Mohammed Shariful Islam

Abstract Background Diabetic mellitus (DM) and cardiovascular diseases (CVD) cause significant healthcare burden globally and often co-exists. Current approaches often fail to identify many people with co-occurrence of DM and CVD, leading to delay in healthcare seeking, increased complications and morbidity. In this paper, we aimed to develop and evaluate a two-stage machine learning (ML) model to predict the co-occurrence of DM and CVD. Methods We used the diabetes complications screening research initiative (DiScRi) dataset containing >200 variables from >2000 participants. In the first stage, we used two ML models (logistic regression and Evimp functions) implemented in multivariate adaptive regression splines model to infer the significant common risk factors for DM and CVD and applied the correlation matrix to reduce redundancy. In the second stage, we used classification and regression algorithm to develop our model. We evaluated the prediction models using prediction accuracy, sensitivity and specificity as performance metrics. Results Common risk factors for DM and CVD co-occurrence was family history of the diseases, gender, deep breathing heart rate change, lying to standing blood pressure change, HbA1c, HDL and TC\HDL ratio. The predictive model showed that the participants with HbA1c >6.45 and TC\HDL ratio > 5.5 were at risk of developing both diseases (97.9% probability). In contrast, participants with HbA1c >6.45 and TC\HDL ratio ≤ 5.5 were more likely to have only DM (84.5% probability) and those with HbA1c ≤5.45 and HDL >1.45 were likely to be healthy (82.4%. probability). Further, participants with HbA1c ≤5.45 and HDL <1.45 were at risk of only CVD (100% probability). The predictive accuracy of the ML model to detect co-occurrence of DM and CVD is 94.09%, sensitivity 93.5%, and specificity 95.8%. Conclusions Our ML model can significantly predict with high accuracy the co-occurrence of DM and CVD in people attending a screening program. This might help in early detection of patients with DM and CVD who could benefit from preventive treatment and reduce future healthcare burden.


2022 ◽  
Vol 14 (2) ◽  
pp. 798
Author(s):  
Snezhana Gocheva-Ilieva ◽  
Atanas Ivanov ◽  
Maya Stoimenova-Minova

A novel framework for stacked regression based on machine learning was developed to predict the daily average concentrations of particulate matter (PM10), one of Bulgaria’s primary health concerns. The measurements of nine meteorological parameters were introduced as independent variables. The goal was to carefully study a limited number of initial predictors and extract stochastic information from them to build an extended set of data that allowed the creation of highly efficient predictive models. Four base models using random forest, CART ensemble and bagging, and their rotation variants, were built and evaluated. The heterogeneity of these base models was achieved by introducing five types of diversities, including a new simplified selective ensemble algorithm. The predictions from the four base models were then used as predictors in multivariate adaptive regression splines (MARS) models. All models were statistically tested using out-of-bag or with 5-fold and 10-fold cross-validation. In addition, a variable importance analysis was conducted. The proposed framework was used for short-term forecasting of out-of-sample data for seven days. It was shown that the stacked models outperformed all single base models. An index of agreement IA = 0.986 and a coefficient of determination of about 95% were achieved.


2022 ◽  
Vol 2022 ◽  
pp. 1-9
Author(s):  
Eman H. Alkhammash ◽  
Abdelmonaim Fakhry Kamel ◽  
Saud M. Al-Fattah ◽  
Ahmed M. Elshewey

This paper presents optimized linear regression with multivariate adaptive regression splines (LR-MARS) for predicting crude oil demand in Saudi Arabia based on social spider optimization (SSO) algorithm. The SSO algorithm is applied to optimize LR-MARS performance by fine-tuning its hyperparameters. The proposed prediction model was trained and tested using historical oil data gathered from different sources. The results suggest that the demand for crude oil in Saudi Arabia will continue to increase during the forecast period (1980–2015). A number of predicting accuracy metrics including Mean Absolute Error (MAE), Median Absolute Error (MedAE), Mean Square Error (MSE), Root Mean Square Error (RMSE), and coefficient of determination ( R 2 ) were used to examine and verify the predicting performance for various models. Analysis of variance (ANOVA) was also applied to reveal the predicting result of the crude oil demand in Saudi Arabia and also to compare the actual test data and predict results between different predicting models. The experimental results show that optimized LR-MARS model performs better than other models in predicting the crude oil demand.


Author(s):  
Ayse Ozmen

Residential customers are the main users generally need a great quantity of natural gas in distribution systems, especially, in the wintry weather season since it is particularly consumed for cooking and space heating. Hence, it ought to be non-interruptible. Since distribution systems have a restricted ability for supply, reasonable planning and prediction through the whole year, especially in winter seasons, have emerged as vital. The Ridge Regression (RR) is formulated mainly to decrease collinearity results through shrinking the regression coefficients and reducing the impact in the model of variables. Conic multivariate adaptive regression splines ((C)MARS) model is constructed as an effective choice for MARS by using inverse problems, statistical learning, and multi-objective optimization theories. In this approach, the model complexity is penalized in the structure of RR and it is constructed a relaxation by utilizing continuous optimization, called Conic Quadratic Programming (CQP). In this study, CMARS and RR are applied to obtain forecasts of residential natural gas demand for local distribution companies (LDCs) that require short-term forecasts, and the model performances are compared by using some criteria. Here, our analysis shows that CMARS models outperform RR models. For one-day-ahead forecasts, CMARS yields a MAPE of about 4.8%, while the same value under RR reaches 8.5%. As the forecast horizon increases, it can be seen that the performance of the methods becomes worse, and for a forecast one week ahead, the MAPE values for CMARS and RR are 9.9% and 18.3%, respectively.


2021 ◽  
Vol 13 (24) ◽  
pp. 5170
Author(s):  
Cecilia Alonso-Rego ◽  
Stéfano Arellano-Pérez ◽  
Juan Guerra-Hernández ◽  
Juan Alberto Molina-Valero ◽  
Adela Martínez-Calvo ◽  
...  

In this study, we used data from a thinning trial conducted on 34 different sites and 102 sample plots established in pure and even-aged Pinus radiata and Pinus pinaster stands, to test the potential use of low-density airborne laser scanning (ALS) metrics and terrestrial laser scanning (TLS) metrics to provide accurate estimates of variables related to surface and canopy fires. An exhaustive field inventory was carried out in each plot to estimate the main stand variables and the main variables related to fire hazard: surface fuel loads by layers, fuel strata gap, surface fuel height, stand mean height, canopy base height, canopy fuel load and canopy bulk density. In addition, the point clouds from low-density ALS and single-scan TLS of each sample plot were used to calculate metrics related to the vertical and horizontal distribution of forest fuels. The comparative performance of the following three non-parametric machine learning techniques used to estimate the main stand- and fire-related variables from those metrics was evaluated: (i) multivariate adaptive regression splines (MARS), (ii) support vector machine (SVM), and (iii) random forest (RF). The selection of the best modeling approach was based on a comparison of the root mean square error (RMSE), obtained by optimizing the parameters of each technique and performing cross-validation. Overall, the best results were obtained with the MARS techniques for data from both sensors. The TLS data provided the best results for variables associated with the internal characteristics of canopy structure and understory fuel but were less reliable for estimating variables associated with the upper canopy, due to occlusion by mid-canopy foliage. The combination of ALS and TLS metrics improved the accuracy of estimates for all variables analyzed, except the height and the biomass of the understory shrubs. The variability demonstrated by the combined use of both types of metrics ranged from 43.11% for the biomass of duff litter layers to 94.25% for dominant height. The results suggest that the combination of machine learning techniques and metrics derived from low-density ALS data, drawn from a single-scan TLS or a combination of both metrics, may represent a promising alternative to traditional field inventories for obtaining valuable information about surface and canopy fuel variables at large scales.


2021 ◽  
Author(s):  
Abul Abrar Masrur Ahmed ◽  
M A I Chowdhury ◽  
Oli Ahmed ◽  
Ambica Sutradhar

Abstract The ability to predict dissolved oxygen, which is a critical water quality (WQ) parameter, is critical for aquatic managers responsible for maintaining ecosystem health and the management of reservoirs affected by WQ. This paper reports forecasting dissolved oxygen (DO) concentration using multivariate adaptive regression splines (MARS) of running river water using a set of water quality and hydro-meteorological variables. This study’s key objectives were to assess input selection methods and five multi-resolution analyses as a data extraction approach. Moreover, the hybrid model is prepared by maximum overlap discrete wavelet transformation (MODWT) with the MARS model (i.e., MODWT-MARS). The proposed model is further compared with numerous machine learning methods. The result shows that the hybrid algorithms (i.e., MODWT-MARS) outperformed the other models (r = 0.981, WI = 0.990, RMAE = 2.47% and MAE = 0.089). This hybrid method may serve as the foundation for forecasting water quality variables with fewer predictor variables.


2021 ◽  
Vol 7 (1) ◽  
Author(s):  
Bowen Lei ◽  
Tanner Quinn Kirk ◽  
Anirban Bhattacharya ◽  
Debdeep Pati ◽  
Xiaoning Qian ◽  
...  

AbstractBayesian optimization (BO) is an indispensable tool to optimize objective functions that either do not have known functional forms or are expensive to evaluate. Currently, optimal experimental design is always conducted within the workflow of BO leading to more efficient exploration of the design space compared to traditional strategies. This can have a significant impact on modern scientific discovery, in particular autonomous materials discovery, which can be viewed as an optimization problem aimed at looking for the maximum (or minimum) point for the desired materials properties. The performance of BO-based experimental design depends not only on the adopted acquisition function but also on the surrogate models that help to approximate underlying objective functions. In this paper, we propose a fully autonomous experimental design framework that uses more adaptive and flexible Bayesian surrogate models in a BO procedure, namely Bayesian multivariate adaptive regression splines and Bayesian additive regression trees. They can overcome the weaknesses of widely used Gaussian process-based methods when faced with relatively high-dimensional design space or non-smooth patterns of objective functions. Both simulation studies and real-world materials science case studies demonstrate their enhanced search efficiency and robustness.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Marc Sarossy ◽  
Jonathan Crowston ◽  
Dinesh Kumar ◽  
Anne Weymouth ◽  
Zhichao Wu

AbstractGlaucoma is an optic neuropathy that results in the progressive loss of retinal ganglion cells (RGCs), which are known to exhibit functional changes prior to cell loss. The electroretinogram (ERG) is a method that enables an objective assessment of retinal function, and the photopic negative response (PhNR) has conventionally been used to provide a measure of RGC function. This study sought to examine if additional parameters from the ERG (amplitudes of the a-, b-, i-wave, as well the trough between the b- and i-wave), a multivariate adaptive regression splines (MARS; a non-linear) model and achromatic stimuli could better predict glaucoma severity in 103 eyes of 55 individuals with glaucoma. Glaucoma severity was determined using standard automated perimetry and optical coherence tomography imaging. ERGs targeting the PhNR were recorded with a chromatic (red-on-blue) and achromatic (white-on-white) stimulus with the same luminance. Linear and MARS models were fitted to predict glaucoma severity using the PhNR only or all ERG markers, derived from chromatic and achromatic stimuli. Use of all ERG markers predicted glaucoma severity significantly better than the PhNR alone (P ≤ 0.02), and the MARS performed better than linear models when using all markers (P = 0.01), but there was no significant difference between the achromatic and chromatic stimulus models. This study shows that there is more information present in the photopic ERG beyond the conventional PhNR measure in characterizing RGC function.


Author(s):  
Shen Xing-xing ◽  
Cao Wei-wei ◽  
Li Kai

Abstract In this study, multivariate adaptive regression splines (MARS) model with order two and three were developed for predicting the California bearing capacity (CBR) value of pond ash stabilized with lime and lime sludge. To this aim, the model had five variables named maximum dry density, optimum moisture content, lime percentage, lime sludge percentage, and curing period as inputs, and CBR as output variable. MARS-O3 has the best results, which its R2 stood at 0.9565 and 0.9312, and PI 0.0709 and 0.1061 for the training and testing phases, respectively. In both developed models, the estimated CBR values in training and testing stages specify acceptable agreement with experimental results, representing the workability of proposed equations for predicting the CBR values with high accuracy. Comparison of two developed equations supplied that MARS-O3 has a better result than MARS-O2. Based on error curves, the MARS-O3 model results in the lowest error percentage in the CBR predicting process, providing roughly accurate prediction than those of the rest developed methods specified. Therefore, MARS-O3 could be recognized as the proposed model.


Sign in / Sign up

Export Citation Format

Share Document