scholarly journals Can machine learning bring cardiovascular risk assessment to the next level?

Author(s):  
Adrien Rousset ◽  
David Dellamonica ◽  
Romuald Menuet ◽  
Armando Lira Pineda ◽  
Lea Ricci ◽  
...  

Abstract Objective Through this proof of concept, we studied the potential added value of machine learning methods in building cardiovascular risk scores from structured data and the conditions under which they outperform linear statistical models. Methods Relying on extensive cardiovascular clinical data from FOURIER, a randomized clinical trial to test for evolocumab efficacy, we compared linear models, neural networks, random forest, and gradient boosting machines for predicting the risk of major adverse cardiovascular events. To study the relative strengths of each method, we extended the comparison to restricted subsets of the full FOURIER dataset, limiting either the number of available patients or the number of their characteristics. Results When using all the 428 covariates available in the dataset, machine learning methods significantly (c-index 0.67, p-value 2e-5) outperformed linear models built from the same variables (c-index 0.62), as well as a reference cardiovascular risk score based on only 10 variables (c-index 0.60). We showed that gradient boosting—the best performing model in our setting—requires fewer patients and significantly outperforms linear models when using large numbers of variables. On the other hand, we illustrate how linear models suffer from being trained on too many variables, thus requiring a more careful prior selection. These machine learning methods proved to consistently improve risk assessment, to be interpretable despite their complexity and to help identify the minimal set of covariates necessary to achieve top performance. Conclusion In the field of secondary cardiovascular events prevention, given the increased availability of extensive electronic health records, machine learning methods could open the door to more powerful tools for patient risk stratification and treatment allocation strategies.

2021 ◽  
Author(s):  
Polash Banerjee

Abstract Wildfires in limited extent and intensity can be a boon for the forest ecosystem. However, recent episodes of wildfires of 2019 in Australia and Brazil are sad reminders of their heavy ecological and economical costs. Understanding the role of environmental factors in the likelihood of wildfires in a spatial context would be instrumental in mitigating it. In this study, 14 environmental features encompassing meteorological, topographical, ecological, in situ and anthropogenic factors have been considered for preparing the wildfire likelihood map of Sikkim Himalaya. A comparative study on the efficiency of machine learning methods like Generalized Linear Model (GLM), Support Vector Machine (SVM), Random Forest (RF) and Gradient Boosting Model (GBM) has been performed to identify the best performing algorithm in wildfire prediction. The study indicates that all the machine learning methods are good at predicting wildfires. However, RF has outperformed, followed by GBM in the prediction. Also, environmental features like average temperature, average wind speed, proximity to roadways and tree cover percentage are the most important determinants of wildfires in Sikkim Himalaya. This study can be considered as a decision support tool for preparedness, efficient resource allocation and sensitization of people towards mitigation of wildfires in Sikkim.


2021 ◽  
Vol 3 ◽  
pp. 47-57
Author(s):  
I. N. Myagkova ◽  
◽  
V. R. Shirokii ◽  
Yu. S. Shugai ◽  
O. G. Barinov ◽  
...  

The ways are studied to improve the quality of prediction of the time series of hourly mean fluxes and daily total fluxes (fluences) of relativistic electrons in the outer radiation belt of the Earth 1 to 24 hours ahead and 1 to 4 days ahead, respectively. The prediction uses an approximation approach based on various machine learning methods, namely, artificial neural networks (ANNs), decision tree (random forest), and gradient boosting. A comparison of the skill scores of short-range forecasts with the lead time of 1 to 24 hours showed that the best results were demonstrated by ANNs. For medium-range forecasting, the accuracy of prediction of the fluences of relativistic electrons in the Earth’s outer radiation belt three to four days ahead increases significantly when the predicted values of the solar wind velocity near the Earth obtained from the UV images of the Sun of the AIA (Atmospheric Imaging Assembly) instrument of the SDO (Solar Dynamics Observatory) are included to the list of the input parameters.


Author(s):  
Artem Salamatov ◽  
Elena Gafarova ◽  
Vladimir Belevitin ◽  
Maxim Gafarov ◽  
Darya Gordeeva

The relevance of environmental and economic activity requires professional training of specialists and, accordingly, new organizational and pedagogical conditions for effective education. It is also necessary to develop control and measuring materials that would have all the qualities (validity, reliability, consistency, significance and objectivity) to obtain the most reliable results in justifying the need and sufficiency of the identified conditions. The intensification of information processes in vocational education leads researchers to the need to find optimal conditions and tools to achieve pedagogical goals. Among these tools are machine learning methods and mathematical models built on their basis for quantitative assessment of the quality of vocational training in the field of environmental and economic activities. The use of the qualimetric approach in pedagogy is possible in the presence of a certain array of observational data for one or another criterion related to learning conditions, personal qualities of students, etc. The construction of an algorithmic model allows one to operate with conditions in mental experiments, test hypotheses, and since pedagogical research is quite long in time, the choice of conditions based on the most favorable forecast built using the model allows one to optimize pedagogical resources to achieve the planned results. Rational selection of effective control and measuring materials (CMMs) allows one to determine the need and sufficiency of organizational and pedagogical conditions. While mathematical modeling allows one to quickly adjust the organizational and pedagogical conditions as a set of opportunities for content, forms, teaching methods, information and communication technologies (ICTs) and CMMs used to achieve the planned educational results in the sphere of environmental and economic activity. Interpretation of the derived features in the context of the pedagogical research performed with a cross-validation accuracy of 72% made it possible to reveal the dominant significance of intersubjective connections between the disciplines studied by the sample of students in the bachelor's and master's programs. Namely, programs 44.03.04 and 44.04.04 "Professional training (by industry)", which are the most significant in terms of the formation of competence in the field of environmental and economic activities. The designed mathematical model of the Gradient Boosting Classifier allows making predictive expectations of the studied competency types and testing hypotheses for the inclusion or exclusion of certain significant organizational and pedagogical conditions for the effective implementation of the educational process. A necessary and sufficient organizational and pedagogical condition for the effective formation of competence in the field of environmental and economic activity is to ensure continuity between significant disciplines and the actualization of interdisciplinary relationships based on the development of interdisciplinary courses.


Author(s):  
Pavel Kikin ◽  
Alexey Kolesnikov ◽  
Alexey Portnov ◽  
Denis Grischenko

The state of ecological systems, along with their general characteristics, is almost always described by indicators that vary in space and time, which leads to a significant complication of constructing mathematical models for predicting the state of such systems. One of the ways to simplify and automate the construction of mathematical models for predicting the state of such systems is the use of machine learning methods. The article provides a comparison of traditional and based on neural networks, algorithms and machine learning methods for predicting spatio-temporal series representing ecosystem data. Analysis and comparison were carried out among the following algorithms and methods: logistic regression, random forest, gradient boosting on decision trees, SARIMAX, neural networks of long-term short-term memory (LSTM) and controlled recurrent blocks (GRU). To conduct the study, data sets were selected that have both spatial and temporal components: the values of the number of mosquitoes, the number of dengue infections, the physical condition of tropical grove trees, and the water level in the river. The article discusses the necessary steps for preliminary data processing, depending on the algorithm used. Also, Kolmogorov complexity was calculated as one of the parameters that can help formalize the choice of the most optimal algorithm when constructing mathematical models of spatio-temporal data for the sets used. Based on the results of the analysis, recommendations are given on the application of certain methods and specific technical solutions, depending on the characteristics of the data set that describes a particular ecosystem


2020 ◽  
Author(s):  
Juan David Gutiérrez

Abstract Background: Previous authors have evidenced the relationship between air pollution-aerosols and meteorological variables with the occurrence of pneumonia. Forecasting the number of attentions of pneumonia cases may be useful to optimize the allocation of healthcare resources and support public health authorities to implement emergency plans to face an increase in patients. The purpose of this study is to implement four machine-learning methods to forecast the number of attentions of pneumonia cases in the five largest cities of Colombia by using air pollution-aerosols, and meteorological and admission data.Methods: The number of attentions of pneumonia cases in the five most populated Colombian cities was provided by public health authorities between January 2009 and December 2019. Air pollution-aerosols and meteorological data were obtained from remote sensors. Four machine-learning methods were implemented for each city. We selected the machine-learning methods with the best performance in each city and implemented two techniques to identify the most relevant variables in the forecasting developed by the best-performing machine-learning models. Results: According to R2 metric, random forest was the machine-learning method with the best performance for Bogotá, Medellín and Cali; whereas for Barranquilla, the best performance was obtained from the Bayesian adaptive regression trees, and for Cartagena, extreme gradient boosting had the best performance. The most important variables for the forecasting were related to the admission data.Conclusions: The results obtained from this study suggest that machine learning can be used to efficiently forecast the number of attentions of pneumonia cases, and therefore, it can be a useful decision-making tool for public health authorities.


Proceedings ◽  
2020 ◽  
Vol 59 (1) ◽  
pp. 5
Author(s):  
Gong Chen ◽  
Judith Rosenow ◽  
Michael Schultz ◽  
Ostap Okhrin

Increasing demands on a highly efficient air traffic management system go hand in hand with increasing requirements for predicting the aircraft’s future position. In this context, the airport collaborative decision-making framework provides a standardized approach to improve airport performance by defining operationally important milestones along the aircraft trajectory. In particular, the aircraft landing time is an important milestone, significantly impacting the utilization of limited runway capacities. We compare different machine learning methods to predict the landing time based on broadcast surveillance data of arrival flights at Zurich Airport. Thus, we consider different time horizons (look ahead times) for arrival flights to predict additional sub-milestones for n-hours-out timestamps. The features are extracted from both surveillance data and weather information. Flights are clustered and analyzed using feedforward neural networks and decision tree methods, such as random forests and gradient boosting machines, compared with cross-validation error. The prediction of landing time from entry points with a radius of 45, 100, 150, 200, and 250 nautical miles can attain an MAE and RMSE within 5 min on the test set. As the radius increases, the prediction error will also increase. Our predicted landing times will contribute to appropriate airport performance management.


2020 ◽  
Vol 12 (6) ◽  
pp. 914 ◽  
Author(s):  
Mahdieh Danesh Yazdi ◽  
Zheng Kuang ◽  
Konstantina Dimakopoulou ◽  
Benjamin Barratt ◽  
Esra Suel ◽  
...  

Estimating air pollution exposure has long been a challenge for environmental health researchers. Technological advances and novel machine learning methods have allowed us to increase the geographic range and accuracy of exposure models, making them a valuable tool in conducting health studies and identifying hotspots of pollution. Here, we have created a prediction model for daily PM2.5 levels in the Greater London area from 1st January 2005 to 31st December 2013 using an ensemble machine learning approach incorporating satellite aerosol optical depth (AOD), land use, and meteorological data. The predictions were made on a 1 km × 1 km scale over 3960 grid cells. The ensemble included predictions from three different machine learners: a random forest (RF), a gradient boosting machine (GBM), and a k-nearest neighbor (KNN) approach. Our ensemble model performed very well, with a ten-fold cross-validated R2 of 0.828. Of the three machine learners, the random forest outperformed the GBM and KNN. Our model was particularly adept at predicting day-to-day changes in PM2.5 levels with an out-of-sample temporal R2 of 0.882. However, its ability to predict spatial variability was weaker, with a R2 of 0.396. We believe this to be due to the smaller spatial variation in pollutant levels in this area.


Entropy ◽  
2020 ◽  
Vol 23 (1) ◽  
pp. 21
Author(s):  
Yury Rodimkov ◽  
Evgeny Efimenko ◽  
Valentin Volokitin ◽  
Elena Panova ◽  
Alexey Polovinkin ◽  
...  

When entering the phase of big data processing and statistical inferences in experimental physics, the efficient use of machine learning methods may require optimal data preprocessing methods and, in particular, optimal balance between details and noise. In experimental studies of strong-field quantum electrodynamics with intense lasers, this balance concerns data binning for the observed distributions of particles and photons. Here we analyze the aspect of binning with respect to different machine learning methods (Support Vector Machine (SVM), Gradient Boosting Trees (GBT), Fully-Connected Neural Network (FCNN), Convolutional Neural Network (CNN)) using numerical simulations that mimic expected properties of upcoming experiments. We see that binning can crucially affect the performance of SVM and GBT, and, to a less extent, FCNN and CNN. This can be interpreted as the latter methods being able to effectively learn the optimal binning, discarding unnecessary information. Nevertheless, given limited training sets, the results indicate that the efficiency can be increased by optimizing the binning scale along with other hyperparameters. We present specific measurements of accuracy that can be useful for planning of experiments in the specified research area.


Genes ◽  
2019 ◽  
Vol 11 (1) ◽  
pp. 41 ◽  
Author(s):  
Mengli Xiao ◽  
Zhong Zhuang ◽  
Wei Pan

Enhancer-promoter interactions (EPIs) are crucial for transcriptional regulation. Mapping such interactions proves useful for understanding disease regulations and discovering risk genes in genome-wide association studies. Some previous studies showed that machine learning methods, as computational alternatives to costly experimental approaches, performed well in predicting EPIs from local sequence and/or local epigenomic data. In particular, deep learning methods were demonstrated to outperform traditional machine learning methods, and using DNA sequence data alone could perform either better than or almost as well as only utilizing epigenomic data. However, most, if not all, of these previous studies were based on randomly splitting enhancer-promoter pairs as training, tuning, and test data, which has recently been pointed out to be problematic; due to multiple and duplicating/overlapping enhancers (and promoters) in enhancer-promoter pairs in EPI data, such random splitting does not lead to independent training, tuning, and test data, thus resulting in model over-fitting and over-estimating predictive performance. Here, after correcting this design issue, we extensively studied the performance of various deep learning models with local sequence and epigenomic data around enhancer-promoter pairs. Our results confirmed much lower performance using either sequence or epigenomic data alone, or both, than reported previously. We also demonstrated that local epigenomic features were more informative than local sequence data. Our results were based on an extensive exploration of many convolutional neural network (CNN) and feed-forward neural network (FNN) structures, and of gradient boosting as a representative of traditional machine learning.


Sign in / Sign up

Export Citation Format

Share Document