Prediction of Water Saturation from Well Log Data by Machine Learning Algorithms: Boosting and Super Learner

Fahimeh Hadavimoghaddam; Mehdi Ostadhassan; Mohammad Ali Sadri; Tatiana Bondarenko; Igor Chebyshev; Amir Semnani

doi:10.3390/jmse9060666

Prediction of Water Saturation from Well Log Data by Machine Learning Algorithms: Boosting and Super Learner

Journal of Marine Science and Engineering ◽

10.3390/jmse9060666 ◽

2021 ◽

Vol 9 (6) ◽

pp. 666

Author(s):

Fahimeh Hadavimoghaddam ◽

Mehdi Ostadhassan ◽

Mohammad Ali Sadri ◽

Tatiana Bondarenko ◽

Igor Chebyshev ◽

...

Keyword(s):

Machine Learning ◽

Water Saturation ◽

Machine Learning Algorithms ◽

Rock Properties ◽

Gradient Boosting ◽

Data Set ◽

Log Data ◽

Gamma Density ◽

Super Learner ◽

Resistivity Log

Intelligent predictive methods have the power to reliably estimate water saturation (Sw) compared to conventional experimental methods commonly performed by petrphysicists. However, due to nonlinearity and uncertainty in the data set, the prediction might not be accurate. There exist new machine learning (ML) algorithms such as gradient boosting techniques that have shown significant success in other disciplines yet have not been examined for Sw prediction or other reservoir or rock properties in the petroleum industry. To bridge the literature gap, in this study, for the first time, a total of five ML code programs that belong to the family of Super Learner along with boosting algorithms: XGBoost, LightGBM, CatBoost, AdaBoost, are developed to predict water saturation without relying on the resistivity log data. This is important since conventional methods of water saturation prediction that rely on resistivity log can become problematic in particular formations such as shale or tight carbonates. Thus, to do so, two datasets were constructed by collecting several types of well logs (Gamma, density, neutron, sonic, PEF, and without PEF) to evaluate the robustness and accuracy of the models by comparing the results with laboratory-measured data. It was found that Super Learner and XGBoost produced the highest accurate output (R2: 0.999 and 0.993, respectively), and with considerable distance, Catboost and LightGBM were ranked third and fourth, respectively. Ultimately, both XGBoost and Super Learner produced negligible errors but the latest is considered as the best amongst all.

Download Full-text

Petrofacies classification using machine learning algorithms

Geophysics ◽

10.1190/geo2019-0439.1 ◽

2020 ◽

Vol 85 (4) ◽

pp. WA101-WA113 ◽

Cited By ~ 3

Author(s):

Adrielle A. Silva ◽

Mônica W. Tavares ◽

Abel Carrasquilla ◽

Roseane Misságia ◽

Marco Ceia

Keyword(s):

Machine Learning ◽

Oil And Gas ◽

Water Saturation ◽

Carbonate Reservoir ◽

Machine Learning Algorithms ◽

Carbonate Reservoirs ◽

Training Data ◽

Southeastern Brazil ◽

Gradient Boosting ◽

Data Set

Carbonate reservoirs represent a large portion of the world’s oil and gas reserves, exhibiting specific characteristics that pose complex challenges to the reservoirs’ characterization, production, and management. Therefore, the evaluation of the relationships between the key parameters, such as porosity, permeability, water saturation, and pore size distribution, is a complex task considering only well-log data, due to the geologic heterogeneity. Hence, the petrophysical parameters are the key to assess the original composition and postsedimentological aspects of the carbonate reservoirs. The concept of reservoir petrofacies was proposed as a tool for the characterization and prediction of the reservoir quality as it combines primary textural analysis with laboratory measurements of porosity, permeability, capillary pressure, photomicrograph descriptions, and other techniques, which contributes to understanding the postdiagenetic events. We have adopted a workflow to petrofacies classification of a carbonate reservoir from the Campos Basin in southeastern Brazil, using the following machine learning methods: decision tree, random forest, gradient boosting, K-nearest neighbors, and naïve Bayes. The data set comprised 1477 wireline data from two wells (A3 and A10) that had petrofacies classes already assigned based on core descriptions. It was divided into two subsets, one for training and one for testing the capability of the trained models to assign petrofacies. The supervised-learning models have used labeled training data to learn the relationships between the input measurements and the petrofacies to be assigned. Additionally, we have developed a comparison of the models’ performance using the testing set according to accuracy, precision, recall, and F1-score evaluation metrics. Our approach has proved to be a valuable ally in petrofacies classification, especially for analyzing a well-logging database with no prior petrophysical information.

Download Full-text

A Critical Literature Review on Rock Petrophysical Properties Estimation from Images Based on Direct Simulation and Machine Learning Techniques

10.2118/208125-ms ◽

2021 ◽

Author(s):

Ahmed Samir Rizk ◽

Moussa Tembely ◽

Waleed AlAmeri ◽

Emad W. Al-Shalabi

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Literature Review ◽

Training Data ◽

Rock Properties ◽

Gradient Boosting ◽

Petrophysical Properties ◽

Direct Simulation ◽

Data Set ◽

Extreme Gradient Boosting

Abstract Estimation of petrophysical properties is essential for accurate reservoir predictions. In recent years, extensive work has been dedicated into training different machine-learning (ML) models to predict petrophysical properties of digital rock using dry rock images along with data from single-phase direct simulations, such as lattice Boltzmann method (LBM) and finite volume method (FVM). The objective of this paper is to present a comprehensive literature review on petrophysical properties estimation from dry rock images using different ML workflows and direct simulation methods. The review provides detailed comparison between different ML algorithms that have been used in the literature to estimate porosity, permeability, tortuosity, and effective diffusivity. In this paper, various ML workflows from the literature are screened and compared in terms of the training data set, the testing data set, the extracted features, the algorithms employed as well as their accuracy. A thorough description of the most commonly used algorithms is also provided to better understand the functionality of these algorithms to encode the relationship between the rock images and their respective petrophysical properties. The review of various ML workflows for estimating rock petrophysical properties from dry images shows that models trained using features extracted from the image (physics-informed models) outperformed models trained on the dry images directly. In addition, certain tree-based ML algorithms, such as random forest, gradient boosting, and extreme gradient boosting can produce accurate predictions that are comparable to deep learning algorithms such as deep neural networks (DNNs) and convolutional neural networks (CNNs). To the best of our knowledge, this is the first work dedicated to exploring and comparing between different ML frameworks that have recently been used to accurately and efficiently estimate rock petrophysical properties from images. This work will enable other researchers to have a broad understanding about the topic and help in developing new ML workflows or further modifying exiting ones in order to improve the characterization of rock properties. Also, this comparison represents a guide to understand the performance and applicability of different ML algorithms. Moreover, the review helps the researchers in this area to cope with digital innovations in porous media characterization in this fourth industrial age – oil and gas 4.0.

Download Full-text

Classification of hazelnut cultivars: comparison of DL4J and ensemble learning algorithms

Notulae Botanicae Horti Agrobotanici Cluj-Napoca ◽

10.15835/nbha48412041 ◽

2020 ◽

Vol 48 (4) ◽

pp. 2316-2327

Author(s):

Caner KOC ◽

Dilara GERDAN ◽

Maksut B. EMİNOĞLU ◽

Uğur YEGÜL ◽

Bulent KOC ◽

...

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Random Forest ◽

Ensemble Learning ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Performance Criteria ◽

Gradient Boosting ◽

Data Set

Classification of hazelnuts is one of the values adding processes that increase the marketability and profitability of its production. While traditional classification methods are used commonly, machine learning and deep learning can be implemented to enhance the hazelnut classification processes. This paper presents the results of a comparative study of machine learning frameworks to classify hazelnut (Corylus avellana L.) cultivars (‘Sivri’, ‘Kara’, ‘Tombul’) using DL4J and ensemble learning algorithms. For each cultivar, 50 samples were used for evaluations. Maximum length, width, compression strength, and weight of hazelnuts were measured using a caliper and a force transducer. Gradient boosting machine (Boosting), random forest (Bagging), and DL4J feedforward (Deep Learning) algorithms were applied in traditional machine learning algorithms. The data set was partitioned into a 10-fold-cross validation method. The classifier performance criteria of accuracy (%), error percentage (%), F-Measure, Cohen’s Kappa, recall, precision, true positive (TP), false positive (FP), true negative (TN), false negative (FN) values are provided in the results section. The results showed classification accuracies of 94% for Gradient Boosting, 100% for Random Forest, and 94% for DL4J Feedforward algorithms.

Download Full-text

Automatic Lithofacies Classification with t-SNE and K-Nearest Neighbors Algorithm

Anuário do Instituto de Geociências - UFRJ ◽

10.11137/1982-3908_2021_44_35024 ◽

2021 ◽

Vol 44 ◽

Author(s):

Guilherme Loriato Potratz ◽

Smith Washington Arauco Canchumuni ◽

Jose David Bermudez Castro ◽

Júlia Potratz ◽

Marco Aurélio C. Pacheco

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Artificial Neural Networks ◽

Unsupervised Classification ◽

Machine Learning Algorithms ◽

Hydrocarbon Exploration ◽

Data Set ◽

Log Data ◽

Hidden Patterns ◽

Artificial Neural

One of the critical processes in the exploration of hydrocarbons is the identification and prediction of lithofacies that constitute the reservoir. One of the cheapest and most efficient ways to carry out that process is from the interpretation of well log data, which are often obtained continuously and in the majority of drilled wells. The main methodologies used to correlate log data to data obtained in well cores are based on statistical analyses, machine learning models and artificial neural networks. This study aims to test an algorithm of dimension reduction of data together with an unsupervised classification method of predicting lithofacies automatically. The performance of the methodology presented was compared to predictions made with artificial neural networks. We used the t-Distributed Stochastic Neighbor Embedding (t-SNE) as an algorithm for mapping the wells logging data in a smaller feature space. Then, the predictions of facies are performed using a KNN algorithm. The method is assessed in the public dataset of the Hugoton and Panoma fields. Prediction of facies through traditional artificial neural networks obtained an accuracy of 69%, where facies predicted through the t-SNE + K-NN algorithm obtained an accuracy of 79%. Considering the nature of the data, which have high dimensionality and are not linearly correlated, the efficiency of t SNE+KNN can be explained by the ability of the algorithm to identify hidden patterns in a fuzzy boundary in data set. It is important to stress that the application of machine learning algorithms offers relevant benefits to the hydrocarbon exploration sector, such as identifying hidden patterns in high-dimensional datasets, searching for complex and non-linear relationships, and avoiding the need for a preliminary definition of mathematic relations among the model’s input data.

Download Full-text

Extreme Gradient Boosting for Parkinson’s Disease Diagnosis from Voice Recordings

10.21203/rs.2.20727/v1 ◽

2020 ◽

Author(s):

Ibrahim Karabayir ◽

Suguna Pappu ◽

Samuel Goldman ◽

Oguz Akbilgic

Keyword(s):

Machine Learning ◽

Parkinson’S Disease ◽

Parkinson's Disease ◽

Disease Diagnosis ◽

Machine Learning Algorithms ◽

Gradient Boosting ◽

Acoustic Features ◽

Data Set ◽

Non Invasive ◽

Extreme Gradient Boosting

Abstract Background : Parkinson’s Disease (PD) is a clinically diagnosed neurodegenerative disorder that affects both motor and non-motor neural circuits. Speech deterioration (hypokinetic dysarthria) is a common symptom, which often presents early in the disease course. Machine learning can help movement disorders specialists improve their diagnostic accuracy using non-invasive and inexpensive voice recordings. Method : We used “Parkinson Dataset with Replicated Acoustic Features Data Set” from the UCI-Machine Learning repository. The dataset included 45 features including sex and 44 speech test based acoustic features from 40 patients with Parkinson’s disease and 40 controls. We analyzed the data using various machine learning algorithms including tree-based ensemble approaches such as random forest and extreme gradient boosting. We also implemented a variable importance analysis to identify important variables classifying patients with PD. Results : The cohort included total of 80 subjects; 40 patients with PD (55% men) and 40 controls (67.5% men). PD patients showed at least two of the three symptoms; resting tremor, bradykinesia, or rigidity. All patients were over 50 years old and the mean age for PD subjects and controls were 69.6 (SD 7.8) and 66.4 (SD 8.4), respectively. Our final model provided an AUC of 0.940 with 95% confidence interval 0.935-0.945in 4-folds cross validation using only six acoustic features including Delta3 (Run2), Delta0 (Run 3), MFCC4 (Run 2), Delta10 (Run 2/Run 3), MFCC10 (Run 2) and Jitter_Rap (Run 1/Run 2). Conclusions : Machine learning can accurately detect Parkinson’s disease using an inexpensive and non-invasive voice recording. Such technologies can be deployed into smartphones for screening of large patient populations for Parkinson’s disease.

Download Full-text

Predictability of Mortality in Patients With Myocardial Injury After Noncardiac Surgery Based on Perioperative Factors via Machine Learning: Retrospective Study

JMIR Medical Informatics ◽

10.2196/32771 ◽

2021 ◽

Vol 9 (10) ◽

pp. e32771

Author(s):

Seo Jeong Shin ◽

Jungchan Park ◽

Seung-Hwa Lee ◽

Kwangmo Yang ◽

Rae Woong Park

Keyword(s):

Machine Learning ◽

Clinical Data ◽

Myocardial Injury ◽

Learning Algorithms ◽

Noncardiac Surgery ◽

Machine Learning Algorithms ◽

Gradient Boosting ◽

Data Set ◽

Extreme Gradient Boosting ◽

The Impact

Background Myocardial injury after noncardiac surgery (MINS) is associated with increased postoperative mortality, but the relevant perioperative factors that contribute to the mortality of patients with MINS have not been fully evaluated. Objective To establish a comprehensive body of knowledge relating to patients with MINS, we researched the best performing predictive model based on machine learning algorithms. Methods Using clinical data from 7629 patients with MINS from the clinical data warehouse, we evaluated 8 machine learning algorithms for accuracy, precision, recall, F1 score, area under the receiver operating characteristic (AUROC) curve, and area under the precision-recall curve to investigate the best model for predicting mortality. Feature importance and Shapley Additive Explanations values were analyzed to explain the role of each clinical factor in patients with MINS. Results Extreme gradient boosting outperformed the other models. The model showed an AUROC of 0.923 (95% CI 0.916-0.930). The AUROC of the model did not decrease in the test data set (0.894, 95% CI 0.86-0.922; P=.06). Antiplatelet drugs prescription, elevated C-reactive protein level, and beta blocker prescription were associated with reduced 30-day mortality. Conclusions Predicting the mortality of patients with MINS was shown to be feasible using machine learning. By analyzing the impact of predictors, markers that should be cautiously monitored by clinicians may be identified.

Download Full-text

Prediction of addiction to drugs and alcohol using machine learning: A case study on Bangladeshi population

International Journal of Electrical and Computer Engineering (IJECE) ◽

10.11591/ijece.v11i5.pp4471-4480 ◽

2021 ◽

Vol 11 (5) ◽

pp. 4471

Author(s):

Md. Ariful Islam Arif ◽

Saiful Islam Sany ◽

Farah Sharmin ◽

Md. Sadekur Rahman ◽

Md. Tarek Habib

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Performance Metrics ◽

Learning Algorithms ◽

Principal Component ◽

Machine Learning Algorithms ◽

Gradient Boosting ◽

Data Set ◽

Adaptive Boosting ◽

Drugs And Alcohol

Nowadays addiction to drugs and alcohol has become a significant threat to the youth of the society as Bangladesh’s population. So, being a conscientious member of society, we must go ahead to prevent these young minds from life-threatening addiction. In this paper, we approach a machinelearning-based way to forecast the risk of becoming addicted to drugs using machine-learning algorithms. First, we find some significant factors for addiction by talking to doctors, drug-addicted people, and read relevant articles and write-ups. Then we collect data from both addicted and nonaddicted people. After preprocessing the data set, we apply nine conspicuous machine learning algorithms, namely k-nearest neighbors, logistic regression, SVM, naïve bayes, classification, and regression trees, random forest, multilayer perception, adaptive boosting, and gradient boosting machine on our processed data set and measure the performances of each of these classifiers in terms of some prominent performance metrics. Logistic regression is found outperforming all other classifiers in terms of all metrics used by attaining an accuracy approaching 97.91%. On the contrary, CART shows poor results of an accuracy approaching 59.37% after applying principal component analysis.

Download Full-text

Using Machine Learning Algorithms for Accurate Received Optical Power Prediction of an FSO Link over a Maritime Environment

Photonics ◽

10.3390/photonics8060212 ◽

2021 ◽

Vol 8 (6) ◽

pp. 212

Author(s):

Antonios Lionis ◽

Konstantinos Peppas ◽

Hector E. Nistazakis ◽

Andreas Tsigopoulos ◽

Keith Cohn ◽

...

Keyword(s):

Machine Learning ◽

Optical Power ◽

Machine Learning Algorithms ◽

Dew Point ◽

Coefficient Of Determination ◽

Gradient Boosting ◽

Atmospheric Conditions ◽

Free Space Optical ◽

Power Prediction ◽

Data Set

The performance prediction of an optical communications link over maritime environments has been extensively researched over the last two decades. The various atmospheric phenomena and turbulence effects have been thoroughly explored, and long-term measurements have allowed for the construction of simple empirical models. The aim of this work is to demonstrate the prediction accuracy of various machine learning (ML) algorithms for a free-space optical communication (FSO) link performance, with respect to real time, non-linear atmospheric conditions. A large data set of received signal strength indicators (RSSI) for a laser communications link has been collected and analyzed against seven local atmospheric parameters (i.e., wind speed, pressure, temperature, humidity, dew point, solar flux and air-sea temperature difference). The k-nearest-neighbors (KNN), tree-based methods-decision trees, random forest and gradient boosting- and artificial neural networks (ANN) have been employed and compared among each other using the root mean square error (RMSE) and the coefficient of determination (R2) of each model as the primary performance indices. The regression analysis revealed an excellent fit for all ML models, indicative of their ability to offer a significant improvement in FSO performance modeling as compared to traditional regression models. The best-performing R2 model found to be the ANN approach (0.94867), while random forests achieved the most optimal RMSE result (7.37).

Download Full-text

Forecasting US movies box office performances in Turkey using machine learning algorithms

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-189120 ◽

2020 ◽

Vol 39 (5) ◽

pp. 6579-6590

Author(s):

Sandy Çağlıyor ◽

Başar Öztayşi ◽

Selime Sezgin

Keyword(s):

Machine Learning ◽

Global Economy ◽

Learning Algorithms ◽

Forecast Model ◽

Machine Learning Algorithms ◽

Gradient Boosting ◽

High Stakes ◽

Box Office ◽

Industry Forecast ◽

The Impact

The motion picture industry is one of the largest industries worldwide and has significant importance in the global economy. Considering the high stakes and high risks in the industry, forecast models and decision support systems are gaining importance. Several attempts have been made to estimate the theatrical performance of a movie before or at the early stages of its release. Nevertheless, these models are mostly used for predicting domestic performances and the industry still struggles to predict box office performances in overseas markets. In this study, the aim is to design a forecast model using different machine learning algorithms to estimate the theatrical success of US movies in Turkey. From various sources, a dataset of 1559 movies is constructed. Firstly, independent variables are grouped as pre-release, distributor type, and international distribution based on their characteristic. The number of attendances is discretized into three classes. Four popular machine learning algorithms, artificial neural networks, decision tree regression and gradient boosting tree and random forest are employed, and the impact of each group is observed by compared by the performance models. Then the number of target classes is increased into five and eight and results are compared with the previously developed models in the literature.

Download Full-text

In silico Prediction of Inhibitory Constant of Thrombin Inhibitors Using Machine Learning

Combinatorial Chemistry & High Throughput Screening ◽

10.2174/1386207322666181220130232 ◽

2019 ◽

Vol 21 (9) ◽

pp. 662-669 ◽

Cited By ~ 1

Author(s):

Junnan Zhao ◽

Lu Zhu ◽

Weineng Zhou ◽

Lingfeng Yin ◽

Yuchen Wang ◽

...

Keyword(s):

Machine Learning ◽

Prediction Models ◽

Regression Tree ◽

Large Data ◽

Thrombin Inhibitors ◽

Coagulation Cascade ◽

Gradient Boosting ◽

Support Vector ◽

Data Set ◽

Descriptor Selection

Background: Thrombin is the central protease of the vertebrate blood coagulation cascade, which is closely related to cardiovascular diseases. The inhibitory constant Ki is the most significant property of thrombin inhibitors. Method: This study was carried out to predict Ki values of thrombin inhibitors based on a large data set by using machine learning methods. Taking advantage of finding non-intuitive regularities on high-dimensional datasets, machine learning can be used to build effective predictive models. A total of 6554 descriptors for each compound were collected and an efficient descriptor selection method was chosen to find the appropriate descriptors. Four different methods including multiple linear regression (MLR), K Nearest Neighbors (KNN), Gradient Boosting Regression Tree (GBRT) and Support Vector Machine (SVM) were implemented to build prediction models with these selected descriptors. Results: The SVM model was the best one among these methods with R2=0.84, MSE=0.55 for the training set and R2=0.83, MSE=0.56 for the test set. Several validation methods such as yrandomization test and applicability domain evaluation, were adopted to assess the robustness and generalization ability of the model. The final model shows excellent stability and predictive ability and can be employed for rapid estimation of the inhibitory constant, which is full of help for designing novel thrombin inhibitors.

Download Full-text