Ensemble Machine Learning Assisted Reservoir Characterization Using Field Production Data–An Offshore Field Case Study

Baozhong Wang; Jyotsna Sharma; Jianhua Chen; Patricia Persaud

doi:10.3390/en14041052

Ensemble Machine Learning Assisted Reservoir Characterization Using Field Production Data–An Offshore Field Case Study

Energies ◽

10.3390/en14041052 ◽

2021 ◽

Vol 14 (4) ◽

pp. 1052

Author(s):

Baozhong Wang ◽

Jyotsna Sharma ◽

Jianhua Chen ◽

Patricia Persaud

Keyword(s):

Machine Learning ◽

Random Forest ◽

Reservoir Characterization ◽

Time Lapse ◽

Production Data ◽

Oil Saturation ◽

Ensemble Machine Learning ◽

Input Parameters ◽

Saturation Profiles ◽

Field Production

Estimation of fluid saturation is an important step in dynamic reservoir characterization. Machine learning techniques have been increasingly used in recent years for reservoir saturation prediction workflows. However, most of these studies require input parameters derived from cores, petrophysical logs, or seismic data, which may not always be readily available. Additionally, very few studies incorporate the production data, which is an important reflection of the dynamic reservoir properties and also typically the most frequently and reliably measured quantity throughout the life of a field. In this research, the random forest ensemble machine learning algorithm is implemented that uses the field-wide production and injection data (both measured at the surface) as the only input parameters to predict the time-lapse oil saturation profiles at well locations. The algorithm is optimized using feature selection based on feature importance score and Pearson correlation coefficient, in combination with geophysical domain-knowledge. The workflow is demonstrated using the actual field data from a structurally complex, heterogeneous, and heavily faulted offshore reservoir. The random forest model captures the trends from three and a half years of historical field production, injection, and simulated saturation data to predict future time-lapse oil saturation profiles at four deviated well locations with over 90% R-square, less than 6% Root Mean Square Error, and less than 7% Mean Absolute Percentage Error, in each case.

Download Full-text

Learning from Imbalanced Educational Data Using Ensemble Machine Learning Algorithms

Webology ◽

10.14704/web/v18si01/web18053 ◽

2021 ◽

Vol 18 (Special Issue 01) ◽

pp. 183-195

Author(s):

Thingbaijam Lenin ◽

N. Chandrasekaran

Keyword(s):

Machine Learning ◽

Random Forest ◽

Missing Values ◽

Machine Learning Techniques ◽

Gradient Boosting ◽

Adaptive Boosting ◽

Stochastic Gradient Boosting ◽

Ensemble Machine Learning ◽

Learning Techniques ◽

Student’S Performance

Student’s academic performance is one of the most important parameters for evaluating the standard of any institute. It has become a paramount importance for any institute to identify the student at risk of underperforming or failing or even drop out from the course. Machine Learning techniques may be used to develop a model for predicting student’s performance as early as at the time of admission. The task however is challenging as the educational data required to explore for modelling are usually imbalanced. We explore ensemble machine learning techniques namely bagging algorithm like random forest (rf) and boosting algorithms like adaptive boosting (adaboost), stochastic gradient boosting (gbm), extreme gradient boosting (xgbTree) in an attempt to develop a model for predicting the student’s performance of a private university at Meghalaya using three categories of data namely demographic, prior academic record, personality. The collected data are found to be highly imbalanced and also consists of missing values. We employ k-nearest neighbor (knn) data imputation technique to tackle the missing values. The models are developed on the imputed data with 10 fold cross validation technique and are evaluated using precision, specificity, recall, kappa metrics. As the data are imbalanced, we avoid using accuracy as the metrics of evaluating the model and instead use balanced accuracy and F-score. We compare the ensemble technique with single classifier C4.5. The best result is provided by random forest and adaboost with F-score of 66.67%, balanced accuracy of 75%, and accuracy of 96.94%.

Download Full-text

Subsurface Characterization Using Ensemble Machine Learning

10.4043/31061-ms ◽

2021 ◽

Author(s):

Gorka G Leiceaga ◽

Robert Balch ◽

George El-kaseeh

Keyword(s):

Machine Learning ◽

Reservoir Characterization ◽

Gamma Ray ◽

Complex Model ◽

Scientific Model ◽

Adequate Model ◽

Subsurface Characterization ◽

Ensemble Machine Learning ◽

Data Driven Approach ◽

High Level

Abstract Reservoir characterization is an ambitious challenge that aims to predict variations within the subsurface using fit-for-purpose information that follows physical and geological sense. To properly achieve subsurface characterization, artificial intelligence (AI) algorithms may be used. Machine learning, a subset of AI, is a data-driven approach that has exploded in popularity during the past decades in industries such as healthcare, banking and finance, cryptocurrency, data security, and e-commerce. An advantage of machine learning methods is that they can be implemented to produce results without the need to have first established a complete theoretical scientific model for a problem – with a set of complex model equations to be solved analytically or numerically. The principal challenge of machine learning lies in attaining enough training information, which is essential in obtaining an adequate model that allows for a prediction with a high level of accuracy. Ensemble machine learning in reservoir characterization studies is a candidate to reduce subsurface uncertainty by integrating seismic and well data. In this article, a bootstrap aggregating algorithm is evaluated to determine its potential as a subsurface discriminator. The algorithm fits decision trees on various sub-samples of a dataset and uses averaging to improve the accuracy of the prediction without over-fitting. The gamma ray results from our test dataset show a high correlation with the measured logs, giving confidence in our workflow applied to subsurface characterization.

Download Full-text

Coal Pit Mapping with Random Forest-Based Ensemble Machine Learning at Lower Benue Trough

International Journal of Scientific and Research Publications (IJSRP) ◽

10.29322/ijsrp.10.12.2020.p10851 ◽

2020 ◽

Vol 10 (12) ◽

pp. 470-473

Author(s):

Okeke Francis Ifeanyi ◽

Ibrahim Adesina Adekunle ◽

Echeonwu Emmanuel Chinyere

Keyword(s):

Machine Learning ◽

Random Forest ◽

Benue Trough ◽

Ensemble Machine Learning ◽

Lower Benue Trough

Download Full-text

Lithofacies Classification of Carbonate Reservoirs Using Advanced Machine Learning: A Case Study from a Southern Iraqi Oil Field

10.4043/31114-ms ◽

2021 ◽

Author(s):

Mohammed A. Abbas ◽

Watheq J. Al-Mudhafar

Keyword(s):

Machine Learning ◽

Random Forest ◽

Reservoir Characterization ◽

Carbonate Reservoir ◽

Oil Field ◽

Machine Learning Techniques ◽

Classification Algorithms ◽

Classification Rate ◽

Porosity And Permeability ◽

Lithofacies Classification

Abstract Estimating rock facies from petrophysical logs in non-cored wells in complex carbonates represents a crucial task for improving reservoir characterization and field development. Thus, it most essential to identify the lithofacies that discriminate the reservoir intervals based on their flow and storage capacity. In this paper, an innovative procedure is adopted for lithofacies classification using data-driven machine learning in a well from the Mishrif carbonate reservoir in the giant Majnoon oil field, Southern Iraq. The Random Forest method was adopted for lithofacies classification using well logging data in a cored well to predict their distribution in other non-cored wells. Furthermore, three advanced statistical algorithms: Logistic Boosting Regression, Bagging Multivariate Adaptive Regression Spline, and Generalized Boosting Modeling were implemented and compared to the Random Forest approach to attain the most realistic lithofacies prediction. The dataset includes the measured discrete lithofacies distribution and the original log curves of caliper, gamma ray, neutron porosity, bulk density, sonic, deep and shallow resistivity, all available over the entire reservoir interval. Prior to applying the four classification algorithms, a random subsampling cross-validation was conducted on the dataset to produce training and testing subsets for modeling and prediction, respectively. After predicting the discrete lithofacies distribution, the Confusion Table and the Correct Classification Rate Index (CCI) were employed as further criteria to analyze and compare the effectiveness of the four classification algorithms. The results of this study revealed that Random Forest was more accurate in lithofacies classification than other techniques. It led to excellent matching between the observed and predicted discrete lithofacies through attaining 100% of CCI based on the training subset and 96.67 % of the CCI for the validating subset. Further validation of the resulting facies model was conducted by comparing each of the predicted discrete lithofacies with the available ranges of porosity and permeability obtained from the NMR log. We observed that rudist-dominated lithofacies correlates to rock with higher porosity and permeability. In contrast, the argillaceous lithofacies correlates to rocks with lower porosity and permeability. Additionally, these high-and low-ranges of permeability were later compared with the oil rate obtained from the PLT log data. It was identified that the high-and low-ranges of permeability correlate well to the high- and low-oil rate logs, respectively. In conclusion, the high quality estimation of lithofacies in non-cored intervals and wells is a crucial reservoir characterization task in order to obtain meaningful permeability-porosity relationships and capture realistic reservoir heterogeneity. The application of machine learning techniques drives down costs, provides for time-savings, and allows for uncertainty mitigation in lithofacies classification and prediction. The entire workflow was done through R, an open-source statistical computing language. It can easily be applied to other reservoirs to attain for them a similar improved overall reservoir characterization.

Download Full-text

ANALYSIS OF SINGLE AND ENSEMBLE MACHINE LEARNING CLASSIFIERS FOR PHISHING ATTACKS DETECTION

International Journal of Computer Systems & Software Engineering ◽

10.15282/ijsecs.7.2.2021.5.0088 ◽

2021 ◽

Vol 7 (2) ◽

pp. 44-49

Author(s):

Oyelakin A. M ◽

Alimi O. M ◽

Mustapha I. O ◽

Ajiboye I. K

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Random Forest ◽

Decision Trees ◽

Random Forest Algorithm ◽

Ensemble Techniques ◽

Learning Classifiers ◽

Phishing Attacks ◽

Ensemble Machine Learning

Phishing attacks have been used in different ways to harvest the confidential information of unsuspecting internet users. To stem the tide of phishing-based attacks, several machine learning techniques have been proposed in the past. However, fewer studies have considered investigating single and ensemble machine learning-based models for the classification of phishing attacks. This study carried out performance analysis of selected single and ensemble machine learning (ML) classifiers in phishing classification.The focus is to investigate how these algorithms behave in the classification of phishing attacks in the chosen dataset. Logistic Regression and Decision Trees were chosen as single learning classifiers while simple voting techniques and Random Forest were used as the ensemble machine learning algorithms. Accuracy, Precision, Recall and F1-score were used as performance metrics. Logistic Regression algorithm recorded 0.86 as accuracy, 0.89 as precision, 0.87 as recall and 0.81 as F1-score. Similarly, the Decision Trees classifier achieved an accuracy of 0.87, 0.83 for precision, 0.88 for recall and 0.81 for F1-score. In the voting ensemble, accuracy of 0.92 was achieved. 0.90 was obtained for precision, 0.92 for recall and 0.92 for F1-score. Random Forest algorithm recorded 0.98, 0.97, 0.98 and 0.97 as accuracy, precision, recall and F1-score respectively. From the experimental analyses, Random Forest algorithm outperformed simple averaging classifier and the two single algorithms used for phishing url detection. The study established that the ensemble techniques that were used for the experimentations are more efficient for phishing url identification compared to the single classifiers.

Download Full-text

Improved Reservoir Characterization Incorporating Time-Lapse Seismic And Production Data

10.3997/2214-4609-pdb.223.007 ◽

1997 ◽

Author(s):

Xuri Huang ◽

Geoffrey A. King

Keyword(s):

Reservoir Characterization ◽

Time Lapse ◽

Production Data

Download Full-text

Ensemble Machine Learning: The Latest Development in Computational Intelligence for Petroleum Reservoir Characterization

10.2118/168111-ms ◽

2013 ◽

Cited By ~ 2

Author(s):

Fatai A. Anifowose

Keyword(s):

Machine Learning ◽

Computational Intelligence ◽

Reservoir Characterization ◽

Petroleum Reservoir ◽

Ensemble Machine Learning

Download Full-text

The Comparison of Tree-Based Ensemble Machine Learning for Classifying Public Datasets

RSF Conference Series: Engineering and Technology ◽

10.31098/cset.v1i1.412 ◽

2021 ◽

Vol 1 (1) ◽

pp. 407-413

Author(s):

Nur Heri Cahyana ◽

Yuli Fauziah ◽

Agus Sasmito Aribowo

Keyword(s):

Machine Learning ◽

Random Forest ◽

Class Size ◽

Good Method ◽

Test Dataset ◽

Ensemble Machine Learning ◽

Tree Classifier ◽

Public Datasets ◽

Number Of Classes ◽

The Relationship

This study aims to determine the best methods of tree-based ensemble machine learning to classify the datasets used, a total of 34 datasets. This study also wants to know the relationship between the number of records and columns of the test dataset with the number of estimators (trees) for each ensemble model, namely Random Forest, Extra Tree Classifier, AdaBoost, and Gradient Bosting. The four methods will be compared to the maximum accuracy and the number of estimators when tested to classify the test dataset. Based on the results of the experiments above, tree-based ensemble machine learning methods have been obtained and the best number of estimators for the classification of each dataset used in the study. The Extra Tree method is the best classifier method for binary-class and multi-class. Random Forest is good for multi-classes, and AdaBoost is a pretty good method for binary-classes. The number of rows, columns and data classes is positively correlated with the number of estimators. This means that to process a dataset with a large row, column or class size requires more estimators than processing a dataset with a small row, column or class size. However, the relationship between the number of classes and accuracy is negatively correlated, meaning that the accuracy will decrease if there are more classes for classification.

Download Full-text

Price Prediction for Pre-Owned Cars Using Ensemble Machine Learning Techniques

10.3233/apc210194 ◽

2021 ◽

Author(s):

Chetna Longani ◽

Sai Prasad Potharaju ◽

Sandhya Deore

Keyword(s):

Machine Learning ◽

Random Forest ◽

Mean Squared Error ◽

Machine Learning Techniques ◽

Random Forest Algorithm ◽

Fair Price ◽

Ensemble Machine Learning ◽

Comparable Performance ◽

Used Car ◽

Used Cars

The Pre-owned cars or so-called used cars have capacious markets across the globe. Before acquiring a used car, the buyer should be able to decide whether the price affixed for the car is genuine. Several facets including mileage, year, model, make, run and many more are needed to be considered before getting a hold of any pre-owned car. Both the seller and the buyer should have a fair deal. This paper presents a system that has been implemented to predict a fair price for any pre-owned car. The system works well to anticipate the price of used cars for the Mumbai region. Ensemble techniques in machine learning namely Random Forest Algorithm, eXtreme Gradient Boost are deployed to develop models that can predict an appropriate price for the used cars. The techniques are compared so as to determine an optimal one. Both the methods provided comparable performance wherein eXtreme Boost outperformed the random forest algorithm. Root Mean Squared Error of random forest recorded 3.44 whereas eXtreme Boost displayed 0.53.

Download Full-text

Disease Classification and Prediction using Ensemble Machine Learning Classification Algorithm

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.f5507.039621 ◽

2021 ◽

Vol 9 (6) ◽

pp. 202-214

Author(s):

Preeth B.Meena ◽

Radha, P.

Keyword(s):

Machine Learning ◽

Random Forest ◽

Medical Records ◽

Machine Learning Algorithms ◽

Ensemble Prediction ◽

Support Vector ◽

Disease Rate ◽

Medical Field ◽

Machine Learning Classification ◽

Ensemble Machine Learning

In today’s scenario, disease prediction plays an important role in medical field. Early detection of diseases is essential because of the fast food habits and life. In my previous study for predicting diseases using radiology test report , and to classify the disease as positive or negative three classifiers Naïve Bayes (NB), Support Vector Machine (SVM) and Modified Extreme Learning Machine (MELM was used to increase the accuracy of results. To increase the efficiency of predicting the disease and to find which disease pricks the society, ensemble machine learning algorithm is used. The huge data from the healthcare industry were preprocessed., categorized and analyzed to find out and predict which patient to be treated and given priority and which hits the society the most. Ensemble machine learning's popularity in the medical industry is due to a variety of factors the Classifiers used are K Nearest Neighbors, Nearest Mean Classifier, Mean Feature Voting Classifier, KDtree KNN, Random Forest. To reduce the manual processes in medical field automating these processes has become important. Electronic medical records and significant advances in health care have given an opportunity to make find out which patients need to be given more importance. Several methodologies and techniques were used to preprocess the data in order to meet the study' requirements. To improve the performance of machine learning algorithms, feature selections were made using Tabu search. When ensemble prediction is combined with the Random Forest algorithm as the combiner, the results are more reliable. The aim of this study is to create a system to classify Medical records whether it is diseased or not and find out which disease rate has increased. This research will help the society to an individual to get treated easily and take preventive measures to avoid diseases.

Download Full-text