Abstract
The problem of missing data is a frequent occurrence in well production history records. Due to network outage, facility maintenance or equipment failure, the time series production data measured from surface and downhole gauges can be intermittent. The fragmentary data are an obstacle for reservoir management. The incomplete dataset is commonly simplified by omitting all observations with missing values, which will lead to significant information loss. Thus, to fill the missing data gaps, in this study, we developed and tested several missing data imputation approaches using machine learning and deep learning methods.
Traditional data imputation methods such as interpolation and counting most frequent values can introduce bias to the data as the correlations between features are not considered. Thus, in this study, we investigated several multivariate imputation algorithms that use the entire set of available data streams to estimate the missing values. The methods use a full suite of well measurements, including wellhead and downhole pressures, oil, water and gas flow rates, surface and downhole temperatures, choke settings, etc. Any parameter that has gaps in its recorded history can be imputed from the other available data streams.
The models were tested on both synthetic and real datasets from operating Norwegian and Abu Dhabi reservoirs. Based on the characteristics of the field data, we introduced different types of continuous missing distributions, which are the combinations of single-multiple missing sections in a long-short time span, to the complete dataset. We observed that as the missing time span expands, the stability of the more successful methods can be kept to a threshold of 30% of the entire dataset. In addition, for a single missing section over a shorter period, which could represent a weather perturbation, most methods we tried were able to achieve high imputation accuracy. In the case of multiple missing sections over a longer time span, which is typical of gauge failures, other methods were better candidates to capture the overall correlation in the multivariate dataset.
Most missing data problems addressed in our industry focus on single feature imputation. In this study, we developed an efficient procedure that enables fast reconstruction of the entire production dataset with multiple missing sections in different variables. Ultimately, the complete information can support the reservoir history matching process, production allocation, and develop models for reservoir performance prediction.