Face-Detection Algorithm Handles Big Data To Help Identify Candidates for Restimulation
The recent proliferation of subsurface data from instrumented wells has created significant challenges for traditional production-data-analysis methods to extract useful information for reservoir management. This paper demonstrates the viability of a production-data-classification approach adapted from real-time face detection for identifying restimulation candidates. The approach has the potential to be used as a big-data analytic tool for long-duration production-data analysis to serve as a screening tool for selection of restimulation candidates.
Restimulation treatments in producing shale wells have the potential to improve economic performance by increasing the conductivity of existing fractures or enhancing their contact with the formation. The influence of matrix and fracture characteristics on the success of restimulation, however, is not completely understood, which has led to uncertainty in determining favorable candidate wells. Several methods to select restimulation candidates have been proposed. These methods, however, are time-consuming and tend to require detailed input data or exhibit a lack of generality for other reservoir settings.
This paper aims to address these challenges with a new methodology for fast and robust analysis of production data from hydraulically fractured wells. A dual-permeability forward-flow modeling approach is used to generate multiple realizations of production-rate profiles by modifying fracture and other parameters. Using this data, pattern-recognition tools are applied to help uncover trends associated with favorable and unfavorable restimulation candidates. This is achieved using a binary classification framework adapted from real-time face detection, which uses simple numerical criteria computed directly from raw flow-rate data, thus eliminating the need for detailed information and promoting computational efficiency. The algorithm also provides probabilistic predictions, which serve as a means to rank candidate wells. While the process of training the classifier has the potential to be computationally intensive, the application of the trained classifier on the observed data is extremely fast, making the method useful for real-time classification of well performance.
The Viola-Jones face-detection algorithm is a binary classification scheme that takes labeled examples of images containing faces and nonfaces and develops a set of numerical criteria for distinguishing between both categories. Using these rules, an arbitrary test image can be analyzed and classified as a face or a nonface.
First, training data are made up of images sorted into face and nonface categories, with each individual image represented as a 2D array of pixel intensities. Second, each training image is characterized using a set of simple spatial templates called Haar-like features. Fig. 1 displays the geometries of Haar-like features used in the Viola-Jones algorithm. In each case, a feature score is computed by subtracting the sum of pixel values in the white region from the sum of pixel values in the gray region. By shifting and scaling these features across the image window, the essential patterns characterizing the observed face or signal can be identified in the pattern-recognition scheme.
Next, given a large collection of scores derived from multiple permutations of feature geometries and dimensions across all training images, a statistical learning technique known as AdaBoost (adaptive boosting) is used to select the subset of features that best captures the difference between face and nonface examples. The output of the AdaBoost routine is a linear combination of the most discriminative features and a threshold for predicting class membership.
Finally, a cascade training procedure is used to build a series of increasingly complex AdaBoost classifiers. This allows straightforward cases to be eliminated early and saves expensive computation for more-difficult images.
Production-Data Classifier Training
In order to apply the face-detection methodology to production-data analysis, well rate history is represented as 1D vectorized images with pixel values indicating flow-rate magnitudes.
By representing well data in this form, the face-detection framework can be extended to classify production data using horizontal 1D features such as Features 1 and 2 in Fig. 1. In other words, if two categories of production data are specified (corresponding to good restimulation candidates and poor) on the basis of subsurface-fracture characteristics, a binary classifier can be trained to distinguish between both sets of data.
Given the means to characterize production data, the next step is to apply pattern-recognition techniques to distinguish between different categories of data. In a manner similar to the face-detection framework, this is achieved by use of a cascaded binary classification scheme, which is presented in detail in the complete paper.
The next objective is to generate a list of feature scores for each data sample. This is accomplished by translating each feature geometry in time and computing the difference between cumulative normalized production rates in the white and gray regions for different scales. Intuitively, this implies that the algorithm uses gradients and jumps in rate to characterize the influence of fractures and other pertinent reservoir parameters.
Using this information, an ensemble of production profiles can be scanned and the probability that each well belongs to either category can be estimated. This provides a quantification of uncertainty associated with the prediction of a well to be a favorable or unfavorable restimulation candidate. If the probabilities are not very discriminatory, the procedure would suggest that the observed production information is inadequate for making reservoir-development decisions and that other sources of information need to be investigated in order to lead to more-robust decisions.
The final step in the overall production-data classifier training framework is the cascade training procedure. In binary classification literature, the true positive ratio represents the proportion of data that belong to the favorable category (+1) and are predicted correctly. On the other hand, the false-positive ratio refers to the proportion of data that belong to the unfavorable class (-1) and are predicted as members of the favorable class.
The cascade training process builds a series of increasingly more discriminative AdaBoost classifiers to promote computational efficiency. The final outcome of the training process is a series of increasingly complex AdaBoost models that only allows the favorable class through to later stages. In other words, when an arbitrary well production profile is assessed with the cascade, the trained AdaBoost model is run in each successive stage and only proceeds to the next stage if a value of +1 is predicted. In this way, data that display obviously unfavorable characteristics are immediately classified as such at early stages, thus reserving expensive computation for more-difficult cases.
In addition to aiding in selection of restimulation candidates, this methodology has potential applications in real-time screening of massive volumes of production data in order to identify rapidly wells that meet other criteria for reservoir development.
Results and Discussion
Using the generated training production data along with class labels denoting favorable and unfavorable restimulation candidates, the proposed Viola-Jones framework has been applied to train a production-data classification model.
Three separate candidate selection criteria also have been considered. In each case, in order to capture the effect of the timing of refracturing, either 2 or 3 years of production data before the restimulation were used. After the training process, the predictive performance of each of these classifiers was tested by assessing a synthetic test set of well-production profiles that were not available to the training algorithm.
Results show that, regardless of the selection criteria for the restimulation candidate, favorable and unfavorable candidates were distinguished with an accuracy of 76–83%. In all cases, while training may be computationally intensive, the final trained classifier was able to provide predictions on the test set in near real time. For example, the classifier based on the criterion of change in cumulative production can make predictions on 100 samples of 2-year well production in 0.35 seconds.
Finally, the proposed approach was validated by use of publicly available production data.
An interesting observation from the results is that, when the classification model was run on field data, all wells restimulated after 3 years had been accurately predicted, while 50% of the wells restimulated after 2 years had been misclassified. This contrasts with the results from an assessment of synthetic data where slightly better performance is seen in predicting 2-year cases than 3-year cases. This disparity can be attributed to the limited sample size of the field data. In other words, if additional field data were considered, the overall classification accuracy would likely be similar to those reported with the synthetic cases.
This article, written by Special Publications Editor Adam Wilson, contains highlights of paper SPE 187328, “From Face Detection to Fractured-Reservoir Characterization: Big Data Analytics for Restimulation-Candidate Selection,” by Egbadon Udegbe, SPE, Eugene Morgan, SPE, and Sanjay Srinivasan, SPE, The Pennsylvania State University, prepared for the 2017 SPE Annual Technical Conference and Exhibition, San Antonio, Texas, USA, 9–11 October. The paper has not been peer reviewed.
Don't miss out on the latest technology delivered to your email monthly. Sign up for the Data Science and Digital Engineering newsletter. If you are not logged in, you will receive a confirmation email that you will need to click on to confirm you want to receive the newsletter.
09 May 2019