Machine Learning Optimizes Duvernay Shale-Well Performance

You have access to this full article to experience the outstanding content available to SPE members and JPT subscribers.

To ensure continued access to JPT's content, please Sign In, JOIN SPE, or Subscribe to JPT

This paper discusses how machine learning by use of multiple linear regression and a neural network was used to optimize completions and well designs in the Duvernay shale. The methodology revealed solutions that could save more than $1 million per well and potentially deliver an improvement in well performance of greater than 50%. The work flow described rigorously analyzes the relationships between a significant number of well-completion variables, predicts results, and performs optimizations for ideal outcomes. The work flow is not Duvernay-specific and can be applied to other basins and formations.


A fundamental problem for machine learning in many industries is that a responding variable is controlled not by one but by a number of predictor variables. Inferring the relationship between the responding variable and the predictor variables is of key importance. Interactions between predictor variables and noise in the data complicate matters further. This problem can be solved with multiple linear regression or a neural network, both of which use all of the predictor variables together. However, care must be taken to obtain a model that is truly predictive and not merely a result of overfitting the data.

In unconventional oil and gas reservoirs, well performance generally is characterized either at the well level by detailed technical work such as rate-transient analysis, microseismic, and other techniques or at the field level by statistical methods with ranges for production performance. Refinement of this statistical interpretation generally involves normalizing for only one or two key parameters, such as lateral length or tonnage. Additionally, wells usually are grouped or excluded entirely from the population for various reasons, such as substandard completion design. This introduces bias in the selected wells and reduces the sample size. As a result, this approach is limited to the key variables identified and the bias introduced by the well population selected.

The idea of using a neural network has been executed successfully in the past to optimize completions. However, data sets were limited. Recently, the use of machine learning has grown substantially by integrating more variables in the analysis, which reduces reservoir uncertainty.

The goal of the work flow described in the complete paper is to improve on previous methodology by rigorously and statistically refining estimates for well performance without excluding wells and to recommend which variables are and are not influencing well performance. The goal was accomplished through machine learning in the form of a multiple linear regression and a neural network, comparing the results from both.

This work flow has been applied to the Duvernay formation around Fox Creek, Alberta, Canada. The Duvernay is an upper Devonian source rock in the Western Canadian Sedimentary Basin. It ranges in thermal maturity from dry gas in the southwest to black oil in the northeast. The formation is a nanopermeability shale that was not developed with vertical or horizontal wells before 2010. All development has consisted of horizontal multifractured wells, with completion reports and production data available in the public domain.

The well population selected included all Duvernay wells in the 9300-km2 Fox Creek area. To ensure sufficient production data to gauge well performance accurately, only horizontal Duvernay wells with more than 12 months of public production data were used and only wells with a mapped initial condensate yield of less than 500 bbl/MMcf were included. The sample captured the reservoir phase windows of dry gas, wet gas, retrograde condensate, and most of the volatile oil window. In these windows, gas rate was used as an indication of relative performance. Because gas rate would not be a good performance metric for black oil, these wells were excluded. In total, 262 wells met the criteria.

The paper discusses and illustrates regression methods for this machine-learning application, including single linear regression, multiple linear regression, and use of neural networks.  

Regression Goals

The work flow has two main goals. The first is to establish the potential effect of each predictor variable on well performance. This allows for optimization by maximizing factors positively affecting well performance and saves on cost if other predictor variables have little to no effect. The second goal involves estimating the range of well performance for an area, given that certain geological or intrinsic properties such as condensate yield cannot be changed. The ability to estimate adds economic value in the potential for high grading, correct facility sizing, and reducing uncertainty.

To achieve both goals, the data are split into three groups randomly (Fig. 1). The split allows the model to have enough data to train but still allows for testing of its ability to predict. The first group, comprising 60% of the wells, provides the training data, which are used to train the multiple linear regression and neural-network model. The second 20% of the wells provide the cross-validation data, which are not used to train the models. Thus, the regressions can be judged on their ability to generalize and fit data for which they have not been specifically trained. The cross-validation group is also used to tune any hyperparameters. The third group, the last 20%, provides the final test data. These data are only used at the end of the work flow and ensure that any hyperparameters that were tuned to get a good cross-validation data match are also applicable generally.

Fig. 1—Statistical testing structure (neural network).


The paper discusses optimization criteria, predictor variables used in the study, and correlations between all variables considered before the regression models were run.

Work Flows

Generalized Work Flow. The generalized work flow consisted of the following steps.

  • Establish a performance metric as the response variable.
  • Gather potential predictor variables in a numerical form—both quantitative variables and categorical/binary variables for discrete outcomes.
  • Randomly split the wells into three groups, as described previously.
  • Run a multiple linear regression with all variables on the training data. Eliminate variables that have no statistical significance.
  • Eliminate variables one at a time, according to the least statistical significance. The results roughly indicate the order of importance and effect of each variable.
  • Run the neural network with all variables. A range of hyperparameters should be tried with the goal of maximizing the fit on the cross-validation data.
  • Apply both methods to the final test data. The fit on this final test data should represent both models’ ability to generalize and predict.

Fox Creek Work Flow. This work flow was devised and applied to the 262 wells in the Fox Creek Duvernay area.

  • A percentage of type curve was used as the response variable.
  • Twenty-one predictor variables were identified with publicly available data for completions, depths, well spacing, orientations, and liquid yield.
  • The wells were randomly split, with 157 wells in the training group, 53 in the cross-validation group, and 52 in the final test group.
  • A multiple linear regression was run on the training data with all variables. Eight variables were eliminated with no statistical significance.
  • Variables were eliminated one at a time, according to the least statistical significance.
  • The neural network was run 300 times, with a range of hidden nodes and regularization levels with all 21 variables. The cross-validation data were used to optimize the level of regularization and number of hidden nodes.
  • Multiple linear regression and neural-network models were applied to predict values for the final test data, and the results were compared.


Compared with recent well performance of six operators, the neural network suggested substantial ability to improve well performance by varying parameters under operator control. Potential improvement ranged from 19 to 97%, showing significant potential for all operators without expensive strategic testing on numerous wells.

  • The neural network was successful in explaining 78% of the variance in the 52 test wells. The final multiple linear regression was predictive in explaining 67% of the variance in the 52 test wells.
  • Virtually no benefit was indicated from expensive hydraulic-fracturing procedures such as using ceramic or resin-coated proppant or having hybrid fluid systems. Eliminating these offered savings of more than $1 million per well in the Duvernay.
  • No benefit was indicated from placing wells parallel to the minimum horizontal stress rather than on a north/south orientation. Significant savings could result in a land-ownership system, such as in Alberta, that is not aligned with this direction.
  • Total fracture tonnage was confirmed as a key driver of well performance.
  • Fracture pump rate is associated with better well performance and should be investigated further.
  • The work flow can be applied to other fields with different input variables and used to reduce uncertainty and maximize well performance.

This study can be extended in many ways. Geological parameters such as landing zone, porosity, height, water saturation, and mineralogy could be added. Quantification of the effect on production for each of these could be established and used to high-grade specific reservoir areas more accurately. Another aspect that could be added would be to create more than one dependent variable to incorporate more detail about the production-decline behavior. For example, certain variables might be tied more to a higher initial production but not to an increase in ultimate recovery. Therefore, decline parameters such as initial rate, decline rate, end of linear flow, and terminal decline behavior may be predicted individually rather than averaged.

For a limited time, the complete paper SPE 189823 is free to SPE members.

This article, written by JPT Technology Editor Judy Feder, contains highlights of paper SPE 189823, “Machine Learning Applied To Optimize Duvernay Well Performance,” by Braden Bowie, SPE, Apache, prepared for the 2018 SPE Canada Unconventional Resources Conference, Calgary, 13–14 March. The paper has not been peer reviewed.

Machine Learning Optimizes Duvernay Shale-Well Performance

01 May 2019

Volume: 71 | Issue: 5



Don't miss out on the latest technology delivered to your email weekly.  Sign up for the JPT newsletter.  If you are not logged in, you will receive a confirmation email that you will need to click on to confirm you want to receive the newsletter.