[First Hit]

2019 AAPG Annual Convention and Exhibition:

Datapages, Inc.Print this page

Integrated Production and Subsurface Machine Learning Model for Previous HitPredictingNext Hit Previous HitHydrocarbonNext Hit Previous HitRecoveryNext Hit in the Bakken


We combine production and completions data from 9,000 unconventional wells in the Williston Basin with 42,000 geophysical log files representing 10,000 unique wells to build a predictive model for Previous HithydrocarbonNext Hit production. We predict Previous HithydrocarbonNext Hit Previous HitrecoveryNext Hit at 30 day intervals up to 2 years after the start of production. This model contains dozens of input features and incorporates their nonlinear, multivariate effects on Previous HithydrocarbonNext Hit production. The subsurface modeling and Previous HitrecoveryNext Hit predictions are implemented using the open source machine learning tools Scikit-learn and Tensorflow.

In developing the subsurface model for this machine learning approach, we also create predictions for important subsurface features that are not commonly logged due to cost and complexity, such as measurements of Previous HithydrocarbonTop chain length in productive formations of the basin. These measurements show nonlinear relationships both in space and with commonly logged properties, boosting the impact of a relatively sparse dataset. Additionally, we use this public subsurface dataset to identify subdivisions in the Bakken (lower, middle, upper), and train the model with subsurface features informed by previous geologic studies which identified these sections in core studies and well log interpretation.

With only production and completions data included, we see less than 15-25% aggregate error for various areas of the basin. We show successive improvements in this machine learning model by training it with a sequence of datasets: Production and completion data with limited well counts and full dataset, and a model with subsurface features added with both limited well count and full dataset. For all models, we use an 80%-20% training-test split to avoid overfitting the model.

These results show the power of combining open source machine learning tools, state agency subsurface datasets, engineering data, and traditional stratigraphic descriptions to develop models with a basin-scale understanding of the factors affecting production from unconventional wells. Given the wealth of subsurface and completions information available in state records, mining of these datasets could be a significant source of information for training models to aid in future well planning.