--> Spatial Sampling Bias in Decision Tree Machine Learning Method for Unconventional Resources
[First Hit]

2019 AAPG Annual Convention and Exhibition:

Datapages, Inc.Print this page

Previous HitSpatialNext Hit Previous HitSamplingNext Hit Bias in Decision Tree Machine Learning Method for Unconventional Resources

Abstract

Machine learning methods, such as decision tree and random forest, are powerful methods for modeling complicated multivariate relationships and may be applied to productivity prediction and uncertainty characterization to support decision-making for unconventional reservoir development. These methods have been developed in a wide variety of applications with available dense or exhaustive Previous HitsamplingNext Hit such as satellite imagery and process automation. In general subsurface modeling and forecasting exhibits sparse, non-representative Previous HitsamplingNext Hit; therefore, it is necessary to account for Previous HitspatialNext Hit Previous HitsamplingNext Hit bias in the construction of the prediction modeling when employing machine learning methods. In this study, polygonal declustering is integrated into a machine learning prediction workflow to mitigate Previous HitspatialNext Hit Previous HitsamplingNext Hit bias with a decision tree. Polygonal declustering provides data weights based on the local data density. These weights may be applied to calculate representative statistics for predictive models. For decision tree, each segmentation is determined by greedy reduction of the residual sum of square (RSS) of the model compared to the data. Declustering the biased data set before partitioning removes the influence of spatially biased data from the model construction. The declustering weights are applied in estimating the prediction for each terminal node and during the tree growth determination of the next hierarchical binary segmentation to minimize the prediction error. Declustering could also be applied to other tree-based methods like bagging and random forests. Tree-based estimation is demonstrated due to the ease to interpret the results. The Previous HitspatialNext Hit weighted decision tree model are demonstrated with two predictor and one response features based on a synthetic, but realistic, 2D geological truth model. By evaluating the error reduction with respect to different degree of bias, the improvement of Previous HitspatialNext Hit weighted decision tree over a naïve tree model is quantified and a trend can be observed. It is shown that Previous HitspatialNext Hit Previous HitsamplingNext Hit bias has a significant effect on the accuracy of the prediction model and that declustering is effective for correcting nonrepresentative data sets. It is recommended that data representativity is addressed for Previous HitspatialTop machine learning prediction.