--> Data-driven Approach to Handling High-dimensional Data Input Space using Feature Selection Based Hybrid Machine Learning Methodology

AAPG Middle East Region GTW, Digital Subsurface Transformation

Datapages, Inc.Print this page

Data-driven Approach to Handling High-dimensional Data Input Space using Feature Selection Based Hybrid Machine Learning Methodology

Abstract

One of the drivers of the 4th Industrial Revolution is big data. Big data is not only about the volume of data but also its dimensionality. With the recent higher scale of data integration in the geosciences for improved reservoir properties prediction, the industry is now facing the challenge of handling high-dimensional data input space. High-dimensional data increases model complexity, memory utility, and computational intensity, thereby reducing model performance and increasing the turnaround time required to deliver modeling results. The major challenge is determining the subset of input variables with the optimal predictive attributes. The physics-based approach requires browsing through the large input variables space and manually selecting the optimal subset based on domain expertise. This approach is subjective as it is biased toward an expert’s level of experience and domain of application. The traditional regression analysis approach assumes a linear relationship between the input variables and the target property. The relationship between field measurements and most reservoir properties are nonlinear. This presentation proposes a pattern recognition approach that utilizes the hybrid machine learning paradigm to automatically extract the best subset of input variables without human bias. The approach is objective and consistent. It is also adaptive to dynamic data and frequent data updates. A case of the reservoir cementation factor prediction that combines the least square fitting capability of Functional Networks with Artificial Neural Networks to build a FN-ANN hybrid model will be discussed. The proposed model has the best match when validated with core data compared to multivariate regression and ANN models using all input variables. The result confirms the efficiency of the nonlinear variable selection process based on the FN algorithm.