--> A Data-Driven Method for Processing and Analysis of Gas Chromatography-Mass Spectrometry (GC-MS) Signals in Differentiation of Oil Samples

2019 AAPG Annual Convention and Exhibition:

Datapages, Inc.Print this page

A Data-Driven Method for Processing and Analysis of Gas Chromatography-Mass Spectrometry (GC-MS) Signals in Differentiation of Oil Samples

Abstract

Due to the large number of organic compounds existing in crude oil samples and the variations in compositions, differentiation of oil and extract data by gas chromatography-mass spectrometry (GC-MS) analysis is time consuming and laborious. The process relies heavily on the skills of a limited pool of experienced analysts, and as a consequence delivers subjective outcomes. Comparison typically requires alignment in the time domain as a data pre-processing step, which can be challenging as elution orders can vary depending on test procedures, conditions and data vintages. Machine learning methods are known for their unparalleled ability to efficiently handle large volumes of data, intelligently extracting diagnostic features, and establishing complicated non-linear relationship between data and interpretations. In this study, a data-driven method is proposed to assist the differentiation of geochemistry samples, based on a database of oil and source rock extract GC-MS measurements and machine learning techniques. Chromatogram peaks can be consistently located by a recurrent neural network classifier with the application of a continuous wavelet transform to the total signal. Compound assignment is performed via supervised classification of the mass spectra. The machine learning models are trained with a database of interpreted oil and known source rock extract samples, including various data vintages and instrument types. By comparing automatically assigned, comprehensive compound assignment between individual samples, major differences in the abundance of common compounds and the identification of missing species enables quantitative discrimination between critical samples of interest. Diagnostic compounds identified by this process can be used as a basis for robust production allocation schemes and higher confidence oil-source correlations.