--> Efficient Access to Relevant Knowledge Extracted From Geoscience Literature Dedicated to Petroleum Basin Exploration by Using IBM Watson

2019 AAPG Annual Convention and Exhibition:

Datapages, Inc.Print this page

Efficient Access to Relevant Knowledge Extracted From Geoscience Literature Dedicated to Petroleum Basin Exploration by Using IBM Watson

Abstract

The aim of the study is to enhance the efficiency for collecting relevant geoscientist data among huge amount of unstructured scientific documents by using machine learning algorithm. Valuable knowledge can be found in scientific document collections, however scientists lack of time and are disconcerted to effectively consult mountains of unstructured documents. The main motivation of this work was to create a system able to identify among large repositories what documents are relevant to answer specific questions related to petroleum exploration and more precisely to source rock characterization. The work have been conduct to apply machine learning systems, namely WATSON (IBM) in order to support geoscientists in a regional geological study. Scientific publications provide information in the form of text, curves or figures. Therefore two types of machine learning algorithms were tested: one dedicated to image recognition (Watson Visual Recognition WVR) and one to text analysis (Watson Knowledge Studio WKS). First WVR was trained to identify specific image/charts in scientific publications (Event Chart, Stratigraphic column, burial curves and well logs). WVR is able to discriminate efficiently the images of interest from the others even if it was trained with only few dozens of seeds for each image class. Second WKS was trained to understand the semantic framework of textual knowledge related to source rocks. The first step was to list a set of questions we would like to provide answers, e.g. what are the formations bearing source rock in Basin X ? What are the Miocene source rock formations in Country X ? What are the depositional environment of the source rocks in Basin X ? Based on the set of questions and on the recurrence of terms, an ontology (a definition of the entities and relations between entities) was defined. The ontology was willingly limited to ten entities and their relations to make a quick test. WKS has been trained on a set of annotated documents (~150 extracts of ~1000 words). The trained WKS model is able to identify quite efficiently the entities and the associated relations. Then the two trained models have been applied on a new set of documents, and the extracted information has been stored in a database. The last step was to translate our natural language questions into queries. The final result is a list of few documents selected and order with an index of relevance by our system. The proposed workflow is promising thanks to the good performance obtained.