2019 AAPG Annual Convention and Exhibition:

Datapages, Inc.Print this page

Using PyCHNO to Generate Training-Image Datasets for Machine Learning Ichnology


Labeled image training datasets have been modeled using deep-learning approaches to successfully identify unlabeled image datasets (e.g. identifying cats in images). However, in order to for this method to correctly recognize images, thousands of labeled training images are typically required.

Considering that a labeled ichnofossil image library does not exist, this approach is generally deemed unsuitable for the automated detection of trace fossils. Ichnofossil detection is further complicated by the variability of trace-fossil cuts, size, and deformation, as well as lithological and regional changes in core. Therefore, an accurate deep-learning model likely requires tens of thousands of training images.

Sedimentary core datasets sometimes include thousands of preserved trace fossils. Using the open source ichnology data collection software PyCHNO, a skilled ichnology worker can rapidly click on, label, and reference thousands of trace fossils from core images. In this presentation we present a software add-on that can be used to extract thousands of labeled trace-fossil images from a PyCHNO-collected core dataset in order to generate a labeled database for machine learning approaches. In addition, images from zones containing no trace fossils can be extracted from core datasets for solving bioturbated vs. unbioturbated zone problems. We use an example from the Cretaceous McMurray Formation of NE Alberta to demonstrate this approach.

Using this approach, a few core-image datasets from wells in close proximity can be used to generate a labeled training dataset comprising thousands of images. These images can subsequently be input into a deep learning framework (e.g. Keras) for generating a model of trace fossil identification, which can potentially be used to label, at the very least, bioturbated vs. unbioturbated zones from unlabeled core images.

It is important to note that the proposed approach requires an expert to collect PyCHNO data. Indeed, the quality of the model depends on the skills of the ichnologist identifying trace fossils. And, generated models, may not be initially well suited to regional or cross-formational labeling of trace fossils from core images. Also note that PyCHNO can be used to collect sedimentary structures and can similarly be used to generate sedimentary structure image training datasets. The proposed method has the potential of significantly increasing the power of core-image datasets, by allowing ichnologists to focus on interpreting trace-fossil distributions rather than spending considerable time collecting data.