Lineage Metadata as a Critical Component of Data Trustworthiness for Subsurface and Analytics Applications


Knowledge regarding the origin of data being used for modeling or analytical purposes is essential to establish trust in the results. Source encompasses not only the device(s) used to create the data, but also the history of subsequent operations performed on the data. Trust is not a quantity or a parameter. For data originating outside the controlled data environment of a company or institution, trust in the usability of the data is a decision based on varying criteria. The completeness of the lineage metadata will play a significant role in allowing the data receiver to establish whether the data can be trusted “as is” or whether a verification process is required prior to usage.To address this issue, companies providing data to customers, partners or other entities would seek guidance on what lineage information requirements, or in the absence of a formal request provide the data with accompanying metadata as they saw fit. Such case-by-case procedures are onerous and in most cases do not entirely satisfy the requirements of the recipient.A number of factors over the past 20 years have exacerbated this issue. Foremost is the increasing complexity of upstream datasets and their exponential growth in volume. The trend towards shorter oilfield development project cycles puts further pressure on staff. Finally, the attrition in subject matter experts (SME) able to assess data validity limits the resources that can be deployed. Modern technologies such as machine learning can provide valuable efficiencies to overcome the lack of lineage metadata. However this would still require some level of supervision and outcome verification, and the level of detail regarding e.g. exact processing history (which software package(s), which version, what parameters, identity of users, dates, etc..) would be limited.The more rational and less ambiguous approach is to make sure that all the necessary information is attached to the data. Starting in 2010 the industry came together to define a standard for metadata, including lineage, data assurance and integrity components, and this was published in 2016 as the Energy Industry Profile (EIP) of ISO 19115-1:2014. It is an ISO Conformance Level 1 profile of the published international standard.The biggest challenge lays ahead: convincing all industry players that the investment in implementing the metadata standards is an effort that will deliver a step-change in data trustworthiness while freeing up valuable SME resources.