--> Bring the Geophysical Data Onto a High Performance Data Node

International Conference & Exhibition

Datapages, Inc.Print this page

Bring the Geophysical Data Onto a High Performance Data Node

Abstract

National Computational Infrastructure (NCI) manages national environmental research data collections (10+ PB) as part of its specialized high performance data node of the Research Data Storage Infrastructure (RDSI) program. The Australian National Geophysical Collection is one of the RDSI funded collections. It includes the most comprehensive publicly available Australian airborne magnetic, gamma-ray, seismic, electromagnetic and gravity data sets. The airborne geophysics data set contains approximately 32.8 million line kilometres of data, which, at current prices, would cost approximately $197 million to acquire. The gravity data set contains more than 1.57 million reliable onshore stations gathered during more than 1800 surveys. The collection also includes a large number of seismic surveys from Australia's onshore explosive, wide-angle reflection and refraction surveys, as well as seismic surveys cross offshore basins. The total size of this geophysical data collection is 300TB. The data is made available in a HPC and data-intensive environment - a ~56000 core supercomputer, virtual labs on a 3000 core cloud system, and data services. Our data management practices include Data Management Plan (DMP), create catalogues on all the data records, providing data services, minting persistent identifier. The DMP is developed to record the workflows, procedures, the key contacts and responsibilities. The DMP has fields that can be exported to the ISO19115 schema and to the collection level catalogue of GeoNetwork. The subset or file level metadata catalogues are linked with the collection level through parent-child relationship definition using UUID. A number of tools have been developed that support interactive metadata management, bulk loading of data, and support for computational workflows or data pipelines. The Digital Objective Identifier (DOI) will be minted for each dataset so that the data collection is tracked over its lifetime. The data citation also helps on the recognition of the data providers, data owners, data generators and data aggregators. With our data management, users can easily find the data, process the data through our Virtual Geophysical Laboratory available on cloud, or work with the data using the on-site high performance computing resources.