The Convergence of Big Data and Extreme Simulation

David Keyes

The Convergence of Big Data and Extreme Simulation

David Keyes

King Abdullah University of Science and Technology

May 7-8, 2018 – AAPG Middle East Region GTW, Digital Subsurface Transformation, Dubai, UAE

Posted: August 3, 2018

Abstract

Motivations abound for the convergence of large-scale simulation and big data: (1) scientific and engineering advances, (2) computational and data storage efficiency, (3) economy of data center operations, and (4) the development of a competitive workforce. To take advantage of advances in analytics and learning, large-scale simulations should evolve to incorporate these technologies in-situ, rather than as forms of post-processing. This potentially reduces burdens of file transfer and the runtime IO that produces the files. In some applications, IO consumes more resources than the computation, itself. Smart steering may obviate significant computation, along with the IO that would accompany it, in unfruitful regions of physical parameter space, as guided by the in-situ analytics. In-situ machine learning offers smart data compression, which complements analytics in leading to reduced IO and reduced storage. Machine learning has the potential to improve the simulation, itself, since many simulations incorporate empirical relationships, such as constitutive parameters or functions that are not derived from first principles, but tuned from dimensional analysis, intuition, observation, or other simulations. Machine learning in-the-loop may ultimately be more effective than the tuning of human experts. Flipping the perspective, simulation potentially provides significant benefits in return to analytics and learning workflows. Theory-guided data science is an emerging paradigm that aims to improve the effectiveness of data science models, by requiring consistency with known scientific principles (e.g., conservation laws). It is analogous to “regularization” in optimization, wherein non-unique candidates are penalized by some physically plausible constraint (such as minimizing energy) to narrow the field. In analytics, among statistically equally plausible outcomes, the field could be narrowed to those that satisfy physical constraints, as checked by simulations. Simulation can also provide training data for machine learning, complementing data that is available from experimentation and observation. There are also beneficial interactions between the two types of workflows within big data. Analytics can provide to machine learning feature vectors for training. Machine learning, in turn, can impute missing data and provide detection and classification. The scientific opportunities are potentially enormous enough to overcome the inertia of the specialized communities that have gathered around each of paradigms and spur convergence.

Search and Discovery
Featured Articles

AAPG Store
Featured Digital Pubs

GIS Map Publishing Program

Online Journal for E&P Geoscientists

The Convergence of Big Data and Extreme Simulation

Abstract

Search and Discovery
Featured Articles

Archives

AAPG Store
Featured Digital Pubs

GIS Map Publishing Program

Online Journal for E&P Geoscientists

The Convergence of Big Data and Extreme Simulation

Abstract

Search and DiscoveryFeatured Articles

Archives

AAPG StoreFeatured Digital Pubs

GIS Map Publishing Program

Search and Discovery
Featured Articles

AAPG Store
Featured Digital Pubs