Reproducible geoscientific modelling with hypergraphs

Georg Semmler, TU Bergakademie Freiberg
SemmlerSemmler

Reproducibility of research is one of the cornerstones of modern science. With digital geoscientific models becoming a standard tool to answer questions, reproducibility of the model construction processes becomes relevant. With GeoHub I introduce a framework that allows to:

  • Describe how data are combined to build a complex geoscientific model
  • Record the performed steps used to construct the model such that it is possible to repeat them later
  • Perform automated checks to ensure that the same result can be reproduced at a later time
  • Build a reproduction of a geoscientific model

GeoHubs internal representation of geoscientific construction processes is based on hypergraphs, where each node corresponds to a datasets used as part of the construction process and each hyperedge corresponds to a construction step. This representation allows to reason about internal dependencies of the construction process. Computer executable construction step descriptions allow the repetition of construction processes. By comparing the output of different realizations of the same construction process it is possible to reason about the reproducibility of construction processes in an automated way.

The presented framework is implemented as software prototype. It allows to record existing construction workflows and reason about their reproducibility. This capability is demonstrated by the implementation of several case studies. These case studies include a geophysical data processing workflow, a three dimensional subsurface model construction and the calculation of hydrological balance model.