The latest issue of Nature group’s Scientific Data journal features papers on the Materials Cloud, an Open Science Platform designed to enable the seamless sharing of resources in computational materials science. Another feature is on AiiDA, an open-source Python infrastructure that helps researchers automate and share computational workflows. The publication is a testimony to the ever-increasing adoption of the two tools that have emerged from EPFL Professor Nicola Marzari’s Theory of Simulation and Materials (THEOS) group, and now the cornerstones of NCCR MARVEL’s Open Science strategy.
The principle that open access to data, software and, ultimately, infrastructure, leads to scientific results that can be assessed, verified and reproduced is core to the mission of open computational science. Today’s information technology enables the design of open-science platforms enabling scientists to use existing data, submit new content and launch new simulations with minimal technical expertise.
Materials Cloud aims to be such a platform—“Materials Cloud, a platform for open computational science” is the first paper describing the non-profit service developed and supported by NCCR MARVEL, the European H2020 MaX Centre of Excellence, as well as a number of other partners.
Materials Cloud is a platform designed to enable open and seamless sharing of resources for computational science, driven by applications in materials modelling. It hosts (1) archival and dissemination services for raw and curated data, together with their provenance graph, (2) modelling services and virtual machines, (3) tools for data analytics, and pre-/post-processing, and (4) educational materials. Data is citable and archived persistently, with comprehensive coverage of entire simulation pipelines (calculations performed, codes used, data generated) in the form of graphs that allow retracing and reproducing any computed result. When an AiiDA database is shared on Materials Cloud, peers can browse the interconnected record of simulations, download individual files or the full database, and start their research from the results of the original authors. The infrastructure is agnostic to the specific simulation codes used and can support diverse applications in computational science that transcend its initial materials domain.
The paper “AiiDA 1.0, a scalable computational infrastructure for automated reproducible workflows and data provenance” describes how the infrastructure has evolved in recent years. Publishing the paper in Scientific Data, a cross-disciplinary open access journal focused the publication of peer-reviewed research data in an accessible way in order to facilitate interpretation and reuse, underscores how AiiDA has evolved from a useful tool for materials science calculations to become a viable alternative for the management of scientific data in general.
The increasing availability of computing power and the sustained development of advanced computational methods have made a significant contribution to recent scientific progress. These developments present new challenges driven by the sheer amount of calculations and data to manage. Next-generation exascale supercomputers will tackle these challenges, making automated and scalable solutions crucial. In recent years, AiiDA (aiida.net) has evolved into a robust open-source high-throughput infrastructure addressing the challenges arising from the needs of automated workflow management and data provenance recording. Here, we introduce developments and capabilities required to reach sustained performance, with AiiDA supporting throughputs of tens of thousands processes/hour, while automatically preserving and storing the full data provenance in a relational database making it queryable and traversable, thus enabling high-performance data analytics. AiiDA’s workflow language provides advanced automation, error handling features and a flexible plugin model to allow interfacing with external simulation software. The associated plugin registry enables seamless sharing of extensions, empowering a vibrant user community dedicated to making simulations more robust, user-friendly and reproducible.