From Script to STAMPED Research Object
·9 mins
Four scenarios showing how to incrementally add STAMPED properties to a shell-based analysis using git, make, and singularity.
Reproducibility is the cornerstone aspiration of scientific data management. A result is reproducible when re-running the same analysis on the same data yields identical outcomes. This requires:
STAMPED principles support reproducibility through version control (pinning exact states of data and code), provenance tracking (recording what was run and how), containerization (freezing computational environments), and modular dataset organization (making it possible to re-assemble all components of an analysis). When every element of an analysis is tracked and versioned, reproduction becomes a matter of checking out the right state and re-executing.