Data Curation Service for DiRAC and IRIS
The Data Curation Service for DiRAC and IRIS aims to make high‑value outputs from UK STFC communities—such as cosmology, astronomy, and particle/nuclear physics—more findable, shareable, and preserved over the long term. It addresses the end‑to‑end path from HPC job outputs on DiRAC systems to curated, citable datasets discoverable across the IRIS ecosystem, aligning with funder policies and FAIR principles while reducing the curation burden on research groups.
Core deliverables typically include a common metadata model and lightweight templates tailored to DiRAC/IRIS science domains; workflows that move, validate, and package data from HPC storage into archival and dissemination services; assignment of persistent identifiers for datasets, projects, and provenance; and access controls that enable appropriate sharing from embargoed collaboration access through to open release. The service is coupled with guidance and support for data management planning, clear retention policies across storage tiers, and integration points with community repositories and facility services.
Anticipated benefits are improved reproducibility and reuse of flagship simulations and observational data products, easier citation and credit for teams, and more efficient use of storage through lifecycle management. The project’s next steps typically focus on scaling pilots into production across sites, strengthening discovery services and APIs, integrating with facility authentication/authorisation, and delivering training for researchers and RSEs. Share the slide content and I’ll tailor this precisely to the project’s stated scope, milestones, and timelines.