HAI‑End
Supercomputers have become a core accelerant of progress in computational research disciplines and fields relying on artificial intelligence and simulations of physical phenomena. Tools including compilers, profilers, debuggers, parallelisation libraries, machine learning libraries, and so forth make this hardware infrastructure available to software developers. They unlock HPC. While there is a lot of entry-level training and material to get into development tools for supercomputing, there is no clear pathway how infrastructure professionals and researchers can transition from being ``good with HPC tools’’ towards technical mastery. There is a lack of appropriate training or knowledge acquisition routes for the most advanced users already working with HPC and AI high-end computing (HAI-End computing). Since it is tools that allow us to engineer code systematically and to reason and find out why implementations behave the way they do, this lack leads into a situation where flagship codes run and benefit from HPC - mainly through self-taught developers - but realisation variants and realisation workflows are seldom documented, assessed and compared to each other systematically. Intelligence “how to implement things” is not captured and shared. Consequently, developers struggle to benefit from HPC insight obtained through other codes, and HPC cannot unfold its full potential.
In HAI-End, we deliver bespoke training for the most advanced HPC/AI user group. We recruit a small cohort of them every year from various career tracks and research fields. These future HPC champions are given an insight into how tools work, what they are capable of, what alternatives are on the table, where tools have shortcomings. Such an upskilling is only possible in close collaboration with the tool developers themselves. We therefore team up with them directly. The tool knowledge will allow the HPC champions who are already research infrastructure experts to argue in detail about different implementations, to compare them, to derive best practices and recommendations, and to identify what codes could in theory do. Their findings will be shared through papers, talks, social media and an annual conference in Durham.
HAI-End trains DRI professional staff with new skills. It provides a seedcorn to allow professionals to start to argue about best-case implementations and methods, and gives them access to resources they typically do not have available: the tool experts themselves. While directly targeted on expert professionals, the acquired knowledge on “how to” indirectly makes HPC more useful to new users and communities from all research councils, and it helps all disciplines using computational infrastructure as supercomputing insight propagates into all computational disciplines.
Outcomes
Since early 2025, HAI-End is integrated into the SHAREing project. One workpackage of SHAREing builds up performance assessment services for the UK, and therefore benefits directly from HAI-End’s outcomes. SHAREing also hosts a training workpackage, which uses HAI-End events to curate and create advanced training material. This material ultimately is fed into the DiRAC Training Academy where it becomes available to the whole UK community.
Links
- SHAREing (https://shareing-dri.github.io/)
- DiRAC Training Academy (https://dirac.ac.uk/dirac-training-academy/)