论文标题

可再现的科学门户网站的跨境高性能计算

Reproducible Cross-border High Performance Computing for Scientific Portals

论文作者

Abarenkov, Kessy, Fouilloux, Anne, Neukirchen, Helmut, Azab, Abdulrahman

论文摘要

为了再现环境,需要解决一些挑战:需要自动化科学工作流;需要以明确的方式提供所涉及的软件版本;输入数据需要容易访问;高性能计算(HPC)群集经常参与并实现位置可重复性,甚至有必要在特定群集上执行代码以避免由不同的HPC平台引起的差异(除非这是科学家的本地群集,否则它需要在(管理性的)边界访问它)。最好是,即使允许没有经验的用户(重新)产生结果,所有这些都应对用户友好。尽管一些易于使用的基于Web的科学门户网站已经支持访问HPC资源,但这通常仅是指本地的计算和数据资源。以生物多样性和气候研究领域的两个社区特定门户网站的示例,我们提供了一种解决方案,用于访问跨境科学门户的远程HPC(和云)计算和数据资源,涉及基于严格的软件版本和设置自动化的严格包装,从而增强了可重复性。

To reproduce eScience, several challenges need to be solved: scientific workflows need to be automated; the involved software versions need to be provided in an unambiguous way; input data needs to be easily accessible; High-Performance Computing (HPC) clusters are often involved and to achieve bit-to-bit reproducibility, it might be even necessary to execute the code on a particular cluster to avoid differences caused by different HPC platforms (and unless this is a scientist's local cluster, it needs to be accessed across (administrative) borders). Preferably, to allow even inexperienced users to (re-)produce results, all should be user-friendly. While some easy-to-use web-based scientific portals support already to access HPC resources, this typically only refers to computing and data resources that are local. By the example of two community-specific portals in the fields of biodiversity and climate research, we present a solution for accessing remote HPC (and cloud) compute and data resources from scientific portals across borders, involving rigorous container-based packaging of the software version and setup automation, thus enhancing reproducibility.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源