论文标题
部分可观测时空混沌系统的无模型预测
Caching and Reproducibility: Making Data Science experiments faster and FAIRer
论文作者
论文摘要
中小型数据科学实验通常依赖于个人科学家或小型团队的临时研究软件。通常,没有时间使研究软件快速,可重复使用和开放访问。结果是双重的。首先,随后的研究人员必须在拟议的假设或实验框架上花费大量工作时间。在最坏的情况下,其他人无法重现实验,并重用结果进行后续研究。其次,假设临时研究软件在经常长期运行的计算实验中失败。在这种情况下,迭代改善软件并重新运行实验的总体努力为研究人员带来了巨大的时间压力。我们建议将缓存成为研究软件开发过程中不可或缺的一部分,甚至在编写第一行代码之前。本文概述了在数据科学项目中开发研究软件的缓存建议。我们的建议提供了一个观点,以规避常见问题,例如礼节依赖,速度等。同时,缓存有助于开放科学工作流程中实验的可重复性。关于四个指导原则,即可发现性,可访问性,互操作性和可重复性(公平),我们预见到,包括研究软件开发中提出的建议在内的数据将使数据与该软件的数据相关,使机器和人类更公平。我们在我们最近完成的数学信息检索中提出的一些建议的一些建议展示了一些建议的建议。
Small to medium-scale data science experiments often rely on research software developed ad-hoc by individual scientists or small teams. Often there is no time to make the research software fast, reusable, and open access. The consequence is twofold. First, subsequent researchers must spend significant work hours building upon the proposed hypotheses or experimental framework. In the worst case, others cannot reproduce the experiment and reuse the findings for subsequent research. Second, suppose the ad-hoc research software fails during often long-running computationally expensive experiments. In that case, the overall effort to iteratively improve the software and rerun the experiments creates significant time pressure on the researchers. We suggest making caching an integral part of the research software development process, even before the first line of code is written. This article outlines caching recommendations for developing research software in data science projects. Our recommendations provide a perspective to circumvent common problems such as propriety dependence, speed, etc. At the same time, caching contributes to the reproducibility of experiments in the open science workflow. Concerning the four guiding principles, i.e., Findability, Accessibility, Interoperability, and Reusability (FAIR), we foresee that including the proposed recommendation in a research software development will make the data related to that software FAIRer for both machines and humans. We exhibit the usefulness of some of the proposed recommendations on our recently completed research software project in mathematical information retrieval.