论文标题

开放催化剂2022(OC22)数据集和氧化电催化剂的挑战

The Open Catalyst 2022 (OC22) Dataset and Challenges for Oxide Electrocatalysts

论文作者

Tran, Richard, Lan, Janice, Shuaibi, Muhammed, Wood, Brandon M., Goyal, Siddharth, Das, Abhishek, Heras-Domingo, Javier, Kolluru, Adeesh, Rizvi, Ammar, Shoghi, Nima, Sriram, Anuroop, Therrien, Felix, Abed, Jehad, Voznyy, Oleksandr, Sargent, Edward H., Ulissi, Zachary, Zitnick, C. Lawrence

论文摘要

用于电催化剂机器学习模型的开发需要大量的培训数据,以使其能够在各种材料中使用。目前缺乏足够培训数据的一类材料是氧化物,这对于OER催化剂的发展至关重要。为了解决这个问题,我们开发了OC22数据集,其中包括62,331个DFT松弛(〜9,854,504个单点计算),遍及一系列氧化物材料,覆盖范围和吸附物。我们定义了广义的总能源任务,从而实现超出吸附能量的财产预测;我们测试了几个图神经网络的基线性能;我们提供预定义的数据集拆分,以建立明确的基准,以实现未来的努力。在最一般的任务中,Gemnet-OC通过微调组合化学不同的OC20和OC22数据集时,能量预测提高了约36%。同样,使用关节训练时,我们在OC20上的总能量预测提高了约19%,OC22的力预测提高了约9%。我们通过捕获文献吸附能和重要的OER缩放关系来证明高表现模型的实际实用性。我们希望OC22为寻求在氧化物表面中融合复杂的远程静电和磁相互作用的模型提供重要的基准。数据集和基线模型是开源的,可以使用公共排行榜,以鼓励对总能源任务和数据进行持续的社区发展。

The development of machine learning models for electrocatalysts requires a broad set of training data to enable their use across a wide variety of materials. One class of materials that currently lacks sufficient training data is oxides, which are critical for the development of OER catalysts. To address this, we developed the OC22 dataset, consisting of 62,331 DFT relaxations (~9,854,504 single point calculations) across a range of oxide materials, coverages, and adsorbates. We define generalized total energy tasks that enable property prediction beyond adsorption energies; we test baseline performance of several graph neural networks; and we provide pre-defined dataset splits to establish clear benchmarks for future efforts. In the most general task, GemNet-OC sees a ~36% improvement in energy predictions when combining the chemically dissimilar OC20 and OC22 datasets via fine-tuning. Similarly, we achieved a ~19% improvement in total energy predictions on OC20 and a ~9% improvement in force predictions in OC22 when using joint training. We demonstrate the practical utility of a top performing model by capturing literature adsorption energies and important OER scaling relationships. We expect OC22 to provide an important benchmark for models seeking to incorporate intricate long-range electrostatic and magnetic interactions in oxide surfaces. Dataset and baseline models are open sourced, and a public leaderboard is available to encourage continued community developments on the total energy tasks and data.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源