机器学习和文化遗产的“收集作为ML数据”清单

论文标题

机器学习和文化遗产的“收集作为ML数据”清单

The "Collections as ML Data" Checklist for Machine Learning & Cultural Heritage

论文作者

Lee, Benjamin Charles Germain

论文摘要

在文化遗产部门中，在将机器学习技术应用于数字收藏时，已经做出了越来越多的努力来考虑关键的社会技术镜头。尽管文化遗产社区共同开发了一大批工作，详细介绍了在组织层面的图书馆和其他文化遗产机构中的机器学习负责任的操作，但仍有很少专门为踏上机器学习项目的从业者创建的准则。将机器学习应用于文化遗产涉及的歧管赌注和敏感性强调了制定此类准则的重要性。本文通过在开发利用文化遗产数据的机器学习项目时可以使用指导性问题和实践来制定详细的清单，从而对这种需求做出了贡献。我将结果清单称为“收集为ML数据”清单，完成后，该清单可以通过项目的可交付成果发布。通过调查现有项目，包括我自己的项目，报纸导航员，我证明了“作为ML数据的收集”清单是合理的，并证明了如何使用和操作该法式指导问题。

Within the cultural heritage sector, there has been a growing and concerted effort to consider a critical sociotechnical lens when applying machine learning techniques to digital collections. Though the cultural heritage community has collectively developed an emerging body of work detailing responsible operations for machine learning in libraries and other cultural heritage institutions at the organizational level, there remains a paucity of guidelines created specifically for practitioners embarking on machine learning projects. The manifold stakes and sensitivities involved in applying machine learning to cultural heritage underscore the importance of developing such guidelines. This paper contributes to this need by formulating a detailed checklist with guiding questions and practices that can be employed while developing a machine learning project that utilizes cultural heritage data. I call the resulting checklist the "Collections as ML Data" checklist, which, when completed, can be published with the deliverables of the project. By surveying existing projects, including my own project, Newspaper Navigator, I justify the "Collections as ML Data" checklist and demonstrate how the formulated guiding questions can be employed and operationalized.

下载PDF全文

下载文献需遵守相关版权规定

论文标题