论文标题

HotelRec:一种新颖的大型酒店推荐数据集

HotelRec: a Novel Very Large-Scale Hotel Recommendation Dataset

论文作者

Antognini, Diego, Faltings, Boi

论文摘要

如今,推荐系统已成为每个人每日数字程序的必然部分,并且在大多数互联网平台上都存在。最先进的基于深度学习的模型需要大量数据才能实现其最佳性能。许多符合此标准的数据集已针对亚马逊产品,餐厅或啤酒等多个领域提出。但是,酒店域中的工作和数据集有限:最大的酒店评论数据集低于百万个样品。此外,酒店域的数据稀少度比传统建议数据集更高,因此,传统的协作过滤方法不能应用于此类数据。在本文中,我们提出了一个基于TripAdvisor的非常大规模的酒店推荐数据集HotelRec,其中包含5000万条评论。据我们所知,HotelRec是酒店域中最大的公开数据集(50m对0.9m),此外,这是一个单个域中最大的建议数据集,并带有文本评论(50m对22m)。我们发布HotelRec进行进一步研究:https://github.com/diego999/hotelrec。

Today, recommender systems are an inevitable part of everyone's daily digital routine and are present on most internet platforms. State-of-the-art deep learning-based models require a large number of data to achieve their best performance. Many datasets fulfilling this criterion have been proposed for multiple domains, such as Amazon products, restaurants, or beers. However, works and datasets in the hotel domain are limited: the largest hotel review dataset is below the million samples. Additionally, the hotel domain suffers from a higher data sparsity than traditional recommendation datasets and therefore, traditional collaborative-filtering approaches cannot be applied to such data. In this paper, we propose HotelRec, a very large-scale hotel recommendation dataset, based on TripAdvisor, containing 50 million reviews. To the best of our knowledge, HotelRec is the largest publicly available dataset in the hotel domain (50M versus 0.9M) and additionally, the largest recommendation dataset in a single domain and with textual reviews (50M versus 22M). We release HotelRec for further research: https://github.com/Diego999/HotelRec.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源