论文标题

Wikidata-Lite用于知识提取和探索

Wikidata-lite for Knowledge Extraction and Exploration

论文作者

Nguyen, Phuc, Takeda, Hideaki

论文摘要

Wikidata是由全球社区支持的最大的合作通用知识图。它包括许多有用的知识探索和数据科学应用主题。但是,由于Wikidata的巨大尺寸,以数百万个结果检索大量数据,进行需要进行大量聚合操作或访问过多的语句参考的复杂查询,这是一项挑战。本文介绍了我们在Wikidata-Lite上的初步作品,Wikidata-lite是一种工具包,该工具包构建数据库脱机以进行知识提取和探索,例如,通过其关键字和属性检索项目信息,语句,证明或搜索实体。 Wikidata-Lite具有高性能和记忆效率,比大查询的官方Wikidata Sparql端点要快得多。 wikidata-lite存储库可在https://github.com/phucty/wikidb上找到。

Wikidata is the largest collaborative general knowledge graph supported by a worldwide community. It includes many helpful topics for knowledge exploration and data science applications. However, due to the enormous size of Wikidata, it is challenging to retrieve a large amount of data with millions of results, make complex queries requiring large aggregation operations, or access too many statement references. This paper introduces our preliminary works on Wikidata-lite, a toolkit to build a database offline for knowledge extraction and exploration, e.g., retrieving item information, statements, provenances, or searching entities by their keywords and attributes. Wikidata-lite has high performance and memory efficiency, much faster than the official Wikidata SPARQL endpoint for big queries. The Wikidata-lite repository is available at https://github.com/phucty/wikidb.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源