论文标题

用树结构的复杂类别解码将长尾巴上的尾巴上映

Supertagging the Long Tail with Tree-Structured Decoding of Complex Categories

论文作者

Prange, Jakob, Schneider, Nathan, Srikumar, Vivek

论文摘要

尽管当前的CCG超级著作在标准WSJ测试集上具有很高的精度,但是很少有系统使用类别的内部结构,这些内部结构将在解析过程中驱动句法派生。标签集在传统上被截断,丢弃了长尾巴中许多稀有且复杂的类型类型。但是,超级著作本身就是树。我们没有放弃稀有标签,而是研究了解释其内部结构的建设性模型,包括新颖的树结构预测方法。我们最好的标记器能够恢复长尾超列表中相当大的一部分,甚至可以生成在训练中从未见过的CCG类别,同时以更少的参数以总体标签准确性近似于先前的最新状态。我们进一步研究了不同方法如何推广到室外评估集。

Although current CCG supertaggers achieve high accuracy on the standard WSJ test set, few systems make use of the categories' internal structure that will drive the syntactic derivation during parsing. The tagset is traditionally truncated, discarding the many rare and complex category types in the long tail. However, supertags are themselves trees. Rather than give up on rare tags, we investigate constructive models that account for their internal structure, including novel methods for tree-structured prediction. Our best tagger is capable of recovering a sizeable fraction of the long-tail supertags and even generates CCG categories that have never been seen in training, while approximating the prior state of the art in overall tag accuracy with fewer parameters. We further investigate how well different approaches generalize to out-of-domain evaluation sets.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源