论文标题
隐藏的模式网络
Hidden Schema Networks
论文作者
论文摘要
大型的,预验证的语言模型推断出强大的表示,它们是隐式编码语义和句法内容的强大表示。在这项工作中,我们介绍了一种新型的神经语言模型,该模型通过归纳偏见强制执行明确的关系结构,该结构允许在预验证的语言模型的输出表示上构成。具体而言,该模型将句子编码为符号序列(组成表示),这些句子与偏见的随机步行者在全局潜在图上访问的节点相对应,并渗透后者的后验分布。我们首先证明该模型能够从随机令牌序列的人为生成的数据集中发现地面图形。接下来,我们分别以编码器和解码器的形式利用验证的BERT和GPT-2语言模型,从自然语言数据集中推断符号网络(Schemata)。我们的实验表明,(i)可以将推断的符号解释为编码语言的不同方面,例如主题或情感,(ii)类似GPT的模型可以有效地基于符号表示。最后,我们探索了从常识性知识数据库中推断出的模式网络上的训练自动回归,随机步行``推理''推理模型,并使用采样的路径来增强常识性语言模型在常识性上(如果是推理任务)上的性能。
Large, pretrained language models infer powerful representations that encode rich semantic and syntactic content, albeit implicitly. In this work we introduce a novel neural language model that enforces, via inductive biases, explicit relational structures which allow for compositionality onto the output representations of pretrained language models. Specifically, the model encodes sentences into sequences of symbols (composed representations), which correspond to the nodes visited by biased random walkers on a global latent graph, and infers the posterior distribution of the latter. We first demonstrate that the model is able to uncover ground-truth graphs from artificially generated datasets of random token sequences. Next, we leverage pretrained BERT and GPT-2 language models as encoder and decoder, respectively, to infer networks of symbols (schemata) from natural language datasets. Our experiments show that (i) the inferred symbols can be interpreted as encoding different aspects of language, as e.g. topics or sentiments, and that (ii) GPT-like models can effectively be conditioned on symbolic representations. Finally, we explore training autoregressive, random walk ``reasoning" models on schema networks inferred from commonsense knowledge databases, and using the sampled paths to enhance the performance of pretrained language models on commonsense If-Then reasoning tasks.