论文标题

单个语料库预测阅读过程中的快速记忆检索

Individual corpora predict fast memory retrieval during reading

论文作者

Hofmann, Markus J., Müller, Lara, Rölke, Andre, Radach, Ralph, Biemann, Chris

论文摘要

训练了预测语言模型的语料库可以被视为语义系统的体验。我们每天在平板电脑上记录了两个参与者两个月的阅读,从而产生了300/500K令牌的单个语料库样本。然后,我们培训了来自单个语料库的Word2VEC模型和7000万个句子的报纸语料库,以获得个人和基于规范的长期记忆结构。为了测试单个语料库是否可以对长期记忆检索的认知任务做出更好的预测,我们生成了由134个句子组成的刺激材料,该句子具有不相关的个​​体和基于规范的单词概率。在1-2个月后的随后的眼动跟踪研究中,我们的回归分析表明,基于规范的单词概率而不是基于规范的单词概率可以解释第一固定持续时间和首次凝视持续时间。单词长度另外影响了目光的持续时间和总观看持续时间。结果表明,与Norm Corpus相比,个人的长期记忆结构代表可以更好地解释阅读性能,并且最近获得的信息可以迅速访问。

The corpus, from which a predictive language model is trained, can be considered the experience of a semantic system. We recorded everyday reading of two participants for two months on a tablet, generating individual corpus samples of 300/500K tokens. Then we trained word2vec models from individual corpora and a 70 million-sentence newspaper corpus to obtain individual and norm-based long-term memory structure. To test whether individual corpora can make better predictions for a cognitive task of long-term memory retrieval, we generated stimulus materials consisting of 134 sentences with uncorrelated individual and norm-based word probabilities. For the subsequent eye tracking study 1-2 months later, our regression analyses revealed that individual, but not norm-corpus-based word probabilities can account for first-fixation duration and first-pass gaze duration. Word length additionally affected gaze duration and total viewing duration. The results suggest that corpora representative for an individual's longterm memory structure can better explain reading performance than a norm corpus, and that recently acquired information is lexically accessed rapidly.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源