语言建模任务中的规定：一项因果关系的性别代词解决方案的研究

论文标题

语言建模任务中的规定：一项因果关系的性别代词解决方案的研究

Underspecification in Language Modeling Tasks: A Causality-Informed Study of Gendered Pronoun Resolution

论文作者

McMilin, Emily

论文摘要

现代语言建模任务通常被指定：对于给定的令牌预测，许多单词可能会满足用户在推理时产生自然语言的意图，但是只有一个单词可以最大程度地减少任务在训练时的损失功能。我们介绍了一种简单的因果机制，以描述指定作用在虚假相关性产生中的作用。尽管它很简单，但我们的因果模型还是直接为开发了两种轻巧的黑盒评估方法的发展，我们适用于在广泛的LLMS上的性别代词解决任务至1）1）帮助检测推理任务实现指定的检测2）以前未报告的性别vs. vs. vs. vs. vs. vs. vs. bers in llms范围： GPT-4 Turbo预览，b）训练预培训目标：从掩盖和自回归语言建模到这些目标的混合物，c）训练阶段：从预训练到从人类反馈（RLHF）的加强学习。 https://github.com/2dot71mily/uspec可在https://github.com/2dot71 milly/uspec上获得代码和开放源演示。

Modern language modeling tasks are often underspecified: for a given token prediction, many words may satisfy the user's intent of producing natural language at inference time, however only one word will minimize the task's loss function at training time. We introduce a simple causal mechanism to describe the role underspecification plays in the generation of spurious correlations. Despite its simplicity, our causal model directly informs the development of two lightweight black-box evaluation methods, that we apply to gendered pronoun resolution tasks on a wide range of LLMs to 1) aid in the detection of inference-time task underspecification by exploiting 2) previously unreported gender vs. time and gender vs. location spurious correlations on LLMs with a range of A) sizes: from BERT-base to GPT-4 Turbo Preview, B) pre-training objectives: from masked & autoregressive language modeling to a mixture of these objectives, and C) training stages: from pre-training only to reinforcement learning from human feedback (RLHF). Code and open-source demos available at https://github.com/2dot71mily/uspec.

下载PDF全文

下载文献需遵守相关版权规定

论文标题