论文标题

减轻印地语和马拉地语的性别刻板印象

Mitigating Gender Stereotypes in Hindi and Marathi

论文作者

Kirtane, Neeraja, Anand, Tanvi

论文摘要

随着我们日常生活中自然语言处理的使用增加,解决这些系统固有的性别偏见的需求也会放大。这是因为固有的偏置会干扰这些系统输出的语义结构,同时执行机器翻译等任务。虽然正在用英语进行研究以量化和减轻偏见,但某些指示语言的偏见方法相对较新,或者完全不存在。大多数指示语言都是性别的,即,每个名词根据每种语言的语法规则分配一个性别。结果,评估与英语所做的不同。本文评估了印地语和马拉地语中的性别刻板印象。这些方法将与英语的方法有所不同,因为在某些单词的情况下,有男性和女性。我们创建了一个中性和性别的职业单词,情感单词并借助嵌入相干测试(ECT)和相对规范距离(RND)来测量偏见的数据集。我们还试图从嵌入中减轻这种偏见。实验表明,我们提出的歧义技术减少了这些语言中的性别偏见。

As the use of natural language processing increases in our day-to-day life, the need to address gender bias inherent in these systems also amplifies. This is because the inherent bias interferes with the semantic structure of the output of these systems while performing tasks like machine translation. While research is being done in English to quantify and mitigate bias, debiasing methods in Indic Languages are either relatively nascent or absent for some Indic languages altogether. Most Indic languages are gendered, i.e., each noun is assigned a gender according to each language's grammar rules. As a consequence, evaluation differs from what is done in English. This paper evaluates the gender stereotypes in Hindi and Marathi languages. The methodologies will differ from the ones in the English language because there are masculine and feminine counterparts in the case of some words. We create a dataset of neutral and gendered occupation words, emotion words and measure bias with the help of Embedding Coherence Test (ECT) and Relative Norm Distance (RND). We also attempt to mitigate this bias from the embeddings. Experiments show that our proposed debiasing techniques reduce gender bias in these languages.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源