论文标题

乌尔都语形态,拼字法和词典提取

Urdu Morphology, Orthography and Lexicon Extraction

论文作者

Humayoun, Muhammad, Hammarström, Harald, Ranta, Aarne

论文摘要

乌尔都语是一种具有挑战性的语言,因为它首先是其人物 - 阿拉伯语脚本,其次,其形态学系统具有固有的语法形式和阿拉伯语,波斯语和南亚母语的词汇。本文将乌尔都语语言的实施描述为软件API,我们处理拼字法,形态和词典的提取。形态是在称为功能形态的工具包中实现的(Forsberg&Ranta,2004年),该工具包基于将语法作为软件库的想法。因此,可以在诸如对语法的关键字,语言培训和基础结构等应用程序中重复使用此实现。我们还提出了乌尔都语语法的一小部分的实现,以证明这种可重复性。

Urdu is a challenging language because of, first, its Perso-Arabic script and second, its morphological system having inherent grammatical forms and vocabulary of Arabic, Persian and the native languages of South Asia. This paper describes an implementation of the Urdu language as a software API, and we deal with orthography, morphology and the extraction of the lexicon. The morphology is implemented in a toolkit called Functional Morphology (Forsberg & Ranta, 2004), which is based on the idea of dealing grammars as software libraries. Therefore this implementation could be reused in applications such as intelligent search of keywords, language training and infrastructure for syntax. We also present an implementation of a small part of Urdu syntax to demonstrate this reusability.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源