Morphological Guesser as a Tool for Analyzing Field Data: Experiences with The Naukan Yupik Language

Cover Page

Cite item

Full Text

Abstract

The paper presents the development and evaluation of two automated morphological analysis tools for Naukan Yupik (Yupik Eskimo Eskimo-Aleut): a dictionary-based morphological analyzer and a dictionary-free morphological guesser. Both tools are implemented with a two-stage approach to morphological modeling based on finite state automata. The study examines in detail the morphological features of Naukan Yupik that influence the development of automated analysis tools, including rich inflection and derivation, homonymy of morphological markers, and complex morphophonological processes. The effectiveness of both tools will be evaluated using a corpus of oral texts from 2022–2023. Particular attention is paid to the problem of overgeneration in the output of the morphological guesser and to ways of solving this problem through part-of-speech-based analysis separation. The results show that when working with field data, the use of a guesser can be more effective despite its known limitations.

About the authors

Elena Mikhailovna Budyanskaya

Institute of Linguistics of the RAS

Author for correspondence.
Email: budyanskaya.lena@gmail.com
Moscow, Russia

Anton Olegovich Buzanov

Institute of Linguistics of the RAS; High School of Economy

Email: anton.buzanov.00@gmail.com
Москва, Россия

Daria Olegovna Zhornik

Institute of Linguistics of the RAS

Email: daria.zhornik@yandex.ru
Moscow, Russia

Andrey Andreevich Pikhtin

Institute of Linguistics of the RAS; High School of Economy

Email: p_nafanyka@gmail.com
Moscow, Russia

References

  1. Menovschikov G.A. Yazyk naukanskikh eskimosov [The language of Naukan eskimos]. Leningrad., Nauka, 1975. 512 p. (in Russian).
  2. Golovko E.V., Dobrieva E.A., Jacobson S., Krauss M. Slovar’ yazyaka naukanskikh eskimosov [Naukan yupik eskimo dictionary]. Fairbanks, Alaska native languages center, 2004. 369 p. (in Russian).
  3. Vakhtin N.B. Morfologiya glagol’nogo slovoizmeneniya s yupikskikh (eskimosskikh) yazykakh [Inflectional morphology in yupik (eskimo) languages]. Rossiyskaya akademiya nauk, Institut lingvisticheskikh issledovaniy. Saint Petersburg, Nestor, 2007. 123 p. (in Russian).
  4. Kanuparthi N., Inumella A., Sharma D.M. Hindi Derivational Morphological Analyzer. In: Proceedings of the Twelfth Meeting of the Special Interest Group on Computational Morphology and Phonology. Montreal, Association for Computational Linguistics, 2012. Pp. 10–16.
  5. Kessikbayeva G., Cicekli I. Rule Based Morphological Analyzer of Kazakh Language. In: Proceedings of the 2014 Joint Meeting of SIGMORPHON and SIGFSM. Baltimore, Association for Computational Linguistics, 2014. Pp. 46–54.
  6. Khalifa S., Hassan S., Habash N. A Morphological Analyzer for Gulf Arabic Verbs. In: Proceedings of the Third Arabic Natural Language Processing Workshop. Valencia: Association for Computational Linguistics, 2017. Pp. 35–45.
  7. Forbes C., Nicolai G., Silfverberg M. An FST morphological analyzer for the Gitksan language. In: Proceedings of the 18th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology. Online: Association for Computational Linguistics, 2021. Pp. 188–197.
  8. Merzhevich T., Ferraz Gerardi F. Introducing YakuToolkit. Yakut Treebank and Morphological Analyzer. In: Proceedings of the 1st Annual Meeting of the ELRA/ISCA Special Interest Group on Under-Resourced Languages. Marseille, European Language Resources Association, 2022. Pp. 185–188.
  9. Koskenniemi K. Two-level Morphology. A General Computational Model for Word-Form Recognition and Produc-tion. Helsinki, University of Helsinki, Department of General Linguistics, 1983.
  10. Karttunen L. KIMMO: A General Morphological Processor. Texas Linguistics Forum. 1983. Vol. 22. Pp. 217–228.
  11. Antworth E.L. PC-KIMMO: a two-level processor for morphological analysis. Dallas, Summer Institute of Linguistics, 1990.
  12. Ritchie G.D., Russell G.J., Black A.W., Pulman S.G. Computational Morphology. Practical Mechanisms for the English Lexicon. Cambridge, The MIT Press, 1991.
  13. Swanson D., Howell N. Lexd: A finite-state lexicon compiler for non-suffixational morphologies. Multilingual Facilitation. 2021. Pp. 133–146.
  14. Karttunen L., Beesley K. R. Two-level rule compiler. Palo Alto, Xerox Corporation, Palo Alto Research Center, 1992.
  15. Lindén K., Axelson E., Hardwick S., Pirinen T. A., Silfverberg M. HFST—framework for compiling and applying morphologies. In: Systems and Frameworks for Computational Morphology: Second International Workshop, SFCM 2011. Berlin, Springer, 2011. Pp. 67–85.
  16. Chen E., Schwartz L. A morphological analyzer for St. Lawrence island / Central Siberian yupik. In: Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018). 2018.

Supplementary files

Supplementary Files
Action
1. JATS XML

Согласие на обработку персональных данных

 

Используя сайт https://journals.rcsi.science, я (далее – «Пользователь» или «Субъект персональных данных») даю согласие на обработку персональных данных на этом сайте (текст Согласия) и на обработку персональных данных с помощью сервиса «Яндекс.Метрика» (текст Согласия).