Comparison of automatic summarization of texts in Russian
- Authors: Dagaev A.E.1, Popov D.I.1
-
Affiliations:
- Issue: No 4 (2024)
- Pages: 13-22
- Section: Articles
- URL: https://ogarev-online.ru/2454-0714/article/view/359391
- DOI: https://doi.org/10.7256/2454-0714.2024.4.69474
- EDN: https://elibrary.ru/CSFMFC
- ID: 359391
Cite item
Full Text
Abstract
The subject of the research in this article is the generalization of texts in Russian using artificial intelligence models. In particular, the authors compare the popular models GigaChat, YaGPT2, ChatGPT-3.5, ChatGPT-4, Bard, Bing AI and YouChat and conduct a comparative study of their work on Russian texts. The article uses datasets for the Russian language, such as Gazeta, XL-Sum and WikiLingua, as source materials for subsequent generalization, as well as additional datasets in English, CNN Dailymail and XSum, were taken to compare the effectiveness of generalization. The article uses the following indicators: ROUGE, BLEU score, BERTScore, METEOR and BLEURT to assess the quality of text synthesis. In this article, a comparative analysis of data obtained during automatic generalization using artificial intelligence models is used as a research method. The scientific novelty of the research is to conduct a comparative analysis of the quality of automatic generalization of texts in Russian and English using various neural network models of natural language processing. The authors of the study drew attention to the new models GigaChat, YaGPT2, ChatGPT-3.5, ChatGPT-4, Bard, Bing AI and YouChat, considering and analyzing their effectiveness in the task of text generalization. The results of the generalization in Russian show that YouChat demonstrates the highest results in terms of the set of ratings, emphasizing the effectiveness of the model in processing and generating text with a more accurate reproduction of key elements of content. Unlike YouChat, the Bard model showed the worst results, representing the model with the least ability to generate coherent and relevant text. The data obtained during the comparison will contribute to a deeper understanding of the models under consideration, helping to make a choice when using artificial intelligence for text summarization tasks as a basis for future developments.
About the authors
Alexander Evgenevich Dagaev
Email: alejaandro@bk.ru
Dmitry Ivanovich Popov
Email: damitry.popov@gmail.com
References
Goyal T., Li J. J., Durrett G. News summarization and evaluation in the era of gpt-3 //arXiv preprint arXiv:2209.12356. – 2022. Zhang T. et al. Benchmarking large language models for news summarization //arXiv preprint arXiv:2301.13848. – 2023. Gusev I. Dataset for automatic summarization of Russian news //Artificial Intelligence and Natural Language: 9th Conference, AINL 2020, Helsinki, Finland, October 7–9, 2020, Proceedings 9. – Springer International Publishing, 2020. – С. 122-134. Lin C. Y. Rouge: A package for automatic evaluation of summaries //Text summarization branches out. – 2004. – С. 74-81. Post M. A call for clarity in reporting BLEU scores //arXiv preprint arXiv:1804.08771. – 2018. Bhaskar A., Fabbri A., Durrett G. Prompted opinion summarization with GPT-3.5 //Findings of the Association for Computational Linguistics: ACL 2023. – 2023. – С. 9282-9300. Tang L. et al. Evaluating large language models on medical evidence summarization //npj Digital Medicine. – 2023. – Т. 6. – №. 1. – С. 158. Hendy A. et al. How good are gpt models at machine translation? a comprehensive evaluation //arXiv preprint arXiv:2302.09210. – 2023. Jiao W. et al. Is ChatGPT a good translator? Yes with GPT-4 as the engine //arXiv preprint arXiv:2301.08745. – 2023. Narayan S., Cohen S. B., Lapata M. Don't give me the details, just the summary! topic-aware convolutional neural networks for extreme summarization //arXiv preprint arXiv:1808.08745. – 2018. Nallapati R. et al. Abstractive text summarization using sequence-to-sequence rnns and beyond //arXiv preprint arXiv:1602.06023. – 2016. Hasan T. et al. XL-sum: Large-scale multilingual abstractive summarization for 44 languages //arXiv preprint arXiv:2106.13822. – 2021. Zhang T. et al. Bertscore: Evaluating text generation with bert //arXiv preprint arXiv:1904.09675. – 2019. Banerjee S., Lavie A. METEOR: An automatic metric for MT evaluation with improved correlation with human judgments //Proceedings of the acl workshop on intrinsic and extrinsic evaluation measures for machine translation and/or summarization. – 2005. – С. 65-72. Sellam T., Das D., Parikh A. P. BLEURT: Learning robust metrics for text generation //arXiv preprint arXiv:2004.04696. – 2020. Ladhak F. et al. WikiLingua: A new benchmark dataset for cross-lingual abstractive summarization //arXiv preprint arXiv:2010.03093. – 2020.
Supplementary files

