Wrong Answers Only: Distractor Generation for Russian Reading Comprehension Questions Using a Translated Dataset

Nikita Vyacheslavovich Login; Логин Никита Вячеславович

doi:10.17323/jle.2024.22244

Wrong Answers Only: Distractor Generation for Russian Reading Comprehension Questions Using a Translated Dataset

Authors: Login N.V.¹
Affiliations:
1. HSE University
Issue: Vol 10, No 4 (2024)
Pages: 56-70
Section: Research Papers
URL: https://ogarev-online.ru/2411-7390/article/view/356609
DOI: https://doi.org/10.17323/jle.2024.22244
ID: 356609

Cite item

Full Text

Abstract
About the authors
References
Supplementary files
Statistics

Abstract

Background: Reading comprehension questions play an important role in language learning. Multiple-choice questions are a convenient form of reading comprehension assessment as they can be easily graded automatically. The availability of large reading comprehension datasets makes it possible to also automatically produce these items, reducing the cost of development of test question banks, by fine-tuning language models on them. While English reading comprehension datasets are common, this is not true for other languages, including Russian. A subtask of distractor generation poses a difficulty, as it requires producing multiple incorrect items.

Purpose: The purpose of this work is to develop an efficient distractor generation solution for Russian exam-style reading comprehension questions and to discover whether a translated English-language distractor dataset can offer a possibility for such solution.

Method: In this paper we fine-tuned two pre-trained Russian large language models, RuT5 and RuGPT3 (Zmitrovich et al, 2024), on distractor generation task for two classes of summarizing questions retrieved from a large multiple-choice question dataset, that was automatically translated from English to Russian. The first class consisted of questions on selection of the best title for the given passage, while the second class included questions on true/false statement selection. The models were assessed automatically on test and development subsets, and true statement distractor models were additionally evaluated on an independent set of questions from Russian state exam USE.

Results: It was observed that the models surpassed the non-fine-tuned baseline, the performance of RuT5 model was better than that of RuGPT3, and that the models handled true statement selection questions much better than title questions. On USE data models fine-tuned on translated dataset have shown better quality than that trained on existing Russian distractor dataset, with T5-based model also beating the baseline established by output of an existing English distractor generation model translated into Russian.

Conclusion: The obtained results show the possibility of a translated dataset to be used in distractor generation and the importance of the domain (language examination) and question type match in the input data.

Keywords

automatic distractor generation, multiple-choice questions, reading comprehension, large language model, dataset translation

About the authors

Nikita Vyacheslavovich Login

HSE University

Email: nlogin@hse.ru
ORCID iD: 0009-0007-2480-8708
Moscow, Russia

References

Alsubait, T. M. (2015). Ontology-based multiple-choice question generation [Unpublished PhD thesis]. University of Manchester.
Banerjee, S., & Lavie, A. (2005). METEOR: An automatic metric for MT evaluation with improved correlation with human judgements. In J. Goldstein, A. Lavie, C.-Y. Lin, & C. Voss (Eds.), Proceedings of the ACL Workshop on intrinsic and extrinsic evaluation measures for machine translation and/or summarization (pp. 65-72). Association for Computational Linguistics.
Belyanova, M. A., Andreev, A. M., & Gapanyuk, Y. E. (2022). Neural text question generation for Russian language using hybrid intelligent information systems approach. In B. Kryzhanovsky, W. Dunin-Barkowski, V. Redko, Y. Tiumentsev, & V. V. Klimov (Eds.), Advances in neural computation, machine learning, and cognitive research V (vol. 1008, pp. 217-223). Springer International Publishing. DOI:https://doi.org/10.1007/978-3-030-91581-0_29
Bitew, S. K., Hadifar, A., Sterckx, L., Deleu, J., Develder, & C., Demeester, T. (2022) Learning to reuse distractors to support multiple choice question generation in education. IEEE Transactions on Learning Technologies, 17, 375-390. IEEE Computer Society Press. DOI:https://doi.org/10.1109/TLT.2022.3226523
Bitew, S. K., Deleu, J., Develder, C., & Demeester, T. (2023) Distractor generation for multiple-choice questions with predictive prompting and large language models (Version 1). arXiv. DOI:https://doi.org/10.48550/arXiv.2307.16338
Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D. M., Wu, J., Winter, C., Hesse, C., Chen, M., Sigler, E., Litwin, M., Gray, S., Chess, B., Clark, J., Berner, C., McCandlish, S., Radford, A., Sutskever, I., & Amodei, D. (2020). Language models are few-shot learners. In H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, & H. Lin (Eds.), Advances in Neural Information Processing Systems (vol. 33, pp. 1877-1901). Curran Associates, Inc. DOI:https://doi.org/10.48550/arXiv.2005.14165
Chung, H.-L., Chan, Y.-H., & Fan, Y.-C. (2020). A BERT-based distractor generation scheme with multi-tasking and negative answer training strategies. In T. Cohn, Y. He, & Y. Liu (Eds.), Findings of the Association for Computational Linguistics: EMNLP 2020 (pp. 4390-4400). Association for Computational Linguistics. DOI:https://doi.org/10.18653/v1/2020.findings-emnlp.393
De-Fitero-Dominguez, D., Garcia-Lopez, E., Garcia-Cabot, A., Del-Hoyo-Gabaldon, J.-A., & Moreno-Cediel, A. (2024).
Distractor generation through text-to-text transformer models. IEEE Access, 12, 25580-25589. DOI:https://doi.org/10.1109/ACCESS.2024.3361673
Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. In J. Burstein, C. Doran, & T. Solorio (Eds.), Proceedings of the 2019 conference of the North American Chapter of the Association for Computational Linguistics: Human language technologies (vol. 1: Long and Short Paper, pp. 4171-4186). Association for Computational Linguistics. DOI:https://doi.org/10.18653/v1/N19-1423
Efimov, P., Chertok, A., Boytsov, L., & Braslavski, P. (2020). SberQuAD - Russian reading comprehension dataset: Description and analysis. In A. Arampatzis, E. Kanoulas, T. Tsikrika, S. Vrochidis, H. Joho, C. Lioma, C. Eickhoff, A. Névéol, L. Cappellato, & N. Ferro (Eds.), Experimental IR meets multilinguality, multimodality, and interaction (vol. 12260, pp. 3-15). Springer International Publishing. DOI:https://doi.org/10.1007/978-3-030-58219-7_1
Elkins, S., Kochmar, E., Serban, I., & Cheung, J. C. K. (2023). How useful are educational questions generated by large language models? In N. Wang, G. Rebolledo-Mendez, V. Dimitrova, N. Matsuda, & O. C. Santos (Eds.), Artificial intelligence in education. Posters and late breaking results, workshops and tutorials, industry and innovation tracks, practitioners, doctoral consortium and blue sky (vol. 1831, pp. 536-542). Springer Nature Switzerland. DOI:https://doi.org/10.1007/978-3-031-36336-8_83
Fenogenova, A., Mikhailov, V., & Shevelev, D. (2020). Read and reason with MuSeRC and RuCoS: Datasets for machine reading comprehension for Russian. In D. Scott, N. Bel, & C. Zong (Eds.), Proceedings of the 28th International Conference on Computational Linguistics (pp. 6481-6497).International Committee on Computational Linguistics. DOI:https://doi.org/10.18653/v1/2020.coling-main.570
Gao, Y., Bing, L., Li, P., King, I., & Lyu, M. R. (2019). Generating distractors for reading comprehension questions from real examinations. Proceedings of the AAAI Conference on Artificial Intelligence, 33(01), 6423-6430. DOI:https://doi.org/10.1609/aaai.v33i01.33016423
Ghanem, B. & Fyshe, A. (2024). DISTO: Textual distractors for multiple choice reading comprehension questions using negative sampling. In M. Marras, M. Ueno (Eds.), Proceedings of the 17th International Conference on Educational Data Mining (pp. 23-34).International Educational Data Mining Society. DOI:https://doi.org/10.5281/ZENODO.12729766
Glushkova, T., Machnev, A., Fenogenova, A., Shavrina, T., Artemova, E., & Ignatov, D. I. (2021). DaNetQA: A yes/no question answering dataset for the Russian language. In W. M. P. Van Der Aalst, V. Batagelj, D. I. Ignatov, M. Khachay, O. Koltsova, A. Kutuzov, S. O. Kuznetsov, I. A. Lomazova, N. Loukachevitch, A. Napoli, A. Panchenko, P. M. Pardalos, M. Pelillo, A. V. Savchenko, & E. Tutubalina (Eds.), Analysis of Images, Social Networks and Texts (vol. 12602, pp. 57-68). Springer International Publishing. DOI:https://doi.org/10.1007/978-3-030-72610-2_4
Hadifar, A., Bitew, S. K., Deleu, J., Develder, C., & Demeester, T. (2023). EduQG: A multi-format multiple-choice dataset for the educational domain. IEEE Access, 11, 20885-20896. DOI:https://doi.org/10.1109/ACCESS.2023.3248790
Huang, L., Le Bras, R., Bhagavatula, C., & Choi, Y. (2019). CosmosQA: Machine reading comprehension with contextual commonsense reasoning. In K. Inui, J. Jiang, V. Ng, & X. Wan (Eds.), Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) (pp. 2391-2401). Association for Computational Linguistics. DOI:https://doi.org/10.18653/v1/D19-1243
Joshi, M., Choi, E., Weld, D., & Zettlemoyer, L. (2017). TriviaQA: A large scale distantly supervised challenge dataset for reading comprehension. In R Barzilay., & M.-Y. Kan (Eds.), Proceedings of the 55th Annual Meeting of the Association for Computational linguistics (vol. 1: Long Papers, pp. 1601-1611). Association for Computational linguistics. DOI:https://doi.org/10.18653/v1/P17-1147
Kurdi, G., Leo, J., Parsia, B., Sattler, U., & Al-Emari, S. (2020). A systematic review of automatic question generation for educational purposes.International Journal of Artificial Intelligence in Education, 30(1), 121-204. DOI:https://doi.org/10.1007/s40593-019-00186-y
Kwiatkowski, T., Palomaki, J., Redfield, O., Collins, M., Parikh, A., Alberti, C., Epstein, D., Polosukhin, I., Devlin, J., Lee, K., Toutanova, K., Jones, L., Kelcey, M., Chang, M.-W., Dai, A. M., Uszkoreit, J., Le, Q., & Petrov, S. (2019). Natural questions: A benchmark for question answering research. Transactions of the Association for Computational Linguistics, 7, 453-466. DOI:https://doi.org/10.1162/tacl_a_00276
Lai, G., Xie, Q., Liu, H., Yang, Y., & Hovy, E. (2017). RACE: Large-scale reading comprehension dataset from examinations. In M. Palmer, R. Hwa, & S. Riedel (Eds.), Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (pp. 785-794). Association for Computational Linguistics. DOI:https://doi.org/10.18653/v1/D17-1082
Lee, D. B., Lee, S., Jeong, W. T., Kim, D., & Hwang, S. J. (2020). Generating diverse and consistent QA pairs from contexts with information-maximizing hierarchical conditional VAEs. In D. Jurafsky, J. Chai, N. Schluter, & J. Tetreault (Eds.), Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (pp. 208-224). Association for Computational Linguistics. DOI:https://doi.org/10.18653/v1/2020.acl-main.20
Lewis, M., Liu, Y., Goyal, N., Ghazvininejad, M., Mohamed, A., Levy, O., Stoyanov, V., & Zettlemoyer, L. (2020). BART: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In D. Jurafsky.
J. Chai, N. Schluter, & J. Tetreault (Eds.), Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (pp. 7871-7880). Association for Computational Linguistics. DOI:https://doi.org/10.18653/v1/2020.acl-main.703
Lin, C.-Y. (2004). ROUGE: A Package for automatic evaluation of summaries. In Text summarization branches out (pp. 74-81). Association for Computational Linguistics.https://aclanthology.org/W04-1013.
Lu, X., West, P., Zellers, R., Bras, R. L., Bhagavatula, C., & Choi, Y. (2021). NeuroLogic decoding: (Un)supervised neural text generation with predicate logic constraints. In K. Toutanova, A.Rumshisky, L. Zettlemoyer, D. Hakkani-Tur, I. Beltagy, S. Bethard, R. Cotterell, T. Chakraborty, & Y. Zhou (Eds.), Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human language technologies (pp. 4288-4299). Association for Computational Linguistics. DOI:https://doi.org/10.18653/v1/2021.naacl-main.339
Maity, S., Deroy, A., & Sarkar, S. (2024). A novel multi-stage prompting approach for language agnostic MCQ generation using GPT. In N. Goharian, N. Tonellotto, Y. He, A. Lipani, G. McDonald, C. Macdonald, & I. Ounis (Eds.), Advances in information retrieval (vol. 14610, pp. 268-277). Springer Nature Switzerland. DOI:https://doi.org/10.1007/978-3-031-56063-7_18
Makhnytkina, O., Matveev, A., Svischev, A., Korobova, P., Zubok, D., Mamaev, N., & Tchirkovskii, A. (2020). Conversational question generation in Russian. In S. Balandin, L. Turchet, & T. Tyutina (Eds.), 2020 27th Conference of Open Innovations Association (FRUCT) (pp. 1-8). IEEE. DOI:https://doi.org/10.23919/FRUCT49677.2020.9211056
Manakul, P., Liusie, A., & Gales, M. (2023). MQAG: Multiple-choice question answering and generation for assessing information consistency in summarization. In J. C. Park, Y. Arase, B. Hu, W. Lu, D. Wijaya, A. Purwarianti, & A. A. Krisnadhi (Eds.), Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific chapter of the Association for Computational Linguistics (vol. 1: Long Papers, pp. 39-53). Association for Computational Linguistics. DOI:https://doi.org/10.18653/v1/2023.ijcnlp-main.4
Papineni, K., Roukos, S., Ward, T., & Zhu, W.-J. (2002). BLEU: a method for automatic evaluation of machine translation. In The 40th Annual Meeting on Association for Computational Linguistics-ACL '02 (pp. 311-318). Association for Computational Linguistics. DOI:https://doi.org/10.3115/1073083.1073135
Paris, A. H., & Paris, S. G. (2003). Assessing narrative comprehension in young children. Reading Research Quarterly, 38(1), 36-76. DOI:https://doi.org/10.1598/RRQ.38.1.3
Qiu, Z., Wu, X., & Fan, W. (2020). Automatic distractor generation for multiple choice questions in standard tests. In D. Scott, N. Bel, & C. Zong (Eds.), Proceedings of the 28th International Conference on Computational Linguistics (pp. 2096-2106). International Committee on Computational Linguistics. DOI:https://doi.org/10.18653/v1/2020.coling-main.189
Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., Zhou, Y., Li, W., & Liu, P. J. (2020). Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research, 21, 1, 5485-5551.https://dl.acm.org/doi/abs/. DOI:https://doi.org/10.5555/3455716.3455856
Rajpurkar, P., Zhang, J., Lopyrev, K., & Liang, P. (2016). SQuAD: 100,000+ questions for machine comprehension of text. In J. Su, K. Duh, & X. Carreras (Eds.), Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing (pp. 2383-2392). Association for Computational Linguistics. DOI:https://doi.org/10.18653/v1/D16-1264
Reddy, S., Chen, D., & Manning, C. D. (2019). CoQA: A conversational question answering challenge. Transactions of the Association for Computational Linguistics, 7, 249-266. DOI:https://doi.org/10.1162/tacl_a_00266
Rybin, I., Korablinov, V., Efimov, P., & Braslavski, P. (2021).RuBQ 2.0: An innovated Russian question answering dataset. In R. Verborgh, K. Hose, H. Paulheim, P.-A. Champin, M. Maleshkova, O. Corcho, P. Ristoski, & M. Alam (Eds.), The Semantic Web (vol. 12731, pp. 532-547). Springer International Publishing. DOI:https://doi.org/10.1007/978-3-030-77385-4_32
Sekulić, I., Aliannejadi, M., & Crestani, F. (2021). Towards facet-driven generation of clarifying questions for conversational search. In Proceedings of the 2021 ACM SIGIR International Conference on Theory of Information Retrieval (pp. 167-175). Association for Computing Machinery. DOI:https://doi.org/10.1145/3471158.3472257
Shavrina, T., Emelyanov, A., Fenogenova, A., Fomin, V., Mikhailov, V., Evlampiev, A., Malykh, V., Larin, V., Natekin, A., Vatulin, A., Romov, P., Anastasiev, D., Zinov, N., & Chertok, A. (2020, May). Humans keep it one hundred: An overview of AI Journey. In N. Calzolari, F. Béchet, P. Blache, K. Choukri, C. Cieri, T. Declerck, S. Goggi, H. Isahara, B. Maegaard, J. Mariani, H. Mazo, A. Moreno, J. Odijk, & S. Piperidis (Eds.), Proceedings of the Twelfth Language Resources and Evaluation Conference (pp. 2276-2284). European Language Resources Association.https://aclanthology.org/2020.lrec-1.277.
Tiedemann, J., & Thottingal, S. (2020). OPUS-MT - Building open translation services for the world. In A. Martins, H. Moniz, S. Fumega, B. Martins, F. Batista, L. Coheur, C. Parra, I. Trancoso, M. Turchi, A. Bisazza, J. Moorkens, A. Guerberof.
M. Nurminen, L. Marg, & M. L. Forcada (Eds.), Proceedings of the 22nd Annual Conference of the European Association for Machine Translation (pp. 479-480). European Association for Machine Translation.https://aclanthology.org/2020.eamt-1.61.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł. ukasz, & Polosukhin, I. (2017). Attention is all you need. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, & R. Garnett (Eds.), Advances in Neural Information Processing Systems (vol. 30, 6000-6010). Curran Associates, Inc.https://dl.acm.org/doi/. DOI:https://doi.org/10.5555/3295222.3295349
Welbl, J., Liu, N. F., & Gardner, M. (2017). Crowdsourcing multiple choice science questions. In L. Derczynski, W. Xu, A. Ritter, & T. Baldwin (Eds.), Proceedings of the 3rd Workshop on Noisy User-generated Text (pp. 94-106). Association for Computational Linguistics. DOI:https://doi.org/10.18653/v1/W17-4413
Xiao, D., Zhang, H., Li, Y., Sun, Y., Tian, H., Wu, H., & Wang, H. (2020). ERNIE-GEN: An enhanced multi-flow pre-training and fine-tuning framework for natural language generation. In C. Bessiere (Ed.) Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence (pp. 3997-4003).International Joint Conferences on Artificial Intelligence Organization. DOI:https://doi.org/10.24963/ijcai.2020/553
Xu, Y., Wang, D., Yu, M., Ritchie, D., Yao, B., Wu, T., Zhang, Z., Li, T., Bradford, N., Sun, B., Hoang, T., Sang, Y., Hou, Y., Ma, X., Yang, D., Peng, N., Yu, Z., & Warschauer, M. (2022). Fantastic questions and where to find them: FairytaleQA - An authentic dataset for narrative comprehension. In S. Muresan, P. Nakov, & A. Villavicencio (Eds.), Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (vol. 1: Long Papers, pp. 447-460). Association for Computational Linguistics. DOI:https://doi.org/10.18653/v1/2022.acl-long.34
Xue, L., Constant, N., Roberts, A., Kale, N., Al-Rfou, R., Siddhant, A., Barua, A., & Raffel, C. (2020). MT5: A massively multilingual pre-trained text-to-text transformer (Version 3). arXiv. DOI:https://doi.org/10.48550/arXiv.2010.11934
Zhang, C. (2023). Automatic generation of multiple-choice questions (Version 1). arXiv. DOI:https://doi.org/10.48550/ARXIV.2303.14576
Zhang, T., Kishore, V., Wu, F., Weinberger, K. Q., & Artzi, Y. (2019). BERTScore: Evaluating text generation with BERT (Version 3). arXiv. DOI:https://doi.org/10.48550/ARXIV.1904.09675
Zmitrovich, D., Abramov, A., Kalmykov, A., Tikhonova, M., Taktasheva, E., Astafurov, D., Baushenko, M., Snegirev, A., Kadulin, V., Markov, S., Shavrina, T., Mikhailov, V., & Fenogenova, A. (2024). A family of pretrained transformer language models for Russian. In N. Calzolari, M.-Y. Kan, V. Hoste, A. Lenci, S. Sakti, & N. Xue (Eds.), Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024) (pp. 507-524). ELRA Language Resource Association. DOI:https://doi.org/10.48550/arXiv.2309.10931

Supplementary files

Supplementary Files

Action

1. JATS XML

Download

Username
Password
Remember me

Forgot password?	Register

Username
Password
Remember me

Forgot password?	Register

Vol 11, No 1 (2025)

Vol 11, No 1 (2025)

Wrong Answers Only: Distractor Generation for Russian Reading Comprehension Questions Using a Translated Dataset

Full Text

Abstract

Keywords

About the authors

Nikita Vyacheslavovich Login

References

Supplementary files