Aspects of creating a corporate question-and-answer system using generative pre-trained language models

Abstract

The article describes various ways to use generative pre-trained language models to build a corporate question-and-answer system. A significant limitation of the current generative pre-trained language models is the limit on the number of input tokens, which does not allow them to work "out of the box" with a large number of documents or with a large document. To overcome this limitation, the paper considers the indexing of documents with subsequent search query and response generation based on two of the most popular open source solutions at the moment – the Haystack and LlamaIndex frameworks. It has been shown that using the open source Haystack framework with the best settings allows you to get more accurate answers when building a corporate question-and-answer system compared to the open source LlamaIndex framework, however, requires the use of an average of several more tokens.    The article used a comparative analysis to evaluate the effectiveness of using generative pre-trained language models in corporate question-and-answer systems using the Haystack and Llamaindex frameworks. The evaluation of the obtained results was carried out using the EM (exact match) metric. The main conclusions of the conducted research on the creation of question-answer systems using generative pre-trained language models are: 1. Using hierarchical indexing is currently extremely expensive in terms of the number of tokens used (about 160,000 tokens for hierarchical indexing versus 30,000 tokens on average for sequential indexing), since the response is generated by sequentially processing parent and child nodes. 2. Processing information using the Haystack framework with the best settings allows you to get somewhat more accurate answers than using the LlamaIndex framework (0.7 vs. 0.67 with the best settings). 3. Using the Haystack framework is more invariant with respect to the accuracy of responses in terms of the number of tokens in the chunk. 4. On average, using the Haystack framework is more expensive in terms of the number of tokens (about 4 times) than the LlamaIndex framework. 5. The "create and refine" and "tree summarize" response generation modes for the LlamaIndex framework are approximately the same in terms of the accuracy of the responses received, however, more tokens are required for the "tree summarize" mode.

References

  1. Simmons R. F., Klein S., McConlogue K. Indexing and dependency logic for answering English questions // American Documentation. – 1964. – Т. 15. – №. 3. – С. 196-204.
  2. Luo M. et al. Choose your qa model wisely: A systematic study of generative and extractive readers for question answering // arXiv preprint arXiv:2203.07522. – 2022.
  3. Zhou C. et al. A comprehensive survey on pretrained foundation models: A history from bert to chatgpt // arXiv preprint arXiv:2302.09419. – 2023.
  4. Lewis P. et al. Retrieval-augmented generation for knowledge-intensive nlp tasks //Advances in Neural Information Processing Systems. – 2020. – Т. 33. – С. 9459-9474.
  5. Маслюхин С. М. Диалоговая система на основе устных разговоров с доступом к неструктурированной базе знаний // Научно-технический вестник информационных технологий, механики и оптики. – 2023. – Т. 23. – №. 1. – С. 88-95.
  6. Евсеев Д. А., Бурцев М. С. Использование графовых и текстовых баз знаний в диалоговом ассистенте DREAM // Труды Московского физико-технического института. – 2022. – Т. 14. – №. 3 (55). – С. 21-33.
  7. Su D. Generative Long-form Question Answering: Relevance, Faithfulness and Succinctness //arXiv preprint arXiv:2211.08386. – 2022.
  8. Kim M. Y. et al. Legal information retrieval and entailment based on bm25, transformer and semantic thesaurus methods // The Review of Socionetwork Strategies. – 2022. – Т. 16. – №. 1. – С. 157-174.
  9. Ke W. Alternatives to Classic BM25-IDF based on a New Information Theoretical Framework //2022 IEEE International Conference on Big Data (Big Data). – IEEE, 2022. – С. 36-44.
  10. Rodriguez P. L., Spirling A. Word embeddings: What works, what doesn’t, and how to tell the difference for applied research // The Journal of Politics. – 2022. – Т. 84. – №. 1. – С. 101-115.
  11. Жеребцова Ю. А., Чижик А. В. Сравнение моделей векторного представления текстов в задаче создания чат-бота // Вестник Новосибирского государственного университета. Серия: Лингвистика и межкультурная коммуникация. – 2020. – Т. 18. – №. 3. – С. 16-34.
  12. Digutsch J., Kosinski M. Overlap in meaning is a stronger predictor of semantic activation in GPT-3 than in humans //Scientific Reports. – 2023. – Т. 13. – №. 1. – С. 5035.
  13. Kamnis S. Generative pre-trained transformers (GPT) for surface engineering // Surface and Coatings Technology. – 2023. – С. 129680.
  14. Khadija M. A., Aziz A., Nurharjadmo W. Automating Information Retrieval from Faculty Guidelines: Designing a PDF-Driven Chatbot powered by OpenAI ChatGPT // 2023 International Conference on Computer, Control, Informatics and its Applications (IC3INA). – IEEE, 2023. – С. 394-399.
  15. Johnson J., Douze M., Jégou H. Billion-scale similarity search with gpus // IEEE Transactions on Big Data. – 2019. – Т. 7. – №. 3. – С. 535-547.
  16. Rajpurkar P. et al. Squad: 100,000+ questions for machine comprehension of text // arXiv preprint arXiv:1606.05250. – 2016.
  17. Bai Y., Wang D. Z. More than reading comprehension: A survey on datasets and metrics of textual question answering // arXiv preprint arXiv:2109.12264. – 2021.

Supplementary files

Supplementary Files
Action
1. JATS XML

Согласие на обработку персональных данных

 

Используя сайт https://journals.rcsi.science, я (далее – «Пользователь» или «Субъект персональных данных») даю согласие на обработку персональных данных на этом сайте (текст Согласия) и на обработку персональных данных с помощью сервиса «Яндекс.Метрика» (текст Согласия).