Triplet-based knowledge mining using pretrained large language models

Bulat R. Zinnurov; Зиннуров Булат Ринатович; Zinnur M. Gizatullin; Гизатуллин Зиннур Марселевич

doi:10.33693/2313-223X-2025-12-4-13-19

Triplet-based knowledge mining using pretrained large language models

Authors: Zinnurov B.R.¹, Gizatullin Z.M.¹
Affiliations:
1. Kazan National Research Technical University named after A.N. Tupolev – KAI
Issue: Vol 12, No 4 (2025)
Pages: 13-19
Section: MATHEMATICAL MODELING, NUMERICAL METHODS AND COMPLEX PROGRAMS
URL: https://ogarev-online.ru/2313-223X/article/view/380182
DOI: https://doi.org/10.33693/2313-223X-2025-12-4-13-19
EDN: https://elibrary.ru/FLJQXL
ID: 380182

Cite item

Full Text

Open Access
Restricted Access

Access granted
Restricted Access

Subscription Access

Abstract
About the authors
References
Supplementary files
Statistics

Abstract

Extracting structured information from text is a key task in natural language processing. Large language models for information extraction tasks achieve high accuracy thanks to pre-training on huge volumes of data. However, such models require significant computational resources and are unavailable for local use due to their dependence on cloud infrastructure. Therefore, compact, open-source large language models that can be retrained locally are increasingly being used to address this problem. This paper evaluates the effectiveness of retraining compact large language models for automated triplet information extraction from unstructured text. The Mistral model with seven billion parameters was used in the study. The model was fine-tuned on a custom dataset consisting of 650 examples, each containing an instruction, an input text and an expected output. The results confirm the effectiveness of retraining: the F1-score increased several-fold compared to the baseline model. The retrained version of the model demonstrates competitiveness with the large-scale DeepSeek language model with 685 billion parameters. The obtained results highlight the potential of compact open large language models for knowledge extraction tasks under resource constraints, such as knowledge graph construction.

Keywords

large language model, retraining, instruction tuning, triplet extraction, knowledge graph

About the authors

Bulat R. Zinnurov

Kazan National Research Technical University named after A.N. Tupolev – KAI

Author for correspondence.
Email: b.zinnurov@yandex.ru
ORCID iD: 0009-0000-7633-7302
SPIN-code: 8905-0826

postgraduate student, Department of Automated Systems for Information Processing and Control

Russian Federation, Kazan

Zinnur M. Gizatullin

Kazan National Research Technical University named after A.N. Tupolev – KAI

Email: zmgizatullin@kai.ru
ORCID iD: 0000-0003-0571-5593
SPIN-code: 6882-0089
Scopus Author ID: 56165279600
ResearcherId: E-8566- 2017

Dr. Sci. (Eng.), Professor, Professor, Department of Automated Systems for Information Processing and Control

Russian Federation, Kazan

References

Ivanova G.S., Martynyuk P.A. Analysis of systems for extracting information from unstructured text documents. Neurocomputers: Development, Application. 2025. Vol. 27. No. 1. Pp. 5–27. (In Rus.). doi: 10.18127/j19998554-202501-01.
Ma R., Wang P., Liu C. et al. S2R: Teaching LLMs to self-verify and self-correct via reinforcement learning. In: Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics. Vol. 1: Long papers. Vienna, Austria: Association for Computational Linguistics. Pp. 22632–22654. doi: 10.18653/v1/2025.acl-long.1104.
Ji S., Pan S., Cambria E. et al. A survey on knowledge graphs: Representation, acquisition, and applications. IEEE Transactions on Neural Networks and Learning Systems. 2021. Vol. 33. No. 2. Pp. 494–514. doi: 10.1109/TNNLS.2021.3070843.
Procko T.T., Ochoa O. Graph retrieval-augmented generation for large language models: A survey. In: Proceedings of the Conference on AI, Science, Engineering, and Technology (AIxSET). Laguna Hills, CA, USA, 2024. Pp. 166–169. doi: 10.1109/AIxSET62544.2024.00030.
Dagdelen J. et al. Structured information extraction from scientific text with large language models. Nature Communications. 2024. Vol. 15. Art. 1418. doi: 10.1038/s41467-024-45563-x.
Chung H. W., Hou L., Longpre S. et al. Scaling instruction-finetuned language models. Journal of Machine Learning Research. 2024. Vol. 25. No. 70. Pp. 1–53.
Hu E.J., Shen Y., Wallis P. et al. Lora: Low-rank adaptation of large language models. ICLR. 2022. Vol. 1. No. 2. URL: https://arxiv.org/abs/2106.09685
Wang J. LLM-based fine-tuning data generation for relation triplet extraction with expert ensemble and demonstration selection. In: Proceedings of the IEEE 12th International Conference on Intelligent Systems (IS). IEEE, 2024. Pp. 1–7. doi: 10.1109/IS61756.2024.10705209.
Zhang Y., Sadler T., Taesiri M.R. et al. Fine-tuning language models for triple extraction with data augmentation. In: Proceedings of the 1st Workshop on Knowledge Graphs and Large Language Models (KaLLM 2024). 2024. Pp. 116–124. doi: 10.18653/v1/2024.kallm-1.12.

Supplementary files

Supplementary Files

Action

1. JATS XML

Download

Username
Password
Remember me

Forgot password?	Register

Username
Password
Remember me

Forgot password?	Register