Multimodal Stock Price Prediction: A Case Study of the Russian Securities Market

Cover Page

Cite item

Full Text

Abstract

Classical asset price forecasting methods primarily rely on numerical data, such as price time series, trading volumes, limit order book data, and technical analysis indicators. However, the news flow plays a significant role in price formation, making the development of multimodal approaches that combine textual and numerical data for improved prediction accuracy highly relevant.This paper addresses the problem of forecasting financial asset prices using the multimodal approach that combines candlestick time series and textual news flow data. A unique dataset was collected for the study, which includes time series for 176 Russian stocks traded on the Moscow Exchange and $79,555$ financial news articles in Russian.For processing textual data, pre-trained models RuBERT and Vikhr-Qwen2.5-0.5b-Instruct (a large language model) were used, while time series and vectorized text data were processed using an LSTM recurrent neural network. The experiments compared models based on a single modality (time series only) and two modalities, as well as various methods for aggregating text vector representations.Prediction quality was estimated using two key metrics: Accuracy (direction of price movement prediction: up or down) and Mean Absolute Percentage Error (MAPE), which measures the deviation of the predicted price from the true price. The experiments showed that incorporating textual modality reduced the MAPE value by 55%.The resulting multimodal dataset holds value for the further adaptation of language models in the financial sector. Future research directions include optimizing textual modality parameters, such as the time window, sentiment, and chronological order of news messages.

About the authors

Kasymkhan Usufovich Khubiyev

Sirius University of Science and Technology

Email: kasymkhankhubievnis@gmail.com
Researcher, Center of Social and Economic Forecasting; Master's Student of „Financial Mathematics and Financial Technologies“, Sirius University of Science and Technology, Sirius, Russia. Research interests: artificial intelligence and its application in science, finance, industry, and business

Mikhail Evgenyevich Semenov

Sirius University of Science and Technology

Email: semenov.me@talantiuspeh.ru
PhD in Physics and Mathematics, Scientific Supervisor of the „Financial Mathematics and Financial Technologies“ direction, Sirius University of Science and Technology, Sirius, Russia. Research interests: Information technology, intelligent data processing and analysis technologies.

References

  1. K. Mishev, A. Gjorgjevikj, I. Vodenska, L. Chitkushev, D. Trajanov. „Evaluation of sentiment analysis in finance: from lexicons to transformers“, IEEE Access, 8 (2020), pp. 131662–131682 DOI https://doi.org/10.1109/ACCESS.2020.3009626.
  2. T. -T. Ho, Y. Huang. „Stock price movement prediction using sentiment analysis and CandleStick chart representation“, Sensors, 21:23 (2021), 7957, 18 DOI https://doi.org/10.3390/s21237957 pp.
  3. M. Jaggi, P. Mandal, S. Narang, U. Naseem, M. Khushi. „Text mining of stocktwits data for predicting stock prices“, Applied System Innovation, 4:1 (2021), 13, 22 DOI https://doi.org/10.3390/asi4010013 pp.
  4. B. Fazlija, P. Harder. „Using financial news sentiment for stock price direction prediction“, Mathematics, 10:13 (2022), 2156, 20 pp.
  5. Y. Xinli, Ch. Zheng, L. Yuan, D. Shujing, L. Zongyi, L. Yanbin. Temporal data meets LLM — Explainable financial time series forecasting, 2023, 13 pp.
  6. Zh. Boyu, Y. Hongyang, X. -Y. Liu. Instruct-FinGPT: Financial sentiment analysis by instruction tuning of general-purpose large language models, 2023, 7 pp.
  7. T. D. Kulikova, E. Y. Kovtun, S. A. Budennyy. „Do we benefit from the categorization of the news flow in the stock price prediction problem?“, Dokl. Math., 108, Suppl. 2 (2023), pp. S503–S510 DOI https://doi.org/10.1134/S1064562423701648.
  8. Y. Kuratov, M. Arkhipov. Adaptation of deep bidirectional multilingual transformers for Russian language, 2019, 8 pp.
  9. A. Nikolich, K. Korolev, A. Shelmanov, I. Kiselev. Vikhr: The family of open-source instruction-tuned large language models for Russian, 2024, 8 pp.
  10. A. Yang, B. Yang, B. Hui, B. Zheng, B. Yu, Ch. Zhou, Ch. Li, Ch. Li, D. Liu, F. Huang, G. Dong, H. Wei, H. Lin, J. Tang, J. Wang, J. Yang, J. Tu, J. Zhang, J. Ma, J. Yang, J. Xu, J. Zhou, J. Bai, J. He, J. Lin, K. Dang, K. Lu, K. Chen, K. Yang, M. Li, M. Xue, N. Ni, P. Zhang, P. Wang, R. Peng, R. Men, R. Gao, R. Lin, Sh. Wang, Sh. Bai, S. Tan, T. Zhu, T. Li, T. Liu, W. Ge, X. Deng, X. Zhou, X. Ren, X. Zhang, X. Wei, X. Ren, X. Liu, Y. Fan, Y. Yao, Y. Zhang, Y. Wan, Y. Chu, Y. Liu, Z. Cui, Zh. Zhang, Zh. Guo, Zh. Fan. Qwen2 Technical Report, 2024, 26 pp.
  11. K. Khubiev. Russian financial news dataset, Kaggle Platform, 2025 URL https://www.kaggle.com/datasets/kkhubiev/russian-financial-news doi: 10.34740/kaggle/dsv/10614647.

Supplementary files

Supplementary Files
Action
1. JATS XML


Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.

Согласие на обработку персональных данных

 

Используя сайт https://journals.rcsi.science, я (далее – «Пользователь» или «Субъект персональных данных») даю согласие на обработку персональных данных на этом сайте (текст Согласия) и на обработку персональных данных с помощью сервиса «Яндекс.Метрика» (текст Согласия).