Reducing Errors and Computational Load in Road Scene Text Recognition

Taisiya R. Maximova; Максимова Таисия Романовна; Konstantin B. Bulatov; Булатов Константин Булатович

doi:10.14357/20718632240301

Reducing Errors and Computational Load in Road Scene Text Recognition

Autores: Maximova T.R.¹, Bulatov K.B.²
Afiliações:
1. Smart Engines
2. Federal Research Center "Computer Science and Control", Russian Academy of Sciences
Edição: Nº 3 (2024)
Páginas: 3-15
Seção: Intelligent systems and technologies
URL: https://ogarev-online.ru/2071-8632/article/view/286110
DOI: https://doi.org/10.14357/20718632240301
EDN: https://elibrary.ru/MMVTBM
ID: 286110

Citar

Texto integral

Acesso aberto
Acesso é fechado

Acesso está concedido
Acesso é fechado

Somente assinantes

Resumo
Sobre autores
Bibliografia
Arquivos suplementares
Estatísticas

Resumo

This paper focuses on the problem of reduction of the computation load for road scene text recognition by making a stopping decision which cuts off further recognition. The contribution of the paper is the construction of stopping rules for real-time text recognition systems with results combination, with an experimental evaluation on an open dataset RoadText-1k. We found that for fast-working systems the ROVER (Recognizer Output Voting Error Reduction) combination method and majority voting are best for Levenshtein and direct match metrics respectively, however, with an increase of per-frame processing time, ROVER becomes consistently better. Furthermore, while the selection of a single most focused frame is the worst strategy for fast-working systems, its comparative rank increases with the increase of processing time. Moreover, choosing one most focused frame and combining three most focused frames are preferable for fast-working systems when decreasing load on the device is needed.

Palavras-chave

combination method, reducing computational load, real-time recognition, road scene analysis, text recognition, video stream recognition

Sobre autores

Taisiya Maximova

Smart Engines

Autor responsável pela correspondência
Email: t.maksimova@smartengines.com

Programmer

Rússia, Moscow

Konstantin Bulatov

Federal Research Center "Computer Science and Control", Russian Academy of Sciences

Email: kbulatov@smartengines.com

старший научный сотрудник, кандидат технических наук

Rússia, Moscow

Bibliografia

Yuan-Ying Wang, Hung-Yu Wei, "Road Capacity and Throughput for Safe Driving Autonomous Vehicles", IEEE Access, 2020, vol. 8, pp. 95779–95792, 10.1109/ACCESS.2020.2995312.
Paden B., Čáp M., Zheng Yong S., Yershov D., Frazzoli E., "A survey of motion planning and control techniques for self-driving urban vehicles", IEEE Transactions on Intelligent Vehicles, vol. 1, 1998, pp. 33–55, 10.1109/TIV.2016.2578706.
Chen Z., Huang X., "End-to-end learning for lane keeping of self-driving cars", 2017 IEEE Intelligent Vehicles Symposium (IV), 2017, pp. 1856–1860, 10.1109/IVS.2017.7995975.
Gündüz G., Acarman A. T., "A Lightweight Online Multiple Object Vehicle Tracking Method", 2018 IEEE Intelligent Vehicles Symposium (IV), 2018, pp. 427–432, 10.1109/IVS.2018.8500386.
Matsuda A., Matsui T., Matsuda Y., Suwa H., Yasumoto K., "A System for Real-time On-street Parking Detection and Visualization on an Edge Device'", 2021 IEEE International Conference on Pervasive Computing and Communications Workshops and other Affiliated Events (PerCom Workshops), 2021, pp. 227–232, 10.1109/PerComWorkshops51409.2021.9431076.
Balamuralidhar N., Tilon S., Nex F., 2021. "MultEYE: Monitoring system for real-time vehicle detection, tracking and speed estimation from UAV imagery on edge-computing platforms", Remote sensing, 2021, 13(4), p.573, 10.3390/rs13040573.
Zhu Z., Liang D., Zhang S., Huang X., Li B., Shimin Hu, "Traffic-Sign Detection and Classification in the Wild", IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, 10.1109/CVPR.2016.232.
Konushin A.S., Faizov B.V., Shakhuro V.I., "Road images augmentation with synthetic traffic signs using neural networks", Computer Optics, vol. 45, 2021, pp. 736–748, 10.18287/2412-6179-CO-859.
Rajesh R., Rajeev K., Suchithra K., Lekhesh V.P., Gopakumar V., Ragesh N.K., "Coherence vector of Oriented Gradients for traffic sign recognition using Neural Networks", The 2011 International Joint Conference on Neural Networks, 2011, 10.1109/IJCNN.2011.6033318.
Lobanov M., Sholomov D., "On the Acceleration of the Convolutional Neural Network Architecture Based on ResNet in the Task of Road Scene Objects Recognition", Journal of Information Technologies and Computing Systems, 2019, vol. 69, pp. 57–65.
Limonova E. E., Alfonso D. M., Nikolaev D. P., Arlazarov V. V., "Bipolar Morphological Neural Networks: Gate-Efficient Architecture for Computer Vision", IEEE Access, vol. 9, pp. 97569–97581, 2021, doi: 10.1109/ACCESS.2021.3094484.
Bojarski M., Testa D., Dworakowski D., Firner B., Flepp B., Goyal P., Jackel L. D., Monfort M., Muller U., Zhang J., Zhang X., Zhao J., Zieba K., "End to end learning for self-driving cars", Retrieved from https://arxiv.org/abs/1604.07316, 2016, Accessed August 4, 2022.
Naiemi F., Ghods V., Khalesi H., "Scene text detection and recognition: a survey", Multimedia Tools and Applications, 2022, vol. 81, 10.1007/s11042-022-12693-7.
Reddy S., Mathew M., Gomez L., Rusinol M., Karatzas D., Jawahar C.V., "RoadText-1K: Text Detection amp; Recognition Dataset for Driving Videos", 2020 IEEE International Conference on Robotics and Automation (ICRA), pp. 11074–11080, 10.1109/ICRA40945.2020.9196577.
Bulatov K., Fedotova N., Arlazarov V. V., "An approach to road scene text recognition with per-frame accumulation and dynamic stopping decision", Thirteenth International Conference on Machine Vision, 2021, 10.1117/12.2586912.
Bulatov K., Razumnyi N., Arlazarov V.V., "On optimal stopping strategies for text recognition in a video stream as an application of a monotone sequential decision model", Int. J. on Document Analysis and Recognit, 2019, vol. 22, number 3, pp. 303–314, 10.1007/s10032-019-00333-0.
Bulatov K., "A method to reduce errors of string recognition based on combination of several recognition results with per-character alternatives", Bulletin of the South Ural State University. Ser. Mathematical Modelling, Programming & Computer Software, 2019, vol. 12, number 3, pp. 74–88, 10.14529/mmp190307.
Bulatov K., Arlazarov V. V., "Determining optimal frame processing strategies for real-time document recognition systems", Document Analysis and Recognition – ICDAR 2021, Lecture Notes in Computer Science, vol. 12822, 2021, 10.1007/978-3-030-86331-9_18.
Fiscus J. G., "A post-processing system to yield reduced word error rates: Recognizer Output Voting Error Reduction (ROVER)", 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings, 1997, pp. 347–354, 10.1109/ASRU.1997.659110.
Petrova O., Bulatov K., Arlazarov V. L., "Methods of weighted combination for text field recognition in a video stream", Proc. SPIE (ICMV 2019), 2020, vol. 11433, pp. 704–709, 10.1117/12.2559378.
Tolstov I., Martynov S., Farsobina V., Bulatov K., "A modification of a stopping method for text recognition in a video stream with best frame selection", Proc. SPIE (ICMV 2020), 2021, vol. 11605, pp. 464–471, 10.1117/12.2586928.
Mita T., Hori O., "Improvement of Video Text Recognition by Character Selection", Proceedings of Sixth International Conference on Document Analysis and Recognition, 2001, pp. 1089–1093, 10.1109/ICDAR.2001.953954.
Czúni L., Nagy A. M., "Improving object recognition of CNNs with multiple queries and HMMs", Twelfth International Conference on Machine Vision (ICMV 2019), 2020, vol. 11433, pp. 266–272, 10.1117/12.2559393.
Bulatov K. B., Polevoy D. V., "Reducing overconfidence in neural networks by dynamic variation of recognizer relevance", ECMS, 2015, pp. 488–491.
Bulatov K., Fedotova N., Arlazarov V. V., "Fast Approximate Modelling of the Next Combination Result for Stopping the Text Recognition in a Video'', 2020 25th International Conference on Pattern Recognition (ICPR), 2021, pp. 239–246, 10.1109/ICPR48806.2021.9412574.
Zilberstein S., "Using Anytime Algorithms in Intelligent Systems", AI Magazine, 1996, vol. 17, number 3, pp. 73–83, 0.1609/aimag.v17i3.1232.
Yujian L., Bo L., "A Normalized Levenshtein Distance Metric", IEEE Transactions on Pattern Analysis and Machine Intelligence, 2007, vol. 29, number 6, pp. 1091–1095, 10.1109/TPAMI.2007.1078.

Arquivos suplementares

Ação

1. JATS XML

Baixar

Nome de usuário
Senha
Lembrar usuário

Esqueceu a senha?	Cadastro

Nome de usuário
Senha
Lembrar usuário

Esqueceu a senha?	Cadastro

Nº 4 (2024)

Nº 4 (2024)

Reducing Errors and Computational Load in Road Scene Text Recognition

Texto integral

Resumo

Palavras-chave

Sobre autores

Taisiya Maximova

Konstantin Bulatov

Bibliografia

Arquivos suplementares