Object Recognition by Components and Relations between Them

Cover Page

Cite item

Full Text

Abstract

The paper’s goal is to develop a methodology and algorithm for the recognition of objects in the environment, keeping the quality with an increasing number of objects. For this purpose, the following problems were solved: recognition of the shape features, estimation of relations between features, and matching between the found features and relations and the defined templates (descriptions of complex and simple objects of the real world). A convolutional neural network is used for the shape feature recognition. In order to train it we used artificially generated images with shape features (3D primitive objects) that were randomly placed on the scene with different properties of their surfaces. The set of relations necessary to recognize objects, which can be represented as a combination of shape features, is formed. Testing on photos of real-world objects showed the ability to recognize real-world objects regardless of their type (in cases where different models and modifications are possible). This paper considers an example of outdoor luminaire recognition. The example shows the algorithm's ability not only to detect an object in the image but also to estimate the position of its components. This solution makes it possible to use the algorithm in the task of object manipulation performed by robotic systems.

About the authors

P. A Slivnitsin

Perm National Research Polytechnic University

Email: slivnitsin.pavel@gmail.com
Professora Pozdeyeva St. 7

L. A Mylnikov

Perm National Research Polytechnic University

Email: lamylnikov@hse.ru
Student St. 38

References

  1. Meel V. The 87 Most Popular Computer Vision Applications for 2023. 2022. Available at: https://viso.ai/applications/computer-vision-applications/ (accessed: 23.11.2022).
  2. Urbonas A., Raudonis V., Maskeliūnas R., Damaševičius R. Automated identification of wood veneer surface defects using faster region-based convolutional neural network with data augmentation and transfer learning // Appl. Sci. 2019. vol. 9(22). pp. 4898.
  3. Орешин А.Н., Лысанов И.Ю. Новый метод автоматизации процессов аутентификации персонала с использованием видеопотока // Труды СПИИРАН. 2017. Т. 5. № 54. С. 35–56.
  4. Bureš L., Gruber I, Neduchal P., Hlaváč M., Hruz M. Semantic text segmentation from synthetic images of full-text documents // SPIIRAS Proc. 2019. vol. 18(6). pp. 1380–1405.
  5. Yu F., Chen H, Wang X., Xian W., Chen Y., Liu F., Madhavan V., Darrell T. BDD100K: A Diverse Driving Dataset for Heterogeneous Multitask Learning // Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. 2020. pp. 2633–2642.
  6. Slivnitsin P., Bachurin A., Mylnikov L. Robotic system position control algorithm based on target object recognition // Proceedings of International Conference on Applied Innovation in IT. Anhalt University of Applied Sciences. 2020. vol. 8(1). pp. 87–94.
  7. Чиров Д.С., Чертова О.Г., Потапчук Т.Н. Методика обоснования требований к системе технического зрения робототехнического комплекса // Труды СПИИРАН. 2017. Т. 2. № 51. С. 152–176.
  8. Delfanti A., Frey B. Humanly Extended Automation or the Future of Work Seen through Amazon Patents // Sci. Technol. Hum. Values. 2021. vol. 46. no. 3. pp. 655–682.
  9. Al-Azzo F., Taqi A.M., Milanova M. Human related-health actions detection using Android Camera based on TensorFlow Object Detection API // Int. J. Adv. Comput. Sci. Appl. 2018. vol. 9. no. 10. pp. 9–23.
  10. Russakovsky O., Deng J., Su H., Krause J., Satheesh S., Ma S., Huang Zh., Karpathy A., Khosla A., Bernstein M., Berg A.C., Fei-Fei L. ImageNet Large Scale Visual Recognition Challenge // Int. J. Comput. Vis. 2015. vol. 115. no. 3. pp. 211–252.
  11. Zou Z., Chen K., Shi Zh., Guo Yu., Ye J. Object Detection in 20 Years: A Survey // arXiv. 2019. pp. 1–39.
  12. He K., Gkioxari G., Dollár P., Girshick R.. Mask R-CNN // IEEE Trans. Pattern Anal. Mach. Intell. 2020. vol. 42. no. 2. pp. 386–397.
  13. Kirillov A., He K., Girshick R., Rother C., Dollar P. Panoptic segmentation // Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. 2019. vol. 2019-June. pp. 9396–9405.
  14. Bazarevsky V, Grishchenko I, Raveendran K., Zhu T., Zhang F., Grundmann M. BlazePose: On-device real-time body pose tracking // arXiv. 2020.
  15. Khan K., Ahmad N., Ullah K., Din I. Multiclass semantic segmentation of faces using CRFs // Turkish J. Electr. Eng. Comput. Sci. 2017. vol. 25. no. 4. pp. 3164–3174.
  16. Zeng A, Yu K.-T., Song S., Suo D., Walker Jr.E., Rodriguez A., Xiao J. Multi-view self-supervised deep learning for 6D pose estimation in the Amazon Picking Challenge // 2017 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2017. pp. 1386–1383.
  17. Yaguchi H., Nagahama K., Hasegawa T., Inaba M. Development of an autonomous tomato harvesting robot with rotational plucking gripper // IEEE Int. Conf. Intell. Robot. Syst. 2016. vol. 2016-Novem. pp. 652–657.
  18. Mylnikov L., Slivnitsin P., Mylnikova A. Robotic System Operation Specification on the Example of Object Manipulation // Proc. Int. Conf. Appl. Innov. IT. 2022. vol. 10. no. 1. pp. 51–59.
  19. Sermanet P., Eigen D., Zhang X., Mathieu M., Fergus R., LeCun Y. OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks // 2nd Int. Conf. Learn. Represent. ICLR 2014 - Conf. Track Proc. 2013. 16 p.
  20. Viola P., Jones M. Rapid Object Detection using a Boosted Cascade of Simple Features // Proceedings IEEE Conf. on Computer Vision and Pattern Recognition. 2001. pp. 511–518.
  21. Dalal N., Triggs B. Histograms of oriented gradients for human detection // Proc. - 2005 IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognition, CVPR 2005. 2005. vol. 1(16). pp. 886–893.
  22. Felzenszwalb P., McAllester D., Ramanan D. A discriminatively trained, multiscale, deformable part model // 2008 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2008. vol. 330. no. 6. pp. 1–8.
  23. Wolpert D.H., Macready W.G. No free lunch theorems for optimization // IEEE Trans. Evol. Comput. 1997. vol. 1(1). pp. 67–82.
  24. Slivnitsin P., Kniazev A., Mylnikov L., Schlechtweg S., Kokoulin A. Influence of Synthetic Image Datasets on the Result of Neural Networks for Object Detection // Proc. Int. Conf. Appl. Innov. IT. 2021. vol. 9(1). pp. 55–60.
  25. Abramovich F., Pensky M. Classification with many classes: Challenges and pluses // J. Multivar. Anal. 2019. vol. 174. pp. 1–25.
  26. Redmon J., Divvala S., Girshick R., Farhadi A. You Only Look Once: Unified, Real-Time Object Detection // 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2016. vol. 2016-Decem. pp. 779–788.
  27. Liu W., Anguelov D., Erhan D. Szegedy C., Reed S., Fu C.-Y., Berg A.C SSD: Single Shot MultiBox Detector // Eccv / (Eds.: Leibe B.). Cham: Springer International Publishing, 2016. vol. 9905. pp. 398–413.
  28. Rezatofighi H., Tsoi N., Gwak J., Sadeghian A., Reid I., Savarese S. Generalized intersection over union: A metric and a loss for bounding box regression // Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. 2019. vol. 2019-June. pp. 658–666.
  29. Ren S., He K., Girshick R., Sun J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks // IEEE Trans. Pattern Anal. Mach. Intell. 2017. vol. 39(6). pp. 1137–1149.
  30. Gomes H.M. Model learning in iconic vision // PQDT – UK & Ireland. 2002. 212 p.
  31. Salas-Moreno R.F., Newcombe R.A., Strasdat H., Kelly P.H.J., Davison A.J. SLAM++: Simultaneous localisation and mapping at the level of objects // Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. 2013. pp. 1352–1359.
  32. Dai A., Nießner M. 3DMV: Joint 3D-multi-view prediction for 3D semantic scene segmentation // Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics). 2018. vol. 11214 LNCS. pp. 458–474.
  33. Dai A., Chang A.X., Savva M., Halber M., Funkhouser T., Nießner M. ScanNet: Richly-annotated 3D reconstructions of indoor scenes // Proc. – 30th IEEE Conf. Comput. Vis. Pattern Recognition, CVPR 2017. 2017. vol. 2017-Janua. pp. 2432–2443.
  34. Le T., Duan Y. PointGrid: A Deep Network for 3D Shape Understanding // Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. 2018. pp. 9204–9214.
  35. Su H., Maji S., Kalogerakis E., Learned-Miller E. Multi-view Convolutional Neural Networks for 3D Shape Recognition // 2015 IEEE International Conference on Computer Vision (ICCV). IEEE, 2015. vol. 32(1). pp. 945–953.
  36. Choy C., Park J., Koltun V. Fully convolutional geometric features // Proc. IEEE Int. Conf. Comput. Vis. 2019. vol. 2019-Octob. pp. 8957–8965.
  37. Biederman I. Recognition-by-Components: A Theory of Human Image Understanding // Psychol. Rev. 1987. vol. 94(2). pp. 115–147.
  38. Thompson P. Margaret Thatcher: A New Illusion // Perception. 1980. vol. 9(4). pp. 483–484.
  39. Biederman I. Visual object rocognition // An Invitation to Cognitive Science. (Eds.: Kosslyn S.M., Osherson D.N.) Cambridge: MIT Press, 1995. pp. 121–165.
  40. Winston P.H. Artificial intelligence. Addison-Wesley Longman Publishing Co., Inc. Boston, MA: Addison-Wesley Publishing Company, 1992. 737 p.
  41. Marr D., Poggio T. A computational theory of human stereo vision // Proc. R. Soc. London - Biol. Sci. 1979. vol. 204. no. 1156. pp. 301–328.
  42. Marr D., Nishihara H.K. Representation and recognition of the spatial organization of three-dimensional shapes // Proc. R. Soc. London. Ser. B. Biol. Sci. 1978. vol. 200. no. 1140. pp. 269–294.
  43. Abdel-Aziz Y.I., Karara H.M. Direct linear transformation from comparator coordinates into object space coordinates in close-range photogrammetry // Photogramm. Eng. Remote Sensing. 2015. vol. 81(2). pp. 103–107.
  44. Bolya D, Zhou C., Xiao F., Lee Y.J. YOLACT++ Better Real-Time Instance Segmentation // IEEE Trans. Pattern Anal. Mach. Intell. 2022. vol. 44. no. 2. pp. 1108–1121.
  45. Bolya D, et al. You Only Look At CoefficienTs. 2020. Available at: https://github.com/dbolya/yolact (accessed: 11.11.2022).
  46. Kazemi V., Sullivan J. One Millisecond Face Alignment with an Ensemble of Regression Trees // Rev. Anthropol. 1992. vol. 21(2). pp. 147–157.
  47. Lin T.Y., Maire M., Belongie S., Bourdev L., Girshick R., Hays J., Perona P., Ramanan D., Zitnick C.L., Dollar P. Microsoft COCO: Common objects in context // Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics). 2014. vol. 8693 LNCS(5). pp. 740–755.
  48. Denninger M., Sundermeyer M., Winkelbauer D., Zidan Y., Olefir D., Elbadrawy M., Lodhi A., Katam H.T. BlenderProc. 2019. 7 p. doi: 10.48550/arXiv.1911.01911.
  49. Blender 3D. Available at: https://www.blender.org/ (accessed: 22.11.2022).
  50. Slivnitsin P. Position estimation for robotic system positioning using the example of outdoor luminaire replacement: master thesis. Koethen: HS Anhalt, 2021. 54 p.
  51. Vershinin D., Mylnikov L. A review and comparison of mapping and trajectory selection algorithms // Proc. Int. Conf. Appl. Innov. IT. 2021. vol. 9(1). pp. 85–92.
  52. Zeng A. и др. TossingBot: Learning to Throw Arbitrary Objects With Residual Physics // IEEE Trans. Robot. 2020. vol. 36(4). pp. 1307–1319.
  53. Chen D. и др. Learning Canonical Shape Space for Category-Level 6D Object Pose and Size Estimation // 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2020. pp. 11970–11979.
  54. Koch S. и др. ABC: A big cad model dataset for geometric deep learning // Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. 2019. vol. 2019-June. pp. 9593–9603.

Supplementary files

Supplementary Files
Action
1. JATS XML

Согласие на обработку персональных данных

 

Используя сайт https://journals.rcsi.science, я (далее – «Пользователь» или «Субъект персональных данных») даю согласие на обработку персональных данных на этом сайте (текст Согласия) и на обработку персональных данных с помощью сервиса «Яндекс.Метрика» (текст Согласия).