Using convolutional neural networks for acoustic-based emergency vehicle detection

Cover Page

Cite item

Full Text

Abstract

Background: A siren is a special signal given by emergency vehicles such as fire trucks, police cars and ambulances to warn drivers or pedestrians on the road. However, drivers sometimes may not hear the siren due to the sound insulation of a modern car, the noise of city traffic, or their own inattention. This problem can lead to a delay in the provision of emergency services or even to traffic accidents.

Aim: develop an acoustic method for detecting the presence of emergency vehicles on the road through the use of convolutional neural networks.

Materials and Methods: The algorithm of work is based on the conversion of sound from the external environment into its spectrogram, for analysis by a convolutional neural network. An open dataset (“Emergency Vehicle Siren Sounds”) from sources available on Internet sites such as Google and Youtube, saved in the “.wav” audio format, was used as a dataset for siren sounds and city traffic. The code was developed on the Google.Colab platform using cloud storage.

Results: The conducted experiments showed that the proposed method and model of the neural network make it possible to achieve an average efficiency of determining the type of sound with an accuracy of 93.3 % and a speed recognition of 0.0004±5 % of a second.

Conclusion: The use of the developed technology for recognizing siren sounds in city noize will improve traffic safety and increase the chances of preventing a dangerous situation. Also, this system can be an additional assistant for hearing-impaired people while driving and everyday life for timely notification of the presence of emergency services nearby.

About the authors

Andrey A. Lisov

South Ural State University

Author for correspondence.
Email: lisov.andrey2013@yandex.ru
ORCID iD: 0000-0001-7282-8470
SPIN-code: 1956-3662

postgraduate student

Russian Federation, Chelyabinsk

Askar Z. Kulganatov

South Ural State University

Email: kulganatov97@gmail.com
ORCID iD: 0000-0002-7576-7949
SPIN-code: 7607-9723

postgraduate student

Russian Federation, Chelyabinsk

Sergei A. Panishev

South Ural State University

Email: panishef.serega@mail.ru
ORCID iD: 0000-0003-2753-2341
SPIN-code: 2676-5207

postgraduate student

Russian Federation, Chelyabinsk

References

  1. Kanzaria HK, Probst MA, Hsia RY. Emergency department death rates dropped by nearly 50 percent, 1997–2011. Health Affairs. 2016 Jul 1;35(7):1303-8. doi: 10.1377/hlthaff.2015.1394
  2. Lee J, Park J, Kim KL, Nam J. Sample-level deep convolutional neural networks for music auto-tagging using raw waveforms. arXiv preprint arXiv:1703.01789. 2017 Mar 6. doi: 10.48550/arXiv.1703.01789
  3. Zhu Z, Engel JH, Hannun A. Learning multiscale features directly from waveforms. arXiv preprint arXiv:1603.09509. 2016 Mar 31. doi: 10.48550/arXiv.1603.09509
  4. Choi K, Fazekas G, Sandler M. Automatic tagging using deep convolutional neural networks. arXiv preprint arXiv:1606.00298. 2016 Jun 1. doi: 10.48550/arXiv.1606.00298
  5. Nasrullah Z, Zhao Y. Music artist classification with convolutional recurrent neural networks. In2019 International Joint Conference on Neural Networks (IJCNN) 2019 Jul 14 (pp. 1-8). IEEE. doi: 10.1109/IJCNN.2019.8851988
  6. Wang Z, Muknahallipatna S, Fan M, et al. Music classification using an improved crnn with multi-directional spatial dependencies in both time and frequency dimensions. In2019 International Joint Conference on Neural Networks (IJCNN) 2019 Jul 14 (pp. 1-8). IEEE. doi: 10.1109/IJCNN.2019.8852128
  7. Dieleman S, Brakel P, Schrauwen B. Audio-based music classification with a pretrained convolutional network. In12th International Society for Music Information Retrieval Conference (ISMIR-2011) 2011 (pp. 669-674). University of Miami.
  8. Chen MT, Li BJ, Chi TS. CNN based two-stage multi-resolution end-to-end model for singing melody extraction. InICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2019 May 12 (pp. 1005-1009). IEEE. doi: 10.1109/ICASSP.2019.8683630
  9. Phan H, Koch P, Katzberg F, et al. Audio scene classification with deep recurrent neural networks. arXiv preprint arXiv:1703.04770. 2017 Mar 14. doi: 10.48550/arXiv.1703.04770
  10. Gimeno P, Viñals I, Ortega A, et al. Multiclass audio segmentation based on recurrent neural networks for broadcast domain data. EURASIP Journal on Audio, Speech, and Music Processing. 2020 Dec;2020:1-9.
  11. Dai J, Liang S, Xue W, et al. Long short-term memory recurrent neural network based segment features for music genre classification. In2016 10th International Symposium on Chinese Spoken Language Processing (ISCSLP) 2016 Oct 17 (pp. 1-5). IEEE. doi: 10.1109/ISCSLP.2016.7918369
  12. Zhang Z, Xu S, Zhang S, et al. Attention based convolutional recurrent neural network for environmental sound classification. Neurocomputing. 2021 Sep 17;453:896-903. doi: 10.1016/j.neucom.2020.08.069
  13. Wang H, Zou Y, Chong D, Wang W. Environmental sound classification with parallel temporal-spectral attention. arXiv preprint arXiv:1912.06808. 2019 Dec 14. doi: 10.48550/arXiv.1912.06808
  14. Sang J, Park S, Lee J. Convolutional recurrent neural networks for urban sound classification using raw waveforms. In2018 26th European Signal Processing Conference (EUSIPCO) 2018 Sep 3 (pp. 2444-2448). IEEE. doi: 10.23919/EUSIPCO.2018.8553247
  15. Choi K, Fazekas G, Sandler M, Cho K. Convolutional recurrent neural networks for music classification. In2017 IEEE International conference on acoustics, speech and signal processing (ICASSP) 2017 Mar 5 (pp. 2392-2396). IEEE. doi: 10.1109/ICASSP.2017.7952585
  16. Gwardys G, Grzywczak D. Deep image features in music information retrieval. International Journal of Electronics and Telecommunications. 2014;60:321-6. doi: 10.2478/eletel-2014-0042
  17. Krizhevsky A, Sutskever I, Hinton GE. Imagenet classification with deep convolutional neural networks. Communications of the ACM. 2017 May 24;60(6):84-90. doi: 10.1145/3065386
  18. Deng J, Dong W, Socher R, et al. Imagenet: A large-scale hierarchical image database. In2009 IEEE conference on computer vision and pattern recognition 2009 Jun 20 (pp. 248-255). IEEE. doi: 10.1109/CVPR.2009.5206848
  19. Emergency Vehicle Siren Sounds [Internet]. Kaggle [cited 2023 February 23]. Available from: https://www.kaggle.com/vishnu0399/emergency-vehicle-siren-sounds
  20. CNN for audio recognition. GitHub [cited 2023 February 23]. Available from: https://github.com/AnLiMan/CNN-for-audio-recognition

Supplementary files

Supplementary Files
Action
1. JATS XML
2. Fig. 1. CNN architecture for emergency siren recognition

Download (29KB)
3. Fig. 2. Ambulance spectrogram

Download (4KB)
4. Fig. 3. Spectrogram of a fire truck (firetruck)

Download (4KB)
5. Fig. 4. Spectrogram of city noise (traffic)

Download (4KB)
6. Fig. 5. "Purified" spectrogram of the audio track

Download (63KB)
7. Fig. 6. Algorithm for training a convolutional neural network

Download (22KB)
8. Fig. 7. Graph of the learning process

Download (48KB)
9. Fig. 8. Checking recognition accuracy on 16 random spectrograms

Download (225KB)
10. Fig. 9. Checking a single image from the test set

Download (23KB)

Copyright (c) 2023 Lisov A.A., Kulganatov A.Z., Panishev S.A.

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.

link to the archive of the previous title

Согласие на обработку персональных данных

 

Используя сайт https://journals.rcsi.science, я (далее – «Пользователь» или «Субъект персональных данных») даю согласие на обработку персональных данных на этом сайте (текст Согласия) и на обработку персональных данных с помощью сервиса «Яндекс.Метрика» (текст Согласия).