Автоматическая классификация эмоций в речи: методы и данные
- Авторы: Лемаев В.И.1, Лукашевич Н.В.1
-
Учреждения:
- Московский Государственный Университет имени М.В. Ломоносова
- Выпуск: № 4 (2024)
- Страницы: 159-173
- Раздел: Статьи
- URL: https://ogarev-online.ru/2409-8698/article/view/379468
- DOI: https://doi.org/10.25136/2409-8698.2024.4.70472
- EDN: https://elibrary.ru/WOBSMN
- ID: 379468
Цитировать
Полный текст
Аннотация
Ключевые слова
Об авторах
Владислав Игоревич Лемаев
Московский Государственный Университет имени М.В. Ломоносова
Email: vladzhkv98@mail.ru
аспирант; кафедра теоретической и прикладной лингвистики;
Наталья Валентиновна Лукашевич
Московский Государственный Университет имени М.В. Ломоносова
Email: louk_nat@mai.ru
профессор; кафедра кафедра теоретической и прикладной лингвистики;
Список литературы
- Schneider, S., Alexei Baevski, Ronan Collobert, Auli, M. wav2vec: Unsupervised Pre-Training for Speech Recognition // ArXiv (Cornell University). 2019.
- Cowie, R., Douglas-Cowie, E., Tsapatsoulis, N., Votsis, G., Kollias, S., Fellenz, W., Taylor, J. G. Emotion recognition in human-computer interaction // IEEE Signal Processing Magazine. 2001. V. 18. No. 1. Pp. 32–80.
- Kondratenko, V., Sokolov, A., Karpov, N., Kutuzov, O., Savushkin, N., Minkin, F. Large Raw Emotional Dataset with Aggregation Mechanism // ArXiv (Cornell University). 2022.
- Busso, C., Bulut, M., Lee, C.-C., Kazemzadeh, A., Mower, E., Kim, S., Chang, J. N., Lee, S., Narayanan, S. S. IEMOCAP: interactive emotional dyadic motion capture database // Language Resources and Evaluation. 2008. V. 42. No. 4, Pp. 335–359.
- Wagner, J., Triantafyllopoulos, A., Wierstorf, H., Schmitt, M., Burkhardt, F., Eyben, F., Schuller, B. W. Dawn of the Transformer Era in Speech Emotion Recognition: Closing the Valence Gap // IEEE Transactions on Pattern Analysis and Machine Intelligence. 2023. V. 45. No. 9. Pp. 10745-10759.
- Kossaifi, J., Walecki, R., Panagakis, Y., Shen, J., Schmitt, M., Ringeval, F., Han, J., Pandit, V., Toisoul, A., Schuller, B., Star, K., Hajiyev, E., Pantic, M. SEWA DB: A Rich Database for Audio-Visual Emotion and Sentiment Research in the Wild // IEEE Transactions on Pattern Analysis and Machine Intelligence. 2021. V. 43. No. 3. Pp. 1022–1040.
- Mohamad Nezami, O., Jamshid Lou, P., Karami, M ShEMO: a large-scale validated database for Persian speech emotion detection // Language Resources and Evaluation. 2018. V. 53. No. 3. Pp. 1–16.
- Inger Samsø Engberg, Anya Varnich Hansen, Ove Kjeld Andersen, Dalsgaard, P. Design, recording and verification of a danish emotional speech database // EUROSPEECH. 1997. V. 4. Pp. 1695–1698.
- Hozjan, V., Kačič, Z. Context-Independent Multilingual Emotion Recognition from Speech Signals // International Journal of Speech Technology. 2003. V. 6. Pp. 311–320.
- Lotfian, R., Busso, C. Building Naturalistic Emotionally Balanced Speech Corpus by Retrieving Emotional Speech from Existing Podcast Recordings // IEEE Transactions on Affective Computing. 2019. V. 10. No. 4. Pp. 471–483.
- Grimm, M., Kroschel, K., Narayanan, S. The Vera am Mittag German audio-visual emotional speech database // International Conference on Multimedia and Expo. 2008. Pp. 865–868.
- Livingstone, S. R., Russo, F. A. The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English // PLOS ONE. 2018. V. 13. No. 5.
- Lubenets, I., Davidchuk, N., Amentes, A. Aniemore. GitHub. 2022. URL: https://github.com/aniemore/Aniemore
- Andrew, A. M. An Introduction to Support Vector Machines and Other Kernel‐based Learning Methods // Kybernetes. 2001. V. 30. No. 1. Pp. 103–115.
- Ho, T. K. Random decision forests // Proceedings of 3rd international conference on document analysis and recognition. 1995. V. 1. Pp. 278–282.
- Ali, S., Tanweer, S., Khalid, S., Rao, N. Mel Frequency Cepstral Coefficient: A Review // ICIDSSD. 2021.
- Zheng, W. Q., Yu, J. S., Zou, Y. X. An experimental study of speech emotion recognition based on deep convolutional neural networks // 2015 International Conference on Affective Computing and Intelligent Interaction (ACII). 2015. Pp. 827–831.
- Hochreiter, S., Schmidhuber, J. Long short-term memory // Neural computation. 1997. V. 9. No. 8. Pp. 1735–1780.
- Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., Polosukhin, I. Attention Is All You Need // ArXiv (Cornell University). 2017.
- Devlin, J., Chang, M. W., Lee, K., Toutanova, K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding // ArXiv (Cornell University). 2018.
- Baevski, A., Zhou, H., Mohamed, A., Auli, M. wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations // ArXiv (Cornell University). 2020.
- Hsu, W.-N., Bolte, B., Tsai, Y.-H. H., Lakhotia, K., Salakhutdinov, R., Mohamed, A. HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units // IEEE/ACM Transactions on Audio, Speech, and Language Processing. 2021. V. 29. Pp. 3451–3460.
- Chen, S., Wang, C., Chen, Z., Wu, Y., Liu, S., Chen, Z., Li, J., Kanda, N., Yoshioka, T., Xiao, X., Wu, J., Zhou, L., Ren, S., Qian, Y., Qian, Y., Wu, J., Zeng, M., Yu, X., Wei, F. WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing // IEEE Journal of Selected Topics in Signal Processing. 2022. V. 16. No. 6. Pp. 1505–1518.
- Jang, E., Gu, S., Poole, B. Categorical Reparametrization with Gumbel-Softmax // ArXiv (Cornell University). 2016.
- Krizhevsky, A., Sutskever, I., Hinton, G. E. ImageNet Classification with Deep Convolutional Neural Networks // Communications of the ACM. 2012. V. 60. No.6. Pp. 84–90.
- Cho, K., van Merrienboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., Bengio, Y. Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation // ArXiv (Cornell University). 2014.
- Yang, S., Chi, P. H., Chuang, Y. S., Lai, C. I. J., Lakhotia, K., Lin, Y. Y., Liu, A. T., Shi, J., Chang, X., Lin, G. T., Huang, T. H., Tseng, W. C., Lee, K., Liu, D. R., Huang, Z., Dong, S., Li, S. W., Watanabe, S., Mohamed, A., Lee, H. SUPERB: Speech processing Universal PERformance Benchmark // ArXiv (Cornell University). 2021.
- Chen, W., Xing, X., Xu, X., Pang, J., Du, L. SpeechFormer++: A Hierarchical Efficient Framework for Paralinguistic Speech Processing // IEEE/ACM Transactions on Audio, Speech, and Language Processing. 2023. V. 31. Pp. 775–788.
Дополнительные файлы

