Hope Speech Detection Using Social Media Discourse (Posi-Vox-2024): A Transfer Learning Approach
- Авторлар: Ahmad M.1, Usman S.2, Farid H.3, Ameer I.4, Muzzamil M.5, Ameer H.5, Sidorov G.1, Batyrshin I.1
-
Мекемелер:
- Instituto Politecnico Nacional (CIC-IPN)
- Institute of Arts and Culture
- Independent Researcher
- Pennsylvania State University at Abington
- Islamia University of Bahawalpur
- Шығарылым: Том 10, № 4 (2024)
- Беттер: 31-43
- Бөлім: Research Papers
- URL: https://ogarev-online.ru/2411-7390/article/view/356607
- DOI: https://doi.org/10.17323/jle.2024.22443
- ID: 356607
Дәйексөз келтіру
Аннотация
Purpose: This study addresses joint multilingual hope speech detection in the Urdu, English, and Arabic languages using a transfer learning paradigm. We developed a new multilingual dataset named Posi-Vox-2024 and employed a joint multilingual technique to design a universal classifier for multilingual dataset. We explored the fine-tuned BERT model, which demonstrated a remarkable performance in capturing semantic and contextual information.
Method: The framework includes (1) preprocessing, (2) data representation using BERT, (3) fine-tuning, and (4) classification of hope speech into binary (‘hope’ and ‘not hope’) and multi-class (realistic, unrealistic, and generalized hope) categories.
Results: Our proposed model (BERT) demonstrated benchmark performance to our dataset, achieving 0.78 accuracy in binary classification and 0.66 in multi-class classification, with a 0.04 and 0.08 performance improvement over the baselines (Logistic Regression, in binary class 0.75 and multi class 0.61), respectively.
Conclusion: Our findings will be applied to improve automated systems for detecting and promoting supportive content in English, Arabic and Urdu on social media platforms, fostering positive online discourse. This work sets new benchmarks for multilingual hope speech detection, advancing existing knowledge and enabling future research in underrepresented languages.
Негізгі сөздер
Авторлар туралы
Muhammad Ahmad
Instituto Politecnico Nacional (CIC-IPN)
Email: mahmad.riaz102@gmail.com
ORCID iD: 0009-0003-8799-8212
Mexico City, Mexico
Sardar Usman
Institute of Arts and Culture
Email: sardar.usman@guas.edu.pk
Lahore, Pakistan
Humaira Farid
Independent Researcher
Email: sa@sfa.ty
California, USA
Iqra Ameer
Pennsylvania State University at Abington
Email: dfgdf@dsg.tu
PA, USA
Muhammad Muzzamil
Islamia University of Bahawalpur
Email: Muzamil.abdulsalam786@gmail.com
Pakistan
Hmaza Ameer
Islamia University of Bahawalpur
Email: asa@sdfsd.tyt
Pakistan
Grigori Sidorov
Instituto Politecnico Nacional (CIC-IPN)
Email: sidorov@cic.ipn.mx
Mexico City, Mexico
Ildar Batyrshin
Instituto Politecnico Nacional (CIC-IPN)
Email: batyr1@cic.ipn.mx
ORCID iD: 0000-0003-0241-7902
Mexico City, Mexico
Әдебиет тізімі
- Alawadh, H. M., Alabrah, A., Meraj, T., & Rauf, H. T. (2023). English language learning via YouTube: An NLP-based analysis of users' comments. Computers, 12(2), 24. DOI:https://doi.org/10.3390/computers12020024
- Anand, M., Sahay, K. B., Ahmed, M. A., Sultan, D., Chandan, R. R., & Singh, B. (2023). Deep learning and natural language processing in computation for offensive language detection in online social networks by feature selection and ensemble classification techniques. Theoretical Computer Science, 943, 203-218. DOI:https://doi.org/10.1016/j.tcs.2022.06.020
- Anjum, & Katarya, R. (2024). Hate speech, toxicity detection in online social media: a recent survey of state of the art and opportunities. International Journal of Information Security, 23(1), 577-608. DOI:https://doi.org/10.1007/s10207-023-00755-2
- Arif, M., Shahiki Tash, M., Jamshidi, A., Ullah, F., Ameer, I., Kalita, J.,.. & Balouchzahi, F. (2024). Analyzing hope speech from psycholinguistic and emotional perspectives. Scientific Reports, 14(1), 23548. DOI:https://doi.org/10.1038/s41598-024-74630-y
- Austin, D., Sanzgiri, A., Sankaran, K., Woodard, R., Lissack, A., & Seljan, S. (2020). Classifying sensitive content in online advertisements with deep learning. International Journal of Data Science and Analytics, 10(3), 265-276. DOI:https://doi.org/10.1007/s41060-020-00212-6
- Balouchzahi, F., Sidorov, G., & Gelbukh, A. (2023). Polyhope: Two-level hope speech detection from tweets. Expert Systems with Applications, 225, 120078. DOI:https://doi.org/10.1016/j.eswa.2023.120078
- Chakravarthi, B. R. (2022). Hope speech detection in YouTube comments. Social Network Analysis and Mining, 12(1), 75. DOI:https://doi.org/10.1007/s13278-022-00901-z
- Chakravarthi, B. R. (2022). Multilingual hope speech detection in English and Dravidian languages. International Journal of Data Science and Analytics, 14(4), 389-406. DOI:https://doi.org/10.1007/s41060-022-00341-0
- Chinnappa, D. (2021). Dhivya-hope-detection@ LT-EDI-EACL2021: Multilingual hope speech detection for code-mixed and transliterated texts. In Proceedings of the First Workshop on Language Technology for Equality, Diversity and Inclusion (pp. 73-78). Association for Computational Linguistics.https://aclanthology.org/2021.ltedi-1.11.
- Davidson, T., Bhattacharya, D., & Weber, I. (2019). Racial bias in hate speech and abusive language detection datasets. arXiv preprint arXiv:1905.12516.
- Gowen, K., Deschaine, M., Gruttadara, D., & Markey, D. (2012). Young adults with mental health conditions and social networking websites: seeking tools to build community. Psychiatric Rehabilitation Journal, 35(3), 245. DOI:https://doi.org/10.2975/35.3.2012.245.250
- Ghanghor, N., Ponnusamy, R., Kumaresan, P. K., Priyadharshini, R., Thavareesan, S., & Chakravarthi, B. R. (2021). IIITK@ LT-EDI-EACL2021: Hope speech detection for equality, diversity, and inclusion in Tamil, Malayalam and English. In Proceedings of the First Workshop on Language Technology for Equality, Diversity and Inclusion (pp. 197-203). Association for Computational Linguistics.
- Irfan, A., Azeem, D., Narejo, S., & Kumar, N. (2024). Multi-Modal Hate Speech Recognition Through Machine Learning. In 2024 IEEE 1st Karachi Section Humanitarian Technology Conference (KHI-HTC) (pp. 1-6). IEEE. DOI:https://doi.org/10.1109/KHI-HTC60760.2024.10482031
- Kogilavani, S. V., Malliga, S., Jaiabinaya, K. R., Malini, M., & Kokila, M. M. (2023). Characterization and mechanical properties of offensive language taxonomy and detection techniques. Materials Today: Proceedings, 81, 630-633. DOI:https://doi.org/10.1016/j.matpr.2021.04.102
- Kumar, A. Saumya, S., & Roy, P. (2022). SOA_NLP@ LT-EDI-ACL2022: An ensemble model for hope speech detection from YouTube comments. In Proceedings of the second workshop on language technology for equality, diversity and inclusion (pp. 223-228). Association for Computational Linguistics. DOI:https://doi.org/10.18653/v1/2022.ltedi-1.31
- Lee, Y., Yoon, S., & Jung, K. (2018).Comparative studies of detecting abusive language on twitter. arXiv preprint arXiv:1808.10245.
- Louati, A., Louati, H., Albanyan, A., Lahyani, R., Kariri, E., & Alabduljabbar, A. (2024). Harnessing machine learning to unveil emotional responses to hateful content on social media. Computers, 13(5), 114. DOI:https://doi.org/10.3390/computers13050114
- Malik, M. S. I., Nazarova, A., Jamjoom, M. M., & Ignatov, D. I. (2023). Multilingual hope speech detection: A Robust framework using transfer learning of fine-tuning RoBERTa model. Journal of King Saud University-Computer and Information Sciences, 35(8), 101736. DOI:https://doi.org/10.1016/j.jksuci.2023.101736
- Mnassri, Kh., Farahbakhsh, R., Chalehchaleh, R., Rajapaksha, P., Jafari, A.R., Li, G., & Crespi, N. (2024). A survey on multi-lingual offensive language detection. PeerJ.Computer Science, 10, e1934-e1934. DOI:https://doi.org/10.7717/peerj-cs.1934
- Nagar, S., Barbhuiya, F. A., & Dey, K. (2023). Towards more robust hate speech detection: Using social context and user data.
- Social Network Analysis and Mining, 13(1), 47. DOI:https://doi.org/10.1007/s13278-023-01051-6
- Nath, T., Singh, V. K., & Gupta, V. (2023). BongHope: An annotated corpus for Bengali hope speech detection. Research Square. DOI:https://doi.org/10.21203/rs.3.rs-2819284/v1
- Palakodety, S., KhudaBukhsh, A. R., & Carbonell, J. G. (2020). Hope speech detection: A computational analysis of the voice of peace. In ECAI 2020 (pp. 1881-1889). IOS Press.
- RamakrishnaIyer LekshmiAmmal, H., Ravikiran, M., Nisha, G., Balamuralidhar, N., Madhusoodanan, A., Kumar Madasamy, A., & Chakravarthi, B. R. (2023). Overlapping word removal is all you need: Revisiting data imbalance in hope speech detection. Journal of Experimental & Theoretical Artificial Intelligence, 36(8), 1837-1859. DOI:https://doi.org/10.1080/0952813X.2023.2166130
- Roy, P., Bhawal, S., Kumar, A., & Chakravarthi, B. R. (2022, May). IIITSurat@ LT-EDI-ACL2022: Hope speech detection using machine learning. In Proceedings of the Second Workshop on Language Technology for Equality, Diversity and Inclusion (pp. 120-126). Association for Computational Linguistics.https://aclanthology.org/2022.ltedi-1.13.
- Schmidt, A., & Wiegand, M. (2017). A survey on hate speech detection using natural language processing. In Proceedings of the Fifth International Workshop on Natural Language Processing for Social Media (pp. 1-10). Association for Computational Linguistics. DOI:https://doi.org/10.18653/v1/W17-1101
- Snyder, C. R., Rand, K. L., & Sigmon, D. R. (2002). Hope Theory: A Member of the Positive Psychology Family. In C. R. Snyder, & S. J. Lopez (Eds.), Handbook of Positive Psychology (pp. 257-276). Oxford University Press.
- Subramanian, M., Sathiskumar, V. E., Deepalakshmi, G., Cho, J., & Manikandan, G. (2023). A survey on hate speech detection and sentiment analysis using machine learning and deep learning models. Alexandria Engineering Journal, 80, 110-121. DOI:https://doi.org/10.1016/j.aej.2023.08.038
- Wang, Z., & Jurgens, D. (2018). It's going to be okay: Measuring access to support in online communities. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (pp. 33-45). Association for Computational Linguistics. DOI:https://doi.org/10.18653/v1/D18-1004
- Yates, A., Cohan, A., & Goharian, N. (2017). Depression and self-harm risk assessment in online forums. arXiv preprint arXiv:1709.01848.
- Yenala, H., Jhanwar, A., Chinnakotla, M. K., & Goyal, J. (2018). Deep learning for detecting inappropriate content in text. International Journal of Data Science and Analytics, 6, 273-286. DOI:https://doi.org/10.1007/s41060-017-0088-4
- Zampieri, M., Malmasi, S., Nakov, P., Rosenthal, S., Farra, N., & Kumar, R. (2019). Predicting the type and target of offensive posts in social media. arXiv preprint arXiv:1902.09666.
Қосымша файлдар


