The Language of Russian Fake Stories: A Corpus-Based Study of the Topical Change in the Viral Disinformation
- 作者: Monogarova A.1, Shiryaeva T.1, Arupova N.2
-
隶属关系:
- Pyatigorsk State University
- Moscow State Institute of International Relations (MGIMO University)
- 期: 卷 7, 编号 4 (2021)
- 页面: 83-106
- 栏目: Research Papers
- URL: https://ogarev-online.ru/2411-7390/article/view/356561
- DOI: https://doi.org/10.17323/jle.2021.13371
- ID: 356561
如何引用文章
全文:
详细
The spread of disinformation during the COVID-19 pandemic is largely associated with social media and online messengers. Viral disinformation disseminated in 2020–2021 was related to a wide range of topics that caused panic among people. Many false narratives emerged and attracted public interest over time, which mainly reflected the general public’s utmost belief in these topics. Text mining can be used to analyze the frequencies of keywords and topic-related vocabulary in order to track the changing focus of the public concerning online disinformation. In this paper, we present the results of a corpus-based study of Russian viral fake stories circulating during the first year of the COVID-19 pandemic. We propose a method for analyzing the central topics and dynamics of topical change in the context of the Russian COVID-19-fake story. In order to accomplish this objective, we make use of a set of tools to extract keywords, count their frequencies and analyze corresponding contexts. We apply these tools to the compiled specialized diachronic corpus of Russian viral false COVID-19-related stories. The obtained data is evaluated to determine the dynamic of topical shifts by tracking the changes in keyword frequencies as well as the use of other high-frequency corpus words. The findings of the work concerning topical fluctuations in the Russian viral COVID-19 disinformation agenda as well as given explanations for the identified drifts in public interest in the topics during the first year of the pandemic can contribute to developing effective strategies for combating the spread of fakes in the future.
作者简介
Alina Monogarova
Pyatigorsk State University
Email: alinach12@yandex.ru
ORCID iD: 0000-0003-4098-0341
Tatyana Shiryaeva
Pyatigorsk State University
Email: shiryaeva@list.ru
ORCID iD: 0000-0002-2604-1703
Nadezhda Arupova
Moscow State Institute of International Relations (MGIMO University)
Email: arupova.n.r@inno.mgimo.ru
ORCID iD: 0000-0002-7094-0626
参考
- Anspach, N., M., Jennings, J., T., & Arceneaux, K. (2019). A little bit of knowledge: Facebook’s news feed and self-perceptions of knowledge. Research and Politics, 6(1), 1–9.https://doi.org/10.1177/2053168018816189
- Baron, A., Rayson, P., & Archer, D. (2009). Word frequency and key word statistics in historical corpus linguistics. Anglistik: International Journal of English Studies, 20(1), 41–67.
- Biber, D., Conrad, S., & Reppen, R. (1998). Corpus linguistics. Investigating language structure and use. Cambridge University Press.https://doi.org/10.1017/CBO9780511804489
- Biber, D., & Jones, J. K. (2009). Quantitative methods in corpus linguistics. In A. Lüdeling & M. Kytö (Eds.), Corpus linguistics: An international handbook (vol. 2, p. 1286-1304). De Gruyter Mouton.https://doi.org/10.1515/9783110213881.2.1286
- Brezina, V. (2018). Statistics in Corpus linguistics: A practical guide. Cambridge University Press.https://doi.org/10.1017/9781316410899.008
- Brezina, V., McEnery, T., & Wattam, S. (2015). Collocations in context: A new perspective on collocation networks. International Journal of Corpus Linguistic, 20(2), 39–73.https://doi.org/10.1075/ijcl.20.2.01bre
- Budge, I, & Pennings, P. (2007) Do they work? Validating computerised word frequency estimates against policy series. Electoral Studies, 26(1), 121–129.https://doi.org/10.1016/j.electstud.2006.04.002
- Curzan, A. (2009) Historical corpus linguistics and evidence of language change. In A. Lüdeling & M. Kytö (Eds.), Corpus linguistics: An international handbook (vol. 2, pp. 1091–1109). De Gruyter Mouton.https://doi.org/10.1515/9783110213881.2.1091
- Faust, O. (2018) Documenting and predicting topic changes in Computers in Biology and Medicine: A bibliometric keyword analysis from 1990 to 2017. Informatics in Medicine Unlocked, 11, 15-27.https://doi.org/10.1016/j.imu.2018.03.002
- Grimmer, J., & Stewart, B. M. (2013). Text as data: The promise and pitfalls of automatic content analysis methods for political texts. Political Analysis, 21(3), 267–297.https://doi.org/10.1093/pan/mps028
- Islam, M. S., Sarkar, T., Khan, S. H., Kamal, A. M., Hasan, S. M., Kabir, A., Yeasmin, D., Islam, M. A., Chowdhury, K. I. A., Anwar, K. S., Chughtai, A. A., & Seale, H. (2020). Covid-19–Related infodemic and its impact on public health: A global social media analysis. American Journal of Tropical Medicine and Hygiene, 103(4), 1621–1629.https://doi.org/10.4269/ajtmh.20-0812
- Ivanenko, A. A., & Zhuravlyova, K. A. (2020). CHto zaraznee: Ocenka upotrebitel’nosti leksem koronavirusnoj epohi [What’s more contagious: Assessing the prevalence of coronavirus-era lexemes]. In N.V. Kozlovskaya (Ed.), Novye slova i slovari novyh slov [New words and dictionaries of new words] (pp. 63-70). RAS.https://doi.org/10.30842/9785604483862
- Jurisica, I., &Wigle D. (2005). Knowledge discovery in proteomics. Chapman and Hall.https://doi.org/10.1201/9781420035162
- Karjus, A., Blythe, R., Kirby, S., Smith, K. (2020). Quantifying the dynamics of topical fluctuations in language. Language Dynamics and Change, 10(1), 86–125.https://doi.org/10.1163/22105832-01001200
- Kim, J. D., Ohta, T., & Tsujii, J. (2008). Corpus annotation for mining biomedical events from literature. BMC Bioinformatics, 9(1), 1–25.https://doi.org/10.1186/1471-2105-9-10
- Kim, S. N., Baldwin, T., & Kan, M. Y. (2010). Evaluating N-gram based evaluation metrics for automatic keyphrase extraction. In Proceedings of the 23rd International Conference on Computational Linguistics (pp. 572–580). Coling.
- Koplenig, A. (2017). The impact of lacking metadata for the measurement of cultural and linguistic change using the Google Ngram data sets - Reconstructing the composition of the German corpus in times of WWII. Digital Scholarship in the Humanities, 32(1), 169–188.https://doi.org/10.1093/llc/fqv037
- Leech, G. (2005). Adding linguistic annotation. Developing linguistic corpora: A guide to good practice (pp. 17–29). Oxbow Books.
- Leech, G., & Roger, F. (1992). Computer corpora - What do they tell us about culture? ICAME Journal, 16, 29–50.http://dx.doi.org/10.1111/j.1749-818X.2009.00149.x doi: 10.1111/j.1749-818X.2009.00149.x
- MacFarlane, D., Tay, L.Q., Hurlstone, M. J., & Eckera, U. K. H. (2021). Refuting spurious Covid-19 treatment claims reduces demand and misinformation sharing. Journal of Applied Research in Memory and Cognition, 10(2), 248–258.https://doi.org/10.1016/j.jarmac.2020.12.005
- Mariani, J., Francopoulo, G., Paroubek, P., & Vernier, F. (2019). The NLP4NLP corpus (II): 50 years of research in speech and language processing. Frontiers in Research Metrics and Analytics, 3, 1–30.https://doi.org/10.3389/frma.2018.00036
- McCarthy, M., & Carter, R. (2001). Size isn’t everything: Spoken English, corpus and the classroom. TESOL Quarterly, 35(2), 337–340.https://doi.org/10.2307/3587654
- Meurers, W. D. (2005). On the use of electronic corpora for theoretical linguistics. Case studies from the syntax of German. Lingua, 115(11), 1619–1639.https://doi.org/10.1016/J.Lingua.2004.07.007
- Murakami, A., Thompson, P., Hunston, S., & Vajn, D. (2017). What is this corpus about?’: Using topic modelling to explore a specialised corpus. Corpora, 12(2), 243-277.https://doi.org/10.3366/cor.2017.0118
- Nel, D., van Heerden, G., Chan, A., Ghazisaeedi, M., Halvorson,W., & Steyn, P. (2011). Eleven years of scholarly research in the Journal of Services Marketing. Journal of Services Marketing, 25(1), 4–13.https://doi.org/10.1108/08876041111107014
- Ngula, R. (2018) Corpus methods in language studies. In Perspectives on Conducting and Reporting Research in the Humanities (pp. 205–223). University of Cape Coast Press.
- Pesta, B., Fuerst, J., & Kirkegaard, E.O.W. (2018). Bibliometric keyword analysis across seventeen years (2000–2016) of intelligence articles. Journal of Intelligence, 6(4), 1–46.https://doi.org/10.3390/jintelligence6040046
- Sampsel, L. J. (2018). Voyant tools. Music Reference Services Quarterly, 21(3), 153-157.https://doi.org/10.1080/10588167.2018.1496754
- Sarica, S., Luo, J., & Wood, K. L. (2020). TechNet: Technology semantic network based on patent data. Expert Systems with Applications, 142, 112995.https://doi.org/10.1016/j.eswa.2019.112995
- Scott, M. (2001). Comparing corpora and identifying key words, collocations, and frequency distributions through the WordSmith tools suite of computer programs. Small corpus studies and ELT: Theory and practice (pp. 47–67). John Benjamins.https://doi.org/10.1075/scl.5.07sco
- Sinclair, J. (2005) Corpus and text - basic principles. Developing linguistic corpora: A guide to good practice (pp. 1–16). Oxbow Books.
- Sinclair, J. (1991). Corpus, concordance, collocation. Oxford University Press.
- Stefanowitsch, A. (2006). Negative evidence and the raw frequency fallacy. Corpus Linguistics and Linguistic Theory, 2(1), 61–77.https://doi.org/10.1515/CLLT.2006.003
- Stefanowitsch, A., & Gries, S. Th. (2009). Corpora and grammar. Corpus linguistics: An international handbook (vol. 2, pp. 933–952). De Gruyter Mouton.https://doi.org/10.1515/9783110213881.2.933
- Stubbs, M. (2001). Texts, corpora, and problems of interpretation: A response to Widdowson. Applied Linguistics, 22(2), 149–172.https://doi.org/10.1093/applin/22.2.149
- Webber, R., & Stroud, D. (2013) How changes in word frequencies reveal changes in the focus of the JDDDMP. Journal of Direct, Data Digit Marketing Practice, 14, 310–320.https://doi.org/10.1057/dddmp.2013.19
- Weismayer, C., & Pezenka, I. (2017) Identifying emerging research fields: A longitudinal latent semantic keyword analysis. Scientometrics, 113(3), 1757–1785.https://doi.org/10.1007/s11192-017-2555-z
- Wilbur, W. J., Rzhetsky, A., Shatkay, H. (2006). New directions in biomedical text annotation: Definitions, guidelines, and corpus construction. BMC Bioinformatics, 7(1), 1–10.https://doi.org/10.1186/1471-2105-7-356
- Williams, G. (1998). Collocational networks: Interlocking patterns of lexis in a corpus of plant biology research articles. International Journal of Corpus Linguistics, 3(1), 151–71.https://doi.org/10.1075/ijcl.3.1.07wil
补充文件


