The relationship between the Fermi–Dirac distribution and statistical distributions in languages
- Authors: Maslov V.P.1
-
Affiliations:
- National Research University Higher School of Economics
- Issue: Vol 101, No 3-4 (2017)
- Pages: 645-659
- Section: Article
- URL: https://ogarev-online.ru/0001-4346/article/view/150022
- DOI: https://doi.org/10.1134/S0001434617030221
- ID: 150022
Cite item
Abstract
In this article, we study, from the mathematical point of view, the analogies between language and multi-particle systems in thermodynamics. We attempt to introduce an appropriate mathematical apparatus and the technical tools of statistical physics to descriptions of language. In particular, we apply the notions of number of degrees of freedom, Bose condensate, phase transition and others to linguistics objects. On the basis of a statistical analysis of dictionaries and statistical distributions in languages, we conjecture that the transition from the semiotic communication system of the higher primates to human language can be described as a phase transition of the first kind. We show that the number of words appearing with frequency 1 in a corpus of texts is equal to the number of ones in the corresponding Fermi–Dirac distribution, while the high frequency of stop-words corresponds to the large number of particles in the Bose condensate, when the number of degrees of freedom is less than two, provided there is a gap in the spectrum. The presented considerations are illustrated by examples from the Russian language. Some of the illustrative examples are untranslatable into English, and so they were replaced in translation by similar examples from the English language.
About the authors
V. P. Maslov
National Research University Higher School of Economics
Author for correspondence.
Email: v.p.maslov@mail.ru
Russian Federation, Moscow
Supplementary files
