Method for measuring voice source parameters for linear predictive speech coding systems

Cover Page

Cite item

Full Text

Open Access Open Access
Restricted Access Access granted
Restricted Access Subscription Access

Abstract

In the context of the current direction of research in the fi eld of acoustic measurements – non-invasive analysis of the voice source – the problem of measuring excitation parameters for a vocoder with linear prediction is considered. The acute problem of high computational complexity of known methods of its solution based on the technique of “analysis by synthesis” is indicated. In order to overcome this problem, a high-speed acoustic measurement method has been developed based on the criterion of the minimum average sample value of the linear prediction error. It is shown that this criterion implements the principle of minimizing the energy consumption of the announcer for the speech production. An example of technical implementing the developed method is considered, and estimates of its computational complexity are given. It is shown that, compared to the well-known method of multi-pulse excitation of a linear prediction vocoder using two address books: adaptive and stochastic, the costs of implementation of the proposed method are reduced by several orders of magnitude. To confi rm this conclusion, a natural experiment was conducted using the author's software on a set of vowel phonemes from a control speaker. It is shown that by optimizing the excitation signal shape, the mean sample value of the linear prediction error is signifi cantly reduced. The obtained results can be useful in developing new and upgrading existing systems and technologies for speech coding and synthesis, mobile speech communication and other applications of digital speech signal processing with data compression based on the linear prediction model.

About the authors

V. V. Savchenko

National Research University Higher School of Economics

Email: vvsavchenko@yandex.ru
ORCID iD: 0000-0003-3045-3337

L. V. Savchenko

National Research University Higher School of Economics

Email: vvsavchenko@yandex.ru
ORCID iD: 0000-0002-2776-5471

References

  1. Ternström S. Special Issue on current trends and future directions in voice acoustics measurement. Applied Sciences, 13(6), 3514 (2023). https://doi.org/10.3390/app13063514
  2. Englert M., Latoszek B. B., Behlau M. Exploring the validity of acoustic measurements and other voice assessments. Journal of Voice, 38(3), 567–571 (2024). https://doi.org/10.1016/j.jvoice.2021.12.014
  3. Савченко В. В. Мера различий речевых сигналов по тембру голоса. Измерительная техника (10), 63–69 (2023). https://doi.org/10.32446/0368-1025it.2023-10-63-69 ; https://www.elibrary.ru/hqycvs
  4. Rabiner L. R., Schafer R. W. Introduction to digital speech processing. Foundations and Trends® in Signal Processing, 1(1–2), 1–194 (2007). https://doi.org/10.1561/2000000001
  5. Gibson J. Mutual information, the linear prediction model, and CELP voice codecs. Information, 10(5), 179 (2019). https://doi.org/10.3390/info10050179
  6. Kadiri S. R., Alku P., Yegnanarayana B. Extraction and utilization of excitation information of speech: A review. Proceedings of the IEEE, 109(12), 1920–1941 (2021). https://doi.org/10.1109/JPROC.2021.3126493
  7. Савченко В. В., Савченко Л. В. Метод асинхронного анализа голосового источника речи на основе двухуровневой авторегрессионной модели речевого сигнала. Измерительная техника, 73(2), 55–62 (2024). https://doi.org/10.32446/0368-1025it.2024-2-55-62 ; https://www.elibrary.ru/ivulbm
  8. Winn M. B. Manipulation of voice onset time in speech stimuli: A tutorial and fl exible Praat script. Journal of the Acoustical Society of America, 147(2), 852 (2020). https://doi.org/10.1121/10.0000692
  9. Савченко В. В., Савченко Л. В. Метод корректировки коэффициентов линейного предсказания для систем цифровой обработки речи со сжатием данных на основе авторегрессионной модели голосового сигнала. Радиотехника и электроника, 69(4), 339–347 (2024). https://doi.org/10.31857/S0033849424040056
  10. Khodaei A., Shams P., Sharifi H., Mozaffari-Tazehkand B. Identifi cation and classifi cation of coronavirus genomic signals based on linear predictive coding and machine learning methods. Biomedical Signal Processing and Control, 80(1), 104192 (2023). https://doi.org/10.1016/j.bspc.2022.104192
  11. Mishra J., Sharma R. K. Vocal tract acoustic measurements for detection of pathological voice disorders. Journal of Circuits, Systems and Computers, 33(10), 2450173 (2024). https://doi.org/10.1142/S0218126624501731
  12. Tokuda I. The source-fi lter theory of speech. Oxford Research Encyclopedia of Linguistics, Oxford (2021). https://doi.org/10.1093/acrefore/9780199384655.013.894
  13. Zalazar I. A., Alzamendi G. A., Schlotthauer G. Symmetric and asymmetric Gaussian weighted linear prediction for voice inverse fi ltering. Speech Communication, 159, 103057 (2024). https://doi.org/10.1016/j.specom.2024.103057
  14. Савченко В. В., Савченко Л. В. Метод кодирования голосового источника речи со сжатием данных на основе модели линейного предсказания. Измерительная техника, 74(3), 67–78 (2025). https://doi.org/10.32446/0368-1025it.2025-3-67-78 ; https://www.elibrary.ru/deyysw
  15. Vary P., Hofmann R., Hellwig K., Sluyter R. J. A regular-pulse excited linear predictive codec. Speech Communication, 7(2), 209–215 (1988). https://doi.org/10.1016/0167-6393(88)90040-4
  16. Al-Heeti M. M., Hammad J. A. and Mustafa A. S. Voice encoding for wireless communication based on LPC, RPE, and CELP. International Congress on Human-Computer Interaction, Optimization and Robotic Applications (HORA), Ankara, Turkey, 2022, pp. 1–4. https://doi.org/10.1109/HORA55278.2022.9800026
  17. Савченко В. В., Савченко Л. В. Субоптимальный алгоритм измерения частоты основного тона с использованием дискретного фурье-преобразования речевого сигнала. Радиотехника и электроника, 68(7), 660–668 (2023). https://doi.org/10.31857/S0033849423060128
  18. O’Shaughnessy D. Review of analysis methods for speech applications. Speech Communication, 151, 64–75 (2023). https://doi.org/10.1016/j.specom.2023.05.008
  19. Togawa T., Otani T., Suzuki K., Taniguchi T. Development of speech technologies to support hearing through mobile terminal users. APSIPA. Transactions on Signal and Information Processing, 4(1), e14 (2015). https://doi.org/10.1017/ATSIP.2015.3
  20. Bousselmi S., Ouni K. A new time-frequency representation based on the tight framelet packet for telephone-band speech coding. Speech Communication, 152, 102954 (2023). https://doi.org/10.1016/j.specom.2023.102954
  21. Alabed S., Alabed S., Alsaraira A., Mostafa N. Implementing and developing secure lowcost long-range system using speech signal processing. Indonesian Journal of Electrical Engineering and Computer Science, 31(3), 1408–1419 (2023). https://doi.org/10.11591/ijeecs.v31.i3.pp1408-1419
  22. Anselam A. S., Pillai S. S., Sreeni K. G. Quality enhancement of low bit rate speech coder with nonlinear prediction. In: Communication Systems and Networks. Lecture Notes in Electrical Engineering, 656, Springer, Singapore (2020). https://doi.org/10.1007/978-981-15-3992-3_53
  23. Chen J. H., Thyssen J. Analysis-by-Synthesis Speech Coding. In: Benesty J., Sondhi M. M., Huang, Y. A. (eds.), Springer Handbook of Speech Processing. Springer, Berlin, Heidelberg (2008). https://doi.org/10.1007/978-3-540-49127-9_17
  24. Sankar M. S. A., Sathidevi P. S. A Wideband scalable bit rate mixed excitation linear prediction-enhanced speech coder by preserving speaker-specifi c features. Circuits Syst Signal Process, 42, 3437–3463 (2023). https://doi.org/10.1007/s00034-022-02277-z
  25. Maheswari K., Balamurugan A. Voice over Internet protocol codec performance in interactive streaming environment. I-Manager’s Journal on Communication Engineering and Systems, 13(1), 16 (2024). https://doi.org/10.26634/jcs.13.1.20435
  26. Савченко В. В., Савченко Л. В. Метод акустического анализа голосового источника речи в режиме реального времени. Измерительная техника, 74(4), 64–73 (2025). https://doi.org/10.32446/0368-1025it.2025-4-64-73 ; https://www.elibrary.ru/grqhlg
  27. Kazuya Y., Ishikawa S., Koba Y., Kijimoto Sh. and Sugiki Sh. Inverse analysis of vocal sound source using an analytical model of the vocal tract. Applied Acoustics, 150, 89–103 (2019). https://doi.org/10.1016/j.apacoust.2019.02.005.

Supplementary files

Supplementary Files
Action
1. JATS XML

Согласие на обработку персональных данных

 

Используя сайт https://journals.rcsi.science, я (далее – «Пользователь» или «Субъект персональных данных») даю согласие на обработку персональных данных на этом сайте (текст Согласия) и на обработку персональных данных с помощью сервиса «Яндекс.Метрика» (текст Согласия).