Random Survival Forests Incorporated by the Nadaraya-Watson Regression

L. V Utkin; Уткин Л. В; A. V Konstantinov; Константинов А. В

doi:10.15622/ia.21.5.1

Случайный лес выживаемости и регрессия Надарая-Уотсона

Авторы: Уткин Л.В¹, Константинов А.В¹
Учреждения:
1. Санкт-Петербургский политехнический университет Петра Великого
Выпуск: Том 21, № 5 (2022)
Страницы: 851-880
Раздел: Искусственный интеллект, инженерия данных и знаний
URL: https://ogarev-online.ru/2713-3192/article/view/267176
DOI: https://doi.org/10.15622/ia.21.5.1
ID: 267176

Цитировать

Полный текст

Аннотация
Об авторах
Список литературы
Дополнительные файлы
Статистика

Аннотация

В статье представлен случайный лес выживаемости на основе модели внимания (Att-RSF). Первая идея, лежащая в основе леса, состоит в том, чтобы адаптировать ядерную регрессию Надарая-Уотсона к случайному лесу выживаемости таким образом, чтобы веса регрессии или ядра можно было рассматривать как обучаемые веса внимания при важном условии, что предсказания случайного леса выживаемости представлены в виде функций времени, например, функции выживания или кумулятивной функции риска. Каждый обучаемый вес, присвоенный дереву и примеру из обучающей или тестовой выборки, определяется двумя факторами: способностью соответствующего дерева предсказывать и особенностью примера, попадающего в лист дерева. Вторая идея Att-RSF состоит в том, чтобы применить модель загрязнения Хьюбера для представления весов внимания как линейной функции обучаемых параметров внимания. C-индекс Харрелла (индекс конкордации) как показатель качества предсказания случайного леса выживаемости используется при формировании функции потерь для обучения весов внимания. Использование C-индекса вместе с моделью загрязнения приводит к стандартной задаче квадратичной оптимизации для вычисления весов, которая имеет целый ряд простых алгоритмов решения. Численные эксперименты с реальными наборами данных, содержащими данные о выживаемости, иллюстрируют предлагаемую модель Att-RSF.

Ключевые слова

машинное обучение, случайный лес выживаемости, функция выживаемости, С-индекс, кумулятивная функция риска, модель внимания, модель засорения Хьюбера

Об авторах

Л. В Уткин

Санкт-Петербургский политехнический университет Петра Великого

Автор, ответственный за переписку.
Email: lev.utkin@gmail.com
улица Политехническая 29

А. В Константинов

Санкт-Петербургский политехнический университет Петра Великого

Email: andrue.konst@gmail.com
Политехническая улица 29

Список литературы

Hosmer D., Lemeshow S., May S. Applied Survival Analysis: Regression Modeling of Time to Event Data. New Jersey : John Wiley & Sons, 2008.
DeepSurv: Personalized treatment recommender system using a Cox proportional hazards deep neural network / Katzman J., Shaham U., Cloninger A., Bates J., Jiang T., and Kluger Y. // BMC medical research methodology. 2018. Vol. 18, no. 24. P. 1-12.
A Deep Active Survival Analysis Approach for Precision Treatment Recommendations: Application of Prostate Cancer / Nezhad M., Sadati N., Yang K., and Zhu D. 2018. Apr. arXiv:1804.03280v1.
Wang P., Li Y., Reddy C. Machine Learning for Survival Analysis: A Survey // ACM Computing Surveys (CSUR). 2019. Vol. 51, no. 6. P. 1-36.
Zhao L., Feng D. DNNSurv: Deep Neural Networks for Survival Analysis Using Pseudo Values. 2020. Mar. arXiv:1908.02337v2.
Cox D. Regression models and life-tables // Journal of the Royal Statistical Society, Series B (Methodological). 1972. Vol. 34, no. 2. P. 187-220.
Tibshirani R. The lasso method for variable selection in the Cox model // Statistics in medicine. 1997. Vol. 16, no. 4. P. 385-395.
Survival SVM: a practical scalable algorithm. / Belle V. V., Pelckmans K., Suykens J., and Huffel S. V. // ESANN. 2008. P. 89-94.
Bou-Hamad I., Larocque D., Ben-Ameur H. A review of survival trees // Statistics Surveys. 2011. Vol. 5. P. 44-71.
Ishwaran H., Kogalur U. Random Survival Forests for R // R News. 2007. Vol. 7, no. 2. P. 25-31.
Breiman L. Random forests // Machine learning. 2001. Vol. 45, no. 1. P. 5-32.
Hu C., Steingrimsson J. Personalized Risk Prediction in Clinical Oncology Research: Applications and Practical Issues Using Survival Trees and Random Forests // Journal of Biopharmaceutical Statistics. 2018. Vol. 28, no. 2. P. 333-349.
Relative Risk Forests for Exercise Heart Rate Recovery as a Predictor of Mortality / Ishwaran H., Blackstone E., Pothier C., and Lauer M. // Journal of the American Statistical Association. 2004. Vol. 99. P. 591-600.
Mogensen U., Ishwaran H., Gerds T. Evaluating Random Forests for Survival Analysis using Prediction Error Curves // Journal of Statistical Software. 2012. Vol. 50, no. 11. P. 1-23.
Random survival forests for dynamic predictions of a time-to-event outcome using a longitudinal biomarker / Pickett K., Suresh K., Campbell K., Davis S., and Juarez-Colunga E. // BMC Medical Research Methodology. – 2021. –– Vol. 21, no. 1. –– P. 1–14.
Schmid M., Wright M., Ziegler A. On the use of Harrell's C for clinical risk prediction via random survival forests // Expert Systems with Applications. 2016. Vol. 63. P. 450-459.
Wright M., Dankowski T., Ziegler A. Unbiased split variable selection for random survival forests using maximally selected rank statistics // Statistics in Medicine. 2017. Vol. 36, no. 8. P. 1272-1284.
Zhou L., Wang H., Xu Q. Survival forest with partial least squares for high dimensional censored data // Chemometrics and Intelligent Laboratory Systems. 2018. Vol. 179. P. 12-21.
A weighted random survival forest / Utkin L., Konstantinov A., Chukanov V., Kots M., Ryabinin M., and Meldo A. // Knowledge-Based Systems. 2019. Vol. 177. P. 136-144.
Evaluating the yield of medical tests / Harrell F., Califf R., Pryor D., Lee K., and Rosati R. // Journal of the American Medical Association. 1982. Vol. 247. P. 2543-2546.
Utkin L., Konstantinov A. Attention-based Random Forest and Contamination Model. // Neural Networks. – 2022. – Vol. 154. – P. 346–359.
Huber P. Robust Statistics. New York : Wiley, 1981.
Witten D., Tibshirani R. Survival analysis with high-dimensional covariates // Statistical Methods in Medical Research. – 2010. – Vol. 19, no. 1. – P. 29–51.
Zhang H., Lu W. Adaptive Lasso for Cox's proportional hazards model // Biometrika. 2007. Vol. 94, no. 3. P. 691-703.
Support vector methods for survival analysis: a comparison between ranking and regression approaches / Belle V. V., Pelckmans K., Huffel S. V., and Suykens J. // Artificial intelligence in medicine. 2011. Vol. 53, no. 2. P. 107-118.
Zhu X., Yao J., Huang J. Deep convolutional neural network for survival analysis with pathological images // 2016 IEEE International Conference on Bioinformatics and Biomedicine. IEEE. 2016. P. 544-547.
Image-based Survival Analysis for Lung Cancer Patients using CNNs / Haarburger C., Weitz P., Rippel O., and Merhof D. – 2018. – Aug. – arXiv:1808.09679v1.
Decision tree for competing risks survival probability in breast cancer study / Ibrahim N., Kudus A., Daud I., and Bakar M. A. // International Journal of Biological and Medical Research. 2008. Vol. 3, no. 1. P. 25-29.
Wang H., Zhou L. Random survival forest with space extensions for censored data // Artificial intelligence in medicine. 2017. Vol. 79. P. 52-61.
An attentive survey of attention models / Chaudhari S., Mithal V., Polatkan G., and Ramanath R. 2019. Apr. arXiv:1904.02874.
Correia A., Colombini E. Attention, please! A survey of neural attention models in deep learning. 2021. Mar. arXiv:2103.16775.
Correia A., Colombini E. Neural Attention Models in Deep Learning: Survey and Taxonomy. 2021. Dec. arXiv:2112.05909.
A Survey of Transformers / Lin T., Wang Y., Liu X., and Qiu X. 2021. Jul. arXiv:2106.04554.
Random Features for Kernel Approximation: A Survey on Algorithms, Theory, and Beyond / Liu F., Huang X., Chen Y., and Suykens J. 2021. Jul. arXiv:2004.11154v5.
Niu Z., Zhong G., Yu H. A review on the attention mechanism of deep learning // Neurocomputing. 2021. Vol. 452. P. 48-62.
Ronao C., Cho S.-B. Random Forests with Weighted Voting for Anomalous Query Access Detection in Relational Databases // Artificial Intelligence and Soft Computing. ICAISC 2015. Cham : Springer. 2015. Vol. 9120 of Lecture Notes in Computer Science. P. 36-48.
Xuan S., Liu G., Li Z. Refined Weighted Random Forest and Its Application to Credit Card Fraud Detection // Computational Data and Social Networks. Cham : Springer International Publishing. 2018. P. 343-355.
Zhang X., Wang M. Weighted Random Forest Algorithm Based on Bayesian Algorithm // Journal of Physics: Conference Series. IOP Publishing. 2021. Vol. 1924. P. 1-6.
Weighted vote for trees aggregation in Random Forest / Daho M., Settouti N., Lazouni M., and Chikh M. // 2014 International Conference on Multimedia Computing and Systems (ICMCS). IEEE. 2014. April. P. 438-443.
Utkin L., Kovalev M., Meldo A. A deep forest classifier with weights of class probability distribution subsets // Knowledge-Based Systems. 2019. Vol. 173. P. 15-27.
Utkin L., Kovalev M., Coolen F. Imprecise weighted extensions of random forests for classification and regression // Applied Soft Computing. 2020. Vol. 92, no. Article 106324. P. 1-14.
Development and validation of a prognostic model for survival time data: application to prognosis of HIV positive patients treated with antiretroviral therapy / May M., Royston P., Egger M., Justice A., and Sterne J. // Statistics in Medicine. 2004. Vol. 23. P. 2375-2398.
Random Survival Forests / Ishwaran H., Kogalur U., Blackstone E., and Lauer M. // Annals of Applied Statistics. 2008. Vol. 2. P. 841-860.
Nadaraya E. On estimating regression // Theory of Probability & Its Applications. 1964. Vol. 9, no. 1. P. 141-142.
Watson G. Smooth regression analysis // Sankhya: The Indian Journal of Statistics, Series A. 1964. P. 359-372.
Bahdanau D., Cho K., Bengio Y. Neural machine translation by jointly learning to align and translate. 2014. Sep. arXiv:1409.0473.
Luong T., Pham H., Manning C. Effective approaches to attention-based neural machine translation // Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. The Association for Computational Linguistics. 2015. P. 1412-1421.
Attention is all you need / Vaswani A., Shazeer N., Parmar N., Uszkoreit J., Jones L., Gomez A., Kaiser L., and Polosukhin I. // Advances in Neural Information Processing Systems. 2017. P. 5998-6008.
Rethinking Attention with Performers / Choromanski K., Likhosherstov V., Dohan D., Song X., Gane A., Sarlos T., Hawkins P., Davis J., Mohiuddin A., Kaiser L., Belanger D., Colwell L., and Weller A. // 2021 International Conference on Learning Representations. 2021.
Schlag I., Irie K., Schmidhuber J. Linear transformers are secretly fast weight programmers // International Conference on Machine Learning 2021. PMLR. 2021. P. 9355-9366.
Support vector machines for survival analysis / Belle V. V., Pelckmans K., Suykens J., and Huffel S. V. // Proceedings of the Third International Conference on Computational Intelligence in Medicine and Healthcare (CIMED2007). 2007. P. 1-8.
Fleming T., Harrington D. Counting processes and survival aalysis. Hoboken, NJ, USA : John Wiley & Sons, 1991.
Sauerbrei W., Royston P. Building multivariable prognostic and diagnostic models: transformation of the predictors by using fractional polynomials // Journal of the Royal Statistics Society Series A. 1999. Vol. 162, no. 1. P. 71-94.
Randomized comparison of interferon-alpha with busulfan and hydroxyurea in chronic myelogenous leukemia. The German CML study group / Hehlmann R., Heimpel H., Hasford J., Kolb H., Pralle H., Hossfeld D., Queisser W., Loeffler H., Hochhaus A., and Heinze B. // Blood. 1994. Vol. 84, no. 12. P. 4064-4077.
Pagano M., Gauvreau K. Principles of biostatistics. Pacific Grove, CA : Duxbury, 2000.
Abrahamowicz M., MacKenzie T., Esdaile J. Time-dependent hazard ratio: modelling and hypothesis testing with application in lupus nephritis // JASA. 1996. Vol. 91. P. 1432-1439.
Kalbfleisch J., Prentice R. The Statistical Analysis of Failure Time Data.New York : John Wiley and Sons, 1980.
Street W., Mangasarian O., Wolberg W. An inductive learning approach to prognostic prediction // Proceedings of the Twelfth International Conference on Machine Learning. San Francisco : Morgan Kaufmann. 1995. P. 522-530.
Stablein D., Carter J., Novak J. Analysis of Survival Data with Nonproportional Hazard Functions // Controlled Clinical Trials. 1981. Vol. 2. P. 149-159.
Gene expression profiling predicts clinical outcome of breast cancer / Veer L. V., Dai H., Vijver M. V. D., He Y., Hart A., Mao M., Peterse H., Kooy K. V. D., Marton M., Witteveen A., and Schreiber G. // Nature. 2002. Vol. 12. P. 530-536.
Demsar J. Statistical comparisons of classifiers over multiple data sets // Journal of Machine Learning Research. 2006. Vol. 7. P. 1-30.

Дополнительные файлы

Доп. файлы

Действие

1. JATS XML

Скачать

Имя пользователя
Пароль
Запомнить меня

Забыли пароль?	Регистрация

Имя пользователя
Пароль
Запомнить меня

Забыли пароль?	Регистрация

Том 24, № 5 (2025)