Identifying the characteristics of late HIV diagnosis using optimized machine learning algorithm
- Authors: Farhadian M.1, Moslehi S.1, Mirzaei M.2
-
Affiliations:
- Hamadan University of Medical Sciences
- Center for Disease Control and Prevention
- Issue: Vol 15, No 5 (2025)
- Pages: 906-914
- Section: ORIGINAL ARTICLES
- URL: https://ogarev-online.ru/2220-7619/article/view/380210
- DOI: https://doi.org/10.15789/2220-7619-ITC-17896
- ID: 380210
Cite item
Full Text
Abstract
Background. Early detection of HIV infection is essential for clinical diagnosis, preventing transmission, and ensuring the safety of blood products. Individuals diagnosed late with HIV may unknowingly transmit the virus, and once diagnosed, they may experience worse health outcomes. Therefore, this study aims to identify the characteristics associated with late diagnosis of HIV patients. Materials and methods. In this retrospective cohort study, the information of 236 patients with HIV infection in Hamadan, the West of Iran, was collected by recording the CD4 count during 2011 to 2022 years. Late HIV diagnosis was considered with a CD4 ≤ 350/mm3. Initially, Extreme Gradient Boosting (XGBoost) and Random Forest (RF) algorithms identified important variables. Subsequently, models such as Logistic Model Tree (LMT), Classification and Regression Tree (CART), Deep Neural Network (DNN), and Support Vector Machine (SVM) were developed using a 70/30 training/test dataset split for clinical and demographic variables. Finally, the optimal model was selected based on accuracy and F1-score using Python software version 3.10. Results. The age, logarithm of Viral Load (LVL), Wight Blood Cell (WBC), Red Blood Cell (RBC), Lymphocyte (Lym), Hematocrit (Hct), Platelet (PLT), Hemoglobin (Hb), and clinical stage variables had relative importance above 6%. Among the developed models for the importance variables, the CART with F1-score and Accuracy values of 0.887 and 0.801 and 0.897 and 0.822 for training data, respectively. The AUC value obtained for the CART was equal to 0.918. Conclusions. Late diagnosis of HIV infection is a substantial problem, particularly in developing an algorithm that can accurately and interpretably detect disease characteristics, such as the CART, which could be essential for identifying characteristics that influence late HIV diagnosis and clinical and therapeutic decisions.
Keywords
About the authors
M. Farhadian
Hamadan University of Medical Sciences
Email: maryam_farhadian80@yahoo.com
ORCID iD: 0000-0002-6054-9850
PhD, Associate Professor of Biostatistics Department, School of Public Health and Research Center for Health Sciences
Iran, Islamic Republic of, HamadanSamad Moslehi
Hamadan University of Medical Sciences
Author for correspondence.
Email: samadmoslehi999@gmail.com
ORCID iD: 0000-0003-1597-7327
PhD, Associate Professor of Biostatistics Department, School of Public Health, Modeling of Noncommunicable Diseases Research Center
Iran, Islamic Republic of, HamadanM. Mirzaei
Center for Disease Control and Prevention
Email: mirzaei3589@gmail.com
ORCID iD: 0000-0001-9428-059X
MSc, Disease Control Expert
Iran, Islamic Republic of, HamadanReferences
- Adler A., Mounier-Jack S., Coker R. Late diagnosis of HIV in Europe: definitional and public health challenges. AIDS Care, 2009, vol. 21, no. 3, pp. 284–293. doi: 10.1080/09540120802183537
- Bath R.E., Emmett L., Verlander N.Q., Reacher M. Risk factors for late HIV diagnosis in the East of England: evidence from national surveillance data and policy implications. Int. J. STD AIDS, 2019, vol. 30, no. 1, pp. 37–44. doi: 10.1177/0956462418793327
- Bendera A., Baryomuntebe D.M., Kevin N.U., Nanyingi M., Kinengyere P.B., Mujeeb S., Sulle E.J. Determinants of late HIV diagnosis and advanced HIV disease among people living with HIV in Tanzania. HIV AIDS-Res. Palliat. Care, 2024, vol. 26, no. 16, pp. 313–323. doi: 10.2147/HIV.S473291
- Bisaso K.R., Anguzu G.T., Karungi S.A., Kiragga A., Castelnuovo B. A survey of machine learning applications in HIV clinical research and care. Comput. Biol. Med., 2017, vol. 91, pp. 366–371. doi: 10.1016/j.compbiomed.2017.11.001
- Buetikofer S. Prevalence and risk factors of late presentation for HIV diagnosis and care in a tertiary referral center in Switzerland. Swiss Med. Wkly., 2014, vol. 144, pp. 1–8. doi: 10.4414/smw.2014.13913
- Camoni L., Raimondo M., Regine V., Salfa M.C., Suligoi B. Late presenters among persons with a new HIV diagnosis in Italy, 2010–2011. BMC Public Health, 2013, vol. 13, no. 1, pp. 1–6. doi: 10.1186/1471-2458-13-281
- Croxford S., Stengaard A.R., Brännström J., Combs L., Dedes N., Girardi E., Grabar S., Kirk O., Kuchukhidze G., Lazarus J.V., Noori T. Late diagnosis of HIV: an updated consensus definition. HIV Med., 2022, vol. 23, no. 11, pp. 1202–1208. doi: 10.1111/hiv.13425
- Gallo R.C. A reflection on HIV/AIDS research after 25 years. Retrovirology, 2006, vol. 3, no. 1, pp. 1–7. doi: 10.1186/1742-4690-3-72
- Gelaw Y.A., Senbete G.H., Adane A.A., Alene K.A. Determinants of late presentation to HIV/AIDS care in Southern Tigray Zone, Northern Ethiopia: an institution-based case-control study. AIDS Res. Ther., 2015, vol. 12, no. 1, pp. 1–8. doi: 10.1186/s12981-015-0074-4
- Gesesew H.A., Ward P., Woldemichael K., Mwanri L. Late presentation for HIV care in Southwest Ethiopia in 2003–2015: prevalence, trend, outcomes and risk factors. BMC Infect. Dis., 2018, vol. 18, pp. 1–11. doi: 10.1186/s12879-018-2987-7
- Holzinger A. Data mining with decision trees: theory and applications. Online Inf. Rev., 2015, vol. 39, no. 3, pp. 437–448
- Landwehr N., Hall M., Frank E. Logistic model trees. Mach. Learn., 2005, vol. 59, pp. 161–205. doi: 10.1007/s10994-005-0466-3
- Lee C.-Y., Lin Y.-P., Wang S.-F., Lu P.-L. Late CART initiation consistently driven by late HIV presentation: A multicenter retrospective cohort study in Taiwan from 2009 to 2019. Infect. Dis. Ther., 2022, vol. 11, no. 3, pp. 1033–1056. doi: 10.1007/s40121-022-00604-y
- Likatavicius G., Van de Laar M. HIV and AIDS in the European Union, 2011. Euro Surveill., 2012, vol. 17, no. 48, pp. 1–17.
- Madakkatel I., Zhou A., McDonnell M.D., Hyppönen E. Combining machine learning and conventional statistical approaches for risk factor discovery in a large cohort study. Sci. Rep., 2021, vol. 11, no. 1, pp. 22997:1-13. doi: 10.1038/s41598-021-02362-3
- Mi J.X., Li A.D., Zhou L.F. Review study of interpretation methods for future interpretable machine learning. IEEE Access, 2020, vol. 8, pp. 191969–191985. doi: 10.1109/ACCESS.2020.3032494.
- Mohammadi Y., Mirzaei M., Shirmohammadi-Khorram N., Farhadian M. Identifying risk factors for late HIV diagnosis and survival analysis of people living with HIV/AIDS in Iran (1987–2016). BMC Infect. Dis., 2021, vol. 21, no. 1, pp. 1–9. doi: 10.1186/s12879-021-06034-5
- Morales-Sánchez R., Montalvo S., Riaño A., Martínez R., Velasco M. Early diagnosis of HIV cases by means of text mining and machine learning models on clinical notes. Comput. Biol. Med., 2024, vol. 179, pp. 108830:1-9. doi: 10.1016/j.compbiomed.2024.108830
- Moslehi S., Rabiei N., Soltanian A.R., Mamani M. Application of machine learning models based on decision trees in classifying the factors affecting mortality of COVID-19 patients in Hamadan, Iran. BMC Med. Inform. Decis. Mak., 2022, vol. 22, no. 1, pp. 192:1-10. doi: 10.1186/s12911-022-01945-5
- Najafi-Vosough R., Faradmal J., Hosseini S.K., Moghimbeigi A., Mahjub H. Predicting hospital readmission in heart failure patients in Iran: a comparison of various machine learning methods. Healthc. Inform. Res., 2021, vol. 27, no. 4, pp. 307–314. doi: 10.4258/hir.2021.27.4.307
- Najafi-Vosough R., Faradmal J., Tapak L., Alafchi B., Najafi-Ghobadi K., Mohammadi T. Prediction the survival of patients with breast cancer using random survival forests for competing risks. J. Prev. Med. Hyg., 2022, vol. 63, no. 2, pp. 298–303. doi: 10.15167/2421-4248/jpmh2022.63.2.2089
- Nyika H., Mugurungi O., Shambira G., Gombe N.T., Bangure D., Mungati M., Tshimanga M. Factors associated with late presentation for HIV/AIDS care in Harare City, Zimbabwe, 2015. BMC Public Health., 2016, vol. 16, no. 369, pp. 1–7. doi: 10.1186/s12889-016-3070-8
- Osman A.I.A., Ahmed A.N., Chow M.F., Huang Y.F., El-Shafie A. Extreme gradient boosting (Xgboost) model to predict the groundwater levels in Selangor Malaysia. Ain Shams Eng. J., 2021, vol. 12, no. 2, pp. 1545–1556. doi: 10.1016/j.asej.2020.11.011
- Reyad M., Sarhan A.M., Arafa M. A modified Adam algorithm for deep neural network optimization. Neural Comput. Appl., 2023, vol. 35, no. 23, pp. 17095–17112. doi: 10.1007/s00521-023-08568-z
- Romero-Rodríguez D.P., Ramírez C., Imaz-Rosshandler I., Ormsby C.E., Peralta-Prado A., Olvera-García G., Cervantes F., Würsch-Molina D., Romero-Rodríguez J., Jiang W., Reyes-Terán G. Machine learning-selected variables associated with CD4 T cell recovery under antiretroviral therapy in very advanced HIV infection. Transl. Med. Commun., 2020, vol. 5, pp. 1–10. doi: 10.1186/s41231-020-00058-x
- Rotily M., Bentz L., Pradier C., Obadia Y., Cavailler P. Factors related to delayed diagnosis of HIV infection in southeastern France. Int. J. STD AIDS, 2000, vol. 11, no. 8, pp. 531–535. doi: 10.1258/0956462001916193
- Roustaei N. Application and interpretation of linear-regression analysis. Med. Hypothesis Discov. Innov. Ophthalmol., 2024, vol. 13, no. 3, pp. 151–159. doi: 10.51329/mehdiopt2024.309546
- Valkenborg D., Rousseau A.J., Geubbelmans M., Burzykowski T. Support vector machines. Am. J. Orthod. Dentofacial Orthop., 2023, vol. 164, no. 5, pp. 754–757. doi: 10.1016/j.ajodo.2023.06.011
- Wang D., Larder B., Revell A., Montaner J., Harrigan R., De Wolf F., Lange J., Wegner S., Ruiz L., Pérez-Elías M.J., Emery S. A comparison of three computational modelling methods for the prediction of virological response to combination HIV therapy. Artif. Intell. Med., 2009, vol. 47, no. 1, pp. 63–74. doi: 10.1016/j.artmed.2009.06.001
- Weissman S., Yang X., Zhang J., Chen S., Olatosi B., Li X. Using a machine learning approach to explore predictors of health care visits as missed opportunities for HIV diagnosis. AIDS, 2021, vol. 35, no. 1, pp. S7-S18. doi: 10.1097/QAD.0000000000002724
- World Health Statistics 2023: monitoring health for the SDGs, sustainable development goals. World Health Organization, 2023.
- Xiang Y., Du J., Fujimoto K., Li F., Schneider J., Tao C. Application of artificial intelligence and machine learning for HIV prevention interventions. Lancet HIV., 2022, vol. 9, no. 1, pp. 54–62. doi: 10.1016/S2352-3018(21)00289-7
- Zhao J., Gao M., Zhao D., Tian W. Prevalence of late HIV diagnosis and its impact on mortality: a comprehensive systematic review and meta-analysis. HIV Med., 2025, vol. 26, no. 4. doi: 10.1111/hiv.13530
Supplementary files

