Vienna, Austria

ESTRO 2023

Session Item

Poster (Digital)
Comparison of machine learning methods to predict urinary incontinence in localized prostate cancer
Hajar Hasannejadasl, The Netherlands


Comparison of machine learning methods to predict urinary incontinence in localized prostate cancer

Hajar Hasannejadasl1, Henk van de Poel2, Ben Vanneste3, Joep van Roermund4, Katja Aben5,6, Zhen Zheng1, Biche Osong1, Lambertus Kiemeney5, Inge Van Oort7, Renee Verwey8, Laura Hochstenbach8, Esther Bloemen9, Andre Dekker1, Rianne Fijten1

1Department of Radiation Oncology (Maastro), Maastricht University Medical Centre+, Maastricht, The Netherlands; 2Amsterdam University Medical Centers, Department of Urology, Amsterdam, The Netherlands; 3Department of Radiation Oncology (Maastro), Maastricht University Medical Centre+, Maastricht, The Netherlands; 4Maastricht University Medical Center+, Department of Urology , Maastricht, The Netherlands; 5Netherlands Comprehensive Cancer Organization, Department of Research & Development, Utrecht, The Netherlands; 6Radboud university medical centre, Radboud Institute for Health Sciences, Nijmegen, The Netherlands; 7Department of Urology, Radboud university Medical Center, Nijmegen, The Netherlands; 8Zuyd University , Departmenet of Applied Sciences, Heerlen, The Netherlands; 9Zuyd University , Department of Applied Sciences, Heerlen, The Netherlands

Show Affiliations
Purpose or Objective

Urinary incontinence (UI) is one of the most common side effects of prostate cancer treatment, but it is currently difficult to predict in clinical practice without artificial intelligence models. Finding a balance between explainability and predictability of a clinical predictive model is a prerequisite for its adoption, but some black box models are considered to perform better in terms of predictability despite not being as explainable. To determine which algorithm has the highest accuracy and is also easily explainable, we used three machine learning (ML) algorithms: logistic regression (LR), random forests (RF), and support vector machines (SVM). To identify the best algorithm to predict UI following localized prostate cancer treatment, we compared the performance of the generated models.

Material and Methods

For our analyses, we used the ProZIB dataset for this study, which included demographics, clinical data, and patient-reported outcomes (PROMs) from 69 Dutch hospitals collected by the Netherlands Comprehensive Cancer Organization. This dataset contained information of 964 men with localized prostate cancer for the purpose of training and external validation. In order to perform an external validation in accordance with the TRIPOD Type 3 guidelines, data were split by location so that one hospital's data could be used either for training or validation. Six models were generated for 2 time points; 3 models for UI 1 year after treatment and 3 for UI 2 years after treatment.


Analyses were conducted on 847 and 670 localized prostate cancers for 1- and 2-year models, respectively. The performance of LR in external validation was superior to other models with an accuracy of 0.76, a sensitivity of 0.82, and an AUC of 0.79 for the 1-year outcome. Training and validation sets of all 2-year models, however, showed markedly different performances. The 2-year models’ accuracy varies from 0.60 (for LR and SVM) to 0.65 (for RF), and both sensitivity and specificity were considerably different for all models. Figure 1 shows the performance results of generated models. The importance of features in each ML model for predicting UI is shown in Figure 2. The importance of features varied among different ML models where 4 variables were selected by all models for 1-year and two for the 2-year outcome. Substantial overlap was observed between variables selected by RF and SVM algorithms.


Figure 1. Performance results of 1-year and 2-year models

Figure 2. A comparison of selected variables in different models


The 2-year models failed to achieve satisfactory results, indicating that the models are not reproducible regardless of the algorithm used. For the 1-year outcome, the model based on LR, known as an explainable algorithm outperformed RF and SVM in external validation. Our findings demonstrate that a non-black box prediction model can still offer high performance to both patients and care providers.