Session Item

Physics track: Radiation protection, secondary tumour induction and low dose
9320
Poster
Physics
08:45 - 08:53
Machine learning methods to predict rectal bleeding after prostate cancer radiotherapy
PH-0283

Abstract

Machine learning methods to predict rectal bleeding after prostate cancer radiotherapy
Authors: IBRAHIM|, Md(1)*[md.ibrahim210@gmail.com];Mylona|, Eugenia(2);Boussion|, Nicolas(1);Acosta|, Oscar(2);De Crevoisier|, Renaud(2);Hatt|, Mathieu(1);
(1)University of Western Brittany, LaTIM- INSERM- UMR 1101, Brest, France;(2)University of Rennes, LtSI-INSERM- UMR 1099, Rennes, France;
Show Affiliations
Purpose or Objective

The goal of this work was to predict rectal bleeding (RB) following prostate cancer (PC) radiotherapy (RT) exploiting dose volume histograms (DVH) and clinical variables in a multicentric setting using 4 ML machine learning algorithms and 3 deep learning (ML, DL) techniques as well majority voting. A specific issue associated with multicentric data was the covariate shift issue, i.e., variables from each center could have strongly different distributions, which hampers the ability to efficiently train and validate multiparametric models using ML. An additional challenge was the high imbalance in the data (i.e., few events).

Material and Methods

The records of 591 patients with more than 3 years follow up (including DVH, clinical data and rectal bleeding events) who underwent RT for localized PC were collected prospectively. The target volume was defined as the prostate and seminal vesicles. The mean dose delivered to the prostate was 79.3 Gy (range: 76–80) at 2 Gy per fraction, with 46 Gy delivered to the seminal vesicles. The cohort was split into a training set from 2 centers (n=337, 27 events) and a validation set (3rd center, n=254, 22 events). The classification task was prediction of RB at 3 years after RT. 

An ML framework was developed consisting of 3 modules to efficiently process multicentric data: 1. covariate shift and imbalance adaptation module relying on SMOTE(EN) [Synthetic Minority Over-sampling Technique -Edited Nearest Neighbours], density estimation ratio and normalization, 2. classification module (implementing 4 ML algorithms: Random Forest, Xtreme Gradient Boosting, LightGBM, CatBoost and 3 DL classifiers: Deep Neural Network, Deep Autoencoder+RF, Deep Variational Autoencoder+RF, as well as majority voting) and 3. Pseudo-labeling module (figure 1). The prediction capability of the proposed method was compared to the prediction capability using “standard” logistic regression using area under the ROC curve (AUC).

Results

In the testing set, the best AUC among all the tested methods of 0.68 (sensitivity 0.77, specificity 0.60) was obtained with random forest relying on the proposed modules, combining DVH and clinical variables and successfully predicting RB at 3 years in 17 patients out of 22 with RB (with however 95 false positives). The AUC of the other classifiers ranged from 0.54 to 0.66 (table 1 and table 2). Majority voting did not improve the results. Without the proposed modules(Table 1 and Table 2), all classifiers reached lower AUC (0.42-0.60). Standard Logistic regression performed poorly (AUC 0.52, sensitivity 0.54, specificity 0.53).

Conclusion

ML/DL techniques including covariate shift as well as imbalance adaptation can achieve higher predictive ability of RB after RT in PC in a multicentric context, compared to standard modeling approaches (logistic regression).