Session Item

December 01
08:30 - 09:10
Interdisciplinary Stream 1
Spatially fractionated GRID radiotherapy - rationale and promise
Teaching Lecture
11:02 - 11:10
AutoConfidence: Per-patient validation for clinical confidence in deep learning for radiotherapy


AutoConfidence: Per-patient validation for clinical confidence in deep learning for radiotherapy
Authors: Nix|, Mike(1)*[];Bird|, David(1);Tyyger|, Marcus(1);Appelt|, Ane(1);Murray|, Louise(1);McCallum|, Hazel(2);Al-Qaisieh|, Bashar(1);Gooya|, Ali(3);
(1)St James Institute of Oncology, Radiotherapy Physics, Leeds, United Kingdom;(2)Newcastle University Teaching Hospitals, Medical Physics, Newcastle, United Kingdom;(3)University of Leeds, Computer Science, Leeds, United Kingdom;
Show Affiliations
Purpose or Objective

Deep-learning (DL) has proven potentially powerful for auto-contouring (AC), synthetic CT (sCT) generation from MRI. Commissioning and validation for DL methods are extremely challenging due to their ‘black-box’ nature and training-dependent robustness to variations in input data. Errors and uncertainties are difficult to quantify and vary inter- and intra-case.  We demonstrate a conditional generative-adversarial-network (cGAN), as the core of a robust AI strategy, dubbed AutoConfidence, which is capable of identifying outliers in unseen input data, and locally assessing the uncertainties and errors in DL predictions (contours or sCT). Crucially, error and uncertainty analysis can be decoupled from the generation network (fig. 1), allowing independent per-patient validation of DL-based sCT or auto-contours from any source, including CE/FDA approved systems.

Material and Methods

32 t2w-SPACE RT position pelvic MRs (2240 slices), from anorectal cancer patients, were deformably registered to planning CT and used to adversarially train a cGAN (fig. 1), consisting of a generative U-net and separate discriminative shallow U-net in an extension of the popular style-transfer network pix2pix.  The discriminative network was trained to predict Hounsfield unit error in the generator output relative to reference CT. Auto-contouring: Planning CT from 16 prostate RT patients (1520 slices), with expert clinical contours, were used to train the cGAN to generate and quality assess contours for 7 OARs. The discriminative network was trained to predict local misclassification relative to reference contours. In both cases single-class support vector machine (1-SVM) and auto-encoder outlier-detection scoring allowed detection of unseen input images lying beyond the confidence limits of the trained networks. Per-slice confidence scores were produced alongside local confidence maps, providing metrics for acceptance/rejection and confidence maps for human intervention/editing.


Outlier scoring identified 11% and 6 % of sCT and AC test slices, respectively, as lying outside the training data confidence bound, indicating a risk of DL prediction failure. Failures originated from variation in scan extent, MRI signal or artefacts. For inlier slices, local confidence maps (fig. 2) were well correlated to local HU differences (r=0.83) for sCT and with confusion entropy in multiclass segmentation (r=0.97). Cohort mean MAE for sCT was 65 (s.d. 9.3) HU and mean DD95PTV was 0.8% (s.d. 0.5). Outlier slices detected by 1-SVM exhibited significantly larger HU differences (p<0.001) and were often visually unacceptable.


AutoConfidence can identify data outliers and low-confidence prediction regions of DL predictions, independent of the production network, enabling automated per-patient validation of ''black box'' methods. Regions requiring human intervention can be highlighted for review, increasing clinical confidence and facilitating highly efficient automated workflows for (e.g.) online adaptive re-planning.