Jonas Willmann 1Joost J C Verhoeff 2Orit Kaidar-Person 3André Abrunhosa-Branquinho 4Enrico Clementel 5Coreen Corning 5Daniel Portik 6Luiza Souza 5Jaap C Reijneveld 7Frederic Dhermain 8Coen Hurkmans 9Angelo Filippo Monti 10Jordi Saez 11Warren P Mason 12Michael Weller 13Patrick Roth 13Nicolaus Andratschke 14

Radiother Oncol. 2025 Aug 10:111088.

DOI: 10.1016/j.radonc.2025.111088

Abstract

Purpose: The multicentre randomised phase III trial EORTC-1709-BTG/CCTG CE.8 (MIRAGE) (NCT03345095) analysed the addition of the proteasome inhibitor marizomib to temozolomide-based chemoradiotherapy with 60 Gy in 30 fractions in patients with newly diagnosed glioblastoma. Here, we analysed the benchmark case procedure for delineation and planning radiotherapy quality assurance (RTQA) that was performed before patient inclusion.

Materials and methods: Prior to trial activation, all participating centers were required to submit a benchmark case for radiotherapy volume delineation and planning. Submissions were prospectively reviewed by the RTQA team, and in cases of unacceptable variations, centers were required to revise and resubmit the same case until protocol compliance was achieved. Structure sets and dose distributions of the same benchmark patient submitted by participating centres were analysed. We determined the rate and causes of variations of glioblastoma target volumes (TV) and organs at risk (OAR) from the protocol-specified delineation guidelines. Delineation interobserver variability before and after RTQA review were quantified using the Dice similarity coefficient (DSC) with respect to ground truth contours at first and final submission of the benchmark case. The influence of reducing delineation interobserver variability on dose parameters of ground truth structures was determined.

Results: The delineations by 88 institutes were judged by RTQA reviewers to contain "unacceptable" variations in 80 % (n = 70) of the cases. TV contours were more frequently deemed unacceptable than organs at risk (72 % vs 55 %). After RTQA review, the mean DSC significantly improved for TV (GTV: 0.77 vs 0.82, p = 0.002; CTV: 0.85 vs 0.88, p < 0.0001; PTV: 0.85 vs 0.88, p < 0.0001), brainstem (0.87 vs 0.88, p = 0.007), cochlea (0.58 vs 0.62, p = 0.004) and optic nerve (0.65 vs 0.67, p = 0.0005), indicating reduced interobserver variability. The delineation adjustments after RTQA review resulted in a significant increase of the mean CTV D98% (+2.2 Gy, +4%, p = 0.005), indicating an improved target coverage. Doses to organs at risk did not change significantly but still met predefined constraints.

Conclusions: Variations in the delineation of target volumes and organs at risk were frequently judged as "unacceptable" during the RTQA review process. Besides a significant increase of CTV coverage, the impact of variations on organ at risk dosimetry was minor, suggesting a potentially negligible effect on toxicity outcomes. Quantitative metrics to assess delineation variations should be explored to improve the RTQA process in clinical trials and routine practice, aiming to flag delineation variations that confer an effect on tumour control or toxicity.