Vienna, Austria

ESTRO 2023

Session Item

Poster (Digital)
Multi-center auto-segmentation model for internal mammary nodes using clinical data: A DBCG study
Emma Skarsø Buhl, Denmark


Multi-center auto-segmentation model for internal mammary nodes using clinical data: A DBCG study

Emma Riis Skarsø1,2, Lasse Hindhede Refsgaard3,2, Abhilasha Saini4, Ebbe Laugaard Lorenzen5, Else Maae6, Esben Yates7, Ingelise Jensen8, Karen Andersen9, Kristian Boye10, Louise Wichmann Matthiessen9, Maja Maraldo10, Martin Berg6, Mette Holck Nielsen11, Mette Møller8, Sami Aziz-Jowad Al-Rawi4, Birgitte Offersen3,7,1,2, Stine Sofia Korreman7,1,2

1Aarhus University Hospital, Danish Center for Particle Therapy, Aarhus, Denmark; 2Aarhus University, Department of Clinical medicine, Aarhus, Denmark; 3Aarhus University Hospital, Department of Experimental Clinical Oncology, Aarhus, Denmark; 4Zealand University Hospital, Department of Clinical Oncology and Palliative Care, Næstved, Denmark; 5Odense University Hospital, Laboratory of Radiation Physics, Odense, Denmark; 6Vejle Hospital, University Hospital of Southern Denmark, Department of Oncology, Vejle, Denmark; 7Aarhus University Hospital, Department of Oncology, Aarhus, Denmark; 8Aalborg University Hospital, Department of Oncology, Aalborg, Denmark; 9Herlev and Gentofte Hospital, Department of Oncology, Herlev, Denmark; 10Copenhagen University Hospital – Rigshospitalet, Department of Oncology, Copenhagen, Denmark; 11Odense University Hospital, Department of Oncology, Odense, Denmark

Show Affiliations
Purpose or Objective

National standardization of breast cancer (BC) radiotherapy (RT) is desirable, and a generalizable auto-segmentation model to delineate target structures can help achieve this. We developed a deep learning (DL) based segmentation model for internal mammary lymph nodes (CTVn_IMN) for left-sided BC patients. The model was trained on national real world clinical delineations, all adhering to the Danish Breast Cancer Group (DBCG) guidelines.

Material and Methods

We included clinical CTVn_IMN delineations (mean volume 9.2 cm3) and CT scans (slice thickness 2-3mm) from a total of 778 high-risk left-sided BC patients treated with adjuvant RT in all seven centres in the nation during 2015-16. Delineations were crudely sorted to eliminate obvious deviations from guidelines: Delineations extending beyond costa 3 caudally and with a width outside the interval [6.6mm;24mm] were removed.

Patients were randomly split into a training set (90%) and a test set (10%).

The CT scans were cropped to the posterior and caudal part of the heart and cranial part of the lungs.

The cropped CT scans and CTVn_IMN delineations were used as input in a 3D full resolution nnUNet with five-fold (1000 epochs) cross-validation and default parameters. Clinical delineations were used as ground truth.

We report Dice coefficient (DSC), Hausdorff distance 95th percentile (HD95) and average surface distance (MSD) between predictions and clinical ground truth on the test set using evaluation functions in nnUNet. In addition, the difference in cranial and caudal extension was measured as number of slices.


A total of 424 patients were excluded during the sorting procedure, leaving 319/35 patients to train/test the model. The model performed with a median DSC = 0.70, HD95 distance = 4.83mm and MSD = 1.45mm, figure 1. The largest variation between ground truth and predictions were in the caudal extension, varying up to 18 slices, figure 2.

The lowest DSC scored patients, showed large disagreements in both cranial and caudal part of the CTVn_IMN. Also, the ground truth was wider than the prediction, see figure 1, patient 1. However, from a clinical perspective, these two DL-based delineations adhere better to the DBCG guidelines than the clinical ground truths.

Median scored patients showed minor disagreements in the cranio-caudal extension, varying 1-2 slices and an overall acceptable agreement in width, range of difference 0.64-1.35mm.


We demonstrated the feasibility of developing a clinically relevant DL model for CTVn_IMN based on real world clinical delineations. The model exhibited minor deviations from clinical ground truth for most patients. In patients with major deviations, model predictions were closer to DBCG guidelines than ground truth. The largest differences were in the caudal extension, indicating that in a clinical setting, attention should be focused on this region. To further mitigate variations, a dataset created specifically for the purpose of training a DL model would be needed.