Session Item

May 07
16:55 - 17:55
Mini-Oral Theatre 1
07: Brachytherapy
Elena Manea, Romania;
Maximilian Schmid, Austria
Deep learning-based tumor segmentation of endoscopy images for rectal cancer patients
Luca Weishaupt, USA


Deep learning-based tumor segmentation of endoscopy images for rectal cancer patients

Luca Weishaupt1, Alana Thibodeau Antonacci1, Aurelie Garant2, Kelita Singh3, Corey Miller4, Té Vuong5, Shirin A. Enger1

1McGill University, Medical Physics Unit, Department of Oncology, Faculty of Medicine, Montréal, Canada; 2UT Southwestern Medical Center, Radiation Oncology, Dallas, USA; 3McGill University Health Centre, Division of Gastroenterology, Montréal, Canada; 4Jewish General Hospital, Division of Gastroenterology, Montréal, Canada; 5Jewish General Hospital, Department of Oncology, Montréal, Canada

Show Affiliations
Purpose or Objective

The objective of this study was to develop an automated rectal tumor segmentation algorithm from endoscopy images. The algorithm will be used in a future multimodal treatment outcome prediction model. Currently, treatment outcome prediction models rely on manual segmentations of regions of interest, which are prone to inter-observer variability. To quantify this human error and demonstrate the feasibility of automated endoscopy image segmentation, we compare three deep learning architectures.

Material and Methods

A gastrointestinal physician (G1) segmented 550 endoscopy images of rectal tumors into tumor and non-tumor regions. To quantify the inter-observer variability, a second gastrointestinal physician (G2) contoured 319 of the images independently.

The 550 images and annotations from G1 were divided into 408 training, 82 validation, and 60 testing sets. Three deep learning architectures were trained; a fully convolutional neural network (FCN32), a U-Net, and a SegNet. These architectures have been used for robust medical image segmentation in previous studies.

All models were trained on a CPU supercomputing cluster. Data augmentation in the form of random image transformations, including scaling, rotation, shearing, Gaussian blurring, and noise addition, was used to improve the models' robustness.

The neural networks' output went through a final layer of noise removal and hole filling before evaluation. Finally, the segmentations from G2 and the neural networks' predictions were compared against the ground truth labels from G1.


The FCN32, U-Net, and SegNet had average segmentation times of 0.77, 0.48, and 0.43 seconds per image, respectively. The average segmentation time per image for G1 and G2 were 10 and 8 seconds, respectively.

A representative example of a U-Net model's segmentation is compared to the ground truth from G1 in Figure 1. The segmentation scores are presented in Figure 2. All the ground truth labels contained tumors, but G2 and the deep learning models did not always find tumors in the images. The scores are based on the agreement of tumor contours with G1’s ground truth and were thus only computed for images in which tumor was found. The automated segmentation algorithms consistently achieved equal or better scores than G2's manual segmentations. G2's low F1/DICE and precision scores indicate poor agreement between the manual contours.


There is a need for robust and accurate segmentation algorithms for rectal tumor segmentation since manual segmentation of these tumors is susceptible to significant inter-observer variability. The deep learning-based segmentation algorithms proposed in this study are more efficient and achieved a higher agreement with our manual ground truth segmentations than a second expert annotator. Future studies will investigate how to train deep learning models on multiple ground truth annotations to prevent learning observer biases.