Abstract

Title

End-to-end head & neck tumor auto-segmentation using CT/PET and MRI without deformable registration

Authors

Jintao Ren1, Jasper Albertus Nijkamp2, Jesper Grau Eriksen3, Stine Sofia Korreman1

Authors Affiliations

1Aarhus University, Department of Clinical Medicine - The Department of Oncology, DCPT - Danish Center for Particle Therapy, Aarhus, Denmark; 2Aarhus University, Department of Clinical Medicine - DCPT - Danish Center for Particle Therapy, Aarhus, Denmark; 3Aarhus University, Department of Clinical Medicine - Department of Experimental Clinical Oncology, Aarhus, Denmark

Purpose or Objective

Deep learning-based tumor segmentation is expected to alleviate time consumption and inter-observer variability (IOV) of manual delineation by learning complementary information from multimodality images. Currently, multimodality images are required to be co-registered prior to segmentation by convolutional neural networks (CNN). Deformable strategies may be used to minimize registration errors; however, these are resource-intensive. We aim to develop an efficient end-to-end solution without compromising segmentation quality.

Materials and Methods

We included planning CT, PET, and MRI(T1-weighted and T2-weighted) from 154 HNSCC patients treated with primary curative (chemo-)radiotherapy.  Clinical delineations of gross tumor volume(GTVt) and involved lymph nodes(GTVn) on CT were considered ground truth. All modalities were resampled to a volumetric isotropic 1mm voxel grid, and MRI images were registered to PET/CT using either rigid registration (RR) or deformable registration (DR) with Elastix. 


We used a 3D UNet with deep supervision as a baseline CNN segmentation model. We designed two approaches from architecture and data directions (1) deformable convolution networks (DCN) and (2) channel translation augmentation (CHTL). DCN employs offsets to enable free-form deformation of CNN  from the fixed 3D sampling grid. We modified the MRI path of UNet's first block with DCN to enhance the geometric transformation modeling capacity. For CHTL augmentation, we randomly shift MRI channels with a maximum of 3 mm on the x, y, and z axis.


Six segmentation groups were compared: RR images with UNet, DCN and CHTL; DR images with UNet, DCN and CHTL. Data was split to uniform train(93 pt), validation(31 pt), test(30 pt), and trained for 200 epochs independently. Results of the test set were evaluated on the union of GTVt and GTVn, using Dice similarity coefficient(Dice), Hausdorff Distance 95 percentile(HD95), Mean Surface Distance(MSD) and training time. 


The networks were trained with the sum of Dice and Top-k loss, Stochastic Gradient Descent optimizer with batch size 2, initial learning rate 0.01 with decay, patches sampling with the size of 128*128*128. Universal augmentation operations include scaling, rotations and mirroring. 

Results

Table 1 shows median scores of segmentation for all six groups. The MSD was significantly reduced for DR UNet compared to RR UNet. Comparable MSD could also be achieved using either DCN or CHTL for RR data. A further improvement was achieved by using DCN or CHTL on DR data, shown in Figure 1. There were no significant differences in either DICE or HD95. DCN took threefold longer time and more GPU memory to train compared to the other methods. 


Conclusion

For multimodality deep learning-based HNSCC GTV segmentation, DR reduces MSD but has no significant effect on Dice or HD95 compared to RR. Both DCN and CHTL with prior RR achieved end-to-end solutions with comparable performance to DR UNet. CHTL requires less computing resources and shorter training time.