We included planning CT, PET, and MRI(T1-weighted and T2-weighted) from 154 HNSCC patients treated with primary curative (chemo-)radiotherapy. Clinical delineations of gross tumor volume(GTVt) and involved lymph nodes(GTVn) on CT were considered ground truth. All modalities were resampled to a volumetric isotropic 1mm voxel grid, and MRI images were registered to PET/CT using either rigid registration (RR) or deformable registration (DR) with Elastix.
We used a 3D UNet with deep supervision as a baseline CNN segmentation model. We designed two approaches from architecture and data directions (1) deformable convolution networks (DCN) and (2) channel translation augmentation (CHTL). DCN employs offsets to enable free-form deformation of CNN from the fixed 3D sampling grid. We modified the MRI path of UNet's first block with DCN to enhance the geometric transformation modeling capacity. For CHTL augmentation, we randomly shift MRI channels with a maximum of 3 mm on the x, y, and z axis.
Six segmentation groups were compared: RR images with UNet, DCN and CHTL; DR images with UNet, DCN and CHTL. Data was split to uniform train(93 pt), validation(31 pt), test(30 pt), and trained for 200 epochs independently. Results of the test set were evaluated on the union of GTVt and GTVn, using Dice similarity coefficient(Dice), Hausdorff Distance 95 percentile(HD95), Mean Surface Distance(MSD) and training time.
The networks were trained with the sum of Dice and Top-k loss, Stochastic Gradient Descent optimizer with batch size 2, initial learning rate 0.01 with decay, patches sampling with the size of 128*128*128. Universal augmentation operations include scaling, rotations and mirroring.