Interdisciplinary Best Paper: Prospective assessment of AI screening for interstitial lung disease (ILD) in radiotherapy
Prospective assessment of AI screening for interstitial lung disease (ILD) in radiotherapy
Purpose or Objective
ILD is associated with high risk of pulmonary complications or even death following high dose radiation therapy. Unfortunately, some patients with ILD have not been diagnosed with ILD prior to radiation treatment. Automated ILD detection during the radiation treatment planning process may help identify patients at risk of radiation complications.
Material and Methods
A CNN was trained using a diagnostic patient dataset (n=4393) of thoracic computed tomography (CT) images from patients labelled as either normal or ILD+ (n=1366) based on radiographic findings. The CNN was tuned using radiation treatment planning 4D CT scans (n=503), of which 55 were known to be ILD+. The resulting model was tested on an internal validation dataset a sensitivity threshold that was selected for clinical deployment. The model was then clinically deployed in an automated framework evaluating CT simulation datasets for all patients treated for a lung malignancy at the time of treatment planning. The model outputs were prospectively assessed on a REB-approved quality assurance framework for 12 months and compared to radiologist assessed gold-standard radiographic ILD classification. The initial phase of clinical deployment was ‘silent’ without real-time clinical output. For the last six months of ‘live’ deployment, prototype emails were sent to clinicians informing them of the AI screening result if positive. Test characteristics of the deployed model were calculated based on the gold-standard radiologist determined ILD status.
After training on diagnostic data and tuning with CT simulation data, the CNN demonstrated an AUC of 0.77. The model threshold selected for deployment had an estimated sensitivity of 75%. During prospective deployment in 2021-2022, 383 consecutive patients were assessed. During the ‘silent’ phase, the model flagged 28/182 patients (15.3% [test positivity]) of which 5 were true positives with 2 false negatives (71% sensitivity) . Of the true positives, 3 were unknown to the treating physician based on clinical review. During the ‘live’ phase, an additional 32/220 patients (14.5% [test positivity]) were flagged with at least 2 true positives, of which 1 was unknown to the treating physician. The gold standard assessments of the 'live' cohort are not yet finalized. Overall, the model achieved an AUC=0.82 based on available gold-standard radiology review consistent with development cohorts. Physician impressions of the AI system were collected prospectively and the main concern identified was the false positive rate.
A CNN has been developed and clinically deployed to screen patients in the radiation treatment planning pipeline for ILD. The model has reasonable screening characteristics and prospectively identified five patients with previously unknown radiographic ILD. After external validation, this model could be deployed in radiation treatment planning systems to alert clinicians to ILD patients at high risk of developing complications.