Abstract

Title

Bias and reporting quality of artificial intelligence models in radiotherapy treatment planning

Authors

Marjan Sharabiani1, Enrico Clementel1, Nicolaus Andratschke2, Nick Reynaert3, Wouter van Elmpt4, Coen Hurkmans5

Authors Affiliations

1European Organisation for Research and Treatment of Cancer (EORTC) , Radiotherapy quality assurance, Brussels, Belgium; 2University Hospital Zürich, Department of Radiation Oncology, Zurich, Switzerland; 3Jules Bordet Institute, Medical Physics Department, Brussels, Belgium; 4Maastricht University Medical Centre, Department of Radiation Oncology, Maastricht, The Netherlands; 5Catharina Hospital, Department of Radiation Oncology, Eindhoven, The Netherlands

Purpose or Objective

The number of studies using artificial intelligence (AI)-based models has increased in recent years, but their clinical application remains a point of contention. The goal of this study was to systematically evaluate the risk of bias (ROB) and reporting quality of AI-based models in radiotherapy treatment planning studies using PROBAST (Prediction model Risk Of Bias ASsessment Tool) and TRIPOD (Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis) guidelines and report the tailored items in each guideline suitable for AI-based models. 

Materials and Methods

A PubMed search was conducted in January 2021 and updated through March 2021. A combination of keywords were used including: “artificial intelligence” OR “machine learning” OR “deep learning” OR “knowledge-based” AND “radiotherapy” AND “treatment planning”. A total of 659 articles were reviewed for title and abstract. 126 articles were selected for full text review, among which 20 were selected to assess ROB and reporting quality for non-randomised studies. The inclusion criteria for the 20 articles were: recently published (2018 to 2021), published in high impact factor journals and the most cited papers. TRIPOD contains a 22 items checklist (37 individual items), where we identified the items pertaining to predictors not relevant to evaluate AI models. PROBAST contains 20 signaling questions from four domains (participants, predictors, outcomes, analysis), from which we similarly exclude the predictor domain, which is not relevant for AI-treatment planning models. 

Results

The two checklists were first examined for applicability in AI-based algorithms. Some of the items for each guideline required adaptations to suit evaluation of the AI models (e.g. missing data was replaced with suboptimal plans).

Figure 1. PROBAST risk of bias assessment

Figure 1 reports the ROB assessed using the tailored PROBAST items for the 20 analysed articles.

Analysis domain is the major cause of ROB in AI-based treatment planning studies (50% of the studies). In these studies, the steps on how to deal with suboptimal plans is either under-reported or unclear.

Adherence to the tailored TRIPOD items was poor in blinding (15%), sample size (10%), and suboptimal plan reporting (15%). Furthermore, given the statistical complexity of model development/validation, the statistical methods used are neglected, resulting in a high ROB. Only 20% of the studies provided supplementary material such as a full model description or code, the statistical analysis or the study data sets (Figure 2).

Figure 2.Adherence to individual TRIPOD items



Conclusion

Most PROBAST and TRIPOD criteria can be used to score articles on AI in radiotherapy treatment planning. Using these items, the articles show a high risk of bias and underreport several important study aspects as judged by the TRIPOD criteria.


Similar research is expected to increase the consistency and transparency of the published evidence base while also reducing study waste.