ESTRO 2023 Physics Track Report - ‘Big data, big headache’ joint ESTRO-American Association of Physicists in Medicine session

PDF Version

 

Working with large, multicentre, DICOM radiotherapy datasets

The utilisation of big data and data science methods holds promise for the enhancement of the progress of radiotherapy. To harness this potential fully, clinical studies must progress beyond simple registration of radiotherapy in binary form or with only prescribed doses and fractionations. Instead, the comprehensive exposure data (including images, structure sets, treatment plans, and 3D dose distributions) that are available in the digital imaging and communications in medicine (DICOM) format should be collected. In addition to clinical studies, DICOM datasets also play a pivotal role in the advancement of machine learning technology, which is swiftly making its way into both research and clinical practice.

During work with large, multicentre, DICOM radiotherapy datasets, some challenges that would be only minor inconveniences in a small cohort might cause significant headaches. However, these challenges can be overcome.

Multicentre data collection

We must use an automated process to collect millions of files from thousands of patients. Bulk DICOM export solutions must be implemented. Most older treatment planning systems do not support this as a standard solution, but if the system allows scripting, it is most likely possible to implement bulk export (and maybe someone, somewhere, has already done it). All participating centres are unlikely to use the same treatment planning system, but for those who are, collaboration is the key. In the end, the output should all be in the DICOM format.

Data curation and standardisation

The level of conformity and variability of multicentre DICOM radiotherapy datasets is contingent upon the extent of cross-centre collaboration and guideline implementation. This holds true, particularly for non-trial treatments, which constitute most of the data available. To tackle effectively the challenges of curating, standardising, and analysing large DICOM datasets, a vendor-agnostic tool is required.

  • This tool should have the capabilities to:
  • provide an overview of all DICOM files and how they are related to patients, studies and treatments;
  • enable selection of patients, studies and treatments, and store the results;
  • make it possible to map treatment-specific structure names to a common name set;
  • automate summation and scaling of dose files;
  • automate the extraction of dose-volume histogram parameters based on the above; and
  • enable easy implementation of study-specific tools.

Case study – Danish Breast Cancer Group (DBCG) RT-Nation

In the DBCG RT-Nation study, we facilitated the implementation of bulk DICOM data collection for the seven radiotherapy centres in Denmark (through the use of Eclipse, Oncentra and Pinnacle). For curation and standardisation, we developed the collaborative DICOM analysis for radiotherapy (CORDIAL-RT) framework (https://github.com/Aarhus-RadOnc-AI/cordial-rt). We managed to include 7448 patients with their corresponding CTs, structures, treatment plans and quality-assured dose distributions. This corresponded to 86% of all loco-regional breast cancer radiotherapy treatments in Denmark that had been carried out between 2008 and 2016.

If we can do it, so can you. Please feel free to reach out!

 

Lasse Refsgaard
Medical physicist / PhD student
Aarhus University Hospital / Ã…rhus University
Aarhus, Denmark