Foundation AI models for MRgRT real-time target localisation

ESTRO 2025 Congress report | Physics track

Study presented during the session on MRLinac-based online adaptive radiotherapy, on Sunday 4 May 2025

Authors:
Tom Julius Blöcker, Elia Lombardo, Sebastian N. Marschner, Claus Belka, Stefanie Corradini, Miguel A. Palacios, Marco Riboldi, Christopher Kurz, Guillaume Landry.

Motivation

To use MRI-guided radiation therapy (MRgRT), the image data must be interpreted, partly to localise the target in real time in order to manage intra-fraction motion. The semi-visual nature of this task means that artificial intelligence (AI) models are strong candidates for this task, and this spurred their development and training.

Recently, promptable foundation models have shown promise in addressing relevant downstream tasks, so bypassing the need for time- and resource-intensive application-specific training. These models take two inputs – prompts/instructions and data – and are trained on tuples that consist of a prompt, data, and the expected output. Such training enables them to address tasks universally for arbitrary prompts and data.

At ESTRO 2025, we presented results of the application of promptable foundation models to MRgRT target tracking, by interpreting a target segmentation on an initial frame as a prompt and the consecutive cine MRI sequence as data inputs.

Most important findings

The segment-anything 2 model (SAM2) for video segmentation, which is a foundation model that has not been trained on medical images, was successfully applied to track moving targets in 0.35T MRlinac 2D sagittal cine MRI data that had been acquired at up to 8Hz from two institutions and countries [1].

Comparison of the SAM2-based approach with inter-observer variability (among two to five observers), as well as static propagation from both breath-hold and non-breath-hold frames, showed that the foundation model had very good tracking capabilities that were on a par with inter-observer variability.

Further comparison of SAM2 with a transformer-based image registration model, TransMorph [2], with patient-specific fine-tuning (TM-PS, using eight expert-delineated frames) and without (TM), showed SAM2 to be only slightly worse than the fine-tuned TM-PS and better than the regular TM. This indicates that the prompting mechanism can function as a time-efficient alternative to PS fine-tuning.

Research implications

This study demonstrates that promptable foundation models can achieve high-quality real-time target localisation in MRgRT. Without intensive or extensive resource training, prompting mechanisms can provide enough guidance to enable application to downstream tasks, such as tracking of patient-specific targets.

In the future, we hope that these models can be useful to the further development of MRgRT and to prevent unnecessary training of new models, for solvable tasks via prompting foundation models.

Also at ESTRO 2025, the TrackRAD2025 challenge was presented, in which solutions to the same task, submitted by participants, were evaluated on a large, multi-institutional dataset in a standardised fashion [3]. The results will be ready in September.

Tom Julius Blöcker

PhD candidate
Department of Radiation Oncology

Ludwig Maximilian University Hospital

Munich, Germany

https://www.linkedin.com/in/tom-julius/
tom.bloecker@campus.lmu.de

tom.bloecker@outlook.de

https://lmu-art-lab.userweb.mwn.de/author/tom-blocker/

References

[1] Bloecker, Tom Julius, et al. MRgRT real-time target localization using foundation models for contour point tracking and promptable mask refinement. Phys. Med. Biol. 70 015004. (2024) DOI: 10.1088/1361-6560/ad9dad

[2] Lombardo, Elia, et al. Patient-specific deep learning tracking framework for real-time 2D target localization in MRI-guided radiotherapy. International Journal of Radiation Oncology* Biology* Physics (2024) DOI: 10.1016/j.ijrobp.2024.10.021

[3] www.trackrad.ch