Image synthesis is expected to provide value for the translation of machine learning methods into clinical practice. Fundamental problems like model robustness, domain transfer, causal modelling, and operator training become approachable through synthetic data. Especially, heavily operator-dependant modalities like Ultrasound imaging require robust frameworks for image and video generation.
So far, video generation has only been possible by providing input data that is as rich as the output data, e.g., image sequence conditioning video. However, clinical documentation is usually scarce and only single images are reported and stored, thus retrospective patient-specific analysis or the generation of rich training data becomes impossible with current approaches.
In this paper, we extend elucidated diffusion models for video modelling to generate plausible video sequences from single images and arbitrary conditioning with clinical parameters. We explore this idea within the context of echocardiograms by looking into the variation of the Left Ventricle Ejection Fraction, the most essential clinical metric gained from these examinations. We use the publicly available EchoNet-Dynamic dataset for all our experiments. Our image to sequence approach achieves an R2 score of 93%, which is 38 points higher than recently proposed sequence to sequence generation methods.
Hover a video to see whether it's an original sample, a factual generated sample or a counterfactual generated sample. You can also click this to tranform the mosaic into a game where you have to guess whether a sample is real or not.
This work was supported by Ultromics Ltd. and the UKRI Centre or Doctoral Training in Artificial Intelligence for Healthcare (EP/S023283/1). The authors gratefully acknowledge the scientific support and HPC resources provided by the Erlangen National High Performance Computing Center (NHR@FAU) of the Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU) under the NHR project b143dc PatRo-MRI. NHR funding is provided by federal and Bavarian state authorities. NHR@FAU hardware is partially funded by the German Research Foundation (DFG) – 440719683.
@misc{reynaud2023featureconditioned,
title={Feature-Conditioned Cascaded Video Diffusion Models for Precise Echocardiogram Synthesis},
author={Hadrien Reynaud and Mengyun Qiao and Mischa Dombrowski and Thomas Day and Reza Razavi and Alberto Gomez and Paul Leeson and Bernhard Kainz},
year={2023},
eprint={2303.12644},
archivePrefix={arXiv},
primaryClass={cs.CV}
}