SeLoRA: Self-Expanding Low-Rank Adaptation of Latent Diffusion Model for Medical Image Synthesis

Yuchen Mao¹, Hongwei Li², Wei Pang³, Giorgos Papanastasiou⁴, Guang Yang⁵, Chengjia Wang³

¹University of Edinburgh, ²Harvard Medical School ³Heriot-Watt University ⁴Archimedes Unit, Athena Research Centre ⁵Imperial College London

arXiv

Interpolate start reference image.

SeLoRA Self-Expanding Low-Rank Adaptation of Latent Diffusion Model for Medical Image Synthesis

Abstract

The persistent challenge of medical image synthesis posed by the scarcity of annotated data and the need to synthesize `missing modalities' for multi-modal analysis, underscored the imperative development of effective synthesis methods.

Recently, the combination of Low-Rank Adaptation (LoRA) with latent diffusion models (LDMs) has emerged as a viable approach for efficiently adapting pre-trained large language models, in the medical field. However, the direct application of LoRA assumes uniform ranking across all linear layers, overlooking the significance of different weight matrices, and leading to sub-optimal outcomes. Prior works on LoRA prioritize the reduction of trainable parameters, and there exists an opportunity to further tailor this adaptation process to the intricate demands of medical image synthesis.

In response, we present SeLoRA, a Self-Expanding Low-Rank Adaptation Module, that dynamically expands its ranking across layers during training, strategically placing additional ranks on crucial layers, to allow the model to elevate synthesis quality where it matters most. The proposed method not only enables LDMs to fine-tune on medical data efficiently but also empowers the model to achieve improved image quality with minimal ranking. The code of our SeLoRA method is publicly available at Here

Synthesized Images

Below are some examples of images synthesized by fine-tuning a Stable Diffusion model with SeLoRA on the IU X-Ray dataset. The synthesized images are from the test set of the dataset, so patients' privacy is protected. Please hover over the images to see the synthesized image.

Hover to Reveal Synthesized Image from Fine-Tunined Stable Diffusion with SeLORA on IU X-Ray Dataset

Cardiac and mediastinal contours are within normal limits. The lungs are clear. Bony structures are intact.

Hover to Reveal Synthesized Image from Fine-Tunined Stable Diffusion with SeLORA on IU X-Ray Dataset

The lungs are clear bilaterally. Specifically, no evidence of focal consolidation, pneumothorax, or pleural effusion. Cardio mediastinal silhouette is unremarkable. Visualized osseous structures of the thorax are without acute abnormality.

Hover to Reveal Synthesized Image from Fine-Tunined Stable Diffusion with SeLORA on IU X-Ray Dataset

The cardiac silhouette, upper mediastinum and pulmonary vasculature are within normal limits. There is no acute air space infiltrate, pleural effusion or pneumothorax. No pulmonary nodules are identified.

Hover to Reveal Synthesized Image from Fine-Tunined Stable Diffusion with SeLORA on IU X-Ray Dataset

Negative for cardiac enlargement. Negative for vascular congestion. There are several small circular opacities in the right upper lung, some of which are centrally lucent. Negative for bony abnormality.

Hover to Reveal Synthesized Image from Fine-Tunined Stable Diffusion with SeLORA on IU X-Ray Dataset

The lungs appear clear. The heart and pulmonary XXXX are normal. The pleural spaces are clear. Surgical clips and suture material are noted in the right hilar region suggesting prior lung surgery. The mediastinal contours are stable.

SeLoRA Training Procedure

The algorithm below describes the training procedure of SeLoRA. When training a model with SeLoRA, SeLoRA initilize itself with a rank of 1 and trains just like the vanilla LoRA under the hood. However, SeLoRA test the fisher information score (FI-Score) of the model at regular intervals and expands its rank if the FI score is not satisfactory. The rank expansion is done by adding a new rank to the weight matrices of the model. The rank expansion is done in a nice way that the new rank is added to the weight matrices that have the most impact on the model's performance.

Interpolate start reference image.

Learnt Rank Visulization

The image visualizes the final ranking of SeLoRA across the layers of a Stable Diffusion model after training on the IU-XRay dataset. Each cell represents a layer in the model, and the number within indicates the layer's rank. The layers are arranged from input to output, top to bottom and left to right.

In the text encoding part, "q", "k", and "v" represent the query, key, and value sections of the attention weights, respectively. "out", "fc1", and "fc2" denote the output, first fully connected layer, and second fully connected layer of the U-Net part.

Similarly, in the U-Net part, "q", "k", and "v" represent the query, key, and value sections of the attention weights. "attn1" and "attn2" refer to the first and second attention layers, where the first is self-attention and the second is cross-attention between text and image embeddings.

Text Encoder

In the text encoder part, large ranks for SeLoRA lie at the q and k parts of the attention weights.

Interpolate start reference image.

U-Net

For the U-Net part, large ranks are allocated to the q and k parts of the second attention layer, which is the cross-attention layer.

Interpolate start reference image.

Across different layers of the stable diffusion model, we observe that the rank allocation aligns with the intuition that weight updates would change most dramatically at locations where the latent representations of text and image intersect (where conditioning is more apparent).

Qualitative Results

The image below compares the qualitative results of SeLoRA with vanilla LoRA and other LoRA variants. The first row displays synthesized images trained on the IU-XRay dataset, while the second row shows synthesized images trained on the Montgomery County CXR dataset, which is significantly smaller.

Due to SeLoRA's strategic nature, it was able to achieve similar image quality to other LoRA variants using only half the rank. Also notably, for the Montgomery County CXR dataset, which has a limited number of samples (a common constraint in medical imaging), SeLoRA outperforms vanilla LoRA and other LoRA variants in terms of image quality.

Interpolate start reference image.

Related Links

Several excellent works have been introduced concurrently with ours, advancing the field of model adaptation.

All low-rank adaptation techniques stem from the original LoRA paper, which pioneered the concept of adapting large language models using low-rank matrices.

AdaLoRA, DyLoRA, and SoRA expand on the LoRA concept by implementing adaptive pruning, dynamically reducing the rank of matrices during training.

Other research, such as QLoRA, focuses on quantizing LoRA to further reduce memory usage.

Given the rapid pace of research in this field, many more related works may have emerged since this writing. For a further overview, please refer to my summary of LoRA papers, which details their techniques and mathematical formulations.

BibTeX

@misc{mao2024seloraselfexpandinglowrankadaptation,
      title={SeLoRA: Self-Expanding Low-Rank Adaptation of Latent Diffusion Model for Medical Image Synthesis}, 
      author={Yuchen Mao and Hongwei Li and Wei Pang and Giorgos Papanastasiou and Guang Yang and Chengjia Wang},
      year={2024},
      eprint={2408.07196},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2408.07196}, 
}