Accepted at ICML 2026

When Is Rank-1 Enough?

Geometry-guided initialization for parameter-efficient fine-tuning of multimodal large language models.

Haoran Zhao, Soyeon Caren Han, and Eduard Hovy
The University of Melbourne

Gap-Init aligns rank-1 LoRA updates with the modality-gap direction.
Gap-Init aligns rank-1 LoRA updates with the modality-gap direction, restoring gradient flow along the geometry-salient axis.

Overview

Gap-Init is a geometry-aware initialization strategy that enables stable and effective rank-1 LoRA adaptation for multimodal large language models. It estimates a modality-gap vector from a small calibration set, aligns the LoRA update direction with this geometry-salient axis, and preserves the pretrained model behavior at initialization.

140.59 CIDEr with rank-1 Gap-Init on COCO captioning.
8x Fewer LoRA parameters than rank-8 while matching or surpassing performance.
256 Calibration samples are enough for the training-free initialization step.
1.44 CIDEr standard deviation across seeds, down from 7.10.

Method

Random rank-1 LoRA directions are almost surely orthogonal to the modality-gap axis in high-dimensional spaces. Gap-Init uses this geometry to choose an update direction that receives meaningful gradient signal from the start.

1

Identify the modality gap

Estimate the translation-like mismatch between vision and text representations from a small calibration set.

2

Align rank-1 LoRA

Set the LoRA B matrix to the normalized gap direction while keeping A at zero.

3

Restore gradient flow

The update direction now projects onto the geometry-salient axis instead of collapsing into near-orthogonality.

Input:  Pretrained MLLM, calibration set D_cal, LoRA rank r
Output: Initialized LoRA matrices (B, A)

1. Extract aligned embeddings:
   z_v = projection(vision_encoder(x_img))
   z_t = text_encoder(x_txt)
   g_i = z_t - z_v

2. Estimate layer-wise gaps:
   g_l = mean(smooth(g_i))

3. Initialize LoRA:
   B[:, 0] = g / ||g||
   A = 0

Return (B, A)

Results

Gap-Init turns rank-1 LoRA from an unstable low-rank bottleneck into a competitive adaptation regime.

Configuration CIDEr
Naive + Random Init 98.08
GG-Safe + Random Init 102.15
GG-Safe + Gap-Init 140.06
Naive + Gap-Init 140.59
Noise Level CIDEr BLEU-4
0.0 141.84 42.55
0.2 140.04 41.62
0.6 142.00 43.43
1.0 133.45 39.62

Quick Start

The codebase supports gap-vector diagnostics, Gap-Init training, and evaluation for captioning and VQA experiments.

Install

Create the project environment.

git clone https://github.com/HaoranZhao2000/Gap-Init.git cd Gap-Init conda env create -f environment.yml conda activate gapinit

Extract gaps

Estimate gap vectors from a calibration set.

python run_diagnostics_master.py \ --config configs/gap_init_rank1_naive.yaml \ --target_module text \ --num_samples 256

Train and evaluate

Run rank-1 Gap-Init adaptation.

python train_caption_master.py \ --config configs/gap_init_rank1_naive.yaml \ --seed 42 python eval_master.py \ --model_path output/gap_init_rank1_naive_seed42 \ --task caption

Citation

If you find this work useful, please cite the paper.

@inproceedings{zhao2026rank,
  title={When Is Rank-1 Enough? Geometry-Guided Initialization for Parameter-Efficient Fine-Tuning},
  author={Zhao, Haoran and Han, Soyeon Caren and Hovy, Eduard},
  booktitle={Forty-third International Conference on Machine Learning},
  year={2026},
  url={https://openreview.net/pdf?id=Umu6IsAUbS}
}