Gap-Init | Geometry-Guided Initialization for PEFT

Overview

Gap-Init is a geometry-aware initialization strategy that enables stable and effective rank-1 LoRA adaptation for multimodal large language models. It estimates a modality-gap vector from a small calibration set, aligns the LoRA update direction with this geometry-salient axis, and preserves the pretrained model behavior at initialization.

140.59 CIDEr with rank-1 Gap-Init on COCO captioning.

8x Fewer LoRA parameters than rank-8 while matching or surpassing performance.

256 Calibration samples are enough for the training-free initialization step.

1.44 CIDEr standard deviation across seeds, down from 7.10.

Method

Random rank-1 LoRA directions are almost surely orthogonal to the modality-gap axis in high-dimensional spaces. Gap-Init uses this geometry to choose an update direction that receives meaningful gradient signal from the start.

Identify the modality gap

Estimate the translation-like mismatch between vision and text representations from a small calibration set.

Align rank-1 LoRA

Set the LoRA B matrix to the normalized gap direction while keeping A at zero.

Restore gradient flow

The update direction now projects onto the geometry-salient axis instead of collapsing into near-orthogonality.

Input:  Pretrained MLLM, calibration set D_cal, LoRA rank r
Output: Initialized LoRA matrices (B, A)

1. Extract aligned embeddings:
   z_v = projection(vision_encoder(x_img))
   z_t = text_encoder(x_txt)
   g_i = z_t - z_v

2. Estimate layer-wise gaps:
   g_l = mean(smooth(g_i))

3. Initialize LoRA:
   B[:, 0] = g / ||g||
   A = 0

Return (B, A)

Results

Gap-Init turns rank-1 LoRA from an unstable low-rank bottleneck into a competitive adaptation regime.

Configuration	CIDEr
Naive + Random Init	98.08
GG-Safe + Random Init	102.15
GG-Safe + Gap-Init	140.06
Naive + Gap-Init	140.59

Noise Level	CIDEr	BLEU-4
0.0	141.84	42.55
0.2	140.04	41.62
0.6	142.00	43.43
1.0	133.45	39.62

Quick Start

The codebase supports gap-vector diagnostics, Gap-Init training, and evaluation for captioning and VQA experiments.

Install

Create the project environment.

git clone https://github.com/HaoranZhao2000/Gap-Init.git
cd Gap-Init
conda env create -f environment.yml
conda activate gapinit

Extract gaps

Estimate gap vectors from a calibration set.

python run_diagnostics_master.py \
  --config configs/gap_init_rank1_naive.yaml \
  --target_module text \
  --num_samples 256

Train and evaluate

Run rank-1 Gap-Init adaptation.

python train_caption_master.py \
  --config configs/gap_init_rank1_naive.yaml \
  --seed 42

python eval_master.py \
  --model_path output/gap_init_rank1_naive_seed42 \
  --task caption

Citation

If you find this work useful, please cite the paper.

@inproceedings{zhao2026rank,
  title={When Is Rank-1 Enough? Geometry-Guided Initialization for Parameter-Efficient Fine-Tuning},
  author={Zhao, Haoran and Han, Soyeon Caren and Hovy, Eduard},
  booktitle={Forty-third International Conference on Machine Learning},
  year={2026},
  url={https://openreview.net/pdf?id=Umu6IsAUbS}
}