Beyond LoRA: choosing the best PEFT technique | Keryc
If you're going to fine-tune an open model with your own data, you've probably heard of PEFT (parameter-efficient fine-tuning) and its apparent king: LoRA. Does that mean LoRA is always the best choice? Not necessarily. In this technical piece I explain why, how to evaluate alternatives, and which tools help you make an informed decision.
Why PEFT exists and why it matters
Training a model from scratch is expensive and slow. Fine-tuning a large model seems like the natural solution, but it eats a lot of memory: during training you need space for model copies, gradients and optimizers. Quantization reduces memory size, but doesn't let you directly adjust weights. That's where PEFT comes in: a set of techniques that let you adapt models using only a fraction of the memory.
What do you gain with PEFT? Small checkpoints, lower risk of catastrophic forgetting, the ability to fine-tune quantized models and to serve multiple adaptations on the same base. In practice that means you can experiment on consumer hardware and scale with less cost.
LoRA: why it dominates and why that can be misleading
LoRA (Low Rank Adaptation) adds low-dimensional matrices on top of the base model and trains only those new matrices. It's simple, effective and easy to integrate: enough reasons for its huge popularity.
But popularity can be self-reinforcing: more tutorials, more tooling support (vLLM, llama.cpp, etc.), more checkpoints on Hugging Face Hub. That doesn't prove it's the best for every case. There are many variants and newer techniques that claim academic superiority, yet comparing papers can be misleading because of tuning biases, heterogeneous benchmarks and limited reproducibility. Some studies even show LoRA matches new methods if hyperparameters are tuned carefully (paper).
The unified Hugging Face benchmark: what they did
To offer a fair comparison, Hugging Face integrated many PEFT techniques into the peft library and created benchmarks that run each technique with the same base, data, code and hardware. Two standout tasks:
MetaMathQA: fine-tuning LLMs for mathematical reasoning (chain-of-thought and output format).
Image generation: learning a new concept (a cat plushy) and generalizing it to new prompts.
Each technique was evaluated not only by test accuracy, but also by max VRAM, runtime, checkpoint size and drift/forgetting metrics. Result: comparisons on equal terms and the ability to see clear tradeoffs.
Key results (what you need to remember)
LoRA remains a solid option, but it doesn't always dominate. In some experiments other techniques beat LoRA on accuracy and/or memory use.
In MetaMathQA (example with meta-llama/Llama-3.2-3B): LoRA with stabilized initialization reached 53.2% using 22.6 GB VRAM. Vanilla LoRA reached 48.1% using 22.5 GB and falls short. Other methods, like BEFT or Lily, appear on the Pareto frontier depending on whether you prioritize memory or accuracy.
In the image task (FLUX.2-klein-base-4B), OFT outperformed LoRA: OFT achieved higher dino similarity (0.708) and lower VRAM (9.01 GB) versus LoRA (0.697 and 9.97 GB). In that experiment OFT dominated LoRA on both axes.
LoRA variants matter: LoRA-FA (optimization and partial freezing) and initialization tweaks change results a lot. Don't confuse "vanilla LoRA" with the whole LoRA ecosystem.
Interpretation: tradeoffs and the Pareto frontier
Think of the problem like a scale: accuracy vs memory vs time vs checkpoint size. If a technique can't be improved on one axis without worsening another compared to alternatives, it's on the Pareto frontier. LoRA often sits on that frontier in some benchmarks, but not always. The practical conclusion: try several methods and choose according to your priorities.
Benchmark limitations and how to contribute
No benchmark captures everything. Hyperparameters, support for specific layers, compatibility with quantized models and particular capabilities (for example, Cartridges for long prompts) can change the decision.
The good news: the peft library makes it easy to add configurations and contribute results. If you believe a method improves with other hyperparameters, you can open a PR. If you want to add a new benchmark, you can collaborate too.
Ecosystem compatibility and conversion to LoRA
A practical reason to prefer LoRA is compatibility: many inference tools only load LoRA directly. To mitigate this, peft now supports converting non-LoRA adapters to LoRA checkpoints. In image tests, converting GraLoRA to LoRA kept scores nearly identical. Still, not all techniques have conversion implemented; expansion depends on demand.
How to choose in practice (short recipe)
Define your priorities: less VRAM, better accuracy, shorter training time, smaller checkpoints?
Use the peft library to try multiple configurations with the same base and your data.
Check LoRA variants before discarding it: DoRA, rs-LoRA, LoRA-FA, etc.
If you need to serve in vLLM/llama.cpp and choose another technique, convert the checkpoint to LoRA if the tool allows it.
Minimal code change to try another technique:
from transformers import AutoModelForCausalLM
from peft import OFTConfig, get_peft_model
base_model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.2-3B", dtype="bfloat16")
config = OFTConfig(target_modules=["q_proj", "v_proj"])
model = get_peft_model(base_model, config)
With a one-line change in the configuration you can evaluate OFT instead of LoRA and compare results on your dataset.
Final reflection
LoRA isn't the definitive answer, but it is a practical standard with huge support. Fair comparisons show other techniques can win depending on the goal. The technical and practical recommendation is to widen your horizon: use the unified peft API, try several techniques on your data and prioritize based on real tradeoffs. Experimenting costs little and can save memory or gain accuracy.