After pretraining, language models go through several mid- and post-training stages to become useful: following instructions, reasoning, using tools, and staying safe. What happens when you want to add or improve a skill after all that work? Do you retrain from scratch and wait ages, or risk the model forgetting things?
BAR offers another route: train experts separately and combine them with a MoE architecture so you keep flexibility and avoid regressions.
Qué es BAR y por qué importa
BAR stands for Branch-Adapt-Route. The key idea is to modularize post-training: each domain (for example math, code, tool use, safety) is trained as an independent expert that goes through its own mini pipeline. Then you combine them into a single model using a mixture-of-experts and a router that decides which expert to activate for each input.
Why does this matter to you? In real model development you have different teams, different timelines, and datasets that arrive asynchronously. Re-running the whole pipeline every time you update code or data is expensive and impractical. BAR gives you an alternative that scales linearly with domain updates.
