Apriel-1.6-15B: efficient multimodal AI that competes with giants

Dec 9, 20254 minutes

Apriel-1.6-15B-Thinker arrives as the newest iteration of the Apriel SLM series: a 15-billion-parameter multimodal model designed to reason with text and images, but with a very clear focus on token and cost efficiency. What’s the result? Performance comparable to models ten times larger and more than a 30% reduction in reasoning-token usage compared to its previous version.

What is Apriel-1.6-15B-Thinker

Apriel-1.6-15B-Thinker is a 15B-parameter multimodal model aimed at deep reasoning across text and vision. It was trained on NVIDIA DGX Cloud using GB200 Grace Blackwell Superchips, and its explicit goal is to maximize the ratio between reasoning capability and inference efficiency.

On the Artificial Analysis Index (AA) it scores 57, outperforming models like Gemini 2.5 Flash, Claude Haiku 4.5 and GPT OSS 20b, and matching Qwen3 235B A22B on some evaluations — but with a much smaller compute footprint. Surprising, right? You don’t always need the biggest model to get top results.

Technical novelties and why they matter

Architecture and scale: it keeps 15B parameters but introduces improvements in tokenization and data mixing to boost multimodal reasoning capacity.
Data and pretraining strategy: the depth-upscaling phase uses a blend made of 35% diverse, high-quality content (web, scientific literature, math problems, code), 15% high-quality NVIDIA Nemotron datasets, and 50% pretraining-style replay data. This helps stabilize reasoning representations before fine stages.
CPT and long sequences: it repeats Apriel-1.5’s two-stage Continual Pretraining (CPT) strategy. In particular, a text-only CPT with extended sequence length to 49K is added to improve memory and long-context handling.
Multimodal training: Stage-1’s mix was expanded with synthetic text-only samples (reasoning, knowledge, code, creative writing) and image-text pairs covering OCR, chart understanding, visual reasoning and SVG/web-code synthesis.
Efficient compute: mid-training consumed roughly 10,000 GPU hours on GB200, which the release presents as a small footprint thanks to the hardware’s high throughput and a careful data strategy.

Post-training: SFT and RL

Supervised Fine-Tuning (SFT): 2.4 million high-signal examples with step-by-step reasoning traces. That choice aims for the model to internalize transparent reasoning processes, not just final answers.
SFT phases: first phase is text-only for 4 epochs at 32K context; second is multimodal for 3 epochs with rejection sampling to preserve image performance after introducing new special tokens.
Special tokens added to the tokenizer: <tool_calls>, </tool_calls>, [BEGIN FINAL RESPONSE], <|end|> to ease parsing and tool-call handling.
Reinforcement Learning: a multi-stage setup uses Group Sequence Policy Optimization (GSPO) and the VeRL framework. Rewards encourage correct answers and penalize verbosity or incorrect formats, aiming to reduce unnecessary reasoning-token usage.

Metrics and benchmarks (what the numbers say)

Artificial Analysis Index: Apriel-1.6 scores 57 on AA, placing it above several much larger models.
Token efficiency: over 30% reduction in reasoning-token usage compared to Apriel-1.5-15B-Thinker, a critical point for production deployments where every token costs money.
Internal and public evaluations: tests covered VQA, OCR, math, code, instruction following and long-context scenarios. On a set of 13 math-vision benchmarks it improves 4 points over its predecessor.
Summary table (highlight): Apriel-1.6 shows average improvements over Apriel-1.5 across many categories (function calling, instruction following, some coding tasks and visual reasoning), although in a few very specialized benches larger models still beat it on raw scores.

Practical implications: why this matters for companies and developers

Cost vs performance: Apriel-1.6 sits on the “cost-efficiency frontier.” That means you get reasoning abilities comparable to much larger models but with a smaller hardware and token footprint. For teams with limited compute budgets, that’s huge.
Deployment in enterprise settings: its design favors extended memory (49K context in internal stages) and inference efficiency — useful for assistants with long histories, document analysis, or agents that combine tools.
Tool integration: the special tokens and emphasis on function-calling make it easier to use as the backend for agents that call APIs or external tools.

Known limitations

It’s not perfect. The team acknowledges vision limitations: OCR can struggle with low-quality images; dense scenes or many objects complicate counting and fine grounding; very complex charts or unusual formats can produce imperfect interpretations. In short: excellent for many enterprise multimodal tasks, but not a flawless solution for every vision problem.

Final reflection

Apriel-1.6-15B-Thinker is a reminder that AI progress doesn’t always mean multiplying parameters. With smart data design, well-thought training phases and clear efficiency goals, you can approach frontier performance while keeping costs manageable. If you work on product or infrastructure, this release shows that prioritizing token efficiency and data quality can deliver practical, powerful models without relying on hundreds of thousands of GPU hours.

Original source

https://huggingface.co/blog/ServiceNow-AI/apriel-1p6-15b-thinker

Stay up to date!

Get AI news, tool launches, and innovative products straight to your inbox. Everything clear and useful.