Leanstral: open-source agent for testing formal code | Keryc
Mistral introduces Leanstral, an open-source agent designed to reduce the human bottleneck in code verification and formal proofs. Can you imagine generating code that's also automatically verified against strict specifications? That's the idea: that the proof isn't an endless manual chore, but part of the agent itself.
What is Leanstral
Leanstral is the first open-source agent trained specifically for Lean 4, the proof assistant capable of expressing complex mathematical objects and software specifications. Unlike approaches that wrap generalist models, Leanstral was designed to be efficient: it has around 6B active parameters and a sparse architecture optimized for proof engineering tasks.
What does this mean in practice? It's not just about proposing guesses: Leanstral generates, proves, and verifies within the formal workflow, using Lean as a perfect checker.
Open and accessible
Weights under an Apache 2.0 license so anyone can download and run the model on their own infrastructure.
Agent mode integrated in Mistral Vibe to get started without setups using the /leanstall command.
Public API and free/low-cost access for a limited time under the endpoint labs-leanstral-2603 to gather real feedback.
The bet is clear: visibility, reproducibility, and real use in formal repositories.
Efficiency and evaluation
Mistral presented FLTEval, an evaluation suite that measures usefulness in real proof-engineering scenarios instead of isolated math puzzles. In that test, Leanstral competes with commercial agents and much larger open-source models.
Key results (summary):
Leanstral achieves better scores with much less computational cost than giant open models.
Compared to the Claude family, Leanstral offers a notable cost/benefit ratio: for example, pass@2 reaches 26.3 points costing only 36 USD, while Sonnet costs 549 USD for 23.7 points and Opus reaches higher quality but with much larger costs.
Simplified evaluation table (score | cost):
Model
Cost (USD)
Score
Haiku
184
23.0
Sonnet
549
23.7
Opus
1650
39.6
Leanstral
18
21.9
Leanstral pass@2
36
26.3
Leanstral pass@4
72
29.3
Leanstral pass@16
290
31.9
The practical takeaway: Leanstral scales efficiently and offers competitive results without needing to run to monstrous sizes.
Real use cases
Mistral showed concrete examples so this doesn't sound like a vague promise:
Migrations after Lean changes: Leanstral diagnosed a real problem from a Proof Assistants Stack Exchange post. It didn't guess; it built test code, reproduced the error, and explained why def blocked the rw tactic. The proposed solution was to use abbrev to create a transparent alias. Result: a clear explanation and an applicable fix.
Converting and reasoning about programs: given material from Rocq, Leanstral translated definitions to Lean, added useful notation, and proved properties about simple programs—like demonstrating that an assignment adds 2 to a variable's value.
These examples show that Leanstral doesn't just generate code; it reasons about it within the formal ecosystem.
How to try it today
In Mistral Vibe: use /leanstall to start without installations.
Labs API: call labs-leanstral-2603 while the endpoint remains available to collect real usage.
Local download: if you prefer full control, download the Apache 2.0 weights and run them on your hardware.
Are you a researcher, a formal proof developer, or just curious? You can integrate it into MCP workflows through vibe; Leanstral is optimized to get the most out of lean-lsp-mcp.
Final reflection
Leanstral represents an important step toward agents that not only generate code but also verify it formally. It doesn't solve every problem, but it lowers the barrier to entry: less time spent on human review, faster iteration on frontier mathematics and critical software. The best part? It's open-source and accessible so the community can test it, improve it, and make it genuinely useful.