Nemotron 3 Nano 4B: Compact AI optimized for the edge

Nemotron 3 Nano 4B is NVIDIA's new bet to bring powerful models to the edge: a hybrid Mamba-Transformer model with 4 billion parameters designed to run on Jetson devices, GeForce GPUs and clusters like DGX Spark, with a small VRAM footprint and good instruction following and tool-use behavior.

Qué es Nemotron 3 Nano 4B

It's a hybrid model that combines Mamba (SSM) components with transformer-style layers to achieve efficient reasoning. With 4B parameters, it's specifically optimized for local and edge deployments: Jetson Thor, Jetson Orin Nano, RTX and DGX Spark.

Why does this matter to you? Because it lets you run conversational agents and "agentic" behaviors close to your data, with lower latency, better privacy guarantees and reduced inference costs.

Rendimiento y benchmarks clave

NVIDIA reports top-of-class results for several relevant metrics:

Instruction following: state of the art in its class (IFBench, IFEval).
Gaming agency / intelligence (Orak): also leading at its size, evaluated on tactical games like Super Mario, Darkest Dungeon and Stardew Valley.

Axis	Parent 9B v2	Nemotron 3 Nano 4B
Depth	56 layers (27 Mamba, 4 Attention, 25 MLP)	42 layers (21 Mamba, 4 Attention, 17 MLP)
Mamba heads	128	96
FFN intermediate dim	15680	12544
Embedding dim	4480	3136

Qué es Nemotron 3 Nano 4B

Rendimiento y benchmarks clave

Cómo se comprimió y por qué es distinto

Etapas de recuperación y post-entrenamiento

Cuantización y despliegue en dispositivos

Dónde encaja y casos de uso prácticos

Recomendaciones rápidas si vas a probarlo

Fuente original

Stay up to date!

Nemotron 3 Nano 4B: Compact AI optimized for the edge