Gemma 4 ejecuta VLA en Jetson Orin Nano Super

Cargando...

Gemma 4 ejecuta VLA en Jetson Orin Nano Super | Keryc

sudo apt update
sudo apt install -y git build-essential cmake curl wget pkg-config \
  python3-pip python3-venv python3-dev \
  alsa-utils pulseaudio-utils v4l-utils psmisc \
  ffmpeg libsndfile1

python3 -m venv .venv
source .venv/bin/activate
pip install --upgrade pip
pip install opencv-python-headless onnx_asr kokoro-onnx soundfile huggingface-hub numpy

sudo fallocate -l 8G /swapfile
sudo chmod 600 /swapfile
sudo mkswap /swapfile
sudo swapon /swapfile
echo '/swapfile none swap sw 0 0' | sudo tee -a /etc/fstab

cd ~
git clone https://github.com/ggml-org/llama.cpp.git
cd llama.cpp
cmake -B build \
  -DGGML_CUDA=ON \
  -DCMAKE_CUDA_ARCHITECTURES="87" \
  -DGGML_NATIVE=ON \
  -DCMAKE_BUILD_TYPE=Release
cmake --build build --config Release -j4

mkdir -p ~/models && cd ~/models
wget -O gemma-4-E2B-it-Q4_K_M.gguf \
  https://huggingface.co/unsloth/gemma-4-E2B-it-GGUF/resolve/main/gemma-4-E2B-it-Q4_K_M.gguf
wget -O mmproj-gemma4-e2b-f16.gguf \
  https://huggingface.co/ggml-org/gemma-4-E2B-it-GGUF/resolve/main/mmproj-gemma4-e2b-f16.gguf

~/llama.cpp/build/bin/llama-server \
  -m ~/models/gemma-4-E2B-it-Q4_K_M.gguf \
  --mmproj ~/models/mmproj-gemma4-e2b-f16.gguf \
  -c 2048 \
  --image-min-tokens 70 --image-max-tokens 70 \
  --ubatch-size 512 --batch-size 512 \
  --host 0.0.0.0 --port 8080 \
  -ngl 99 --flash-attn on \
  --no-mmproj-offload --jinja -np 1

curl -s http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"gemma4","messages":[{"role":"user","content":"Hi!"}],"max_tokens":32}' \
  | python3 -m json.tool

git clone https://github.com/asierarranz/Google_Gemma.git
cd Google_Gemma/Gemma4

wget https://raw.githubusercontent.com/asierarranz/Google_Gemma/main/Gemma4/Gemma4_vla.py

export MIC_DEVICE="plughw:3,0"
export SPK_DEVICE="alsa_output.usb-Generic_USB2.0_Device_20130100ph0-00.analog-stereo"
export WEBCAM=0
export VOICE="af_jessica"
source .venv/bin/activate
python3 Gemma4_vla.py

python3 Gemma4_vla.py --text

{
  "name": "look_and_answer",
  "description": "Take a photo with the webcam and analyze what is visible."
}

sudo docker run -it --rm --pull always \
  --runtime=nvidia --network host \
  -v $HOME/.cache/huggingface:/root/.cache/huggingface \
  ghcr.io/nvidia-ai-iot/llama_cpp:latest-jetson-orin \
  llama-server -hf unsloth/gemma-4-E2B-it-GGUF:Q4_K_S

Qué hace esta demo VLA

Hardware y software que se usó

Preparación del sistema (resumen técnico)

Swap y limpieza de procesos (para evitar OOM)

Modelos, quantizaciones y por qué elegir Q4_K_M

Build nativo de llama.cpp y lanzamiento de llama-server

Demo Python: Gemma4_vla.py y flujo completo

Docker: opción más rápida pero limitada

Problemas comunes y soluciones rápidas

Un par de detalles finales para que funcione bien

Fuente original

¡Mantente al día!

Gemma 4 ejecuta VLA en Jetson Orin Nano Super