Kaggle lets you create AI benchmarks from your PC

Jun 4, 2026Keryc Díaz3 minutes

Kaggle Benchmarks now lets you design and test AI evaluations directly from your local environment, without relying solely on the web notebook editor. Sounds like a small change? It has real impact: it speeds up the idea-to-proof cycle and puts tools in the hands of the people who use models every day.

What changes with local development

Until now, creating evaluation tasks on Kaggle meant working in its web notebook editor. Great if you live there, but awkward if your workflow is VSCode, Cursor, Antigravity, or code agents.

With the new local development feature you can create, validate, push, run, and download tasks from your machine. In short, the infrastructure adapts to your stack, not the other way around. Less friction, more experimentation.

How it works in simple terms

The integration uses the kaggle-benchmarks SDK and new commands in the Kaggle CLI. You don’t need to be an expert: you can write the evaluation in natural language and let a coding agent transform it into a task ready for Kaggle.

Kaggle provides a skill called write-kaggle-benchmarks, which is a set of structured instructions to teach an agent how to create tasks. To add it to your agent, just ask it to install the skill from the Kaggle Skills repository.

Quick example: ask the agent to generate a task that checks whether 300+140=460 is correct. The agent uses the skill, builds the structure with the SDK, and uploads it as an executable task in your benchmark.

Why community evaluations matter

Kaggle Benchmarks was born to democratize reliable, transparent evaluations. The community has already produced over 10,000 tasks, and those tasks feed public leaderboards that let you compare models with clear signals.

Why does that matter? When a capability can be measured objectively, AI teams can focus on improving it. And when evaluations reflect a diversity of real cases, models stop optimizing for artificial scenarios and start solving problems that actually matter.

Public and well-designed evaluations push research toward useful improvements, not just toward metrics accidentally optimized.

What you can try today

Install the Kaggle CLI and explore the new Benchmarks commands.
Add the write-kaggle-benchmarks skill to your agent from the Kaggle Skills repository.
Describe the evaluation idea you want in natural language and let the agent generate the task.
Validate locally, run it, and upload so your task can be added to the public leaderboard.

If you work on products that integrate models, are in research, or simply enjoy evaluation challenges, this will make you more productive. It’s a practical way to build quality signals others can use to improve their models.

The novelty isn’t just technical: it’s social. Letting anyone, anywhere design evaluations changes who gets to define what it means for a model to be good.

Source

https://blog.google/innovation-and-ai/technology/developers-tools/build-kaggle--benchmarks-locally

Stay up to date!

Get AI news, tool launches, and innovative products straight to your inbox. Everything clear and useful.

What changes with local development

Until now, creating evaluation tasks on Kaggle meant working in its web notebook editor. Great if you live there, but awkward if your workflow is VSCode, Cursor, Antigravity, or code agents.

How it works in simple terms

Why community evaluations matter

Public and well-designed evaluations push research toward useful improvements, not just toward metrics accidentally optimized.

What you can try today

Install the Kaggle CLI and explore the new Benchmarks commands.

Add the write-kaggle-benchmarks skill to your agent from the Kaggle Skills repository.

Describe the evaluation idea you want in natural language and let the agent generate the task.

Validate locally, run it, and upload so your task can be added to the public leaderboard.

The novelty isn’t just technical: it’s social. Letting anyone, anywhere design evaluations changes who gets to define what it means for a model to be good.