ScarfBench: benchmark for enterprise Java migrations

AI-assisted modernization sounds like a magic trick: an agent scans your repo and leaves it production-ready. But can it really migrate complex enterprise applications without breaking anything? ScarfBench shows up to answer that with data, not promises.

What is ScarfBench

ScarfBench (Self-Contained Application Refactoring Benchmark) is an open benchmark designed to evaluate code agents on real migration tasks between enterprise Java ecosystems: Spring, Jakarta EE and Quarkus.

It doesn’t stop at comparing source files against a reference. Instead, it requires migrated applications to: compile, deploy, and preserve functional behavior. Why does that matter? Because a useful migration isn’t just pretty code: it’s code that runs in a real environment and does what it’s supposed to.

How ScarfBench evaluates

ScarfBench includes two types of tasks: focused migrations (components, layers) and full-application migrations. It starts from a taxonomy based on JSRs and uses expert-verified migrations to generate implementations for each target framework.

What is ScarfBench

How ScarfBench evaluates

Key results (what they found when putting agents to the test)

Why migrating frameworks is much more than changing annotations

Technical and engineering observations

What ScarfBench brings to the technical community

What this means for your modernization project

Original source

Stay up to date!

ScarfBench: benchmark for enterprise Java migrations