ITBench-AA: AI models perform below 50% on SRE tasks

May 27, 2026Keryc Díaz1 minutes

IBM Research and Artificial Analysis present ITBench-AA, the first benchmark focused on agentic IT tasks for enterprise environments. It starts with Site Reliability Engineering (SRE) tasks on Kubernetes snapshots and shows that frontier models still have limited performance: none exceed 50%.

Does that mean AI is useless for ops? Not at all. It means the tasks are hard in a specific way — you need minimal, evidence-based answers, not long lists of guesses. ITBench-AA is designed to test exactly that.

Stay up to date!

Get AI news, tool launches, and innovative products straight to your inbox. Everything clear and useful.

ITBench-AA: AI models perform below 50% on SRE tasks

May 27, 2026Keryc Díaz1 minutes

Stay up to date!

Get AI news, tool launches, and innovative products straight to your inbox. Everything clear and useful.