In August 2024 Anthropic did something both fun and revealing: they put teams of employees to work with a robodog to retrieve a beach ball. One team used the Claude model; the other relied only on the internet and their own ingenuity. Now, in phase two, they came back with Claude Opus 4.7 to see how much had changed. Curious what happened? The models moved fast, and the implications are both interesting and practical.
What the experiment did (Phase Two)
The original version asked participants to complete several steps: operate the robodog with the manufacturer's controller, connect to video and lidar, write and run a program for manual control, monitor the robot's trajectory, detect the ball and finally achieve autonomous retrieval.
In the autonomous update they didn't ask the model to use a physical controller. Instead they ran three trials of Opus 4.7 in Claude Code with adaptive thinking and the effort parameter set to maximum. The human role was limited to plugging in the laptop, entering the initial prompt, approving commands, and allowing the model to move to the next step.
Measurement: they measured elapsed time per objective and evaluated qualitatively the success of each task.
Key results and technical metrics
In every task where at least one human team completed it in phase one, Opus 4.7 finished the same task at least 10 times faster.
If we take the four tasks that both human teams completed in the original phase, Opus 4.7 was on average more than 37 times faster than the team without Claude (Team Claude-less) and more than 18 times faster than the team that did use Claude.
Overall, in tasks repeated from 2024 to 2026, Opus 4.7 was approximately 20 times faster than the fastest human team.
It produced almost 10 times less code than Team Claude while achieving equal or better success on those tasks.
These numbers aren’t empty marketing: they come from repeated tests (three trials) and from keeping human intervention to the minimum to isolate the model’s capability.
What worked well
Interface and discovery: where humans hesitated between several approaches to access the dog’s sensors, Opus 4.7 quickly identified the most effective route.
Effective code on the first try: much of the code generated by the model worked on the first attempt, reducing manual iterations.
Operational robustness: despite defaulting to an outdated object-detection algorithm in some cases, the model adapted the flow and found effective solutions.
Consistency: low variance in execution times across completed attempts, which suggests that within its operational envelope the behavior is reliable.
Technical limitations — what Claude didn’t solve
Fine closed-loop control: moving the ball precisely requires fast perception, estimating the error between command and result, and adjusting inputs to correct the trajectory. There Opus 4.7 still struggles: it positions the robot behind the ball, but the movements are too uncontrolled to achieve a precise fetch.
Low-level actuator policies weren’t evaluated: designing a specific actuation policy to stabilize physical interaction remains outside the scope of these tests.
Dependence on existing components: the model used available algorithms and APIs, which helps in practical tasks but limits performance if those components are suboptimal.
It didn't fully replace the need for experts: a more experienced robotics researcher succeeded in programming autonomous retrieval. With more time and scaffolding, current models might reproduce that, but it's not automatic yet.
Why this matters (technically)
This shows a dynamic we've seen in software: 1) models boost non-experts; 2) humans and models collaborate; 3) models end up executing concrete tasks on their own. What's new is that this progression now appears in the physical world.
Technically, these improvements didn't come from robot-specific tweaks, but from the model's general increase in capability (scaling). That suggests emergent abilities for manipulation and use of physical tools can appear without targeted training, provided software-hardware interfaces exist.
Put another way: the availability of APIs, drivers and diagnostic tools turns large language models into agents capable of orchestrating physical systems for bounded tasks.
Risks, opportunities and recommendations for researchers
Opportunities: speed up robotic prototypes, reduce hardware-software integration time, democratize access to robots for non-experts. Imagine a maintenance crew using a model to integrate sensors and run basic tasks without an expert on site.
Risks: rapid automation can scale vulnerabilities. Anthropic points to parallels with "N-days" in cybersecurity: automating exploitation or recon can be easy if you have models that find and exploit interfaces.
Practical recommendations:
For robotics teams: document and standardize APIs and instrumentation points; models work better when interfaces are clear.
For security evaluators: red-team scenarios where the model automates interaction with exposed hardware.
For LLM developers: investigate integrating closed-loop control and actuator policies, perhaps via fine-tuning on simulator/real data and hybrid reinforcement learning.
Looking ahead
We’re in an early era of what Anthropic calls "physical agentic AI": models that use existing physical tools for concrete purposes. That doesn't mean they’ve solved all of robotics; it means the gap between orchestrating systems and manipulating them finely is shrinking fast.
More research is needed on low-level policies, safety, and how models can design or adapt hardware for new tasks. But beware: capabilities that today help you integrate sensors might tomorrow enable automations that require firmer regulatory and security attention.
Summary: Anthropic repeated Project Fetch with Claude Opus 4.7 and found that the model, running solo in Claude Code with maximum effort, completed integration and remote-control tasks between 10 and 37 times faster than previous human teams. Still, it failed at fine closed-loop control needed to retrieve a ball precisely, marking clear limits and future directions for research and security.
Stay up to date!
Get AI news, tool launches, and innovative products straight to your inbox. Everything clear and useful.
Project Fetch: Claude Opus 4.7 accelerates robotics