Engineering with agents: Codex creates product without code

Feb 11, 2026Keryc Díaz4 minutes

Five months ago a team decided to try something that sounds like science fiction but is very real: build and launch a software product with zero lines of code written by humans. Everything — from application logic to documentation and infrastructure — was generated by agents guided by Codex.

What they did

They opened an empty repository in August 2025 and, in five months, let Codex and a set of agents fill it. The result: around a million lines of code, 1,500 merged pull requests, and an internal user base that uses the app daily.

The key difference isn’t that the AI is magical. It’s that engineering changed tasks: humans stopped typing implementations and started designing environments, specifying intentions, and building the feedback loops that let agents do reliable work.

How the agents worked (and what that means)

The agents wrote the initial scaffold: repo structure, CI, formatting rules and even an AGENTS.md that explains how to operate in the repository.
Engineers interact with the system via prompts. You describe a task, the agent opens a pull request, reviews its own change, responds to feedback from other agents and humans, and iterates until the task is complete.
Many reviews happen agent-to-agent. Humans review when necessary, but they aren’t the bottleneck.

Can you imagine going to sleep and an agent works for six hours on a task, reproduces a bug, applies the fix and produces video evidence? That happened here.

Changes in the engineering discipline

With agents up front, the discipline doesn’t disappear: it shifts. It’s less about writing perfect lines and more about creating scaffolding, rules and tools the agent understands. Some practices they adopted:

Structured documentation in docs/ and a short AGENTS.md that acts as a map. The rest is cataloged and versioned so agents can find it.
Linters and structural tests that not only detect errors but offer remediation instructions readable by agents.
Domain-driven rigid architecture with defined layers so the agent doesn’t invent erratic dependencies.
Observability and IA-readable UIs: logs, metrics and snapshots the agent can query with LogQL and PromQL to validate behaviors.

In practice they had to prioritize what’s accessible to the agent inside the repo. What lives in Google Docs or chats doesn’t exist for the AI unless you put it in the repository.

Practical lessons and simple rules

Give the agent a map, not a 1,000-page manual.

Context is a limited resource. One giant file confuses more than it helps. Better: small, linked, and verifiable entries. Some useful rules:

Short documentation as a landing point and versioned sources of truth.
Executable, verifiable plans inside the repo so the agent knows what to change and why.
Enforce invariants, not micromanagement: define clear boundaries (for example, validations at the edges) and leave freedom in implementation.

Also, the boring and predictable parts of technology tend to be better for agents: they’re easier to model and less prone to opaque behavior.

Quality control and automated technical debt

When agent output scales, human review capacity becomes the bottleneck. The answer was automated cleanup and coherence:

Encoded “golden” principles an agent can apply consistently.
Cleanup agents that detect bad patterns, open refactor PRs and allow autocommit on minor changes.
Architectural rules and lints that inject remediation instructions into the agent’s context.

Technical debt was paid in small daily installments instead of large monthly cleanups. That reduces the interest you pay on bad design.

Risks, limits and what they still don’t know

Not everything is perfect or automatically generalizable. Important limits:

This repository was designed from the start to be agent-readable. Not all legacy code adapts the same way without investment.
Agents replicate existing patterns, even bad ones. Without firm principles, code quality can degrade.
Open questions remain about maintaining long-term architectural coherence and when human intervention provides bigger leverage.

What can you do tomorrow?

If someone on your team wants to experiment with agents to speed up engineering, think first about these investments:

Structure the repo as a versioned source of truth.
Define clear boundaries and structural tests agents can run.
Automate IA-readable observability: logs, metrics and snapshots.
Start by encapsulating dependencies and critical behaviors with tests and contracts.

It’s not magic: it’s engineering with a new palette of tools. If you organize the ground well, the main benefit is multiplying human time, not replacing it.

This experiment’s experience shows speed is possible, but it requires discipline to keep coherence and quality. In practice, building with agents is less about the code and more about the paths you let the AI travel.

Original source

https://openai.com/index/harness-engineering

Stay up to date!

Get AI news, tool launches, and innovative products straight to your inbox. Everything clear and useful.