PostgreSQL, that database system that sometimes sounds like a museum piece, is at the heart of ChatGPT and OpenAI's API. Surprised? I was too. OpenAI managed to support hundreds of millions of users with a single primary and nearly 50 read replicas, but it wasn't by accident: there were optimizations, rigorous engineering and hard practical decisions.
What happened and why it matters
In one year the load on PostgreSQL grew more than 10x. OpenAI needed to sustain millions of queries per second for 800 million users. The main strategy: keep a single-node primary for writes and offload reads to replicas. Why not split everything from the start? Because sharding PostgreSQL for existing applications means changing hundreds of endpoints and can take months or years.
But working with a single primary brings risks: write bursts, heavy queries and cache failures can saturate the primary and trigger retry cycles that make things worse. The story shows that with engineering and prudence, PostgreSQL can scale much farther than many thought, especially for read-dominated workloads.
