Wayfair moved from small tests in 2024 to running OpenAI models in production to improve catalog quality and speed up supplier support. What was the result? Less manual work, faster decisions, and better data across millions of products.
How Wayfair integrated OpenAI into critical operations
The company didn’t treat AI as an isolated experiment — they made it part of the core operational flow. They started where complexity and scale were highest: product attribute classification and routing supplier requests.
Wayfair manages about 30 million products and roughly 47,000 distinct tags. Having consistent attributes like color, material, or size is key for search, recommendations, and merchandising. Before, improving tags relied on suppliers or customers reporting errors; that was slow and insufficient.
To scale, they moved away from per-tag bespoke models and built a tag-agnostic system using a single OpenAI model. A definition agent gathers public and internal definitions to produce the context for each tag. That layer of meaning, combined with product data, feeds a framework that classifies attributes across product classes.
Measurable results in the catalog
The system already ran in production over more than 1 million products and corrected 2.5 million tags on the most visible and purchased items. An A/B experiment showed significant increases in impressions, clicks, and search page ranking for the treated products.
The team is also expanding attribute coverage at a speed 70 times higher than a year ago. Most importantly: when model confidence is high, systems can overwrite data and notify the supplier; when it’s not, they ask for human confirmation.
Supplier support: triage and agentic flows
Historically, every incoming ticket was reviewed by an associate who had to identify intent and route it. That took time and was error-prone. Wayfair built Wilma, an agentic layer that improves that flow.
One of the first production features was automatic triage: the model reads the request, fills in missing context, and routes the ticket to the right team. Wilma went from prototype to live in about a month because it relied on a platform already integrated with the OpenAI APIs.
Beyond triage, they deployed a dozen agentic flows for specific resolution teams. For example, a co-pilot for replacement parts operations synthesizes the case history, proposes next steps, and drafts responses that a human reviews.
Human control, auditing and trust
Wayfair didn’t hand over critical decisions without control. They put in place practical audits where associates inspect samples physically to validate model outputs and work with suppliers to confirm changes when a tag is high risk.
They use a metric called alignment rate to measure how often the model’s recommendations match the final human decision. When each team hits a set threshold, some flows can move from assistant mode to semi-autonomous. It’s a gradual transition that protects quality while scaling.
Operational and satisfaction impact
The numbers speak: in support, triage and agentic flows automate 41,000 tickets a month, with increases up to 70% in certain processes and reduced response times. In the catalog, they expect to quadruple the impact of corrections in the next six months.
Operationally, teams notice:
Faster routing and resolution
Higher supplier satisfaction
Less manual data entry and classification work
Broader coverage without needing experts in every area
More confidence in attributes before publishing
Wayfair also rolled out more than 1,200 seats of ChatGPT Enterprise across their 12,000 employees for ad hoc tasks and experimentation.
Practical lessons and where they’re headed
Wayfair’s story shows that AI works best when integrated with processes, human controls, and external validation. The key? Not replacing people but amplifying their ability to see context and make more informed decisions.
Advances in multimodal models matter for home retail because products are visual and subjective. Natural language and multimodal capabilities help close the gap between what the customer searches for and how they describe it.
For Wayfair, the partnership with OpenAI was more than access to models: it was guidance on model selection, deployment practices, and internal adoption. That moved slow pilots into production services that already impact the customer experience.
In the end, the bet is clear: scale quality and speed without losing human control. Isn’t that exactly what you expect when choosing between two sofas online?