Tolan is a conversational assistant designed to be spoken to, not typed to. Imagine an animated, personalized character you talk to openly, one that learns with you as conversations unfold. It’s not a quick-answer chatbot: it’s a continuous dialogue experience built to sustain long, shifting conversations.
What is Tolan and why does it matter?
Tolan was created by Portola, a team with startup experience that decided voice was the next big frontier after the ChatGPT boom. Why voice? Because speaking demands low latency, dynamic context handling, and a consistent personality. That makes it harder — but also more natural and exploratory than text.
Quinten Farmer, cofounder and CEO of Portola, explains that with ChatGPT they saw the opportunity: voice was going to be next, but it required solving different problems than text.
Tolan focuses on keeping a recognizable personality, adapting when topics shift mid-sentence, and responding instantly so conversations don’t feel mechanical. Have you ever been in a call where the assistant lags or forgets who you are? Tolan aims to avoid exactly that.
How they leverage GPT-5.1 and the Responses API
The launch of GPT-5.1 was pivotal for Tolan. Portola needed three things: better steerability (the ability to follow tone and personality instructions), low latency, and consistency over long conversations. GPT-5.1 and the Responses API delivered enough improvements to combine those elements.
- Latency: introducing GPT-5.1 and the
Responses APIcut speech-start time by more than 0.7 seconds — a perceptible difference in conversational flow. - Steerability: chained instructions — tone schemes, character traits, and memory nudges — began to be followed faithfully, reducing personality drift.
Voice-oriented architecture
Tolan doesn’t use the classic approach of caching prompts across turns. Each turn rebuilds the context from scratch, combining:
- a summary of recent messages,
- a person card that defines core traits,
- memories retrieved via vectors,
- tone guides and real-time signals from the app.
This reconstruction lets the system adapt instantly when you change topic, without relying on huge prompts that end up fragile.
Memory and fast retrieval
Memories are embedded with the model text-embedding-3-large and stored in a vector store called Turbopuffer, capable of sub-50 ms searches. That speed is essential for real-time voice interactions.
Each turn can trigger memory retrievals using the user’s last phrase and questions synthesized by the system, for example "Who is the user married to?" They also run nightly compression jobs to remove low-value memories and resolve contradictions, so memory doesn’t turn into noise.
Personality and tone
Every Tolan starts with a character scaffold created by a science-fiction writer and tuned by a behavioral researcher. A parallel system evaluates the emotional tenor of the conversation and adjusts delivery: it can shift from playful to more serious without losing coherence.
Results and metrics
The improvements with GPT-5.1 translated into real metrics:
- Fewer memory failures, with a 30% drop in product frustration signals.
- Over 20% increase in next-day retention after activating the new, GPT-5.1–driven personas.
- Since its launch in February 2025, Tolan has exceeded 200,000 monthly active users, holds a 4.8-star rating, and has more than 100,000 App Store reviews.
One user comment sums it up: they remember things you said days ago and bring them back into today’s conversation. That’s exactly what a voice AI aiming to feel alive and connected should do.
Principles and lessons for building for voice
Portola shares clear lessons useful to any team building voice interfaces:
- Design for conversational volatility: people change topics mid-sentence.
- Treat latency as part of the product: the difference between 0.3 s and 1 s changes how the agent is perceived.
- Build memory as a retrieval system, not a giant transcript: compression and fast search are worth more than massive contexts.
- Rebuild context each turn: regenerating context keeps the agent anchored as the conversation drifts.
Those rules aren’t just technical; they’re product decisions that affect whether a voice experience feels human or artificial.
Where Tolan is headed
Portola plans to refine memory compression, improve retrieval logic, and expand persona tuning. The long-term goal is to make voice a truly multimodal interface, where voice, vision, and context integrate into a single steerable system.
What’s next? Voice agents that not only respond fast but understand broad context and act coherently over time.
Tolan shows voice isn’t just a layer on top of text: it’s a new way to design personality, memory, and latency as a whole.
