Retell AI adds A/B testing for phone calls, letting you split traffic across agent configs and measure performance on live inbound support calls.

Run experiments on live inbound calls and measure real agent performance before scaling changes.
Signal analysis
Here at industry sources, we tracked Retell AI's latest release: A/B testing for phone calls. The feature lets you split inbound call traffic by percentage across multiple agent configurations, running parallel experiments on live production calls. Instead of testing new prompts in isolation, you can now route 20% of calls to an experimental agent while 80% hit your baseline, then compare conversion rates, resolution times, and customer satisfaction metrics side-by-side.
This is straightforward infrastructure. You define agent variants (different system prompts, voice settings, tool configurations), set traffic percentages, and Retell handles the routing and logging. The platform captures call recordings, transcripts, and custom metrics for each variant, giving you structured data to make decisions.
Phone call AI has a credibility problem: labs don't match production. Your agent performs well in synthetic tests, then fails on real customer calls because of edge cases, accents, background noise, or unexpected customer requests. A/B testing fixes this by letting you validate changes on real traffic before full rollout.
For support teams using Retell, this removes the binary choice between 'deploy and hope' or 'never iterate.' You can test prompt changes, new tools, different voice profiles, or even different system personalities on 10-20% of calls and measure the actual impact within days. This shifts control to operators instead of leaving them guessing whether a new agent is better or worse.
The competitive angle is clear: Retell is positioning itself as a production-grade tool for serious voice AI work. If you're building customer-facing phone systems, this feature acknowledges that iteration is non-negotiable and provides the mechanism to do it safely.
If you're running a support operation on Retell, map out your top three agent pain points right now. Is your agent failing on specific customer types? Does it handle refunds poorly? Is the tone off? Pick one, write a better prompt variant, and set up an A/B test with 15% traffic. Run it for 500 calls over a week and measure the specific KPI tied to that problem. You need a control-variant comparison on real data before you scale.
Structurally, this means defining your metrics up front. What does 'better' mean in your use case? Faster resolution? Higher customer satisfaction? Lower escalations? Retell gives you the data collection, but you need to decide what you're actually optimizing for. Build that measurement framework before you run experiments, or you'll end up with data that doesn't answer your questions.
For teams shipping phone AI products, A/B testing becomes a core feature you need to expose to your customers. Retell is setting the expectation that voice agents should be iterable. If you're building on top of voice APIs, you'll need equivalent testing infrastructure or lose credibility. The momentum in this space continues to accelerate.
Best use cases
Open the scenarios below to see where this shift creates the clearest practical advantage.
One concise email with the releases, workflow changes, and AI dev moves worth paying attention to.
More updates in the same lane.
The latest Cursor update enhances AI tool integration, streamlining developer workflows and increasing productivity.
Unlock new productivity with the latest Cursor update, featuring enhanced AI tools for developers.
OpenAI's recent update introduces enhanced features that streamline developer workflows and boost automation capabilities.