Phonely Sets New Benchmark for AI Phone Support with Lightning-Fast Model Inference Through Maitai and Groq®

A case study on how multi-LoRA support on accelerated compute is upgrading conversational AI

Introduction

Phonely has partnered with Maitai and Groq to enhance the speed and accuracy of its AI phone support agents. By leveraging the new GroqCloud™ capability to hotswap Low-ranked Adaption (LoRA) at inference, Maitai provides model inference that enables Phonely to improve real-time response performance, addressing a key challenge in conversational AI.

The Challenge: Breaking Through the Limits of Closed-Source

As a company delivering real-time AI phone support, Phonely’s success depends on how quickly and naturally its agents can interact with humans. While large closed-source general-purpose models like GPT-4o offered high-quality outputs, Phonely faced growing limitations in several key areas:

Latency and Performance Bottlenecks: Slow time to first token (TTFT) and long completion times consistently hindered the flow of conversation between AI agents and human callers. Even minor delays disrupted live, spoken dialogue.
Limited Control Over Model Improvements: Phonely relied on its model providers’ release schedules and feature updates, with no ability to incorporate its own data into the model efficiently.
Accuracy Ceiling: While previous closed-source models reached relatively high accuracy, their generalized nature imposed a hard accuracy limit. To continue advancing product quality, Phonely needed to leverage its own data to iteratively improve.

In short, Phonely had outgrown the "off-the-shelf" approach to Large Language Models (LLMs). A solution was needed that delivered speed, precision, and control—without compromising response quality.

The Solution: LoRAs on GroqCloud, Built by Maitai

By adopting Maitai’s platform, Phonely transitioned from closed-source general-purpose models to custom open-source models, hosted on GroqCloud, powered by its purpose-built AI inference chip, the Groq LPU.This advancement is multi-faceted.

The Breakthrough with Groq

Groq worked with Maitai to build multi-LoRA support on their AI inference infrastructure, enabling one instance to support dozens of LoRAs with the ability to hotswap at inference with no added latency. This allows efficient, scalable hosting of fine-tuned models and opens new possibilities for enterprises requiring high-performance AI.

Maitai’s Advantage

Custom-built models powered byultra-fast compute enable unprecedented inference speeds, enhanced accuracy, and the ability to scale efficiently:

TTFT (P90) reduced by 73.4%
Completion time (P90) reduced by 74.6%
Accuracy improved from 81.5% to 99.2% across four model iterations, surpassing GPT-4o by 4.5%
Seamless scaling from 10 to 30,000 requests/min with no additional perceived latency

The Results: A New Standard for AI Phone Support

Phonely’s AI phone agents now operate in real time, delivering instantaneous, natural responses that improve customer satisfaction. With the ability to leverage their own data, Phonely can now iterate more efficiently and progress toward near-perfect accuracy. With Maitai, Phonely can serve distinct custom models tailored to each enterprise customer, enabling significant enhancements in agent performance through specialized fine-tuning.

Checkpoint	Model	TTFT - P90	Completion Time - P90	Accuracy
Legacy	GPT-4o	661 ms	1446 ms	94.7%
Switch to Maitai	Maitai m0	186 ms	316 ms	81.5%
1st Iteration	Maitai m1	189 ms	378 ms	91.8%
2nd Iteration	Maitai m2	176 ms	342 ms	95.2%
3rd Iteration	Maitai m3	179 ms	339 ms	99.2%

Since Since adopting Maitai’s inference platform, Phonely has observed dramatic gains in responsiveness and model quality, as shown in the table below:"Through Maitai, our customers are able to get access to custom fine-tuned models running on the fastest infrastructure in a matter of minutes, not months. This has allowed enterprises running on Phonely to scale to tens of thousands of calls per day with lower latency and higher accuracy than any closed-source model." — Will Bodewes, CEO, Phonely

Industry Impact: What This Means for Enterprise

This collaboration illustrates a practical and scalable path for enterprises building with LLMs. Groq’s ability to host fine-tuned models with zero-latency LoRA hotswapping on GroqCloud, coupled with Maitai’s proxy-layer orchestration and iterative model improvement, gives companies a viable way to improve performance and customize output - without needing to manage infrastructure or training.

"This partnership shows how Maitai and Groq are giving enterprises a real edge. By combining our orchestration with zero-latency LoRA hotswapping via GroqCloud, we make it easy to run fine-tuned models at high speed and scale. Our customers get faster, more accurate, continuously improving models, without all the overhead." — Christian Dal Santo, CEO, Maitai

About Phonely

Phonely provides AI-powered phone support agents for industries requiring fast, reliable, and human-like AI interactions. Its AI solutions reduce wait times, improve customer experiences, and enable seamless automated conversations.

Contact: sales@phonely.ai
Visit: phonely.ai

About Maitai

Maitai provides reliable, fast, and optimized LLM inference for enterprise companies. Acting as a proxy and iteratively improving the model stack passively, Maitai enables businesses to build AI agents without all the overhead or complexity.

Contact: sales@trymaitai.ai
Visit: trymaitai.ai

About Groq

Groq is the AI inference platform redefining price performance. Its custom-built LPU and cloud have been specifically designed to run powerful models instantly, reliably, and at the lowest cost per token—without compromise. Over 1.6 million developers and Fortune 500 companies trust Groq to build fast and scale smarter.

Contact: pr-media@groq.com
Build on Groq: console.groq.com