Jun 2, 2025
A case study on how multi-LoRA support on accelerated compute is upgrading conversational AI
Introduction
Phonely has partnered with Maitai and Groq to enhance the speed and accuracy of its AI phone support agents. By leveraging the new GroqCloud™ capability to hotswap Low-ranked Adaption (LoRA) at inference, Maitai provides model inference that enables Phonely to improve real-time response performance, addressing a key challenge in conversational AI.
The Challenge: Breaking Through the Limits of Closed-Source
As a company delivering real-time AI phone support, Phonely’s success depends on how quickly and naturally its agents can interact with humans. While large closed-source general-purpose models like GPT-4o offered high-quality outputs, Phonely faced growing limitations in several key areas:
Latency and Performance Bottlenecks: Slow time to first token (TTFT) and long completion times consistently hindered the flow of conversation between AI agents and human callers. Even minor delays disrupted live, spoken dialogue.
Limited Control Over Model Improvements: Phonely relied on its model providers’ release schedules and feature updates, with no ability to incorporate its own data into the model efficiently.
Accuracy Ceiling: While previous closed-source models reached relatively high accuracy, their generalized nature imposed a hard accuracy limit. To continue advancing product quality, Phonely needed to leverage its own data to iteratively improve.
In short, Phonely had outgrown the "off-the-shelf" approach to Large Language Models (LLMs). A solution was needed that delivered speed, precision, and control—without compromising response quality.
The Solution: LoRAs on GroqCloud, Built by Maitai
By adopting Maitai’s platform, Phonely transitioned from closed-source general-purpose models to custom open-source models, hosted on GroqCloud, powered by its purpose-built AI inference chip, the Groq LPU.This advancement is multi-faceted.
The Breakthrough with Groq
Groq worked with Maitai to build multi-LoRA support on their AI inference infrastructure, enabling one instance to support dozens of LoRAs with the ability to hotswap at inference with no added latency. This allows efficient, scalable hosting of fine-tuned models and opens new possibilities for enterprises requiring high-performance AI.
Maitai’s Advantage
Custom-built models powered byultra-fast compute enable unprecedented inference speeds, enhanced accuracy, and the ability to scale efficiently:
TTFT (P90) reduced by 73.4%
Completion time (P90) reduced by 74.6%
Accuracy improved from 81.5% to 99.2% across four model iterations, surpassing GPT-4o by 4.5%
Seamless scaling from 10 to 30,000 requests/min with no additional perceived latency
The Results: A New Standard for AI Phone Support
Phonely’s AI phone agents now operate in real time, delivering instantaneous, natural responses that improve customer satisfaction. With the ability to leverage their own data, Phonely can now iterate more efficiently and progress toward near-perfect accuracy. With Maitai, Phonely can serve distinct custom models tailored to each enterprise customer, enabling significant enhancements in agent performance through specialized fine-tuning.
Checkpoint | Model | TTFT - P90 | Completion Time - P90 | Accuracy |
Legacy | GPT-4o | 661 ms | 1446 ms | 94.7% |
Switch to Maitai | Maitai m0 | 186 ms | 316 ms | 81.5% |
1st Iteration | Maitai m1 | 189 ms | 378 ms | 91.8% |
2nd Iteration | Maitai m2 | 176 ms | 342 ms | 95.2% |
3rd Iteration | Maitai m3 | 179 ms | 339 ms | 99.2% |
Since Since adopting Maitai’s inference platform, Phonely has observed dramatic gains in responsiveness and model quality, as shown in the table below:"Through Maitai, our customers are able to get access to custom fine-tuned models running on the fastest infrastructure in a matter of minutes, not months. This has allowed enterprises running on Phonely to scale to tens of thousands of calls per day with lower latency and higher accuracy than any closed-source model." — Will Bodewes, CEO, Phonely
Industry Impact: What This Means for Enterprise
This collaboration illustrates a practical and scalable path for enterprises building with LLMs. Groq’s ability to host fine-tuned models with zero-latency LoRA hotswapping on GroqCloud, coupled with Maitai’s proxy-layer orchestration and iterative model improvement, gives companies a viable way to improve performance and customize output - without needing to manage infrastructure or training.
"This partnership shows how Maitai and Groq are giving enterprises a real edge. By combining our orchestration with zero-latency LoRA hotswapping via GroqCloud, we make it easy to run fine-tuned models at high speed and scale. Our customers get faster, more accurate, continuously improving models, without all the overhead." — Christian Dal Santo, CEO, Maitai
About Phonely
Phonely provides AI-powered phone support agents for industries requiring fast, reliable, and human-like AI interactions. Its AI solutions reduce wait times, improve customer experiences, and enable seamless automated conversations.
Contact: sales@phonely.ai
Visit: phonely.ai
About Maitai
Maitai provides reliable, fast, and optimized LLM inference for enterprise companies. Acting as a proxy and iteratively improving the model stack passively, Maitai enables businesses to build AI agents without all the overhead or complexity.
Contact: sales@trymaitai.ai
Visit: trymaitai.ai
About Groq
Groq is the AI inference platform redefining price performance. Its custom-built LPU and cloud have been specifically designed to run powerful models instantly, reliably, and at the lowest cost per token—without compromise. Over 1.6 million developers and Fortune 500 companies trust Groq to build fast and scale smarter.
Contact: pr-media@groq.com
Build on Groq: console.groq.com
Want to learn more about Voice AI?
Will Bodewes
CEO @ Phonely
Scale your calls with AI.
The average customer saves 70% or more answering their Phones with Phonely.
Latest blog posts
Tool and strategies modern teams need to help their companies grow.
View all posts