Voximplant now lets developers build full-cascade voice AI pipelines in VoxEngine without sacrificing turn-taking quality. Voximplant now has five new capabilities — Voice Activity Detection (VAD), end-of-turn detection, an OpenAI Chat Completions connector, an OpenAI Responses connector, and Bring your own LLM using OpenAI-compatible API support for third-party LLMs. Combined with extensive Cloud Communications and Voice AI capabilities, Voximplant developers have complete control over every stage of a speech-to-speech pipeline while keeping conversations fast and natural.
Voximplant’s existing Voice AI connectors bundle speech input handling into a single end-to-end integration. That works well when a provider's built-in models cover your language, dialect, and voice requirements. However, when you need a specialized speech-to-text (STT) engine, a specific Large Language Model (LLM), or a particular text-to-speech (TTS) voice, the ability to assemble your own pipeline choosing the best STT, LLM, and TTS components independently becomes critical. With today's release, Voximplant gives you the building blocks to create flexible Voice AI pipelines without sacrificing the conversational quality that makes voice agents feel human.
Highlights
- End-of-turn detection — Understand when a caller has actually finished their thought, even through pauses, filler words like "ahh" and "ummm," and natural speech disfluencies. This prevents the agent from cutting in prematurely and enables rapid back-and-forth without awkward delays or crosstalk.
- Voice Activity Detection (VAD) — Detects when a caller starts and stops speaking so your application knows exactly when to capture audio, route it to an STT engine, or stop recording. Available at no additional cost.
- OpenAI Chat Completions API connector — A native VoxEngine module that connects directly to OpenAI's Chat Completions API. Ideal for developers who already use Chat Completions for text-based bots and want to extend the same LLM configuration to voice without rebuilding their infrastructure.
- OpenAI Responses API connector — A native VoxEngine module for OpenAI's newer Responses API, which supports multi-turn state handling, built-in tools, and WebSocket transport. The WebSocket interface makes it particularly well-suited for long-running, tool-call-heavy voice workflows.
- OpenAI-compatible connectors for third-party LLMs — Because several third-party LLM vendors implement the OpenAI API specification, both the Chat Completions and Responses connectors work with OpenAI-compatible models beyond OpenAI itself — allowing you to “Bring your own LLM” while using a consistent integration pattern.
About Voice AI pipelines
Voximplant's existing Voice AI connectors — for platforms like Grok, Deepgram, Cartesia Line, and others — handle speech input, reasoning, and speech output as a single integrated stream. They are the fastest way to ship a voice agent when the provider's built-in speech and model capabilities match your requirements.
However, many production voice applications need more flexibility. Developers may need a speech-to-text engine that handles a specific dialect or industry vocabulary. You may want to route reasoning through a fine-tuned model that is not available inside an integrated connector. Or you may need a particular TTS voice that matches your brand. A full cascade pipeline lets you choose each component independently: your preferred STT feeds text to your preferred LLM, which feeds text to your preferred TTS — all orchestrated through VoxEngine and connected to phone numbers, SIP trunks, WhatsApp, or WebRTC.
The challenge with cascaded pipelines has always been interactivity. Without integrated speech detection and turn-taking, the result feels robotic — the agent either talks over the caller or waits too long to respond. Today's VAD and end-of-turn detection modules solve this directly inside VoxEngine, so full cascade pipelines can deliver the same natural conversational flow as integrated connectors.
Developer notes
- VAD module — Load with the
Sileromodule. VAD detects voice activity on the audio stream and returns an event after a threshold and minimum silence duration you set is exceeded. You can use this event to trigger actions like starting STT capture or stopping a recording. The interface also includes a speech padding parameter that lets you adjust how aggressively to lip any audio. - Turn detection module — Load with the
Pipecatmodule. Turn detection analyzes speech patterns to determine when a caller has finished speaking and expects a response. It handles variable pauses and speech disfluencies so the agent responds at the right moment. This API currently takes a single threshold parameter. We recommend using the turn taking helper referenced in the example below that integrates VAD with additional timers often needed in production Voice AI applications. - OpenAI Chat Completions client — Load the
OpenAImodule and create a client viaOpenAI.createChatCompletionsClient(). Pass your API key, model, and messages array. The client manages the WebSocket connection and streams completion chunks back to your scenario for TTS playback. - OpenAI Responses client — Load the
OpenAImodule and create a client viaOpenAI.createResponsesClient(). The Responses API supports multi-turn conversation state, built-in tools, and a persistent WebSocket connection — well suited for agentic workflows that require function calling and extended interactions. - Third-party OpenAI-compatible models — Both the Chat Completions and Responses clients accept a custom
baseURLparameter, so you can point them at any LLM provider that implements the OpenAI API specification. This gives you a single integration pattern for multiple model providers. - Combining existing modules — VAD and turn detection work alongside Voximplant's existing STT modules (ASR, Deepgram ASR, etc.) and TTS modules (Cartesia, Inworld, ElevenLabs, etc.). Wire them together in a single VoxEngine scenario to build a complete pipeline.
Demo
See the demo and code walkthrough video below.
Pricing and availability
All five capabilities are generally available and ready for use inside VoxEngine today.
End-of-turn detection is priced at. $0.001 per stream for every 15-seconds of activity (0.4¢/min). We expect to halve this price in the very near future.
Everything else is free from Voximplant.
Voice Activity Detection (VAD) is completely free with no limits.
There is also no Voximplant charge for the OpenAI Chat Completions or Responses API Clients - as always, text-based communication over our WebSocket gateways is free of charge. You need to provide your own API key to OpenAI or other LLM connector and will be billed by that provider according to your account terms with them.
Code example
This example includes:
- Full turn-taking controls
- The OpenAI Responses AI Client
- Use of an 3rd party LLM vendor (groq) using OpenAI compatibility
- A full cascaded pipeline using Voximplant’s built-in speech recognition (AST / STT) options and streaming TTS.
Load the turn taking helper code from here into a new scenario. Then make another new scenario with the code below. Make sure the vox-turn-taking scenario is included in your routing rule with the scenario below.
See the full guide for mode details.
/**
* Full-cascade Voice AI demo: Deepgram STT + Groq Llama Responses API + Inworld TTS
* Scenario: answer an incoming call using VoxTurnTaking for turn management.
*
* Include `vox-turn-taking` in the routing rule sequence.
*
* Groq's Responses API is OpenAI-compatible, but it does not currently support
* `previous_response_id`. To keep this example simple, each turn is submitted
* independently instead of rebuilding prior conversation history locally.
*/
require(Modules.ASR);
require(Modules.OpenAI);
require(Modules.Inworld);
require(Modules.ApplicationStorage);
const SYSTEM_PROMPT = `
You are Voxi, a helpful phone assistant for Voximplant. Keep responses short, polite, and telephony-friendly (usually 1-2 sentences).
Reply in English.
`;
VoxEngine.addEventListener(AppEvents.CallAlerting, async ({call}) => {
let stt;
let responsesClient;
let ttsPlayer;
let turnTaking;
const terminate = () => {
stt?.stop();
responsesClient?.close();
turnTaking?.close();
VoxEngine.terminate();
};
call.addEventListener(CallEvents.Disconnected, terminate);
call.addEventListener(CallEvents.Failed, terminate);
try {
call.answer();
call.record({hd_audio: true, stereo: true}); // optional recording
stt = VoxEngine.createASR({
profile: ASRProfileList.Deepgram.en_US,
interimResults: true,
request: {
language: "en-US",
model: "nova-2-phonecall",
keywords: ["Voximplant:4", "OpenAI:2"],
},
});
responsesClient = await OpenAI.createResponsesAPIClient({
apiKey: (await ApplicationStorage.get("GROQ_API_KEY")).value,
baseUrl: "https://api.groq.com/openai/v1",
storeContext: false,
onWebSocketClose: (event) => {
Logger.write("===Groq.WebSocket.Close===");
if (event) Logger.write(JSON.stringify(event));
terminate();
},
});
ttsPlayer = Inworld.createRealtimeTTSPlayer({
createContextParameters: {
create: {
voiceId: "Ashley",
modelId: "inworld-tts-1.5-mini",
speakingRate: 1.1,
temperature: 1.3,
}
}
});
// Load the VoxTurnTaking module as part of the routing rule
turnTaking = await VoxTurnTaking.create({
call,
stt,
vadOptions: {
threshold: 0.5, // sensitivity for detecting speech vs silence
minSilenceDurationMs: 350, // silence required before VAD marks speech end
speechPadMs: 10, // small padding around detected speech
},
turnDetectorOptions: {
threshold: 0.5, // end-of-turn probability needed from Pipecat
},
policy: {
transcriptSettleMs: 500, // grace period for a final STT chunk after end-of-turn
userSpeechTimeoutMs: 1000, // default fallback submit timeout after speech ends
shortUtteranceExtensionMs: 1800, // longer hold for fragments that may continue
fastShortUtteranceTimeoutMs: 700, // faster submit for short complete utterances like "hey"
shortUtteranceMaxChars: 12, // max chars still treated as a short fragment
shortUtteranceMaxWords: 2, // max words still treated as a short fragment
lowConfidenceShortUtteranceThreshold: 0.75, // keep short low-confidence finals replaceable
},
enableLogging: true,
onUserTurn: (input) => { // send the transcript text on end-of-turn
responsesClient.createResponses({
model: "llama-3.3-70b-versatile",
instructions: SYSTEM_PROMPT,
input,
});
},
onInterrupt: () => {
ttsPlayer?.clearBuffer(); // stop any in-progress TTS audio
},
});
responsesClient.addEventListener(OpenAI.ResponsesAPIEvents.ResponseTextDelta, (event) => {
const text = event?.data?.payload?.delta;
if (!text || !turnTaking.canPlayAgentAudio()) return;
ttsPlayer.send({send_text: {text}});
});
responsesClient.addEventListener(OpenAI.ResponsesAPIEvents.ResponseTextDone, (event) => {
const text = event?.data?.payload?.text;
Logger.write(`===AGENT=== ${text}`);
ttsPlayer.send({flush_context: {}}); // Tell TTS to process all buffered text immediately
});
// Event logging to illustrate available OpenAI Responses API client events
[
OpenAI.ResponsesAPIEvents.ResponseCreated,
OpenAI.ResponsesAPIEvents.ResponseFailed,
OpenAI.ResponsesAPIEvents.ResponsesAPIError,
OpenAI.ResponsesAPIEvents.ResponseInProgress,
OpenAI.ResponsesAPIEvents.ResponseCompleted,
OpenAI.ResponsesAPIEvents.ResponseOutputItemAdded,
OpenAI.ResponsesAPIEvents.ResponseContentPartAdded,
OpenAI.ResponsesAPIEvents.ConnectorInformation,
OpenAI.ResponsesAPIEvents.Unknown,
OpenAI.Events.WebSocketMediaStarted,
OpenAI.Events.WebSocketMediaEnded,
].forEach((eventName) => {
responsesClient.addEventListener(eventName, (event) => {
Logger.write(`===${event?.name || eventName}===`);
if (event?.data) Logger.write(JSON.stringify(event.data));
});
});
// Attach the caller media
call.sendMediaTo(stt);
ttsPlayer.sendMediaTo(call);
// Tell the LLM to talk first and greet the user
responsesClient.createResponses({
model: "llama-3.3-70b-versatile",
instructions: SYSTEM_PROMPT,
input: "Greet the caller briefly.",
});
} catch (error) {
Logger.write("===UNHANDLED_ERROR===");
Logger.write(error);
terminate();
}
});
References
General Voice AI
- Voximplant Voice AI platform — https://voximplant.ai
- Full-cascade with “bring your own LLM” guide — https://docs.voximplant.ai/voice-ai-connectors/openai/full-cascade-groq
- Pricing information — https://voximplant.com/pricing
- Sign up for Voximplant — https://manage.voximplant.com/auth/sign_up
OpenAI
- OpenAI product page — https://voximplant.com/products/openai-client
- Chat Completions API Client Guide — https://voximplant.com/docs/voice-ai/openai/chat-completions-client
- Responses API Client Guide — https://voximplant.com/docs/voice-ai/openai/responses-client
- OpenAI module API reference — https://voximplant.com/docs/references/voxengine/openai
VAD and Turn Detection
- VAD and Turn Detection product page — https://voximplant.com/products/turn-detection
- VAD and Turn Guides — https://docs.voximplant.ai/capabilities/speech-flow-control/
- Silero Module (VAD) API reference — https://voximplant.com/docs/references/voxengine/silero
- Pipecat Module (Turn detection) API reference — https://voximplant.com/docs/references/voxengine/pipecat




