Voice AI Engine:
Streaming - STT:
Batch - STT:

Resources

Everything you need to build amazing speech-enabled applications with SpeechCortex. Choose between real-time streaming or batch processing for post-call analysis.

Voice AI Engine

COVE - Conversational Voice Engine for next-generation Voice AI applications

Feature Overview

COVE (Conversational Voice Engine) is purpose-built for Voice AI applications, providing intelligent turn detection and seamless integration with voice assistants and conversational AI systems.

Intelligent Turn Detection

Context-aware end-of-turn detection

Ultra-Low Latency

Sub-200ms response times

Integrated Streaming STT

Built-in speech-to-text pipeline

COVE Engine

The Conversational Voice Engine (COVE) is the core of our Voice AI stack, providing intelligent speech processing optimized for conversational applications.

🎯 Purpose-Built for Voice AI

Unlike general-purpose STT, COVE is specifically designed for interactive voice applications where natural conversation flow is critical.

🔄 Full Duplex Communication

Supports simultaneous listening and speaking, enabling natural conversational interactions without artificial turn-taking constraints.

🧠 Context-Aware Processing

Leverages conversation context to improve accuracy and make intelligent decisions about turn boundaries.

End of Turn Detection

Intelligent detection of when a speaker has finished their turn, enabling natural conversation flow.

⏱️ Adaptive Timing

Dynamically adjusts silence thresholds based on conversation context, avoiding premature cutoffs during natural pauses.

📝 Linguistic Analysis

Analyzes sentence structure and semantics to determine completion, not just silence detection.

🎭 Prosodic Features

Uses pitch, intonation, and rhythm patterns to identify natural turn boundaries with high accuracy.

Start of Turn Detection

Rapid detection of when a user begins speaking, enabling responsive Voice AI interactions.

⚡ Instant Detection

Sub-100ms detection of speech onset, enabling immediate system response and natural conversation pacing.

🔇 Noise Filtering

Distinguishes actual speech from background noise, coughs, and non-speech sounds to prevent false triggers.

🎤 Barge-In Support

Enables users to interrupt system speech naturally, just like in human conversations.

Getting Started

Integrate COVE into your Voice AI application in minutes.

1

Connect via WebSocket

Establish a connection to the COVE endpoint.

wss://api.speechcortex.ai/v1/cove?api_key=YOUR_API_KEY
2

Configure COVE Settings

Enable turn detection features.

{ "type": "config", "end_of_turn": true, "start_of_turn": true }
3

Stream Audio & Receive Events

Send audio and receive transcription + turn events.

WebSocket API

Full reference for the COVE WebSocket API.

Connection Endpoint
Connect to COVE for Voice AI processing.
WSS /v1/cove
Events
transcript.partialInterim transcription
transcript.finalFinal transcription
turn.startUser started speaking
turn.endUser finished speaking

SDKs & Libraries

Official SDKs with COVE integration for Voice AI applications.

Python

v2.1.0

JavaScript

v1.8.0

React Native

v1.2.0

Swift (iOS)

v1.4.0

Kotlin (Android)

v1.3.0

Go

v1.5.0

Code Samples

Python - COVE Voice AI Integration
import speechcortex

client = speechcortex.Client(api_key="YOUR_API_KEY")

def on_turn_end(event):
    print(f"User finished: {event.transcript}")
    # Trigger your AI response here

def on_turn_start(event):
    print("User started speaking")
    # Stop any ongoing TTS playback

# Start COVE session
session = client.cove.start(
    on_turn_start=on_turn_start,
    on_turn_end=on_turn_end
)

session.stream_microphone()
JavaScript - Voice AI Agent
import { SpeechCortex } from '@speechcortex/sdk';

const client = new SpeechCortex({ apiKey: 'YOUR_API_KEY' });

const session = await client.cove.start({
  onTurnStart: () => {
    // User interrupted - stop AI speech
    ttsPlayer.stop();
  },
  onTurnEnd: (event) => {
    // User finished - process with your LLM
    const response = await llm.generate(event.transcript);
    ttsPlayer.speak(response);
  }
});

await session.startMicrophone();

Use Cases

Voice AI Agents

Build natural conversational AI assistants with intelligent turn-taking and barge-in support.

Contact Center AI

Deploy AI agents that can handle customer calls with human-like conversation flow.

Voice-First Applications

Create hands-free applications with responsive voice interaction for IoT and automotive.

Interactive IVR

Replace rigid menu systems with natural language voice interfaces.

Streaming - STT

Real-time streaming speech recognition for live applications

Feature Overview

Stream audio in real-time and receive transcriptions with ultra-low latency. Perfect for voice assistants, live captioning, and interactive applications.

~200ms Latency

Near real-time transcription

WebSocket API

Persistent bi-directional connection

Interim Results

Get partial transcripts as you speak

Media Settings

Configure audio input parameters for optimal streaming transcription quality.

Sample Rate

Supported sample rates for audio input.

8000 Hz16000 Hz22050 Hz44100 Hz48000 Hz

Audio Encoding

Supported audio encoding formats.

PCM (Linear16)μ-lawA-law

Channels

Mono (1 channel) or Stereo (2 channels) audio input supported.

Results

Understanding the transcription results returned by the streaming API.

⏳ Interim & Final

Receive both interim (partial) and final transcription results. Interim results update in real-time as speech is processed, while final results are confirmed and won't change.

🎙️ Speech Final

Indicates when a complete utterance has been recognized. Triggered when natural speech boundaries are detected, such as pauses or sentence endings.

📂 Finalise

Force finalization of the current transcript segment. Useful for ending a session or when you need immediate final results without waiting for natural speech boundaries.

🕒 Word Timing

Precise start and end timestamps for each word in the transcript. Enables accurate audio-text alignment for captions, highlights, and playback synchronization.

💡 Word Confidence

Individual confidence scores (0.0 to 1.0) for each recognized word. Helps identify uncertain transcriptions and enables quality-based filtering or highlighting.

Controls

Control messages to manage the streaming session.

🧩 Variable Chunk

Send audio in variable-sized chunks based on your application's needs. Supports flexible chunk sizes for optimal latency and throughput balance.

♻️ Keep Alive

Maintain the WebSocket connection during periods of silence or inactivity. Prevents timeout disconnections and ensures seamless resumption of audio streaming.

🔚 End Pointing

Automatic detection of speech endpoints to determine when a speaker has finished talking. Enables natural conversation flow and timely transcript finalization.

Format

Output formatting options for transcription results.

✒️ Punctuations

Automatically add punctuation marks (periods, commas, question marks) to the transcript for improved readability and natural text flow.

🗯️ Filler Words

Control whether filler words (um, uh, like, you know) are included or filtered out from the transcript. Useful for verbatim transcription or cleaner output.

Getting Started

Set up real-time streaming transcription in minutes.

1

Establish WebSocket Connection

Connect to our streaming endpoint with your API key.

wss://api.speechcortex.ai/v1/stream?api_key=YOUR_API_KEY
2

Configure Audio Settings

Send configuration message with sample rate and encoding.

{ "type": "config", "sample_rate": 16000, "encoding": "pcm_s16le" }
3

Stream Audio Data

Send binary audio chunks and receive transcription events.

WebSocket API

Full reference for the streaming WebSocket API.

Connection Endpoint
Establish a persistent WebSocket connection for streaming.
WSS /v1/stream
Events
transcript.partialInterim transcription results
transcript.finalFinal transcription segment
vad.speech_endEnd of speech detected

SDKs & Libraries

Official SDKs with built-in WebSocket handling and audio capture.

Python

v2.1.0

JavaScript

v1.8.0

React Native

v1.2.0

Swift (iOS)

v1.4.0

Kotlin (Android)

v1.3.0

Go

v1.5.0

Use Cases

Voice Assistants

Build conversational AI with instant speech recognition and natural turn-taking.

Live Captioning

Real-time captions for video calls, broadcasts, and live events.

Voice Commands

Enable hands-free control in apps, games, and IoT devices.

Live Agent Assist

Provide real-time suggestions to customer service agents during calls.

Code Samples

JavaScript - Browser Streaming
import { SpeechCortex } from '@speechcortex/sdk';

const client = new SpeechCortex({ apiKey: 'YOUR_API_KEY' });

// Start streaming from microphone
const stream = await client.startStreaming({
  onPartial: (text) => console.log('Partial:', text),
  onFinal: (text) => console.log('Final:', text),
  onError: (err) => console.error(err)
});

// Later: stop streaming
stream.stop();
Python - Real-time Streaming
import speechcortex

client = speechcortex.Client(api_key="YOUR_API_KEY")

def on_transcript(event):
    if event.is_final:
        print(f"Final: {event.text}")
    else:
        print(f"Partial: {event.text}")

# Stream from microphone
client.stream_microphone(on_transcript=on_transcript)

Batch - STT

Batch speech-to-text processing for recorded audio files

Feature Overview

Process recorded audio files with high accuracy. Ideal for call center analytics, meeting transcription, and content archival with speaker diarization and advanced features.

High Accuracy

<5% Word Error Rate

Speaker Diarization

Identify who said what

Batch Processing

Process thousands of files

Media Settings

Supported audio formats and configuration options for batch transcription.

Supported File Formats

Upload audio in any of these formats.

WAVMP3M4AFLACOGGWebMAACWMA

File Size Limits

Maximum file sizes for different tiers.

Standard

500 MB

Enterprise

2 GB

Audio Duration

Maximum audio duration of 4 hours per file. Longer recordings can be split automatically.

Results

Understanding the transcription output from batch processing.

Full Transcript

Complete text transcription with punctuation and formatting.

Word-Level Timestamps

Precise start and end times for each word in the transcript.

Speaker Labels

When diarization is enabled, each segment is labeled with the speaker identifier.

Confidence Scores

Overall and word-level confidence scores for quality assessment.

Controls

Options to control transcription behavior and output.

Speaker Diarization

diarization: true

Identify and separate different speakers in the audio.

Punctuation

punctuation: true

Automatically add punctuation to the transcript.

Profanity Filter

profanity_filter: true

Mask profane words in the transcript output.

Custom Vocabulary

custom_vocabulary: [...]

Boost recognition of domain-specific terms and names.

Format

API request and response formats for batch transcription.

Request Body
{
  "audio_url": "https://storage.example.com/recording.wav",
  "language": "en-US",
  "diarization": true,
  "punctuation": true,
  "webhook_url": "https://your-app.com/webhook"
}
Response Format
{
  "id": "tx_abc123",
  "status": "completed",
  "text": "Hello, how can I help you today?",
  "confidence": 0.96,
  "duration": 45.2,
  "segments": [
    {
      "speaker": "Speaker 1",
      "text": "Hello, how can I help you today?",
      "start": 0.0,
      "end": 2.5
    }
  ]
}

Getting Started

Transcribe your first audio file in minutes.

1

Get Your API Key

Sign up and generate an API key from your dashboard.

Manage Keys →
2

Upload Audio File

Send your audio file to the transcription endpoint.

curl -X POST https://api.speechcortex.ai/v1/transcribe \
-H "Authorization: Bearer YOUR_API_KEY" \
-F "audio=@call_recording.wav"
3

Get Transcription Results

Receive structured JSON with text, timestamps, and speaker labels.

REST API

Simple REST endpoints for audio transcription.

Transcribe Audio
Upload and transcribe an audio file.
POST /v1/transcribe
Get Transcription
Retrieve transcription by ID.
GET /v1/transcriptions/:id
Speaker Diarization
Identify and label different speakers.
POST /v1/diarize
Language Detection
Automatically detect spoken language.
POST /v1/detect-language

Batch Processing

Process large volumes of audio files efficiently with our batch API.

How Batch Processing Works

  1. Submit a batch job with multiple audio file URLs
  2. Our system queues and processes files in parallel
  3. Receive webhook notifications as transcriptions complete
  4. Retrieve all results via the batch status endpoint
Create Batch Job
curl -X POST https://api.speechcortex.ai/v1/batch   -H "Authorization: Bearer YOUR_API_KEY"   -H "Content-Type: application/json"   -d '{
    "files": [
      "https://storage.example.com/call1.wav",
      "https://storage.example.com/call2.wav"
    ],
    "webhook_url": "https://your-app.com/webhook",
    "options": {
      "diarization": true,
      "punctuation": true
    }
  }'

Use Cases

Contact Center Analytics

Analyze call recordings to improve agent performance and customer satisfaction.

Meeting Transcription

Convert recorded meetings into searchable, shareable transcripts.

Compliance & Quality

Ensure regulatory compliance with complete call documentation.

Content Archival

Make audio and video archives searchable and accessible.

Code Samples

Python - Transcribe File with Diarization
import speechcortex

client = speechcortex.Client(api_key="YOUR_API_KEY")

# Transcribe with speaker diarization
result = client.transcribe(
    "call_recording.wav",
    diarization=True,
    punctuation=True
)

for segment in result.segments:
    print(f"Speaker {segment.speaker}: {segment.text}")
    
# Output:
# Speaker 1: Hello, how can I help you today?
# Speaker 2: I'd like to check my account balance.
Node.js - Async Transcription
import { SpeechCortex } from '@speechcortex/sdk';

const client = new SpeechCortex({ apiKey: 'YOUR_API_KEY' });

// Submit for async processing
const job = await client.transcribeAsync({
  audioUrl: 'https://storage.example.com/recording.wav',
  webhookUrl: 'https://your-app.com/webhook'
});

console.log('Job ID:', job.id);
// Results delivered via webhook when ready

Ready to Get Started?

Sign up for free and start building with SpeechCortex today. No credit card required.