Connection Flow
After obtaining a session token, connect to the WebSocket endpoint and follow this sequence:Authentication
The first message you send over the WebSocket must be a JSON string containing your session token. You have 10 seconds to send this message before the connection is closed.Server Messages
The server sends two types of messages: text (JSON control messages) and binary (PCM audio frames from the AI agent).| Message Type | Format | Description |
|---|---|---|
connected | JSON text | Authentication succeeded. You can start sending microphone audio. |
agent_ready | JSON text | The AI agent is listening. Full duplex voice conversation is active. |
session_ended | JSON text | Session has ended. Includes a reason field. Clean up resources. |
error | JSON text | An error occurred. Includes code and message fields. |
| (binary) | Raw bytes | PCM audio frame from the AI agent. Play it back to the user. |
Message Formats
Audio Format
Audio is streamed as raw PCM in both directions — the same format for microphone input and agent output.| Property | Value |
|---|---|
| Encoding | PCM signed 16-bit little-endian (s16le) |
| Sample Rate | 16,000 Hz |
| Channels | 1 (mono) |
| Frame Duration | 20 ms |
| Samples per Frame | 320 |
| Bytes per Frame | 640 |
640 bytes (one 20ms frame). Agent audio arrives in the same format.
Session Lifecycle
| State | Description |
|---|---|
idle | No active session. Ready to connect. |
connecting | Fetching token and opening WebSocket. |
connected | WebSocket authenticated. You can start sending audio. |
agent_ready | AI agent is listening. Full duplex conversation active. |
ended | Session terminated. Clean up resources. |