Documentation Index
Fetch the complete documentation index at: https://docs.getworkbuddy.com/llms.txt
Use this file to discover all available pages before exploring further.
Connection Flow
After obtaining a session token, connect to the WebSocket endpoint and follow this sequence:
Client Server
| |
| 1. Open WebSocket connection |
|---------------------------------->>|
| |
| 2. Send: { "token": "<jwt>" } |
|---------------------------------->>|
| |
| 3. Receive: { "type": "connected" }
|<<----------------------------------|
| |
| 4. Start sending mic audio (PCM) |
|==================================>>|
| |
| 5. Receive: { "type": "agent_ready" }
|<<----------------------------------|
| |
| 6. Bidirectional PCM audio |
|<<================================>>|
| |
Authentication
The first message you send over the WebSocket must be a JSON string containing your session token. You have 10 seconds to send this message before the connection is closed.
{ "token": "<your_session_token>" }
Server Messages
The server sends two types of messages: text (JSON control messages) and binary (PCM audio frames from the AI agent).
| Message Type | Format | Description |
|---|
connected | JSON text | Authentication succeeded. You can start sending microphone audio. |
agent_ready | JSON text | The AI agent is listening. Full duplex voice conversation is active. |
session_ended | JSON text | Session has ended. Includes a reason field (agent_disconnected, room_closed, or server_shutdown). Clean up resources. |
error | JSON text | An error occurred. Includes code and message fields. |
| (binary) | Raw bytes | PCM audio frame from the AI agent. Play it back to the user. |
// Connected
{ "type": "connected" }
// Agent Ready
{ "type": "agent_ready" }
// Session Ended
{ "type": "session_ended", "reason": "agent_disconnected" | "room_closed" | "server_shutdown" }
// Error
{ "type": "error", "code": "AUTH_FAILED", "message": "Invalid or expired token" }
Audio is streamed as raw PCM in both directions — the same format for microphone input and agent output.
| Property | Value |
|---|
| Encoding | PCM signed 16-bit little-endian (s16le) |
| Sample Rate | 16,000 Hz |
| Channels | 1 (mono) |
| Frame Duration | 20 ms |
| Samples per Frame | 320 |
| Bytes per Frame | 640 |
Send microphone audio as binary WebSocket messages. Each message should be exactly 640 bytes (one 20ms frame). Agent audio arrives in the same format.
Session Lifecycle
idle --> connecting --> connected --> agent_ready --> ended
| |
+---------- error <------------+
| State | Description |
|---|
idle | No active session. Ready to connect. |
connecting | Fetching token and opening WebSocket. |
connected | WebSocket authenticated. You can start sending audio. |
agent_ready | AI agent is listening. Full duplex conversation active. |
ended | Session terminated. Clean up resources. |
Next Steps
Now that you understand the protocol, head to the Web Guide or Flutter Guide for a complete implementation walkthrough.