Integration Guide

Prerequisites

Before you begin, ensure you have:

A Cartesia API key (Sign up here)
Node.js 18+ or Python 3.8+
Basic understanding of async/await patterns

Installation

npm install @cartesia/cartesia-js

Basic Setup

Initialize the Client

import Cartesia from '@cartesia/cartesia-js';

const cartesia = new Cartesia({
  apiKey: process.env.CARTESIA_API_KEY,
});

Choose a Voice

// List available voices
const voices = await cartesia.voices.list();

// Use a voice ID
const voiceId = "a0e99841-438c-4a64-b679-ae501e7d6091";

Generate Speech

const response = await cartesia.tts.bytes({
  model_id: "sonic",
  transcript: "Hello, I'm CARTER!",
  voice: {
    mode: "id",
    id: voiceId
  },
  output_format: {
    container: "mp3",
    encoding: "mp3",
    sample_rate: 44100
  }
});

// Save or play the audio
const audioBlob = new Blob([response.audio], { type: 'audio/mp3' });

Streaming Integration

For real-time applications like CARTER:

// Server-Sent Events (SSE) streaming
const response = await cartesia.tts.sse({
  model_id: "sonic",
  transcript: "Streaming response in real-time",
  voice: {
    mode: "id",
    id: voiceId
  },
  output_format: {
    container: "raw",
    encoding: "pcm_s16le",
    sample_rate: 16000
  }
});

// Process chunks as they arrive
for await (const chunk of response) {
  // Play audio chunk
  audioPlayer.play(chunk);
}

WebSocket Integration

For lowest latency (like CARTER’s voice interface):

// Connect to WebSocket
const ws = await cartesia.tts.websocket({
  model_id: "sonic",
  voice: {
    mode: "id",
    id: voiceId
  },
  output_format: {
    container: "raw",
    encoding: "pcm_s16le",
    sample_rate: 16000
  }
});

// Send text to convert
await ws.send({
  transcript: "Ultra-low latency response",
  context_id: "conversation-1"
});

// Listen for audio
ws.on('message', (audioChunk) => {
  playAudio(audioChunk);
});

// Close when done
await ws.close();

Adding Emotions

Control voice emotions like CARTER:

const response = await cartesia.tts.sse({
  model_id: "sonic",
  transcript: "I'm so excited about this!",
  voice: {
    mode: "id",
    id: voiceId
  },
  _experimental_voice_controls: {
    emotion: ["positivity:highest", "excitement"],
    speed: "fast"
  }
});

Available Emotions

positivity (lowest, low, high, highest)
curiosity
surprise
anger
sadness

Error Handling

Implement robust error handling:

try {
  const response = await cartesia.tts.bytes({
    model_id: "sonic",
    transcript: message,
    voice: { mode: "id", id: voiceId }
  });
  
  return response.audio;
} catch (error) {
  if (error.status === 429) {
    // Rate limit - implement backoff
    await delay(1000);
    return retry();
  } else if (error.status === 401) {
    // Invalid API key
    console.error('Invalid API key');
  } else {
    // Other errors
    console.error('TTS error:', error);
  }
}

Rate Limiting

Handle rate limits gracefully:

async function generateWithRetry(text, maxRetries = 3) {
  for (let i = 0; i < maxRetries; i++) {
    try {
      return await cartesia.tts.bytes({
        model_id: "sonic",
        transcript: text,
        voice: { mode: "id", id: voiceId }
      });
    } catch (error) {
      if (error.status === 429 && i < maxRetries - 1) {
        // Exponential backoff
        await delay(Math.pow(2, i) * 1000);
        continue;
      }
      throw error;
    }
  }
}

Best Practices

Keep Context IDs Consistent

Use the same context_id for related messages to maintain conversation flow and improve latency.

Optimize Audio Format

Use PCM for lowest latency
Use MP3 for file storage
Match sample rate to your playback system

Handle Disconnections

Implement reconnection logic for WebSocket connections:

ws.on('close', () => {
  setTimeout(() => reconnect(), 1000);
});

Cache Voice IDs

Fetch and cache voice IDs at startup rather than on each request.

Next Steps

Voice API Reference

Detailed API documentation

Code Examples

Working code samples

For production use, always implement proper error handling, rate limiting, and monitoring.

Developers

​Prerequisites

​Installation

​Basic Setup

​Streaming Integration

​WebSocket Integration

​Adding Emotions

​Available Emotions

​Error Handling

​Rate Limiting

​Best Practices

​Next Steps

Voice API Reference

Code Examples

Prerequisites

Installation

Basic Setup

Streaming Integration

WebSocket Integration

Adding Emotions

Available Emotions

Error Handling

Rate Limiting

Best Practices

Next Steps