Skip to main content

Prerequisites

Before you begin, ensure you have:
  • A Cartesia API key (Sign up here)
  • Node.js 18+ or Python 3.8+
  • Basic understanding of async/await patterns

Installation

npm install @cartesia/cartesia-js

Basic Setup

1

Initialize the Client

import Cartesia from '@cartesia/cartesia-js';

const cartesia = new Cartesia({
  apiKey: process.env.CARTESIA_API_KEY,
});
2

Choose a Voice

// List available voices
const voices = await cartesia.voices.list();

// Use a voice ID
const voiceId = "a0e99841-438c-4a64-b679-ae501e7d6091";
3

Generate Speech

const response = await cartesia.tts.bytes({
  model_id: "sonic",
  transcript: "Hello, I'm CARTER!",
  voice: {
    mode: "id",
    id: voiceId
  },
  output_format: {
    container: "mp3",
    encoding: "mp3",
    sample_rate: 44100
  }
});

// Save or play the audio
const audioBlob = new Blob([response.audio], { type: 'audio/mp3' });

Streaming Integration

For real-time applications like CARTER:
// Server-Sent Events (SSE) streaming
const response = await cartesia.tts.sse({
  model_id: "sonic",
  transcript: "Streaming response in real-time",
  voice: {
    mode: "id",
    id: voiceId
  },
  output_format: {
    container: "raw",
    encoding: "pcm_s16le",
    sample_rate: 16000
  }
});

// Process chunks as they arrive
for await (const chunk of response) {
  // Play audio chunk
  audioPlayer.play(chunk);
}

WebSocket Integration

For lowest latency (like CARTER’s voice interface):
// Connect to WebSocket
const ws = await cartesia.tts.websocket({
  model_id: "sonic",
  voice: {
    mode: "id",
    id: voiceId
  },
  output_format: {
    container: "raw",
    encoding: "pcm_s16le",
    sample_rate: 16000
  }
});

// Send text to convert
await ws.send({
  transcript: "Ultra-low latency response",
  context_id: "conversation-1"
});

// Listen for audio
ws.on('message', (audioChunk) => {
  playAudio(audioChunk);
});

// Close when done
await ws.close();

Adding Emotions

Control voice emotions like CARTER:
const response = await cartesia.tts.sse({
  model_id: "sonic",
  transcript: "I'm so excited about this!",
  voice: {
    mode: "id",
    id: voiceId
  },
  _experimental_voice_controls: {
    emotion: ["positivity:highest", "excitement"],
    speed: "fast"
  }
});

Available Emotions

  • positivity (lowest, low, high, highest)
  • curiosity
  • surprise
  • anger
  • sadness

Error Handling

Implement robust error handling:
try {
  const response = await cartesia.tts.bytes({
    model_id: "sonic",
    transcript: message,
    voice: { mode: "id", id: voiceId }
  });
  
  return response.audio;
} catch (error) {
  if (error.status === 429) {
    // Rate limit - implement backoff
    await delay(1000);
    return retry();
  } else if (error.status === 401) {
    // Invalid API key
    console.error('Invalid API key');
  } else {
    // Other errors
    console.error('TTS error:', error);
  }
}

Rate Limiting

Handle rate limits gracefully:
async function generateWithRetry(text, maxRetries = 3) {
  for (let i = 0; i < maxRetries; i++) {
    try {
      return await cartesia.tts.bytes({
        model_id: "sonic",
        transcript: text,
        voice: { mode: "id", id: voiceId }
      });
    } catch (error) {
      if (error.status === 429 && i < maxRetries - 1) {
        // Exponential backoff
        await delay(Math.pow(2, i) * 1000);
        continue;
      }
      throw error;
    }
  }
}

Best Practices

Use the same context_id for related messages to maintain conversation flow and improve latency.
  • Use PCM for lowest latency
  • Use MP3 for file storage
  • Match sample rate to your playback system
Implement reconnection logic for WebSocket connections:
ws.on('close', () => {
  setTimeout(() => reconnect(), 1000);
});
Fetch and cache voice IDs at startup rather than on each request.

Next Steps

Voice API Reference

Detailed API documentation

Code Examples

Working code samples
For production use, always implement proper error handling, rate limiting, and monitoring.