Skip to main content

Prerequisites

Before you begin, ensure you have:
  • A Cartesia API key (Sign up here)
  • Node.js 18+ or Python 3.8+
  • Basic understanding of async/await patterns

Installation

npm install @cartesia/cartesia-js

Basic Setup

1

Initialize the Client

import Cartesia from '@cartesia/cartesia-js';

const cartesia = new Cartesia({
  apiKey: process.env.CARTESIA_API_KEY,
});
2

Choose a Voice

// List available voices
const voices = await cartesia.voices.list();

// Use a voice ID
const voiceId = "a0e99841-438c-4a64-b679-ae501e7d6091";
3

Generate Speech

const response = await cartesia.tts.bytes({
  model_id: "sonic",
  transcript: "Hello, I'm CARTER!",
  voice: {
    mode: "id",
    id: voiceId
  },
  output_format: {
    container: "mp3",
    encoding: "mp3",
    sample_rate: 44100
  }
});

// Save or play the audio
const audioBlob = new Blob([response.audio], { type: 'audio/mp3' });

Streaming Integration

For real-time applications like CARTER:
// Server-Sent Events (SSE) streaming
const response = await cartesia.tts.sse({
  model_id: "sonic",
  transcript: "Streaming response in real-time",
  voice: {
    mode: "id",
    id: voiceId
  },
  output_format: {
    container: "raw",
    encoding: "pcm_s16le",
    sample_rate: 16000
  }
});

// Process chunks as they arrive
for await (const chunk of response) {
  // Play audio chunk
  audioPlayer.play(chunk);
}

WebSocket Integration

For lowest latency (like CARTER’s voice interface):
// Connect to WebSocket
const ws = await cartesia.tts.websocket({
  model_id: "sonic",
  voice: {
    mode: "id",
    id: voiceId
  },
  output_format: {
    container: "raw",
    encoding: "pcm_s16le",
    sample_rate: 16000
  }
});

// Send text to convert
await ws.send({
  transcript: "Ultra-low latency response",
  context_id: "conversation-1"
});

// Listen for audio
ws.on('message', (audioChunk) => {
  playAudio(audioChunk);
});

// Close when done
await ws.close();

Adding Emotions

Control voice emotions like CARTER:
const response = await cartesia.tts.sse({
  model_id: "sonic",
  transcript: "I'm so excited about this!",
  voice: {
    mode: "id",
    id: voiceId
  },
  _experimental_voice_controls: {
    emotion: ["positivity:highest", "excitement"],
    speed: "fast"
  }
});

Available Emotions

  • positivity (lowest, low, high, highest)
  • curiosity
  • surprise
  • anger
  • sadness

Error Handling

Implement robust error handling:
try {
  const response = await cartesia.tts.bytes({
    model_id: "sonic",
    transcript: message,
    voice: { mode: "id", id: voiceId }
  });
  
  return response.audio;
} catch (error) {
  if (error.status === 429) {
    // Rate limit - implement backoff
    await delay(1000);
    return retry();
  } else if (error.status === 401) {
    // Invalid API key
    console.error('Invalid API key');
  } else {
    // Other errors
    console.error('TTS error:', error);
  }
}

Rate Limiting

Handle rate limits gracefully:
async function generateWithRetry(text, maxRetries = 3) {
  for (let i = 0; i < maxRetries; i++) {
    try {
      return await cartesia.tts.bytes({
        model_id: "sonic",
        transcript: text,
        voice: { mode: "id", id: voiceId }
      });
    } catch (error) {
      if (error.status === 429 && i < maxRetries - 1) {
        // Exponential backoff
        await delay(Math.pow(2, i) * 1000);
        continue;
      }
      throw error;
    }
  }
}

Best Practices

Use the same context_id for related messages to maintain conversation flow and improve latency.
  • Use PCM for lowest latency
  • Use MP3 for file storage
  • Match sample rate to your playback system
Implement reconnection logic for WebSocket connections:
ws.on('close', () => {
  setTimeout(() => reconnect(), 1000);
});
Fetch and cache voice IDs at startup rather than on each request.

Next Steps

For production use, always implement proper error handling, rate limiting, and monitoring.