Skip to main content

Introduction

CARTER is the unhinged meme mode of Cartesia — showing what’s possible when you remove all guardrails from Cartesia’s Sonic model. This guide shows you how to build voice AI with personality, not corporate speak.

Why Build with Cartesia?

Real Emotions

Express genuine emotions through voice with fine-grained control

Ultra-Low Latency

Sub-6ms response times for natural conversations

Production Ready

Enterprise-grade reliability and scalability

Simple Integration

Clean APIs and SDKs for rapid development

Cartesia Sonic Model

The Sonic model powers CARTER’s voice capabilities:
  • Emotional Expression: Control tone, pitch, and emotion in real-time
  • Multiple Voices: Choose from stable voices or emotive character voices
  • Low Latency: 6ms average response time
  • High Quality: Natural-sounding speech with proper pronunciation
  • Streaming Support: Real-time audio generation
Sonic is Cartesia’s latest text-to-speech model, offering unprecedented emotional range and responsiveness.

Getting Started

1

Sign Up for Cartesia

Get your API key at cartesia.ai
2

Install the SDK

npm install @cartesia/cartesia-js
# or
pip install cartesia
3

Make Your First Request

import Cartesia from '@cartesia/cartesia-js';

const cartesia = new Cartesia({
  apiKey: process.env.CARTESIA_API_KEY,
});

const response = await cartesia.tts.generate({
  model: 'sonic',
  voice: 'your-voice-id',
  transcript: 'Hello from CARTER!',
  outputFormat: 'mp3',
});

Core Capabilities

Text-to-Speech

Generate natural-sounding speech with emotional control:
import cartesia

client = cartesia.Cartesia(api_key="your-api-key")

# Generate speech with emotion
output = client.tts.sse(
    model_id="sonic",
    transcript="I'm excited about this!",
    voice_id="voice-id",
    _experimental_voice_controls={
        "emotion": ["positivity:high", "curiosity"]
    }
)

Streaming Voice

Real-time voice generation for conversational AI:
const stream = await cartesia.tts.stream({
  model: 'sonic',
  voice: voiceId,
  transcript: 'Streaming response...',
  outputFormat: 'pcm_16000',
});

for await (const chunk of stream) {
  // Process audio chunks in real-time
  playAudio(chunk);
}

Voice Cloning

Clone voices for custom characters:
# Upload voice samples
voice = client.voices.create(
    name="Custom Voice",
    description="My custom voice",
    audio_files=[
        "sample1.wav",
        "sample2.wav", 
        "sample3.wav"
    ]
)

Integration Patterns

CARTER uses several key patterns you can implement:
Maintain persistent connections for low-latency streaming
const ws = cartesia.tts.websocket({
  model: 'sonic',
  voice: voiceId,
  outputFormat: 'pcm_16000',
});

ws.on('message', (audio) => {
  playAudioChunk(audio);
});
Dynamically adjust voice emotions
output = client.tts.sse(
    model_id="sonic",
    transcript="Amazing!",
    voice_id=voice_id,
    _experimental_voice_controls={
        "emotion": ["positivity:highest", "surprise"],
        "speed": "fast"
    }
)
Maintain conversation context for natural flow
const context = cartesia.contexts.create();

// Each message builds on context
await cartesia.tts.generate({
  model: 'sonic',
  voice: voiceId,
  transcript: message,
  contextId: context.id,
});

Next Steps

Resources

Remember to keep your API keys secure and never expose them in client-side code.