Prerequisites
Before you begin, ensure you have:
- A Cartesia API key (Sign up here)
- Node.js 18+ or Python 3.8+
- Basic understanding of async/await patterns
Installation
npm install @cartesia/cartesia-js
Basic Setup
Initialize the Client
import Cartesia from '@cartesia/cartesia-js';
const cartesia = new Cartesia({
apiKey: process.env.CARTESIA_API_KEY,
});
Choose a Voice
// List available voices
const voices = await cartesia.voices.list();
// Use a voice ID
const voiceId = "a0e99841-438c-4a64-b679-ae501e7d6091";
Generate Speech
const response = await cartesia.tts.bytes({
model_id: "sonic",
transcript: "Hello, I'm CARTER!",
voice: {
mode: "id",
id: voiceId
},
output_format: {
container: "mp3",
encoding: "mp3",
sample_rate: 44100
}
});
// Save or play the audio
const audioBlob = new Blob([response.audio], { type: 'audio/mp3' });
Streaming Integration
For real-time applications like CARTER:
// Server-Sent Events (SSE) streaming
const response = await cartesia.tts.sse({
model_id: "sonic",
transcript: "Streaming response in real-time",
voice: {
mode: "id",
id: voiceId
},
output_format: {
container: "raw",
encoding: "pcm_s16le",
sample_rate: 16000
}
});
// Process chunks as they arrive
for await (const chunk of response) {
// Play audio chunk
audioPlayer.play(chunk);
}
WebSocket Integration
For lowest latency (like CARTER’s voice interface):
// Connect to WebSocket
const ws = await cartesia.tts.websocket({
model_id: "sonic",
voice: {
mode: "id",
id: voiceId
},
output_format: {
container: "raw",
encoding: "pcm_s16le",
sample_rate: 16000
}
});
// Send text to convert
await ws.send({
transcript: "Ultra-low latency response",
context_id: "conversation-1"
});
// Listen for audio
ws.on('message', (audioChunk) => {
playAudio(audioChunk);
});
// Close when done
await ws.close();
Adding Emotions
Control voice emotions like CARTER:
const response = await cartesia.tts.sse({
model_id: "sonic",
transcript: "I'm so excited about this!",
voice: {
mode: "id",
id: voiceId
},
_experimental_voice_controls: {
emotion: ["positivity:highest", "excitement"],
speed: "fast"
}
});
Available Emotions
- positivity (lowest, low, high, highest)
- curiosity
- surprise
- anger
- sadness
Error Handling
Implement robust error handling:
try {
const response = await cartesia.tts.bytes({
model_id: "sonic",
transcript: message,
voice: { mode: "id", id: voiceId }
});
return response.audio;
} catch (error) {
if (error.status === 429) {
// Rate limit - implement backoff
await delay(1000);
return retry();
} else if (error.status === 401) {
// Invalid API key
console.error('Invalid API key');
} else {
// Other errors
console.error('TTS error:', error);
}
}
Rate Limiting
Handle rate limits gracefully:
async function generateWithRetry(text, maxRetries = 3) {
for (let i = 0; i < maxRetries; i++) {
try {
return await cartesia.tts.bytes({
model_id: "sonic",
transcript: text,
voice: { mode: "id", id: voiceId }
});
} catch (error) {
if (error.status === 429 && i < maxRetries - 1) {
// Exponential backoff
await delay(Math.pow(2, i) * 1000);
continue;
}
throw error;
}
}
}
Best Practices
Keep Context IDs Consistent
Use the same context_id for related messages to maintain conversation flow and improve latency.
Implement reconnection logic for WebSocket connections:ws.on('close', () => {
setTimeout(() => reconnect(), 1000);
});
Fetch and cache voice IDs at startup rather than on each request.
Next Steps
Voice API Reference
Detailed API documentation
Code Examples
Working code samples
For production use, always implement proper error handling, rate limiting, and monitoring.