Prerequisites
Before you begin, ensure you have:
- A Cartesia API key (Sign up here)
- Node.js 18+ or Python 3.8+
- Basic understanding of async/await patterns
Installation
npm install @cartesia/cartesia-js
Basic Setup
Initialize the Client
import Cartesia from '@cartesia/cartesia-js';
const cartesia = new Cartesia({
apiKey: process.env.CARTESIA_API_KEY,
});
Choose a Voice
// List available voices
const voices = await cartesia.voices.list();
// Use a voice ID
const voiceId = "a0e99841-438c-4a64-b679-ae501e7d6091";
Generate Speech
const response = await cartesia.tts.bytes({
model_id: "sonic",
transcript: "Hello, I'm CARTER!",
voice: {
mode: "id",
id: voiceId
},
output_format: {
container: "mp3",
encoding: "mp3",
sample_rate: 44100
}
});
// Save or play the audio
const audioBlob = new Blob([response.audio], { type: 'audio/mp3' });
Streaming Integration
For real-time applications like CARTER:
// Server-Sent Events (SSE) streaming
const response = await cartesia.tts.sse({
model_id: "sonic",
transcript: "Streaming response in real-time",
voice: {
mode: "id",
id: voiceId
},
output_format: {
container: "raw",
encoding: "pcm_s16le",
sample_rate: 16000
}
});
// Process chunks as they arrive
for await (const chunk of response) {
// Play audio chunk
audioPlayer.play(chunk);
}
WebSocket Integration
For lowest latency (like CARTER’s voice interface):
// Connect to WebSocket
const ws = await cartesia.tts.websocket({
model_id: "sonic",
voice: {
mode: "id",
id: voiceId
},
output_format: {
container: "raw",
encoding: "pcm_s16le",
sample_rate: 16000
}
});
// Send text to convert
await ws.send({
transcript: "Ultra-low latency response",
context_id: "conversation-1"
});
// Listen for audio
ws.on('message', (audioChunk) => {
playAudio(audioChunk);
});
// Close when done
await ws.close();
Adding Emotions
Control voice emotions like CARTER:
const response = await cartesia.tts.sse({
model_id: "sonic",
transcript: "I'm so excited about this!",
voice: {
mode: "id",
id: voiceId
},
_experimental_voice_controls: {
emotion: ["positivity:highest", "excitement"],
speed: "fast"
}
});
Available Emotions
- positivity (lowest, low, high, highest)
- curiosity
- surprise
- anger
- sadness
Error Handling
Implement robust error handling:
try {
const response = await cartesia.tts.bytes({
model_id: "sonic",
transcript: message,
voice: { mode: "id", id: voiceId }
});
return response.audio;
} catch (error) {
if (error.status === 429) {
// Rate limit - implement backoff
await delay(1000);
return retry();
} else if (error.status === 401) {
// Invalid API key
console.error('Invalid API key');
} else {
// Other errors
console.error('TTS error:', error);
}
}
Rate Limiting
Handle rate limits gracefully:
async function generateWithRetry(text, maxRetries = 3) {
for (let i = 0; i < maxRetries; i++) {
try {
return await cartesia.tts.bytes({
model_id: "sonic",
transcript: text,
voice: { mode: "id", id: voiceId }
});
} catch (error) {
if (error.status === 429 && i < maxRetries - 1) {
// Exponential backoff
await delay(Math.pow(2, i) * 1000);
continue;
}
throw error;
}
}
}
Best Practices
Keep Context IDs Consistent
Use the same context_id for related messages to maintain conversation flow and improve latency.
Implement reconnection logic for WebSocket connections:ws.on('close', () => {
setTimeout(() => reconnect(), 1000);
});
Fetch and cache voice IDs at startup rather than on each request.
Next Steps
For production use, always implement proper error handling, rate limiting, and monitoring.