create_speech, the SDK splits at sentence boundaries, generates each segment, and merges the result.
Python
TypeScript
What happens under the hood
- The SDK detects that
SCRIPT.length > 300. - It splits on sentence punctuation first, then word boundaries if a sentence is itself too long.
- The Python SDK fires up to 3 parallel
create_speechrequests; the TypeScript SDK runs them sequentially. - Each segment returns a complete audio file.
- The SDK strips the WAV header from every segment after the first and concatenates the bytes into a single continuous clip.
- You get one playable file back, identical in form to a single-segment response.
Tips
- Punctuation pays off. Well-punctuated source text produces cleaner cuts. If your script comes from machine translation or transcription, adding
./?/!improves the result. - Voice settings travel. The same
voice_settingsare applied to every segment, so the merged audio sounds consistent. - Estimate first.
predict_durationdoesn’t auto-chunk, but you can split your script into a few sentences, callpredict_durationon each, and sum the durations to estimate cost. - Pick the right model. For long narration,
sona_speech_2produces the most natural delivery. Switch tosona_speech_2_flashif you need to generate many narrations quickly.
Related
Long text
Full reference on the 300-character limit and chunking behavior.
Voice settings
Tune the delivery of your narration.