Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.supertoneapi.com/llms.txt

Use this file to discover all available pages before exploring further.

The Supertone TTS API caps text at 300 characters per request. Anything longer returns 400 Bad Request. To synthesize longer scripts you have two options:
  1. Use the SDKs. Both the Python and TypeScript SDKs automatically split long input, generate each segment, and merge the audio for you. You get one final clip back even if the input was 2,000 characters.
  2. Chunk yourself. If you call the REST API directly, split the text on sentence boundaries and concatenate the resulting audio.

The 300-character limit, at a glance

LayerBehavior on >300 characters
REST API (POST /v1/text-to-speech/{voice_id})Returns 400 Bad Request.
Python SDK (create_speech, stream_speech)Auto-chunks at 300, generates in parallel for create_speech (sequential for streaming), merges audio.
TypeScript SDK (createSpeech, streamSpeech)Auto-chunks at 300, generates sequentially, merges audio.
predict_durationNot auto-chunked — same 300-character limit applies.
The threshold is configurable in both SDKs:
# Python — pass maxTextLength via SDK options
# (the default is 300; lower it to chunk earlier)
// TypeScript — pass maxTextLength in the options object
const response = await client.textToSpeech.createSpeech(
  { voiceId, apiConvertTextToSpeechUsingCharacterRequest: { text, language: "en" } },
  { maxTextLength: 250 },
);

SDK auto-chunking, end to end

import os
from supertone import Supertone

VOICE_ID = "20160a4c5ba38967330c84"  # replace with your voice ID

LONG_TEXT = (
    "Once upon a time, in a faraway land, there lived a quiet librarian "
    "who collected stories of forgotten kingdoms. Every evening she would "
    "open a leather-bound notebook and continue writing the next chapter "
    "of a tale she had been telling herself for years. ...continue with "
    "many more sentences spanning over 300 characters..."
)

with Supertone(api_key=os.environ["SUPERTONE_API_KEY"]) as client:
    response = client.text_to_speech.create_speech(
        voice_id=VOICE_ID,
        text=LONG_TEXT,
        language="en",
    )

    with open("narration.wav", "wb") as f:
        f.write(response.result.read())
Internally, the SDK splits LONG_TEXT at sentence boundaries (then word boundaries, then character boundaries if a single word is too long), runs up to 3 parallel create_speech requests, and merges the resulting WAV/MP3 audio with intermediate file headers stripped.

Streaming long text

The SDKs also auto-chunk on stream_speech / streamSpeech. The audio is delivered to your iterator as if it were a single continuous stream — you don’t need to know how many segments were used. See Stream speech for the streaming pattern.

Chunking yourself (cURL or raw HTTP)

If you call the REST API directly, you need to split before sending. A reasonable strategy:
  1. Split on sentence-ending punctuation (., !, ?, , ).
  2. If a sentence is still over 300 characters, split on commas, then on word boundaries.
  3. For each segment, call POST /v1/text-to-speech/{voice_id} and append the returned audio to your output file.
  4. For WAV concatenation, strip the WAV header (first 44 bytes) from every segment after the first so the final file plays as one clip.
# Pseudo-shell: split a script and concatenate WAV chunks
VOICE_ID="20160a4c5ba38967330c84"

split_into_sentences "$INPUT_TEXT" > sentences.txt
while read -r line; do
  curl -s -X POST "https://supertoneapi.com/v1/text-to-speech/$VOICE_ID" \
    -H "x-sup-api-key: $SUPERTONE_API_KEY" \
    -H "Content-Type: application/json" \
    -d "{\"text\": \"$line\", \"language\": \"en\"}" \
    >> raw_chunks.bin
done < sentences.txt
# Then merge with your audio tooling (ffmpeg, etc.) and re-attach a single WAV header.
For most projects, prefer the SDKs — they handle these edge cases (cross-segment timing, header stripping, retry on partial failures) so you don’t have to.

Tips

  • Punctuation matters. Auto-chunking prefers sentence boundaries. Well-punctuated input produces cleaner cuts and more natural transitions.
  • Estimate cost first. predict_duration doesn’t auto-chunk, but you can split text yourself and sum durations to estimate total credits.
  • Watch rate limits. A single long input becomes multiple TTS requests — track your account’s rate limits and consider throttling in your own caller.

Long-form narration

End-to-end example for generating a multi-paragraph narration.

Rate limits

Per-minute request limits by tier.