Long-form narration - Supertone API Documentation

This example takes a multi-paragraph script and produces a single audio file. It demonstrates the SDK’s automatic chunking: you pass the whole script to create_speech, the SDK splits at sentence boundaries, generates each segment, and merges the result.

Python

import os
from supertone import Supertone

VOICE_ID = "20160a4c5ba38967330c84"  # replace with your voice ID

SCRIPT = """
Chapter one. The clocktower struck midnight, and the wind through the old
square carried whispers from the workshops below. Hana adjusted her coat,
checked the address one more time, and stepped through the iron gate.

Inside, the air smelled of copper and lemon polish. Rows of half-finished
automatons stared back from the shelves, each one waiting for a name. She
set her satchel on the bench and opened her notebook to a fresh page.

By dawn, the room had changed. One of the figures by the window was no
longer half-finished, and Hana, very quietly, was no longer alone.
""".strip()

with Supertone(api_key=os.environ["SUPERTONE_API_KEY"]) as client:
    response = client.text_to_speech.create_speech(
        voice_id=VOICE_ID,
        text=SCRIPT,
        language="en",
        model="sona_speech_2",
        voice_settings={"pitch_variance": 0.9, "speed": 0.95},
    )

    with open("narration.wav", "wb") as f:
        f.write(response.result.read())

print("Saved narration.wav")

TypeScript

import { Supertone } from "@supertone/supertone";
import * as fs from "node:fs";

const VOICE_ID = "20160a4c5ba38967330c84"; // replace with your voice ID

const SCRIPT = `
Chapter one. The clocktower struck midnight, and the wind through the old
square carried whispers from the workshops below. Hana adjusted her coat,
checked the address one more time, and stepped through the iron gate.

Inside, the air smelled of copper and lemon polish. Rows of half-finished
automatons stared back from the shelves, each one waiting for a name. She
set her satchel on the bench and opened her notebook to a fresh page.

By dawn, the room had changed. One of the figures by the window was no
longer half-finished, and Hana, very quietly, was no longer alone.
`.trim();

const client = new Supertone({ apiKey: process.env.SUPERTONE_API_KEY });

const response = await client.textToSpeech.createSpeech({
  voiceId: VOICE_ID,
  apiConvertTextToSpeechUsingCharacterRequest: {
    text: SCRIPT,
    language: "en",
    model: "sona_speech_2",
    voiceSettings: { pitchVariance: 0.9, speed: 0.95 },
  },
});

if (response.result instanceof Uint8Array) {
  fs.writeFileSync("narration.wav", response.result);
} else if (response.result && "getReader" in response.result) {
  const reader = (response.result as ReadableStream<Uint8Array>).getReader();
  const chunks: Uint8Array[] = [];
  while (true) {
    const { done, value } = await reader.read();
    if (done) break;
    if (value) chunks.push(value);
  }
  fs.writeFileSync("narration.wav", Buffer.concat(chunks));
}

console.log("Saved narration.wav");

What happens under the hood

The SDK detects that SCRIPT.length > 300.
It splits on sentence punctuation first, then word boundaries if a sentence is itself too long.
The Python SDK fires up to 3 parallel create_speech requests; the TypeScript SDK runs them sequentially.
Each segment returns a complete audio file.
The SDK strips the WAV header from every segment after the first and concatenates the bytes into a single continuous clip.
You get one playable file back, identical in form to a single-segment response.

Tips

Punctuation pays off. Well-punctuated source text produces cleaner cuts. If your script comes from machine translation or transcription, adding ./?/! improves the result.
Voice settings travel. The same voice_settings are applied to every segment, so the merged audio sounds consistent.
Estimate first. predict_duration doesn’t auto-chunk, but you can split your script into a few sentences, call predict_duration on each, and sum the durations to estimate cost.
Pick the right model. For long narration, sona_speech_2 produces the most natural delivery. Switch to sona_speech_2_flash if you need to generate many narrations quickly.

Long text

Full reference on the 300-character limit and chunking behavior.

Voice settings

Tune the delivery of your narration.

​Python

​TypeScript

​What happens under the hood

​Tips

​Related