Use this file to discover all available pages before exploring further.
This example takes a multi-paragraph script and produces a single audio file. It demonstrates the SDK’s automatic chunking: you pass the whole script to create_speech, the SDK splits at sentence boundaries, generates each segment, and merges the result.
import osfrom supertone import SupertoneVOICE_ID = "20160a4c5ba38967330c84" # replace with your voice IDSCRIPT = """Chapter one. The clocktower struck midnight, and the wind through the oldsquare carried whispers from the workshops below. Hana adjusted her coat,checked the address one more time, and stepped through the iron gate.Inside, the air smelled of copper and lemon polish. Rows of half-finishedautomatons stared back from the shelves, each one waiting for a name. Sheset her satchel on the bench and opened her notebook to a fresh page.By dawn, the room had changed. One of the figures by the window was nolonger half-finished, and Hana, very quietly, was no longer alone.""".strip()with Supertone(api_key=os.environ["SUPERTONE_API_KEY"]) as client: response = client.text_to_speech.create_speech( voice_id=VOICE_ID, text=SCRIPT, language="en", model="sona_speech_2", voice_settings={"pitch_variance": 0.9, "speed": 0.95}, ) with open("narration.wav", "wb") as f: f.write(response.result.read())print("Saved narration.wav")
import { Supertone } from "@supertone/supertone";import * as fs from "node:fs";const VOICE_ID = "20160a4c5ba38967330c84"; // replace with your voice IDconst SCRIPT = `Chapter one. The clocktower struck midnight, and the wind through the oldsquare carried whispers from the workshops below. Hana adjusted her coat,checked the address one more time, and stepped through the iron gate.Inside, the air smelled of copper and lemon polish. Rows of half-finishedautomatons stared back from the shelves, each one waiting for a name. Sheset her satchel on the bench and opened her notebook to a fresh page.By dawn, the room had changed. One of the figures by the window was nolonger half-finished, and Hana, very quietly, was no longer alone.`.trim();const client = new Supertone({ apiKey: process.env.SUPERTONE_API_KEY });const response = await client.textToSpeech.createSpeech({ voiceId: VOICE_ID, apiConvertTextToSpeechUsingCharacterRequest: { text: SCRIPT, language: "en", model: "sona_speech_2", voiceSettings: { pitchVariance: 0.9, speed: 0.95 }, },});if (response.result instanceof Uint8Array) { fs.writeFileSync("narration.wav", response.result);} else if (response.result && "getReader" in response.result) { const reader = (response.result as ReadableStream<Uint8Array>).getReader(); const chunks: Uint8Array[] = []; while (true) { const { done, value } = await reader.read(); if (done) break; if (value) chunks.push(value); } fs.writeFileSync("narration.wav", Buffer.concat(chunks));}console.log("Saved narration.wav");
Punctuation pays off. Well-punctuated source text produces cleaner cuts. If your script comes from machine translation or transcription, adding ./?/! improves the result.
Voice settings travel. The same voice_settings are applied to every segment, so the merged audio sounds consistent.
Estimate first.predict_duration doesn’t auto-chunk, but you can split your script into a few sentences, call predict_duration on each, and sum the durations to estimate cost.
Pick the right model. For long narration, sona_speech_2 produces the most natural delivery. Switch to sona_speech_2_flash if you need to generate many narrations quickly.