Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.supertoneapi.com/llms.txt

Use this file to discover all available pages before exploring further.

Supertone offers five TTS models with different trade-offs between quality, latency, language coverage, and configurability. Use this page to choose the model that fits your product.

How to choose

If you need…Pick
The best overall quality, 23 languages — narration, audiobookssona_speech_2
A balance of speed and quality — interactive apps with quality barsona_speech_2_flash
The fastest response with high speech stability, 31 languages — voice agents, real-time interactionsupertonic_api_3
Chunked streaming or the full voice-settings surfacesona_speech_1
The model is selected per-request via the model field. If omitted, the default is sona_speech_1.

Model summary

ModelPositioningLanguagesVoice settingsNotable features
sona_speech_2Highest quality23All except subharmonic_amplitude_controlPhonemes, normalized text
sona_speech_2_flashBalanced speed and quality23pitch_shift, pitch_variance, speed, durationPhonemes, normalized text
supertonic_api_3Ultra-lightweight, lowest latency, improved speech stability31speed only
supertonic_api_1Legacy supertonic model5speed only
sona_speech_1Legacy flagship3All voice settingsStreaming, phonemes

Models in detail

sona_speech_2

The most natural, highest-quality voice on the platform with broad multilingual coverage. Recommended for narration, audiobooks, character dialogue, and production-quality marketing audio — anywhere quality matters more than latency.
  • Languages (23): en, ko, ja, bg, cs, da, el, es, et, fi, hu, it, nl, pl, pt, ro, ar, de, fr, hi, id, ru, vi
  • Voice settings: all parameters except subharmonic_amplitude_control
  • Extras: include_phonemes (timestamps for lip-sync), normalized_text (pronunciation control)
  • Streaming: not supported

sona_speech_2_flash

A lightweight variant of sona_speech_2 optimized for lower latency while keeping the same multilingual coverage. Use it when you care about response time and want acceptable quality — for example, interactive agents or batch generation at scale.
  • Languages (23): same as sona_speech_2
  • Voice settings: pitch_shift, pitch_variance, speed, duration
  • Extras: include_phonemes, normalized_text
  • Streaming: not supported

supertonic_api_3

The next-generation successor to supertonic_api_1 with significantly improved speech stability. Trained differently from the open-weights Supertonic 3 release, this API variant inherits the ultra-low latency profile of supertonic_api_1 while delivering far more reliable pronunciation and reduced reading errors. The best default for voice agents, chatbots, and any real-time experience where time-to-first-audio is the top priority.
  • Languages (31): en, ko, ja, ar, bg, cs, da, de, el, es, et, fi, fr, hi, hr, hu, id, it, lt, lv, nl, pl, pt, ro, ru, sk, sl, sv, tr, uk, vi
  • Voice settings: speed only — all other settings are silently ignored
  • Extras:
  • Streaming: not supported (but per-call latency is so low that streaming is usually unnecessary)

supertonic_api_1

The legacy supertonic model. Superseded by supertonic_api_3, which offers broader language coverage and dramatically better speech stability at the same latency profile. Pick supertonic_api_1 only if you have an existing integration pinned to it; new projects should use supertonic_api_3.
  • Languages (5): en, ko, ja, es, pt
  • Voice settings: speed only — all other settings are silently ignored
  • Extras:
  • Streaming: not supported

sona_speech_1

The legacy flagship. It supports the full voice-settings surface and is the only model that currently supports chunked streaming (stream_speech). For most use cases the newer models are a better starting point; pick sona_speech_1 if you specifically need stream_speech output or the full set of fine-tuning parameters (similarity, text_guidance, subharmonic_amplitude_control).
  • Languages (3): en, ko, ja
  • Voice settings: all parameters
  • Extras: include_phonemes
  • Streaming: supported

Supported languages

language is required on every TTS request and must be a value supported by both the model and the chosen voice (check the voice’s language array).
CodeLanguagesona_speech_2sona_speech_2_flashsupertonic_api_3supertonic_api_1sona_speech_1
enEnglish
koKorean
jaJapanese
esSpanish
ptPortuguese
deGerman
frFrench
itItalian
nlDutch
plPolish
roRomanian
csCzech
daDanish
elGreek
etEstonian
fiFinnish
huHungarian
bgBulgarian
arArabic
hiHindi
idIndonesian
ruRussian
viVietnamese
hrCroatian
ltLithuanian
lvLatvian
skSlovak
slSlovenian
svSwedish
trTurkish
ukUkrainian
Pass the language as a lowercase ISO code string:
response = client.text_to_speech.create_speech(
    voice_id=VOICE_ID,
    text="Hello!",
    language="en",
    model="sona_speech_2",
)
For multilingual content, fire one request per language rather than mixing languages inside a single text. For Japanese inputs with kanji, numbers, units, or symbols, see Normalized text.

Feature support matrix

Featuresona_speech_2sona_speech_2_flashsupertonic_api_3supertonic_api_1sona_speech_1
Streaming (stream_speech)
include_phonemes
normalized_text
pitch_shift, pitch_variance, speed, durationspeed onlyspeed only
similarity, text_guidance
subharmonic_amplitude_control

Voice settings

Reference for every voice-setting parameter and its supported models.

Voices

Find a voice ID that matches your language and style requirements.

On-device TTS

Looking to run TTS locally on CPU, with no API call and no network round-trip? Supertone also publishes an open-weights model in the same Supertonic 3 family — Supertonic 3 (99M parameters, ONNX Runtime, OpenRAIL-M license).
Supertonic 3 (open-weights) is a different model from supertonic_api_3. They share the same family name and lineage, but were trained differently and produce different audio. The API model (supertonic_api_3) is what’s exposed by this API; the open-weights model is a separate on-device release. Don’t assume parity in voice quality, supported voices, or behavior.

Supertonic 3 — On-device TTS ↗

99M-parameter open-weights TTS that runs locally on CPU via ONNX Runtime — 31 languages, no GPU, no cloud, no API. A separate model from supertonic_api_3; visit the project site for weights, samples, and SDKs (Python, Node.js, Web, iOS, Android, C++).