Models - Supertone API Documentation

Supertone offers five TTS models with different trade-offs between quality, latency, language coverage, and configurability. Use this page to choose the model that fits your product.

How to choose

If you need…	Pick
The best overall quality, 23 languages — narration, audiobooks	`sona_speech_2`
A balance of speed and quality — interactive apps with quality bar	`sona_speech_2_flash`
The fastest response with high speech stability, 31 languages — voice agents, real-time interaction	`supertonic_api_3`
Chunked streaming or the full voice-settings surface	`sona_speech_1`

The model is selected per-request via the model field. If omitted, the default is sona_speech_1.

Model summary

Model	Positioning	Languages	Voice settings	Notable features
`sona_speech_2`	Highest quality	23	All except `subharmonic_amplitude_control`	Phonemes, normalized text
`sona_speech_2_flash`	Balanced speed and quality	23	`pitch_shift`, `pitch_variance`, `speed`, `duration`	Phonemes, normalized text
`supertonic_api_3`	Ultra-lightweight, lowest latency, improved speech stability	31	`speed` only	—
`supertonic_api_1`	Legacy supertonic model	5	`speed` only	—
`sona_speech_1`	Legacy flagship	3	All voice settings	Streaming, phonemes

Models in detail

sona_speech_2

The most natural, highest-quality voice on the platform with broad multilingual coverage. Recommended for narration, audiobooks, character dialogue, and production-quality marketing audio — anywhere quality matters more than latency.

Languages (23): en, ko, ja, bg, cs, da, el, es, et, fi, hu, it, nl, pl, pt, ro, ar, de, fr, hi, id, ru, vi
Voice settings: all parameters except subharmonic_amplitude_control
Extras: include_phonemes (timestamps for lip-sync), normalized_text (pronunciation control)
Streaming: not supported

sona_speech_2_flash

A lightweight variant of sona_speech_2 optimized for lower latency while keeping the same multilingual coverage. Use it when you care about response time and want acceptable quality — for example, interactive agents or batch generation at scale.

Languages (23): same as sona_speech_2
Voice settings: pitch_shift, pitch_variance, speed, duration
Extras: include_phonemes, normalized_text
Streaming: not supported

supertonic_api_3

The next-generation successor to supertonic_api_1 with significantly improved speech stability. Trained differently from the open-weights Supertonic 3 release, this API variant inherits the ultra-low latency profile of supertonic_api_1 while delivering far more reliable pronunciation and reduced reading errors. The best default for voice agents, chatbots, and any real-time experience where time-to-first-audio is the top priority.

Languages (31): en, ko, ja, ar, bg, cs, da, de, el, es, et, fi, fr, hi, hr, hu, id, it, lt, lv, nl, pl, pt, ro, ru, sk, sl, sv, tr, uk, vi
Voice settings: speed only — all other settings are silently ignored
Extras: —
Streaming: not supported (but per-call latency is so low that streaming is usually unnecessary)

supertonic_api_1

The legacy supertonic model. Superseded by supertonic_api_3, which offers broader language coverage and dramatically better speech stability at the same latency profile. Pick supertonic_api_1 only if you have an existing integration pinned to it; new projects should use supertonic_api_3.

Languages (5): en, ko, ja, es, pt
Voice settings: speed only — all other settings are silently ignored
Extras: —
Streaming: not supported

sona_speech_1

The legacy flagship. It supports the full voice-settings surface and is the only model that currently supports chunked streaming (stream_speech). For most use cases the newer models are a better starting point; pick sona_speech_1 if you specifically need stream_speech output or the full set of fine-tuning parameters (similarity, text_guidance, subharmonic_amplitude_control).

Languages (3): en, ko, ja
Voice settings: all parameters
Extras: include_phonemes
Streaming: supported

Supported languages

language is required on every TTS request and must be a value supported by both the model and the chosen voice (check the voice’s language array).

Code	Language	`sona_speech_2`	`sona_speech_2_flash`	`supertonic_api_3`	`supertonic_api_1`	`sona_speech_1`
`en`	English	✅	✅	✅	✅	✅
`ko`	Korean	✅	✅	✅	✅	✅
`ja`	Japanese	✅	✅	✅	✅	✅
`es`	Spanish	✅	✅	✅	✅	—
`pt`	Portuguese	✅	✅	✅	✅	—
`de`	German	✅	✅	✅	—	—
`fr`	French	✅	✅	✅	—	—
`it`	Italian	✅	✅	✅	—	—
`nl`	Dutch	✅	✅	✅	—	—
`pl`	Polish	✅	✅	✅	—	—
`ro`	Romanian	✅	✅	✅	—	—
`cs`	Czech	✅	✅	✅	—	—
`da`	Danish	✅	✅	✅	—	—
`el`	Greek	✅	✅	✅	—	—
`et`	Estonian	✅	✅	✅	—	—
`fi`	Finnish	✅	✅	✅	—	—
`hu`	Hungarian	✅	✅	✅	—	—
`bg`	Bulgarian	✅	✅	✅	—	—
`ar`	Arabic	✅	✅	✅	—	—
`hi`	Hindi	✅	✅	✅	—	—
`id`	Indonesian	✅	✅	✅	—	—
`ru`	Russian	✅	✅	✅	—	—
`vi`	Vietnamese	✅	✅	✅	—	—
`hr`	Croatian	—	—	✅	—	—
`lt`	Lithuanian	—	—	✅	—	—
`lv`	Latvian	—	—	✅	—	—
`sk`	Slovak	—	—	✅	—	—
`sl`	Slovenian	—	—	✅	—	—
`sv`	Swedish	—	—	✅	—	—
`tr`	Turkish	—	—	✅	—	—
`uk`	Ukrainian	—	—	✅	—	—

Pass the language as a lowercase ISO code string:

response = client.text_to_speech.create_speech(
    voice_id=VOICE_ID,
    text="Hello!",
    language="en",
    model="sona_speech_2",
)

For multilingual content, fire one request per language rather than mixing languages inside a single text. For Japanese inputs with kanji, numbers, units, or symbols, see Normalized text.

Feature support matrix

Feature	`sona_speech_2`	`sona_speech_2_flash`	`supertonic_api_3`	`supertonic_api_1`	`sona_speech_1`
Streaming (`stream_speech`)	—	—	—	—	✅
`include_phonemes`	✅	✅	—	—	✅
`normalized_text`	✅	✅	—	—	—
`pitch_shift`, `pitch_variance`, `speed`, `duration`	✅	✅	`speed` only	`speed` only	✅
`similarity`, `text_guidance`	✅	—	—	—	✅
`subharmonic_amplitude_control`	—	—	—	—	✅

Voice settings

Reference for every voice-setting parameter and its supported models.

Voices

Find a voice ID that matches your language and style requirements.

On-device TTS

Looking to run TTS locally on CPU, with no API call and no network round-trip? Supertone also publishes an open-weights model in the same Supertonic 3 family — Supertonic 3 (99M parameters, ONNX Runtime, OpenRAIL-M license).

Supertonic 3 (open-weights) is a different model from supertonic_api_3. They share the same family name and lineage, but were trained differently and produce different audio. The API model (supertonic_api_3) is what’s exposed by this API; the open-weights model is a separate on-device release. Don’t assume parity in voice quality, supported voices, or behavior.

Supertonic 3 — On-device TTS ↗

99M-parameter open-weights TTS that runs locally on CPU via ONNX Runtime — 31 languages, no GPU, no cloud, no API. A separate model from supertonic_api_3; visit the project site for weights, samples, and SDKs (Python, Node.js, Web, iOS, Android, C++).

​How to choose

​Model summary

​Models in detail

​sona_speech_2

​sona_speech_2_flash

​supertonic_api_3

​supertonic_api_1

​sona_speech_1

​Supported languages

​Feature support matrix

​Related

Voice settings

Voices

​On-device TTS

Supertonic 3 — On-device TTS ↗

How to choose

Model summary

Models in detail

sona_speech_2

sona_speech_2_flash

supertonic_api_3

supertonic_api_1

sona_speech_1

Supported languages

Feature support matrix

Related

On-device TTS