Supertone offers five TTS models with different trade-offs between quality, latency, language coverage, and configurability. Use this page to choose the model that fits your product.Documentation Index
Fetch the complete documentation index at: https://docs.supertoneapi.com/llms.txt
Use this file to discover all available pages before exploring further.
How to choose
| If you need… | Pick |
|---|---|
| The best overall quality, 23 languages — narration, audiobooks | sona_speech_2 |
| A balance of speed and quality — interactive apps with quality bar | sona_speech_2_flash |
| The fastest response with high speech stability, 31 languages — voice agents, real-time interaction | supertonic_api_3 |
| Chunked streaming or the full voice-settings surface | sona_speech_1 |
model field. If omitted, the default is sona_speech_1.
Model summary
| Model | Positioning | Languages | Voice settings | Notable features |
|---|---|---|---|---|
sona_speech_2 | Highest quality | 23 | All except subharmonic_amplitude_control | Phonemes, normalized text |
sona_speech_2_flash | Balanced speed and quality | 23 | pitch_shift, pitch_variance, speed, duration | Phonemes, normalized text |
supertonic_api_3 | Ultra-lightweight, lowest latency, improved speech stability | 31 | speed only | — |
supertonic_api_1 | Legacy supertonic model | 5 | speed only | — |
sona_speech_1 | Legacy flagship | 3 | All voice settings | Streaming, phonemes |
Models in detail
sona_speech_2
The most natural, highest-quality voice on the platform with broad multilingual coverage. Recommended for narration, audiobooks, character dialogue, and production-quality marketing audio — anywhere quality matters more than latency.- Languages (23):
en,ko,ja,bg,cs,da,el,es,et,fi,hu,it,nl,pl,pt,ro,ar,de,fr,hi,id,ru,vi - Voice settings: all parameters except
subharmonic_amplitude_control - Extras:
include_phonemes(timestamps for lip-sync),normalized_text(pronunciation control) - Streaming: not supported
sona_speech_2_flash
A lightweight variant ofsona_speech_2 optimized for lower latency while keeping the same multilingual coverage. Use it when you care about response time and want acceptable quality — for example, interactive agents or batch generation at scale.
- Languages (23): same as
sona_speech_2 - Voice settings:
pitch_shift,pitch_variance,speed,duration - Extras:
include_phonemes,normalized_text - Streaming: not supported
supertonic_api_3
The next-generation successor tosupertonic_api_1 with significantly improved speech stability. Trained differently from the open-weights Supertonic 3 release, this API variant inherits the ultra-low latency profile of supertonic_api_1 while delivering far more reliable pronunciation and reduced reading errors. The best default for voice agents, chatbots, and any real-time experience where time-to-first-audio is the top priority.
- Languages (31):
en,ko,ja,ar,bg,cs,da,de,el,es,et,fi,fr,hi,hr,hu,id,it,lt,lv,nl,pl,pt,ro,ru,sk,sl,sv,tr,uk,vi - Voice settings:
speedonly — all other settings are silently ignored - Extras: —
- Streaming: not supported (but per-call latency is so low that streaming is usually unnecessary)
supertonic_api_1
The legacy supertonic model. Superseded bysupertonic_api_3, which offers broader language coverage and dramatically better speech stability at the same latency profile. Pick supertonic_api_1 only if you have an existing integration pinned to it; new projects should use supertonic_api_3.
- Languages (5):
en,ko,ja,es,pt - Voice settings:
speedonly — all other settings are silently ignored - Extras: —
- Streaming: not supported
sona_speech_1
The legacy flagship. It supports the full voice-settings surface and is the only model that currently supports chunked streaming (stream_speech). For most use cases the newer models are a better starting point; pick sona_speech_1 if you specifically need stream_speech output or the full set of fine-tuning parameters (similarity, text_guidance, subharmonic_amplitude_control).
- Languages (3):
en,ko,ja - Voice settings: all parameters
- Extras:
include_phonemes - Streaming: supported
Supported languages
language is required on every TTS request and must be a value supported by both the model and the chosen voice (check the voice’s language array).
| Code | Language | sona_speech_2 | sona_speech_2_flash | supertonic_api_3 | supertonic_api_1 | sona_speech_1 |
|---|---|---|---|---|---|---|
en | English | ✅ | ✅ | ✅ | ✅ | ✅ |
ko | Korean | ✅ | ✅ | ✅ | ✅ | ✅ |
ja | Japanese | ✅ | ✅ | ✅ | ✅ | ✅ |
es | Spanish | ✅ | ✅ | ✅ | ✅ | — |
pt | Portuguese | ✅ | ✅ | ✅ | ✅ | — |
de | German | ✅ | ✅ | ✅ | — | — |
fr | French | ✅ | ✅ | ✅ | — | — |
it | Italian | ✅ | ✅ | ✅ | — | — |
nl | Dutch | ✅ | ✅ | ✅ | — | — |
pl | Polish | ✅ | ✅ | ✅ | — | — |
ro | Romanian | ✅ | ✅ | ✅ | — | — |
cs | Czech | ✅ | ✅ | ✅ | — | — |
da | Danish | ✅ | ✅ | ✅ | — | — |
el | Greek | ✅ | ✅ | ✅ | — | — |
et | Estonian | ✅ | ✅ | ✅ | — | — |
fi | Finnish | ✅ | ✅ | ✅ | — | — |
hu | Hungarian | ✅ | ✅ | ✅ | — | — |
bg | Bulgarian | ✅ | ✅ | ✅ | — | — |
ar | Arabic | ✅ | ✅ | ✅ | — | — |
hi | Hindi | ✅ | ✅ | ✅ | — | — |
id | Indonesian | ✅ | ✅ | ✅ | — | — |
ru | Russian | ✅ | ✅ | ✅ | — | — |
vi | Vietnamese | ✅ | ✅ | ✅ | — | — |
hr | Croatian | — | — | ✅ | — | — |
lt | Lithuanian | — | — | ✅ | — | — |
lv | Latvian | — | — | ✅ | — | — |
sk | Slovak | — | — | ✅ | — | — |
sl | Slovenian | — | — | ✅ | — | — |
sv | Swedish | — | — | ✅ | — | — |
tr | Turkish | — | — | ✅ | — | — |
uk | Ukrainian | — | — | ✅ | — | — |
text. For Japanese inputs with kanji, numbers, units, or symbols, see Normalized text.
Feature support matrix
| Feature | sona_speech_2 | sona_speech_2_flash | supertonic_api_3 | supertonic_api_1 | sona_speech_1 |
|---|---|---|---|---|---|
Streaming (stream_speech) | — | — | — | — | ✅ |
include_phonemes | ✅ | ✅ | — | — | ✅ |
normalized_text | ✅ | ✅ | — | — | — |
pitch_shift, pitch_variance, speed, duration | ✅ | ✅ | speed only | speed only | ✅ |
similarity, text_guidance | ✅ | — | — | — | ✅ |
subharmonic_amplitude_control | — | — | — | — | ✅ |
Related
Voice settings
Reference for every voice-setting parameter and its supported models.
Voices
Find a voice ID that matches your language and style requirements.
On-device TTS
Looking to run TTS locally on CPU, with no API call and no network round-trip? Supertone also publishes an open-weights model in the same Supertonic 3 family — Supertonic 3 (99M parameters, ONNX Runtime, OpenRAIL-M license).Supertonic 3 — On-device TTS ↗
99M-parameter open-weights TTS that runs locally on CPU via ONNX Runtime — 31 languages, no GPU, no cloud, no API. A separate model from
supertonic_api_3; visit the project site for weights, samples, and SDKs (Python, Node.js, Web, iOS, Android, C++).