Documentation Index
Fetch the complete documentation index at: https://docs.supertoneapi.com/llms.txt
Use this file to discover all available pages before exploring further.
voice_settings is an optional object on every TTS request that tunes how the audio is delivered — pitch, intonation, speed, and a few advanced parameters for the flagship models.
Quick reference
| Setting | Range | Default | What it does |
|---|---|---|---|
pitch_shift | -24 → 24 | 0 | Semitone shift. ±12 is one full octave. |
pitch_variance | 0 → 2 | 1 | How much the pitch varies — lower is flatter, higher is more animated. |
speed | 0.5 → 2 | 1 | Playback rate multiplier. Applied after duration. |
duration | 0 → 60 | 0 | Forces the generated audio to a target length in seconds (0 = no target). |
similarity | 1 → 5 | 3 | How closely the output matches the original character voice. |
text_guidance | 0 → 4 | 1 | How sensitively delivery adapts to the text content. |
subharmonic_amplitude_control | 0 → 2 | 1 | Amount of subharmonic amplitude in the generated speech. |
Setting voice parameters
- Python
- TypeScript
- cURL
Support by model
Not every model honors every setting. Unsupported settings are silently ignored, so asubharmonic_amplitude_control value on supertonic_api_3 won’t error — it just won’t change the output.
| Setting | sona_speech_2 | sona_speech_2_flash | supertonic_api_3 | supertonic_api_1 | sona_speech_1 |
|---|---|---|---|---|---|
pitch_shift | ✅ | ✅ | — | — | ✅ |
pitch_variance | ✅ | ✅ | — | — | ✅ |
speed | ✅ | ✅ | ✅ | ✅ | ✅ |
duration | ✅ | ✅ | — | — | ✅ |
similarity | ✅ | — | — | — | ✅ |
text_guidance | ✅ | — | — | — | ✅ |
subharmonic_amplitude_control | — | — | — | — | ✅ |
How parameters interact
pitch_shiftis in semitones.+12raises the voice by a full octave. Use small values (±1 to ±4) for natural-sounding adjustments; large values start to sound robotic.pitch_variancecontrols expressiveness. Set to 0 for monotone (good for instructional, news-reading style), or up to 2 for very expressive delivery.durationthenspeed. If both are set, the engine first targetsdurationseconds, thenspeedis applied as a multiplier. Settingduration=5withspeed=2produces roughly 10 seconds of audio.similarityandtext_guidanceare most useful on cloned voices andsona_speech_2/sona_speech_1. Highersimilarityadheres more strictly to the source voice. Highertext_guidancelets delivery shift to match the emotional tone of the text.
Recipes
Calm, slow narration:Related
Models
See which model supports which voice settings.
API reference
Full request and response schema.