Converts text into speech using a voice of your choice, with configurable voice settings.
| Name | Required | Description |
|---|---|---|
voice_id | Yes | The ID of the target voice. |
sona_speech_1 — en, ko, jasupertonic_api_1 — en, ko, ja, es, ptsona_speech_2 — en, ko, ja, bg, cs, da, el, es, et, fi, hu, it, nl, pl, pt, ro, ar, de, fr, hi, id, ru, vi| Name | Required | Description |
|---|---|---|
text | Yes | The text to convert (max 300 characters). |
language | Yes | Language code. Supported: en, ko, ja, bg, cs, da, el, es, et, fi, hu, it, nl, pl, pt, ro, ar, de, fr, hi, id, ru, vi |
style | No | Emotional style. E.g., neutral, happy, sad, etc. If not specified, the character’s default style is applied |
model | No | TTS model. Default: sona_speech_1. |
output_format | No | Output format. Options: wav, mp3. Default: wav. |
voice_settings | No | Advanced voice parameters (see below). |
include_phonemes | No | If true, returns phoneme timing data along with audio (Base64-encoded). Default: false. |
sona_speech_1 — Supports all Voice Settings listed below.supertonic_api_1 — Supports only the speed setting; all other settings are ignored.sona_speech_2 — Supports only the following Voice Settings: pitch_shift, pitch_variance, speed.| Name | Range | Default | Description |
|---|---|---|---|
pitch_shift | -24 → 24 | 0 | Pitch adjustment in semitones. |
pitch_variance | 0 → 2 | 1 | Degree of pitch variation. |
speed | 0.5 → 2 | 1 | Adjusts the generated audio uniformly faster or slower. (ratio) |
duration | 0 → 60 | 0 | When provided, speech is generated to match the given duration (seconds) |
similarity | 1 → 5 | 3 | Controls how closely the generated speech matches the original character voice. |
text_guidance | 0 → 4 | 1 | Controls how sensitively speech characteristics adapt to the input text content. |
subharmonic_amplitude_control | 0 → 2 | 1 | Controls the amount of subharmonic amplitude of the generated speech. |
include_phonemes, returns:
Binary Audiotext length exceeds 300 characters.speed is applied after duration. (Example: duration=5seconds, speed=2times → final audio ≈ 10seconds)style, but default styles may vary by character, so please call Get Voices API to check the default style (the first value in the styles array is the default).The text to convert to speech
300The language code of the text
en, ko, ja, bg, cs, da, el, es, et, fi, hu, it, nl, pl, pt, ro, ar, de, fr, hi, id, ru, vi The style of character to use for the text-to-speech conversion
The model type to use for the text-to-speech conversion
sona_speech_1, sona_speech_2, supertonic_api_1 The desired output format of the audio file (wav, mp3). Default is wav.
wav, mp3 Return phoneme timing data with the audio
Returns either binary audio or JSON with phoneme data based on include_phonemes parameter
Binary audio file (when include_phonemes=false or omitted)