Endpoint
Path Parameters
Name | Required | Description |
---|---|---|
voice_id | Yes | The ID of the target voice. |
Request Body
Content-Type: application/jsonName | Required | Description |
---|---|---|
text | Yes | The text to convert (max 300 characters). |
language | Yes | Language code. Supported: en , ko , ja . |
style | No | Emotional style. E.g., neutral , happy , sad , etc. If not specified, the character’s default style is applied |
model | No | TTS model. Default: sona_speech_1 . |
output_format | No | Output format. Options: wav , mp3 . Default: wav . |
voice_settings | No | Advanced voice parameters (see below). |
include_phonemes | No | If true , returns phoneme timing data along with audio (Base64-encoded). Default: false . |
Voice Settings (optional)
Name | Range | Default | Description |
---|---|---|---|
pitch_shift | -24 → 24 | 0 | Pitch adjustment in semitones. |
pitch_variance | 0 → 2 | 1 | Degree of pitch variation. |
speed | 0.5 → 2 | 1 | Adjusts the generated audio uniformly faster or slower. (ratio) |
duration | 0 → 60 | 0 | When provided, speech is generated to match the given duration (seconds) |
similarity | 1 → 5 | 3 | Controls how closely the generated speech matches the original character voice. |
text_guidance | 0 → 4 | 1 | Controls how sensitively speech characteristics adapt to the input text content. |
subharmonic_amplitude_control | 0 → 2 | 1 | Controls the amount of subharmonic amplitude of the generated speech. |
Response
Depending oninclude_phonemes
, returns:
Binary Audio(Default & when include_phonemes=false)
audio/wav – Raw WAV file.
audio/mpeg – Raw MP3 file. JSON with Phoneme Data
(when include_phonemes=true)
Headers:
X-Audio-Length (number) – Duration of the audio in seconds.Notes
- A 400 error will occur if the
text
length exceeds 300 characters. speed
is applied afterduration
. (Example: duration=5seconds, speed=2times → final audio ≈ 10seconds)- Calls are possible even without
style
, but default styles may vary by character, so please call Get Voices API to check the default style (the first value in the styles array is the default). - The audio file in the response can be directly saved or played (appropriate handling required depending on client).
Authorizations
Path Parameters
Body
application/json
The text to convert to speech
Maximum length:
300
The language code of the text
Available options:
en
, ko
, ja
The style of character to use for the text-to-speech conversion
The model type to use for the text-to-speech conversion
The desired output format of the audio file (wav, mp3). Default is wav.
Available options:
wav
, mp3
Return phoneme timing data with the audio
Response
Returns either binary audio or JSON with phoneme data based on include_phonemes parameter
Binary audio file (when include_phonemes=false or omitted)