It’s useful for understanding expected credit consumption or adjusting text length before making TTS calls.
Endpoint
Notes
- The calling method and Request Body are almost identical to the
text-to-speech
API. - However, only the
duration
value is returned as a result, not audio. - No credits are consumed when calling the Predict Duration API.
- Credits are not actually deducted. (because no speech generation occurs)
- You can get results very similar to when actually calling with the same text.
- Since adjusting
voice_settings.speed
changes the length, it’s better to test with a fixed speech speed.
Request Body
Item | Required | Description |
---|---|---|
text | Yes | Text to analyze. Maximum 300 characters |
language | Yes | Text language. One of ko , en , ja |
style | No | Emotional style. Default style is used if not specified |
model | No | Default is sona_speech_1 . Currently only this model is available |
voice_settings | No | Speech speed or pitch adjustment values. May affect result length |
Request Example
Response Example
Authorizations
Path Parameters
Body
application/json
The text to convert to speech. Max length is 300 characters.
Maximum length:
300
Language code of the voice
Available options:
en
, ko
, ja
The style of character to use for the text-to-speech conversion
The model type to use for the text-to-speech conversion
The desired output format of the audio file (wav, mp3). Default is wav.
Available options:
wav
, mp3
Response
Returns predicted duration of the audio in seconds