Predict text-to-speech duration
Text to speech
Predict duration
Estimate the length of generated speech for a given text — without producing audio or consuming credits.
POST
Predict text-to-speech duration
Returns the expected length (in seconds) of speech generated from the given input. Useful for cost forecasting, UI hints, and pre-flighting batch jobs.
Duration is returned in seconds as a float.
This endpoint does not consume credits. The same 300-character text limit applies — it is not auto-chunked.
Endpoint
Path parameters
| Name | Required | Description |
|---|---|---|
voice_id | ✅ | The ID of the target voice. |
Request body
Same shape as Create speech —text, language, style, model, voice_settings — minus output_format, include_phonemes, and normalized_text (none of which affect duration).
| Name | Required | Description |
|---|---|---|
text | ✅ | Text to analyze. Max 300 characters. |
language | ✅ | Language code. Must be supported by the voice and the model. |
style | — | Emotional style. Defaults to the voice’s first style. |
model | — | TTS model. Defaults to sona_speech_1. |
voice_settings | — | Affects duration through speed and duration. See Create speech for the full table. |
Request example
Response
Notes
- Use the same
modelandspeedin your prediction as in your eventualcreate_speechcall — both affect the result. Predicting at one speed and generating at another produces mismatched durations. - No credits are deducted. Safe to use as a UI hint or budget pre-flight.
See also
Docs: Cost and usage
How to use predict_duration for forecasting and budgeting.
Create speech
Actually generate the audio once you’ve validated the estimate.
Authorizations
Path Parameters
Body
application/json
The text to convert to speech. Max length is 300 characters.
Maximum string length:
300Language code of the voice
Available options:
en, ko, ja, bg, cs, da, el, es, et, fi, hu, it, nl, pl, pt, ro, ar, de, fr, hi, id, ru, vi, hr, lt, lv, sk, sl, sv, tr, uk The style of character to use for the text-to-speech conversion
The model type to use for the text-to-speech conversion
Available options:
sona_speech_1, sona_speech_2, sona_speech_2_flash, supertonic_api_1, supertonic_api_3 The desired output format of the audio file (wav, mp3). Default is wav.
Available options:
wav, mp3 Response
Returns predicted duration of the audio in seconds