Skip to main content
POST
/
v1
/
predict-duration
/
{voice_id}
Predict text-to-speech duration
curl --request POST \
  --url https://supertoneapi.com/v1/predict-duration/{voice_id} \
  --header 'Content-Type: application/json' \
  --header 'x-sup-api-key: <api-key>' \
  --data '
{
  "text": "<string>",
  "style": "<string>",
  "model": "sona_speech_1",
  "output_format": "wav",
  "voice_settings": {
    "pitch_shift": 0,
    "pitch_variance": 1,
    "speed": 1,
    "duration": 0,
    "similarity": 3,
    "text_guidance": 1,
    "subharmonic_amplitude_control": 1
  }
}
'
{
  "duration": 123
}
Returns the expected length (in seconds) of speech generated from the given input. Useful for cost forecasting, UI hints, and pre-flighting batch jobs.
This endpoint does not consume credits. The same 300-character text limit applies — it is not auto-chunked.

Endpoint

POST https://supertoneapi.com/v1/predict-duration/{voice_id}

Path parameters

NameRequiredDescription
voice_idThe ID of the target voice.

Request body

Same shape as Create speechtext, language, style, model, voice_settings — minus output_format, include_phonemes, and normalized_text (none of which affect duration).
NameRequiredDescription
textText to analyze. Max 300 characters.
languageLanguage code. Must be supported by the voice and the model.
styleEmotional style. Defaults to the voice’s first style.
modelTTS model. Defaults to sona_speech_1.
voice_settingsAffects duration through speed and duration. See Create speech for the full table.

Request example

POST /v1/predict-duration/20160a4c5ba38967330c84
x-sup-api-key: $SUPERTONE_API_KEY
Content-Type: application/json

{
  "text": "This is a long-form sentence for duration prediction.",
  "language": "en",
  "style": "neutral"
}

Response

{
  "duration": 3.57
}
Duration is returned in seconds as a float.

Notes

  • Use the same model and speed in your prediction as in your eventual create_speech call — both affect the result. Predicting at one speed and generating at another produces mismatched durations.
  • No credits are deducted. Safe to use as a UI hint or budget pre-flight.

See also

Docs: Cost and usage

How to use predict_duration for forecasting and budgeting.

Create speech

Actually generate the audio once you’ve validated the estimate.

Authorizations

x-sup-api-key
string
header
required

Path Parameters

voice_id
string
required

Body

application/json
text
string
required

The text to convert to speech. Max length is 300 characters.

Maximum string length: 300
language
enum<string>
required

Language code of the voice

Available options:
en,
ko,
ja,
bg,
cs,
da,
el,
es,
et,
fi,
hu,
it,
nl,
pl,
pt,
ro,
ar,
de,
fr,
hi,
id,
ru,
vi,
hr,
lt,
lv,
sk,
sl,
sv,
tr,
uk
style
string

The style of character to use for the text-to-speech conversion

model
enum<string>
default:sona_speech_1

The model type to use for the text-to-speech conversion

Available options:
sona_speech_1,
sona_speech_2,
sona_speech_2_flash,
supertonic_api_1,
supertonic_api_3
output_format
enum<string>
default:wav

The desired output format of the audio file (wav, mp3). Default is wav.

Available options:
wav,
mp3
voice_settings
object

Response

Returns predicted duration of the audio in seconds

duration
number