POST
/
v1
/
predict-duration
/
{voice_id}
Predict text-to-speech duration
curl --request POST \
  --url https://supertoneapi.com/v1/predict-duration/{voice_id} \
  --header 'Content-Type: application/json' \
  --header 'x-sup-api-key: <x-sup-api-key>' \
  --data '{
  "text": "<string>",
  "language": "en",
  "style": "<string>",
  "model": "sona_speech_1",
  "output_format": "wav",
  "voice_settings": {
    "pitch_shift": 0,
    "pitch_variance": 1,
    "speed": 1
  }
}'
{
  "duration": 123
}
This API does not actually generate speech, but only returns the expected speech length (in seconds) based on the input text.
It’s useful for understanding expected credit consumption or adjusting text length before making TTS calls.

Request

  • The calling method and Request Body are almost identical to the text-to-speech API.
  • However, only the duration value is returned as a result, not audio.
  • No credits are consumed when calling the Predict Duration API.

Request Body

ItemRequiredDescription
textText to analyze. Maximum 300 characters
languageText language. One of ko, en, ja
styleEmotional style. Default style is used if not specified
modelDefault is sona_speech_1. Currently only this model is available
voice_settingsSpeech speed or pitch adjustment values. May affect result length

Request Example

POST /v1/predict-duration/{voice_id}
Content-Type: application/json
x-sup-api-key: [YOUR_API_KEY]

{
  "text": "This is a long-form sentence for duration prediction.",
  "language": "en",
  "style": "neutral"
}

Response Example

{
  "duration": 3.57381983
}
This means that generating this text would create approximately 3.57 seconds of audio.

Tips

  • Credits are not actually deducted. (because no speech generation occurs)
  • You can get results very similar to when actually calling with the same text.
  • Since adjusting voice_settings.speed changes the length, it’s better to test with a fixed speech speed.

Headers

x-sup-api-key
string
required

API key for the service

Path Parameters

voice_id
string
required

Body

application/json

Response

200
application/json

Returns predicted duration of the audio in seconds

The response is of type object.