POST
/
v1
/
predict-duration
/
{voice_id}
curl --request POST \
  --url https://supertoneapi.com/v1/predict-duration/{voice_id} \
  --header 'Content-Type: application/json' \
  --header 'x-sup-api-key: <x-sup-api-key>' \
  --data '{
  "text": "<string>",
  "language": "en",
  "style": "<string>",
  "model": "sona_speech_1",
  "voice_settings": {
    "pitch_shift": 0,
    "pitch_variance": 1,
    "speed": 1
  }
}'
{
  "duration": 123
}

This API does not actually generate speech,
but only returns the expected speech length (in seconds) based on the input text. Itโ€™s useful for understanding expected credit consumption or adjusting text length before making TTS calls.

Request

  • The calling method and Request Body are almost identical to the text-to-speech API.
  • However, only the duration value is returned as a result, not audio.
  • No credits are consumed when calling the Predict Duration API.

Request Body

ItemRequiredDescription
textโœ…Text to analyze. Maximum 300 characters
languageโœ…Text language. One of ko, en, ja
styleโŒEmotional style. Default style is used if not specified
modelโŒDefault is sona_speech_1. Currently only this model is available
voice_settingsโŒSpeech speed or pitch adjustment values. May affect result length

Request Example

POST /v1/predict-duration/{voice_id}
Content-Type: application/json
x-sup-api-key: [YOUR_API_KEY]

{
  "text": "This is a long-form sentence for duration prediction.",
  "language": "en",
  "style": "neutral"
}

Response Example

{
  "duration": 3.57381983
}

This means that generating this text would create approximately 3.57 seconds of audio.

Tips

  • Credits are not actually deducted. (because no speech generation occurs)
  • You can get results very similar to when actually calling with the same text.
  • Since adjusting voice_settings.speed changes the length, itโ€™s better to test with a fixed speech speed.

Headers

x-sup-api-key
string
required

API key for the service

Path Parameters

voice_id
string
required

Body

application/json

Response

200
application/json

Returns predicted duration of the audio in seconds

The response is of type object.