Predict Duration

Predict text-to-speech duration

curl --request POST \
  --url https://supertoneapi.com/v1/predict-duration/{voice_id} \
  --header 'Content-Type: application/json' \
  --header 'x-sup-api-key: <x-sup-api-key>' \
  --data '{
  "text": "<string>",
  "language": "en",
  "style": "<string>",
  "model": "sona_speech_1",
  "voice_settings": {
    "pitch_shift": 0,
    "pitch_variance": 1,
    "speed": 1
  }
}'

{
  "duration": 123
}

POST

predict-duration

{voice_id}

Predict text-to-speech duration

curl --request POST \
  --url https://supertoneapi.com/v1/predict-duration/{voice_id} \
  --header 'Content-Type: application/json' \
  --header 'x-sup-api-key: <x-sup-api-key>' \
  --data '{
  "text": "<string>",
  "language": "en",
  "style": "<string>",
  "model": "sona_speech_1",
  "voice_settings": {
    "pitch_shift": 0,
    "pitch_variance": 1,
    "speed": 1
  }
}'

{
  "duration": 123
}

This API does not actually generate speech,
but only returns the expected speech length (in seconds) based on the input text. It’s useful for understanding expected credit consumption or adjusting text length before making TTS calls.

Request

The calling method and Request Body are almost identical to the text-to-speech API.
However, only the duration value is returned as a result, not audio.
No credits are consumed when calling the Predict Duration API.

Request Body

Item	Required	Description
`text`	✅	Text to analyze. Maximum 300 characters
`language`	✅	Text language. One of `ko`, `en`, `ja`
`style`	❌	Emotional style. Default style is used if not specified
`model`	❌	Default is `sona_speech_1`. Currently only this model is available
`voice_settings`	❌	Speech speed or pitch adjustment values. May affect result length

Request Example

POST /v1/predict-duration/{voice_id}
Content-Type: application/json
x-sup-api-key: [YOUR_API_KEY]

{
  "text": "This is a long-form sentence for duration prediction.",
  "language": "en",
  "style": "neutral"
}

Response Example

{
  "duration": 3.57381983
}

This means that generating this text would create approximately 3.57 seconds of audio.

Tips

Credits are not actually deducted. (because no speech generation occurs)
You can get results very similar to when actually calling with the same text.
Since adjusting voice_settings.speed changes the length, it’s better to test with a fixed speech speed.

Headers

x-sup-api-key

string

required

API key for the service

Path Parameters

voice_id

string

required

Body

application/json

Response

200

application/json

Returns predicted duration of the audio in seconds

The response is of type object.

Text To Speech

Get Voice Usage

Supertone API

Voices

Text to Speech

Usage

Request

Request Body

Request Example

Response Example

Tips

Headers

Path Parameters

Body

Response

Supertone API

Voices

Text to Speech

Usage

​Request

​Request Body

​Request Example

​Response Example

​Tips

Headers

Path Parameters

Body

Response

Request

Request Body

Request Example

Response Example

Tips