Predict duration

Predict text-to-speech duration

curl --request POST \
  --url https://supertoneapi.com/v1/predict-duration/{voice_id} \
  --header 'Content-Type: application/json' \
  --header 'x-sup-api-key: <api-key>' \
  --data '
{
  "text": "<string>",
  "language": "en",
  "style": "<string>",
  "model": "sona_speech_1",
  "output_format": "wav",
  "voice_settings": {
    "pitch_shift": 0,
    "pitch_variance": 1,
    "speed": 1,
    "duration": 0,
    "similarity": 3,
    "text_guidance": 1,
    "subharmonic_amplitude_control": 1
  }
}
'

{
  "duration": 123
}

POST

predict-duration

{voice_id}

Predict text-to-speech duration

curl --request POST \
  --url https://supertoneapi.com/v1/predict-duration/{voice_id} \
  --header 'Content-Type: application/json' \
  --header 'x-sup-api-key: <api-key>' \
  --data '
{
  "text": "<string>",
  "language": "en",
  "style": "<string>",
  "model": "sona_speech_1",
  "output_format": "wav",
  "voice_settings": {
    "pitch_shift": 0,
    "pitch_variance": 1,
    "speed": 1,
    "duration": 0,
    "similarity": 3,
    "text_guidance": 1,
    "subharmonic_amplitude_control": 1
  }
}
'

{
  "duration": 123
}

이 API는 음성을 실제로 생성하지 않고, 입력된 텍스트를 기반으로 예상 음성 길이(초 단위)만 반환합니다.
TTS 호출 전에 예상 크레딧 소모량을 파악하거나, 텍스트 길이를 조절할 때 유용합니다.

엔드포인트

https://supertoneapi.com/v1/predict-duration/{voice_id}

참고사항

호출 방식과 Request Body는 text-to-speech API와 거의 동일합니다.
다만 오디오는 반환되지 않고, 결과로 duration 값만 반환됩니다.
Predict Duration API 호출 시 크레딧은 소모되지 않습니다.
크레딧이 차감되지 않습니다. (음성 생성이 이루어지지 않기 때문입니다)
동일한 텍스트로 실제 호출했을 때와 매우 유사한 결과를 얻으실 수 있습니다.
voice_settings.speed 값을 조정하면 길이가 달라지므로, 고정된 발화 속도로 테스트하시는 것이 좋습니다.

요청 파라미터

Item	Required	Description
`text`	Yes	분석할 텍스트입니다. 최대 300자입니다.
`language`	Yes	텍스트 언어입니다. `ko`, `en`, `ja` 중 하나입니다.
`style`	No	감정 스타일입니다. 미지정 시 기본 스타일이 사용됩니다.
`model`	No	기본값은 `sona_speech_1`입니다. 현재 이 모델만 사용 가능합니다.
`voice_settings`	No	발화 속도 또는 피치 조정 값입니다. 결과 길이에 영향을 줄 수 있습니다.

Authorizations

x-sup-api-key

string

header

required

Path Parameters

voice_id

string

required

Body

application/json

text

string

required

The text to convert to speech. Max length is 300 characters.

Maximum length: 300

language

enum<string>

required

Language code of the voice

Available options:

en,

ko,

ja

style

string

The style of character to use for the text-to-speech conversion

model

string

default:sona_speech_1

The model type to use for the text-to-speech conversion

output_format

enum<string>

default:wav

The desired output format of the audio file (wav, mp3). Default is wav.

Available options:

wav,

mp3

voice_settings

object

Show child attributes

Response

Returns predicted duration of the audio in seconds

duration

number

Stream speech

Get usage

⌘I

Supertone API

Voices

Custom voices

Text to speech

Usage

엔드포인트

참고사항

요청 파라미터

Authorizations

Path Parameters

Body

Response

Supertone API

Voices

Custom voices

Text to speech

Usage

​엔드포인트

​참고사항

​요청 파라미터

Authorizations

Path Parameters

Body

Response

엔드포인트

참고사항

요청 파라미터