POST
/
v1
/
text-to-speech
/
{voice_id}
/
stream
Convert text to speech with streaming response
curl --request POST \
  --url https://supertoneapi.com/v1/text-to-speech/{voice_id}/stream \
  --header 'Content-Type: application/json' \
  --header 'x-sup-api-key: <x-sup-api-key>' \
  --data '{
  "text": "<string>",
  "language": "en",
  "style": "<string>",
  "model": "sona_speech_1",
  "output_format": "wav",
  "voice_settings": {
    "pitch_shift": 0,
    "pitch_variance": 1,
    "speed": 1
  }
}'
This response does not have an example.

Basic Usage

  • {voice_id}: Only character-level IDs are supported
  • Parameters such as language, style, and model must be included in the Request Body

Request Body Field Descriptions

FieldRequiredDescription
textText to be converted. Up to 300 characters allowed
languageLanguage of the text. One of ko, en, or ja
styleEmotion style. e.g., neutral, happy, sad, etc. Defaults to character’s base style if unspecified
modelModel to use. Default is sona_speech_1. Currently, only this model is supported
voice_settingsControls pitch/speed. Includes pitch_shift, pitch_variance, and speed (default: 0, 1, 1)

Usage Example

POST /v1/text-to-speech/{voice_id}/stream
Content-Type: application/json
x-sup-api-key: [YOUR_API_KEY]

{
  "text": "Thank you for calling.",
  "language": "en",
  "style": "happy",
  "model": "sona_speech_1",
  "voice_settings": {
    "pitch_shift": 0,
    "pitch_variance": 1,
    "speed": 1
  }
}

Response

  • The response body is returned as a binary chunk, with the default format being wav

Notes

  • A 400 error will occur if the text length exceeds 300 characters.
  • The API can be called without specifying style, but the default style may vary by character.
    Please use the Get Voices API to check the default (the first value in the style array is the default).
  • The returned audio file can be saved or played directly. (Appropriate handling may be required depending on the client.)

Headers

x-sup-api-key
string
required

API key for the service

Path Parameters

voice_id
string
required

Query Parameters

output_format
enum<string>
default:wav

The desired output format of the audio file (wav, mp3). Default is wav.

Available options:
wav,
mp3

Body

application/json

Response

Streaming audio data in binary format

The response is of type file.