This guide explains the request structure, parameter usage, and error handling precautions for the TTS API, which converts text into speech and returns it as an audio stream.
{voice_id}
: Only character-level IDs are supportedlanguage
, style
, and model
must be included in the Request BodyField | Required | Description |
---|---|---|
text | โ | Text to be converted. Up to 300 characters allowed |
language | โ | Language of the text. One of ko , en , or ja |
style | โ | Emotion style. e.g., neutral , happy , sad , etc. Defaults to characterโs base style if unspecified |
model | โ | Model to use. Default is sona_speech_1 . Currently, only this model is supported |
voice_settings | โ | Controls pitch/speed. Includes pitch_shift , pitch_variance , and speed (default: 0, 1, 1) |
output_format | โ | Desired audio file format. wav or mp3 . (Default: wav ) |
wav
output_format=mp3
as a query parameter, the response can be returned in MP3 formattext
length exceeds 300 characters.style
, but the default style may vary by character.API key for the service
The desired output format of the audio file (wav, mp3). Default is wav.
wav
, mp3
Streaming audio data in binary format
The response is of type file
.