A step-by-step guide to parameter structure and usage for converting text to speech.
voice_id
: Unique ID of the voice to useoutput_format
(optional): Audio format to generate. Choose between wav
(default) and mp3
Field | Required | Description |
---|---|---|
text | โ | Text to convert to speech (max 300 characters) |
language | โ | Language of the text. Choose within languages supported by the voice (ko , en , ja ) |
style | โ | Emotion style to apply (neutral, happy, etc.). If not entered, the default style will be used. The first value becomes the default style. |
model | โ | Voice model to use (sona_speech_1 ). Automatically applied if omitted |
voice_settings | โ | Advanced options to adjust voice pitch, intonation, and speed (see below) |
voice_settings
Optionsvoice_settings
is an advanced option you can use when you want to fine-tune the speech feel of the generated voice.
Parameter | Description | Allowed Range | Default |
---|---|---|---|
pitch_shift | Adjusts the pitch level. 0 is the original voice pitch, with ยฑ12 steps possible. 1 step is a semitone. | -12 ~ +12 | 0 |
pitch_variance | Controls the degree of intonation variation during speech. Smaller values create flatter intonation, larger values create richer intonation. | 0.1 ~ 2 | 1 |
speed | Controls speech speed. Values less than 1 make it slower, values greater than 1 make it faster. | 0.5 ~ 2 | 1 |
audio/wav
or audio/mpeg
).sona_speech_1
is supported.