> ## Documentation Index
> Fetch the complete documentation index at: https://docs.supertoneapi.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Text-to-Speech Guide

> A step-by-step guide to parameter structure and usage for converting text to speech.

To convert text to speech through Supertone API, you need to pass information such as text, language, and style along with a specific voice ID to the API.
This document provides step-by-step guidance on the complete call structure of the Text-to-Speech function, parameter configuration methods, response format, and voice adjustment options.

## 1. Endpoint and Basic Structure

```http theme={"dark"}
POST /v1/text-to-speech/{voice_id}
```

### Required Headers

```http theme={"dark"}
x-sup-api-key: [YOUR_API_KEY]
Content-Type: application/json
```

### Path Parameters

* `voice_id`: Unique ID of the voice to use

### Query Parameters

* `output_format` (optional): Audio format to generate. Choose between `wav` (default) and `mp3`

## 2. Request Body

Requests are sent in JSON format and can include the following fields:

| Field            | Required | Description                                                                                                                               |
| :--------------- | :------- | :---------------------------------------------------------------------------------------------------------------------------------------- |
| `text`           | ✅        | Text to convert to speech (max 300 characters)                                                                                            |
| `language`       | ✅        | Language of the text. Choose within languages supported by the voice (`ko`, `en`, `ja`)                                                   |
| `style`          | ❌        | Emotion style to apply (neutral, happy, etc.). If not entered, the default style will be used. The first value becomes the default style. |
| `model`          | ❌        | Voice model to use (`sona_speech_1`). Automatically applied if omitted                                                                    |
| `voice_settings` | ❌        | Advanced options to adjust voice pitch, intonation, and speed (see below)                                                                 |

## 3. Complete Request Example

```http theme={"dark"}
POST /v1/text-to-speech/91992bbd4758bdcf9c9b01?output_format=mp3
x-sup-api-key: [YOUR_API_KEY]
Content-Type: application/json

{
  "text": "안녕하세요, 수퍼톤 API입니다.",
  "language": "ko",
  "style": "neutral",
  "model": "sona_speech_1",
  "voice_settings": {
    "pitch_shift": 0,
    "pitch_variance": 1,
    "speed": 1
  }
}
```

## 4. `voice_settings` Options

`voice_settings` is an advanced option you can use when you want to fine-tune the speech feel of the generated voice.

| Parameter        | Description                                                                                                                                       | Allowed Range | Default |
| :--------------- | :------------------------------------------------------------------------------------------------------------------------------------------------ | :------------ | :------ |
| `pitch_shift`    | Adjusts the pitch level.<br />0 is the original voice pitch, with ±12 steps possible. 1 step is a semitone.                                       | -12 \~ +12    | 0       |
| `pitch_variance` | Controls the degree of intonation variation during speech.<br />Smaller values create flatter intonation, larger values create richer intonation. | 0.1 \~ 2      | 1       |
| `speed`          | Controls speech speed.<br />Values less than 1 make it slower, values greater than 1 make it faster.                                              | 0.5 \~ 2      | 1       |

## 5. Response

On success, responds with an audio stream (`audio/wav` or `audio/mpeg`).\
Audio length can be checked through headers.

```http theme={"dark"}
X-Audio-Length: 3.42
```

The above example indicates that 3.42 seconds of speech was generated.

## 6. Text Input Considerations

* Text can be input up to **300 characters maximum**.
* Too short sentences may result in unnatural speech.
* **Only Korean, English, and Japanese are supported**; other languages may produce unexpected results.
* Emojis and special symbols may not be read or may be ignored.

## 7. Check Speech Duration First with Predict Duration API

Even without generating speech, you can predict how many seconds of speech the input text will produce.

```http theme={"dark"}
POST /v1/predict-duration/{voice_id}
```

* Request method is the same as TTS
* Response example:

```json theme={"dark"}
{
  "duration": 2.87
}
```

**This API does not deduct credits.** It can be useful for usage prediction or implementing preview UI features.

## 8. Stream Text-to-Speech

This is a streaming TTS designed for real-time services such as AI chatbots and character-based chats.\
With streaming TTS, you can receive audio output quickly without waiting for the entire text to be fully synthesized.\
For detailed usage instructions, please refer to the guide below:

* [Stream Text-to-Speech Reference](/en/api-reference/endpoints/stream-text-to-speech)
