> ## Documentation Index
> Fetch the complete documentation index at: https://docs.supertoneapi.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Create speech

> **Converts text into speech using a voice of your choice, with configurable voice settings.**

### Endpoint

```http theme={"dark"}
https://supertoneapi.com/v1/text-to-speech/{voice_id}
```

### Path Parameters

| Name       | Required | Description                 |
| ---------- | -------- | --------------------------- |
| `voice_id` | Yes      | The ID of the target voice. |

### Request Body

**Note on Supported Languages by Model**

The set of supported input languages varies depending on the TTS model:

* **`sona_speech_1`** — `en`, `ko`, `ja`
* **`supertonic_api_1`** — `en`, `ko`, `ja`, `es`, `pt`
* **`sona_speech_2`** — `en`, `ko`, `ja`, `bg`, `cs`, `da`, `el`, `es`, `et`, `fi`, `hu`, `it`, `nl`, `pl`, `pt`, `ro`, `ar`, `de`, `fr`, `hi`, `id`, `ru`, `vi`
* **`sona_speech_2_flash`** — `en`, `ko`, `ja`, `bg`, `cs`, `da`, `el`, `es`, `et`, `fi`, `hu`, `it`, `nl`, `pl`, `pt`, `ro`, `ar`, `de`, `fr`, `hi`, `id`, `ru`, `vi`

| Name               | Required | Description                                                                                                                                                        |
| ------------------ | -------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| `text`             | Yes      | The text to convert (max 300 characters).                                                                                                                          |
| `language`         | Yes      | Language code. Supported: `en`, `ko`, `ja`, `bg`, `cs`, `da`, `el`, `es`, `et`, `fi`, `hu`, `it`, `nl`, `pl`, `pt`, `ro`, `ar`, `de`, `fr`, `hi`, `id`, `ru`, `vi` |
| `style`            | No       | Emotional style. E.g., `neutral`, `happy`, `sad`, etc. If not specified, the character's default style is applied                                                  |
| `model`            | No       | TTS model. Default: `sona_speech_1`.                                                                                                                               |
| `output_format`    | No       | Output format. Options: `wav`, `mp3`. Default: `wav`.                                                                                                              |
| `voice_settings`   | No       | Advanced voice parameters (see below).                                                                                                                             |
| `include_phonemes` | No       | If `true`, returns phoneme timing data along with audio (Base64-encoded). Default: `false`.                                                                        |
| `normalized_text`  | No       | Pronunciation-normalized Japanese text used together with `text` to improve TTS accuracy.                                                                          |

### Voice Settings (optional)

**Note on Voice Settings by Model**

The available Voice Settings vary depending on the TTS model:

* **`sona_speech_1`** — Supports **all** Voice Settings listed below.
* **`supertonic_api_1`** — Supports **only** the `speed` setting; all other settings are ignored.
* **`sona_speech_2`** — Supports all Voice Settings except the following: `subharmonic_amplitude_control`.
* **`sona_speech_2_flash`** — Supports listed Voice Settings : `pitch_shift`, `pitch_variance`, `speed`, `duration`.

| Name                            | Range    | Default | Description                                                                      |
| ------------------------------- | -------- | ------- | -------------------------------------------------------------------------------- |
| `pitch_shift`                   | -24 → 24 | 0       | Pitch adjustment in semitones.                                                   |
| `pitch_variance`                | 0 → 2    | 1       | Degree of pitch variation.                                                       |
| `speed`                         | 0.5 → 2  | 1       | Adjusts the generated audio uniformly faster or slower. (ratio)                  |
| `duration`                      | 0 → 60   | 0       | When provided, speech is generated to match the given duration (seconds)         |
| `similarity`                    | 1 → 5    | 3       | Controls how closely the generated speech matches the original character voice.  |
| `text_guidance`                 | 0 → 4    | 1       | Controls how sensitively speech characteristics adapt to the input text content. |
| `subharmonic_amplitude_control` | 0 → 2    | 1       | Controls the amount of subharmonic amplitude of the generated speech.            |

### Response

Depending on `include_phonemes`, returns:

**Binary Audio**\
**(Default & when include\_phonemes=false)**\
audio/wav – Raw WAV file.\
audio/mpeg – Raw MP3 file.

**JSON with Phoneme Data**\
**(when include\_phonemes=true)**

```json theme={"dark"}
{
  "audio_base64": "UklGRnoGAABXQVZF...",
  "phonemes": {
    "symbols": ["", "h", "ɐ", "ɡ", "ʌ", ""],
    "start_times_seconds": [0, 0.092, 0.197, 0.255, 0.29, 0.58],
    "durations_seconds": [0.092, 0.104, 0.058, 0.034, 0.29, 0.162]
  }
}
```

#### Headers:

X-Audio-Length (number) – Duration of the audio in seconds.

### Notes

* A 400 error will occur if the `text` length exceeds 300 characters.
* `speed` is applied after `duration`. (Example: duration=5seconds, speed=2times → final audio ≈ 10seconds)
* Calls are possible even without `style`, but default styles may vary by character, so please call Get Voices API to check the default style (the first value in the styles array is the default).
* The audio file in the response can be directly saved or played (appropriate handling required depending on client).


## OpenAPI

````yaml openapi.json post /v1/text-to-speech/{voice_id}
openapi: 3.0.0
info:
  title: Supertone Public API
  description: >-
    Supertone API is a RESTful API for using our state-of-the-art AI voice
    models.
  version: 0.9.0
  contact: {}
servers:
  - url: https://supertoneapi.com
    description: Production
security: []
tags:
  - name: voices
    description: Voice Library API endpoints
  - name: custom_voices
    description: Custom Voice Management API endpoints
  - name: text_to_speech
    description: Text-to-Speech API endpoints
  - name: usage
    description: Usage Analytics API endpoints
paths:
  /v1/text-to-speech/{voice_id}:
    post:
      tags:
        - text_to_speech
      summary: Convert text to speech
      description: Convert text to speech using the specified voice
      operationId: create_speech
      parameters:
        - name: voice_id
          required: true
          in: path
          schema:
            type: string
      requestBody:
        required: true
        content:
          application/json:
            schema:
              $ref: '#/components/schemas/APIConvertTextToSpeechUsingCharacterRequest'
      responses:
        '200':
          description: >-
            Returns either binary audio or JSON with phoneme data based on
            include_phonemes parameter
          content:
            audio/wav:
              schema:
                type: string
                format: binary
                description: Binary audio file (when include_phonemes=false or omitted)
            audio/mpeg:
              schema:
                type: string
                format: binary
                description: Binary audio file (when include_phonemes=false or omitted)
            application/json:
              schema:
                type: object
                description: >-
                  JSON response with base64 audio and phoneme data (when
                  include_phonemes=true)
                properties:
                  audio_base64:
                    type: string
                    description: Base64 encoded audio data
                    example: >-
                      UklGRnoGAABXQVZFZm10IBAAAAABAAEAQB8AAEAfAAABAAgAZGF0YQoGAACBhY...
                  phonemes:
                    type: object
                    description: Phoneme timing data with IPA symbols
                    properties:
                      symbols:
                        type: array
                        items:
                          type: string
                        description: List of IPA phonetic symbols
                        example:
                          - ''
                          - h
                          - ɐ
                          - ɡ
                          - ʌ
                          - ''
                      start_times_seconds:
                        type: array
                        items:
                          type: number
                        description: Start times for each phoneme in seconds
                        example:
                          - 0
                          - 0.092
                          - 0.197
                          - 0.255
                          - 0.29
                          - 0.58
                      durations_seconds:
                        type: array
                        items:
                          type: number
                        description: Duration for each phoneme in seconds
                        example:
                          - 0.092
                          - 0.104
                          - 0.058
                          - 0.034
                          - 0.29
                          - 0.162
                required:
                  - audio_base64
              examples:
                english-sample:
                  summary: English "Hello" with phoneme data
                  description: >-
                    Example response for English text "Hello" with phoneme
                    timing information
                  value:
                    audio_base64: >-
                      UklGRnoGAABXQVZFZm10IBAAAAABAAEAQB8AAEAfAAABAAgAZGF0YQoGAACBhYqFbF1fdJivrJBhNjVgodDbq2EcBj+a2/LDciUFLIHO8tiJNwgZaLvt559NEAxQp+PwtmMcBjiR1/LMeSwFJHfH8N2QQAoUXrTp66hVFApGn+DyvmwhBTuW1fzGfS8GI3fE8NyTQQoUXbPn7K5YFApCn+H0vWYhBTuY1vzCfiwGIXbC8d+WSAoTXLbm7K5ZEwpBnOL0vWQiBDyb1v3CfiwGIn+/8t+QSAkTW7Pp7K1XEglEM+DzvmclBTuY1fy/fysMJna/8t6WSAoSW7Lp7KlXEwhEM+H0vWQjBTub1vu/fyoLKHLA8t6UQAoOWbHo7K1ZEwpBnOL0vWMhBTyY1vy/fyoLJXfA8t+UQAoNWLPo7K1ZEwo/nOL0vWUiBDqY1vy/gCsNKHLA8t6SQgkOV7Hp7K1YEwhGm+L0vWYhBTue1vm/fyoLKHLA8t2UQgkPWLPo7KxbEgkAm+L0vWUIBD2b1fy7gCsNKHLA8tyXRAkSWbLm7K5cEglBm+DzvmUkBDya1vy+fyoLJ3fA8t2USgkMWLPo7KxbEgkAm+H0vWUIBD2b1fy8giwMJ3bB8tyXRAkSWbPm7K5bEgkBm+D0vWQkBDya1vy/fyoKKHfA8t2USgkOWLPo7KxZEgkCnODyvmUIBD2a1fy/gCsLJ3bA8t2WTAkNWLPo7KxZEggCnODyvmUJBT2a1vy/gCsKJ3bB8tyWTAkSWbPm7KxbEghCnODyvmQkBDya1v2/fyoLKHfA8t2USgkPWLPo7KtbEgkCnODyvmQkBDya1vy+fyoNKHfA8t2UTAkPWLPo7KtZEgkCnOH0vWQkBDua1vy/gCsLJ3fA8t2USwkPWLPo7KtZEgkCnOHzvWQkBDua1vy/gCsLJ3fA8t2USwkMWLPo7KtZEgkCnODyvmQkBDya1vy/gCsLJ3fA8t2UTAkMWLPo7KtZEgkCnODyvmQkBDya1vy+gCsLJ3fA8t2UTAkMWLPo7KtZEgkCnODyvmQkBDya1vy/gCsLJ3fA8t2UTAkLWLPo7KtZEgkCnOH0vWQkBDua1vy/gCsLJ3fA8t2UTAkLWLPo7KtZEgkCnOH0vWQkBDua1vy/gCsLJ3fA8t2UTAkLWLPo7KtZEgkCnOH0vWQkBDua
                    phonemes:
                      symbols:
                        - ''
                        - h
                        - ɐ
                        - ɡ
                        - ʌ
                        - ''
                      start_times_seconds:
                        - 0
                        - 0.0928798185941043
                        - 0.197369614512472
                        - 0.255419501133787
                        - 0.290249433106576
                        - 0.580498866213152
                      durations_seconds:
                        - 0.0928798185941043
                        - 0.104489795918367
                        - 0.0580498866213152
                        - 0.0348299319727891
                        - 0.290249433106576
                        - 0.162539682539683
                korean-sample:
                  summary: Korean "안녕하세요" with phoneme data
                  description: >-
                    Example response for Korean text "안녕하세요" with phoneme timing
                    information
                  value:
                    audio_base64: >-
                      UklGRnoGAABXQVZFZm10IBAAAAABAAEAQB8AAEAfAAABAAgAZGF0YQoGAACBhY...
                    phonemes:
                      symbols:
                        - ''
                        - ɐ
                        - nf
                        - 'n'
                        - iʌ
                        - ŋ
                        - ɐ
                        - s
                        - e
                        - io
                        - iʌ
                        - ''
                      start_times_seconds:
                        - 0
                        - 0.11609977324263
                        - 0.174149659863946
                        - 0.208979591836735
                        - 0.243809523809524
                        - 0.290249433106576
                        - 0.325079365079365
                        - 0.394739229024943
                        - 0.464399092970522
                        - 0.510839002267574
                        - 0.626938775510204
                        - 0.661768707482993
                      durations_seconds:
                        - 0.11609977324263
                        - 0.0580498866213152
                        - 0.0348299319727891
                        - 0.0348299319727891
                        - 0.0464399092970522
                        - 0.0348299319727891
                        - 0.0696598639455782
                        - 0.0696598639455782
                        - 0.0464399092970522
                        - 0.11609977324263
                        - 0.0348299319727891
                        - 0.0812698412698413
          headers:
            X-Audio-Length:
              description: Duration of the audio in seconds
              schema:
                type: number
        '400':
          description: >-
            Bad Request: Invalid request data for duration prediction or invalid
            request body/headers
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/BadRequestErrorResponse'
        '401':
          description: 'Unauthorized: Invalid API key'
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/UnauthorizedErrorResponse'
        '402':
          description: Not Enough Credits
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/PaymentRequiredErrorResponse'
        '403':
          description: 'Forbidden: Permission denied'
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/ForbiddenErrorResponse'
        '404':
          description: 'Not Found: Voice not found'
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/NotFoundErrorResponse'
        '408':
          description: Request Timeout
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/RequestTimeoutErrorResponse'
        '429':
          description: Rate Limit Exceeded
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/TooManyRequestsErrorResponse'
        '500':
          description: 'Internal Server Error: Failed to convert text to speech'
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/InternalServerErrorResponse'
      security:
        - api-key: []
components:
  schemas:
    APIConvertTextToSpeechUsingCharacterRequest:
      type: object
      properties:
        text:
          type: string
          description: The text to convert to speech
          maxLength: 300
        language:
          type: string
          description: The language code of the text
          enum:
            - en
            - ko
            - ja
            - bg
            - cs
            - da
            - el
            - es
            - et
            - fi
            - hu
            - it
            - nl
            - pl
            - pt
            - ro
            - ar
            - de
            - fr
            - hi
            - id
            - ru
            - vi
        style:
          type: string
          description: The style of character to use for the text-to-speech conversion
        model:
          type: string
          description: The model type to use for the text-to-speech conversion
          enum:
            - sona_speech_1
            - sona_speech_2
            - sona_speech_2_flash
            - sona_speech_2t
            - supertonic_api_1
          default: sona_speech_1
        output_format:
          type: string
          description: >-
            The desired output format of the audio file (wav, mp3). Default is
            wav.
          enum:
            - wav
            - mp3
          default: wav
        voice_settings:
          $ref: '#/components/schemas/ConvertTextToSpeechParameters'
        include_phonemes:
          type: boolean
          description: Return phoneme timing data with the audio
          default: false
        normalized_text:
          type: string
          description: >-
            Pre-normalized text for TTS. Only used with sona_speech_2 and
            sona_speech_2_flash models.
      required:
        - text
        - language
    BadRequestErrorResponse:
      type: object
      properties:
        status:
          type: string
          description: Response status
          example: error
        message:
          type: string
          description: Bad request error message
          example: Invalid request data
      required:
        - status
        - message
    UnauthorizedErrorResponse:
      type: object
      properties:
        status:
          type: string
          description: Response status
          example: error
        message:
          description: Unauthorized error details
          example:
            message: Invalid API Key
            error: Unauthorized
            statusCode: 401
          allOf:
            - $ref: '#/components/schemas/ErrorMessageData'
      required:
        - status
        - message
    PaymentRequiredErrorResponse:
      type: object
      properties:
        status:
          type: string
          description: Response status
          example: error
        message:
          description: Payment required error details
          example:
            message: Not enough credits
            error: Payment Required
            statusCode: 402
          allOf:
            - $ref: '#/components/schemas/ErrorMessageData'
      required:
        - status
        - message
    ForbiddenErrorResponse:
      type: object
      properties:
        status:
          type: string
          description: Response status
          example: error
        message:
          description: Forbidden error details
          example:
            message: Permission denied
            error: Forbidden
            statusCode: 403
          allOf:
            - $ref: '#/components/schemas/ErrorMessageData'
      required:
        - status
        - message
    NotFoundErrorResponse:
      type: object
      properties:
        status:
          type: string
          description: Response status
          example: error
        message:
          description: Not found error details
          example:
            message: Voice not found
            error: Not Found
            statusCode: 404
          allOf:
            - $ref: '#/components/schemas/ErrorMessageData'
      required:
        - status
        - message
    RequestTimeoutErrorResponse:
      type: object
      properties:
        status:
          type: string
          description: Response status
          example: error
        message:
          description: Request timeout error details
          example:
            message: Request timed out
            error: Request Timeout
            statusCode: 408
          allOf:
            - $ref: '#/components/schemas/ErrorMessageData'
      required:
        - status
        - message
    TooManyRequestsErrorResponse:
      type: object
      properties:
        status:
          type: string
          description: Response status
          example: error
        message:
          description: Too many requests error details
          example:
            message: rate limit exceeded
            error: Too Many Requests
            statusCode: 429
          allOf:
            - $ref: '#/components/schemas/ErrorMessageData'
      required:
        - status
        - message
    InternalServerErrorResponse:
      type: object
      properties:
        status:
          type: string
          description: Response status
          example: error
        message:
          description: Internal server error details
          example:
            message: Failed to convert text to speech
            error: Internal Server Error
            statusCode: 500
          allOf:
            - $ref: '#/components/schemas/ErrorMessageData'
      required:
        - status
        - message
    ConvertTextToSpeechParameters:
      type: object
      properties:
        pitch_shift:
          type: number
          default: 0
          minimum: -24
          maximum: 24
        pitch_variance:
          type: number
          default: 1
          minimum: 0
          maximum: 2
        speed:
          type: number
          default: 1
          minimum: 0.5
          maximum: 2
        duration:
          type: number
          description: Duration parameter for TTS generation
          default: 0
          minimum: 0
          maximum: 60
        similarity:
          type: number
          description: Similarity parameter for voice matching
          default: 3
          minimum: 1
          maximum: 5
        text_guidance:
          type: number
          description: Text guidance parameter for generation control
          default: 1
          minimum: 0
          maximum: 4
        subharmonic_amplitude_control:
          type: number
          description: Subharmonic amplitude control parameter
          default: 1
          minimum: 0
          maximum: 2
    ErrorMessageData:
      type: object
      properties:
        message:
          type: string
          description: Error message
          example: Invalid API Key
        error:
          type: string
          description: Error type
          example: Unauthorized
        status_code:
          type: number
          description: HTTP status code
          example: 401
      required:
        - message
        - error
        - status_code
  securitySchemes:
    api-key:
      type: apiKey
      in: header
      name: x-sup-api-key

````