> ## Documentation Index
> Fetch the complete documentation index at: https://docs.supertoneapi.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Stream speech

> テキストを音声に変換し、チャンク化されたオーディオストリームとして出力します。

<Note>
  このドキュメントは英語の原文から自動翻訳されています。表現に不自然な箇所がある場合があります。正確な内容は[英語の原文](/en/api-reference/endpoints/stream-text-to-speech)もあわせてご確認ください。
</Note>

生成された音声をチャンク単位でストリーミングして返すため、クリップ全体の生成完了を待たずに再生を開始できます。ストリーミングと高速な非ストリーミングモデルの使い分けについては、[Docs: 音声ストリーミング](/ja/docs/text-to-speech/stream-speech) および [レイテンシ最適化](/ja/docs/production/latency-optimization) をご参照ください。

<Note>
  ストリーミングは現在 **`sona_speech_1`** のみでサポートされています。
</Note>

## エンドポイント

```http theme={"dark"}
POST https://supertoneapi.com/v1/text-to-speech/{voice_id}/stream
```

## パスパラメータ

| Name       | Required | Description   |
| ---------- | :------: | ------------- |
| `voice_id` |     ✅    | 対象ボイスの ID です。 |

## リクエストボディ

`Content-Type: application/json`

| Name               | Required | Description                                                                         |
| ------------------ | :------: | ----------------------------------------------------------------------------------- |
| `text`             |     ✅    | 変換対象のテキストです。**最大 300 文字。**                                                          |
| `language`         |     ✅    | 言語コードです。サポート: `en`、`ko`、`ja`。                                                       |
| `style`            |     —    | 感情スタイル（例: `neutral`、`happy`）です。未指定の場合はボイスのデフォルトスタイルが適用されます。                         |
| `model`            |     —    | `sona_speech_1` である必要があります（ストリーミングをサポートする唯一のモデル）。                                   |
| `output_format`    |     —    | `wav`（デフォルト）または `mp3`。                                                              |
| `voice_settings`   |     —    | 高度なボイスパラメータです。フィールドと値の範囲は [音声生成](/ja/api-reference/endpoints/text-to-speech) と同じです。 |
| `include_phonemes` |     —    | `true` の場合、レスポンスはチャンクごとに音素データを含む NDJSON となります。デフォルト: `false`。                       |

## レスポンス

**デフォルト（`include_phonemes=false`）:** バイナリオーディオストリームを返します。

* `Content-Type: audio/wav` または `audio/mpeg`（`output_format` に対応）。
* 最初のチャンクにはオーディオファイルヘッダーが含まれ、以降のチャンクは生のオーディオデータです。

**`include_phonemes=true` の場合:** 改行区切りの JSON（NDJSON）で、1 チャンクにつき 1 オブジェクトを返します。

```jsonl theme={"dark"}
{"audio_base64":"...","phonemes":{"symbols":["","h"],"start_times_seconds":[0,0.05],"durations_seconds":[0.05,0.08]}}
{"audio_base64":"...","phonemes":{"symbols":["ɐ","ɡ"],"start_times_seconds":[0.13,0.19],"durations_seconds":[0.06,0.04]}}
```

## 注意事項

* 音声ストリーミングは現在 **ベータ** 版で、`sona_speech_1` のみをサポートします。
* `text` が 300 文字を超えると `400` を返します。SDK は長い入力を自動でチャンキングし、イテレーターにチャンクを転送します。
* `speed` は `duration` の後に適用されます（例: `duration=5` + `speed=2` で約 10 秒）。
* `style` を省略した場合はボイスのデフォルトスタイルが適用されます。デフォルトは [ボイス取得](/ja/api-reference/endpoints/get-voice) で確認できます。

## 関連項目

<CardGroup cols={2}>
  <Card title="Docs: Stream speech" icon="bolt" href="/ja/docs/text-to-speech/stream-speech">
    ストリーミングを使うべき場面と、各 SDK でのチャンク消費方法。
  </Card>

  <Card title="LLM streaming TTS" icon="robot" href="/ja/docs/examples/llm-streaming-tts">
    OpenAI および Anthropic を用いたエンドツーエンドのレシピ。
  </Card>
</CardGroup>


## OpenAPI

````yaml /openapi.json POST /v1/text-to-speech/{voice_id}/stream
openapi: 3.0.0
info:
  title: Supertone Public API
  description: >-
    Supertone API is a RESTful API for using our state-of-the-art AI voice
    models.
  version: 0.9.6
  contact: {}
servers:
  - url: https://supertoneapi.com
    description: Production
security: []
tags:
  - name: voices
    description: Voice Library API endpoints
  - name: custom_voices
    description: Custom Voice Management API endpoints
  - name: text_to_speech
    description: Text-to-Speech API endpoints
  - name: usage
    description: Usage Analytics API endpoints
paths:
  /v1/text-to-speech/{voice_id}/stream:
    post:
      tags:
        - text_to_speech
      summary: Convert text to speech with streaming response
      description: >-
        Convert text to speech using the specified voice with streaming
        response. Returns binary audio stream.
      operationId: stream_speech
      parameters:
        - name: voice_id
          required: true
          in: path
          schema:
            type: string
      requestBody:
        required: true
        content:
          application/json:
            schema:
              $ref: '#/components/schemas/APIConvertTextToSpeechUsingCharacterRequest'
      responses:
        '200':
          description: >-
            Streaming audio data in binary format or NDJSON format with phoneme
            data based on includePhonemes parameter
          content:
            audio/wav:
              schema:
                type: string
                format: binary
                description: Binary audio stream (when includePhonemes=false or omitted)
            audio/mpeg:
              schema:
                type: string
                format: binary
                description: Binary audio stream (when includePhonemes=false or omitted)
            application/x-ndjson:
              schema:
                type: string
                description: >-
                  NDJSON stream with consistent format - each chunk contains
                  audio_base64 and phonemes fields (one null, one populated)
                example: >
                  {"audio_base64":"UklGRnoGAABXQVZF...","phonemes":null}

                  {"audio_base64":null,"phonemes":{"symbols":["","h","ɐ","l","oʊ"],"start_times_seconds":[0,0.1,0.2,0.3,0.4],"durations_seconds":[0.1,0.1,0.1,0.1,0.2]}}

                  {"audio_base64":"E4ATABFAD4AMQAp...","phonemes":null}

                  {"audio_base64":null,"phonemes":{"symbols":["w","ɝ","l","d"],"start_times_seconds":[0.5,0.6,0.7,0.8],"durations_seconds":[0.1,0.1,0.1,0.1]}}
          headers:
            Content-Type:
              description: >-
                Content type: audio/* for binary stream, application/x-ndjson
                for phoneme data
              schema:
                type: string
                enum:
                  - audio/wav
                  - audio/mpeg
                  - application/x-ndjson
                example: audio/mpeg
            Transfer-Encoding:
              description: Chunked transfer encoding
              schema:
                type: string
                example: chunked
            Cache-Control:
              description: No cache headers
              schema:
                type: string
                example: no-cache
            X-Content-Type-Options:
              description: Security header to prevent MIME sniffing
              schema:
                type: string
                example: nosniff
            Trailer:
              description: Announces that X-Audio-Length will be sent as a trailer header
              schema:
                type: string
                example: X-Audio-Length
            X-Audio-Length:
              description: >-
                Total duration of the audio in seconds (sent as trailer header
                after streaming completes)
              schema:
                type: number
        '400':
          description: 'Bad Request: Invalid request data or parameters'
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/BadRequestErrorResponse'
        '401':
          description: 'Unauthorized: Invalid API key'
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/UnauthorizedErrorResponse'
        '402':
          description: 'Payment Required: Not enough credits'
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/PaymentRequiredErrorResponse'
        '403':
          description: 'Forbidden: Permission denied'
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/ForbiddenErrorResponse'
        '404':
          description: 'Not Found: Voice not found'
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/NotFoundErrorResponse'
        '408':
          description: Request Timeout
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/RequestTimeoutErrorResponse'
        '429':
          description: 'Too Many Requests: Rate limit exceeded'
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/TooManyRequestsErrorResponse'
        '500':
          description: 'Internal Server Error: Failed to process streaming TTS'
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/InternalServerErrorResponse'
      security:
        - api-key: []
components:
  schemas:
    APIConvertTextToSpeechUsingCharacterRequest:
      type: object
      properties:
        text:
          type: string
          description: The text to convert to speech
          maxLength: 300
        language:
          type: string
          description: The language code of the text
          enum:
            - en
            - ko
            - ja
            - bg
            - cs
            - da
            - el
            - es
            - et
            - fi
            - hu
            - it
            - nl
            - pl
            - pt
            - ro
            - ar
            - de
            - fr
            - hi
            - id
            - ru
            - vi
            - hr
            - lt
            - lv
            - sk
            - sl
            - sv
            - tr
            - uk
        style:
          type: string
          description: The style of character to use for the text-to-speech conversion
        model:
          type: string
          description: The model type to use for the text-to-speech conversion
          enum:
            - sona_speech_1
            - sona_speech_2
            - sona_speech_2_flash
            - supertonic_api_1
            - supertonic_api_3
          default: sona_speech_1
        output_format:
          type: string
          description: >-
            The desired output format of the audio file (wav, mp3). Default is
            wav.
          enum:
            - wav
            - mp3
          default: wav
        voice_settings:
          $ref: '#/components/schemas/ConvertTextToSpeechParameters'
        include_phonemes:
          type: boolean
          description: Return phoneme timing data with the audio
          default: false
        normalized_text:
          type: string
          description: >-
            Pre-normalized text for TTS. Only used with sona_speech_2 and
            sona_speech_2_flash models.
      required:
        - text
        - language
    BadRequestErrorResponse:
      type: object
      properties:
        status:
          type: string
          description: Response status
          example: error
        message:
          type: string
          description: Bad request error message
          example: Invalid request data
      required:
        - status
        - message
    UnauthorizedErrorResponse:
      type: object
      properties:
        status:
          type: string
          description: Response status
          example: error
        message:
          description: Unauthorized error details
          example:
            message: Invalid API Key
            error: Unauthorized
            statusCode: 401
          allOf:
            - $ref: '#/components/schemas/ErrorMessageData'
      required:
        - status
        - message
    PaymentRequiredErrorResponse:
      type: object
      properties:
        status:
          type: string
          description: Response status
          example: error
        message:
          description: Payment required error details
          example:
            message: Not enough credits
            error: Payment Required
            statusCode: 402
          allOf:
            - $ref: '#/components/schemas/ErrorMessageData'
      required:
        - status
        - message
    ForbiddenErrorResponse:
      type: object
      properties:
        status:
          type: string
          description: Response status
          example: error
        message:
          description: Forbidden error details
          example:
            message: Permission denied
            error: Forbidden
            statusCode: 403
          allOf:
            - $ref: '#/components/schemas/ErrorMessageData'
      required:
        - status
        - message
    NotFoundErrorResponse:
      type: object
      properties:
        status:
          type: string
          description: Response status
          example: error
        message:
          description: Not found error details
          example:
            message: Voice not found
            error: Not Found
            statusCode: 404
          allOf:
            - $ref: '#/components/schemas/ErrorMessageData'
      required:
        - status
        - message
    RequestTimeoutErrorResponse:
      type: object
      properties:
        status:
          type: string
          description: Response status
          example: error
        message:
          description: Request timeout error details
          example:
            message: Request timed out
            error: Request Timeout
            statusCode: 408
          allOf:
            - $ref: '#/components/schemas/ErrorMessageData'
      required:
        - status
        - message
    TooManyRequestsErrorResponse:
      type: object
      properties:
        status:
          type: string
          description: Response status
          example: error
        message:
          description: Too many requests error details
          example:
            message: rate limit exceeded
            error: Too Many Requests
            statusCode: 429
          allOf:
            - $ref: '#/components/schemas/ErrorMessageData'
      required:
        - status
        - message
    InternalServerErrorResponse:
      type: object
      properties:
        status:
          type: string
          description: Response status
          example: error
        message:
          description: Internal server error details
          example:
            message: Failed to convert text to speech
            error: Internal Server Error
            statusCode: 500
          allOf:
            - $ref: '#/components/schemas/ErrorMessageData'
      required:
        - status
        - message
    ConvertTextToSpeechParameters:
      type: object
      properties:
        pitch_shift:
          type: number
          default: 0
          minimum: -24
          maximum: 24
        pitch_variance:
          type: number
          default: 1
          minimum: 0
          maximum: 2
        speed:
          type: number
          default: 1
          minimum: 0.5
          maximum: 2
        duration:
          type: number
          description: Duration parameter for TTS generation
          default: 0
          minimum: 0
          maximum: 60
        similarity:
          type: number
          description: Similarity parameter for voice matching
          default: 3
          minimum: 1
          maximum: 5
        text_guidance:
          type: number
          description: Text guidance parameter for generation control
          default: 1
          minimum: 0
          maximum: 4
        subharmonic_amplitude_control:
          type: number
          description: Subharmonic amplitude control parameter
          default: 1
          minimum: 0
          maximum: 2
    ErrorMessageData:
      type: object
      properties:
        message:
          type: string
          description: Error message
          example: Invalid API Key
        error:
          type: string
          description: Error type
          example: Unauthorized
        status_code:
          type: number
          description: HTTP status code
          example: 401
      required:
        - message
        - error
        - status_code
  securitySchemes:
    api-key:
      type: apiKey
      in: header
      name: x-sup-api-key

````