> ## Documentation Index
> Fetch the complete documentation index at: https://docs.supertoneapi.com/llms.txt
> Use this file to discover all available pages before exploring further.

# 발음과 음소

> 립싱크, 애니메이션, 발음 제어를 위한 음소 심볼과 타임스탬프를 받아옵니다.

<Note>
  이 문서는 영어 원문을 기반으로 자동 번역되었습니다. 표현이 어색하거나 모호한 부분이 있을 수 있으니, 정확한 내용은 [영어 원문](/en/docs/text-to-speech/pronunciation-and-phonemes)을 함께 확인해 주세요.
</Note>

Supertone API는 오디오와 함께 **음소(phoneme) 데이터**를 반환할 수 있습니다. 음소는 모델이 발화한 개별 소리 단위로, 각 음소의 시작 시간과 지속 시간이 함께 제공됩니다. 이 데이터는 게임과 애니메이션의 립싱크 구동, 가라오케 스타일의 단어 하이라이팅, 발음 분석 등에 활용할 수 있습니다.

이 기능을 사용하려면 TTS 요청에 `include_phonemes: true`를 설정해 주십시오.

<Note>
  `sona_speech_2`, `sona_speech_2_flash`, `sona_speech_1`에서 지원됩니다. `supertonic_api_3` 및 `supertonic_api_1`에서는 지원되지 않습니다.
</Note>

## 사용법

<Tabs>
  <Tab title="Python">
    ```python theme={"dark"}
    import base64
    import os
    from supertone import Supertone

    VOICE_ID = "20160a4c5ba38967330c84"  # replace with your voice ID

    with Supertone(api_key=os.environ["SUPERTONE_API_KEY"]) as client:
        response = client.text_to_speech.create_speech(
            voice_id=VOICE_ID,
            text="Hello, world.",
            language="en",
            include_phonemes=True,
        )

        result = response.result
        with open("speech.wav", "wb") as f:
            f.write(base64.b64decode(result.audio_base64))

        for symbol, start, duration in zip(
            result.phonemes.symbols,
            result.phonemes.start_times_seconds,
            result.phonemes.durations_seconds,
        ):
            print(f"{symbol!r} at {start:.3f}s for {duration:.3f}s")
    ```
  </Tab>

  <Tab title="TypeScript">
    ```typescript theme={"dark"}
    import { Supertone } from "@supertone/supertone";
    import * as fs from "node:fs";

    const VOICE_ID = "20160a4c5ba38967330c84"; // replace with your voice ID

    const client = new Supertone({ apiKey: process.env.SUPERTONE_API_KEY });

    const response = await client.textToSpeech.createSpeech({
      voiceId: VOICE_ID,
      apiConvertTextToSpeechUsingCharacterRequest: {
        text: "Hello, world.",
        language: "en",
        includePhonemes: true,
      },
    });

    const result = response.result as {
      audioBase64: string;
      phonemes?: {
        symbols?: string[];
        startTimesSeconds?: number[];
        durationsSeconds?: number[];
      };
    };

    fs.writeFileSync("speech.wav", Buffer.from(result.audioBase64, "base64"));

    const symbols = result.phonemes?.symbols ?? [];
    const starts = result.phonemes?.startTimesSeconds ?? [];
    const durations = result.phonemes?.durationsSeconds ?? [];

    for (let i = 0; i < symbols.length; i++) {
      console.log(`${symbols[i]} at ${starts[i].toFixed(3)}s for ${durations[i].toFixed(3)}s`);
    }
    ```
  </Tab>

  <Tab title="cURL">
    ```bash theme={"dark"}
    VOICE_ID="20160a4c5ba38967330c84"

    curl -X POST "https://supertoneapi.com/v1/text-to-speech/$VOICE_ID" \
      -H "x-sup-api-key: $SUPERTONE_API_KEY" \
      -H "Content-Type: application/json" \
      -d '{
        "text": "Hello, world.",
        "language": "en",
        "include_phonemes": true
      }'
    ```

    바이너리 오디오가 아닌 JSON을 반환합니다.

    ```json theme={"dark"}
    {
      "audio_base64": "UklGRnoGAABXQVZF...",
      "phonemes": {
        "symbols": ["", "h", "ɐ", "ɡ", "ʌ", ""],
        "start_times_seconds": [0, 0.092, 0.197, 0.255, 0.29, 0.58],
        "durations_seconds": [0.092, 0.104, 0.058, 0.034, 0.29, 0.162]
      }
    }
    ```
  </Tab>
</Tabs>

## 응답 구조

| Field                          | Description                                               |
| ------------------------------ | --------------------------------------------------------- |
| `audio_base64`                 | 요청한 `output_format`(`wav` 또는 `mp3`)으로 인코딩된 base64 오디오입니다. |
| `phonemes.symbols`             | IPA 스타일 표기법의 음소 심볼입니다. 빈 문자열은 무음/휴지를 나타냅니다.               |
| `phonemes.start_times_seconds` | 클립 내 각 심볼의 시작 시간입니다.                                      |
| `phonemes.durations_seconds`   | 각 심볼의 지속 시간입니다.                                           |

세 개의 음소 배열은 정렬되어 있습니다. `symbols[i]`, `start_times_seconds[i]`, `durations_seconds[i]`는 동일한 음소를 가리킵니다.

## 음소와 함께 스트리밍하기

`stream_speech`를 `include_phonemes: true`로 호출하면 응답이 **NDJSON**(줄바꿈으로 구분되는 JSON)으로 전환됩니다. 각 줄은 자체 `audio_base64`와 `phonemes` 데이터를 포함하는 청크입니다.

```jsonl theme={"dark"}
{"audio_base64":"...","phonemes":{"symbols":["","h"],"start_times_seconds":[0,0.05],"durations_seconds":[0.05,0.08]}}
{"audio_base64":"...","phonemes":{"symbols":["ɐ","ɡ"],"start_times_seconds":[0.13,0.19],"durations_seconds":[0.06,0.04]}}
```

각 줄이 도착하는 대로 파싱하여 실시간으로 립싱크 렌더러를 구동할 수 있습니다.

## 활용 사례

* **게임 및 애니메이션의 립싱크.** 각 음소를 비짐(viseme, 입 모양)에 매핑하고 오디오에 맞춰 비짐을 재생합니다. 대부분의 엔진은 기본 음소-비짐 매핑 테이블을 제공하며, Supertone의 심볼은 표준 IPA 스타일이므로 대부분의 리그와 호환됩니다.
* **가라오케 / 단어 하이라이팅.** 음소 시작 시간을 이용해 발화되는 순간에 맞춰 단어를 강조 표시할 수 있습니다.
* **발음 분석.** 실제 음소와 예상 시퀀스를 비교하여 어학 학습 앱에서 발음을 검사할 수 있습니다.

엔드투엔드 예제는 [립싱크용 음소 생성](/ko/docs/examples/lip-sync-phonemes)을 참고해 주십시오.

## 관련 문서

<CardGroup cols={2}>
  <Card title="립싱크 예제" icon="face-smile" href="/ko/docs/examples/lip-sync-phonemes">
    음소에서 비짐으로 이어지는 파이프라인을 구축합니다.
  </Card>

  <Card title="노말라이즈드 텍스트" icon="language" href="/ko/docs/text-to-speech/normalized-text">
    모호한 입력의 발음을 개선합니다.
  </Card>
</CardGroup>