> ## Documentation Index
> Fetch the complete documentation index at: https://docs.supertoneapi.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Normalized text

> Improve pronunciation accuracy — especially for Japanese — by pairing your input text with a pronunciation-oriented version.

For inputs that contain kanji, numbers, units, or symbols, the spoken form often diverges from the written form. The `normalized_text` field lets you provide a **pronunciation-oriented version** of your input alongside the original — the engine uses both to produce more accurate speech.

The original `text` preserves meaning and context. The `normalized_text` describes how the sentence should be **spoken**.

<Note>
  `normalized_text` is currently used by `sona_speech_2` and `sona_speech_2_flash` and is **primarily designed for Japanese**.
</Note>

## When normalized text helps

Pair `normalized_text` with `text` whenever your input contains:

* **Numbers** with implicit pronunciation (years, prices, phone numbers)
* **Units and symbols** (`10%`, `170cm`, `$50`)
* **Mixed scripts** (Japanese with English abbreviations, Latin words inside Korean)
* **Kanji with ambiguous readings**
* **Special symbols** (`〜`, `※`, `→`)

It is **strongly recommended for** audiobooks, narration, announcements, and character voice work where pronunciation accuracy matters. For casual short conversational lines, it's usually not necessary.

## Basic usage

<Tabs>
  <Tab title="Python">
    ```python theme={"dark"}
    VOICE_ID = "20160a4c5ba38967330c84"  # replace with your voice ID

    response = client.text_to_speech.create_speech(
        voice_id=VOICE_ID,
        text="今日は10%オフだよ。身長は170cm、体重は60kg。",
        normalized_text="きょうはじゅっパーセントオフだよ。しんちょうはひゃくななじゅっセンチメートル、たいじゅうはろくじゅっキログラム。",
        language="ja",
        model="sona_speech_2",
    )
    ```
  </Tab>

  <Tab title="TypeScript">
    ```typescript theme={"dark"}
    const VOICE_ID = "20160a4c5ba38967330c84"; // replace with your voice ID

    const response = await client.textToSpeech.createSpeech({
      voiceId: VOICE_ID,
      apiConvertTextToSpeechUsingCharacterRequest: {
        text: "今日は10%オフだよ。身長は170cm、体重は60kg。",
        normalizedText:
          "きょうはじゅっパーセントオフだよ。しんちょうはひゃくななじゅっセンチメートル、たいじゅうはろくじゅっキログラム。",
        language: "ja",
        model: "sona_speech_2",
      },
    });
    ```
  </Tab>

  <Tab title="cURL">
    ```bash theme={"dark"}
    curl -X POST "https://supertoneapi.com/v1/text-to-speech/20160a4c5ba38967330c84" \
      -H "x-sup-api-key: $SUPERTONE_API_KEY" \
      -H "Content-Type: application/json" \
      -d '{
        "text": "今日は10%オフだよ。身長は170cm、体重は60kg。",
        "normalized_text": "きょうはじゅっパーセントオフだよ。しんちょうはひゃくななじゅっセンチメートル、たいじゅうはろくじゅっキログラム。",
        "language": "ja",
        "model": "sona_speech_2"
      }' \
      --output speech.wav
    ```
  </Tab>
</Tabs>

## Generating normalized Japanese text with an LLM

The most common pattern is to call an LLM once to produce the normalized version, then pass both `text` and `normalized_text` to the TTS API. The prompt below produces clean JSON output that you can map directly to the request.

```text theme={"dark"}
You will receive a Japanese sentence that may contain kanji, numbers, symbols, and units.
For the given input, provide:
- the original text (natural Japanese using standard kanji–kana mixed notation, without furigana)
- the normalized text, converted according to the rules below.

Important:
- You must respond only with pure JSON format.
- Do not include any explanations or additional text.
- In original_text, do not include furigana (ruby annotations).

Response Format

{
  "original_text": "[natural Japanese Text]",
  "normalized_text": "[converted Text]"
}

Transcription Conversion Rules
1. Convert all kanji into hiragana using context-appropriate readings.
2. Keep katakana as is.
3. Preserve punctuation exactly as written.
4. Convert Arabic numerals into hiragana.
5. Expand units and English abbreviations into full katakana forms.
6. Apply natural phonological changes such as gemination and sound alternations.

Conversion Examples

{
  "original_text": "今日はどんな一日だったの？",
  "normalized_text": "きょうはどんないちにちだったの？"
}

{
  "original_text": "今日は10%オフだよ。身長は170cm、体重は60kgだって！",
  "normalized_text": "きょうはじゅっパーセントオフだよ。しんちょうはひゃくななじゅっセンチメートル、たいじゅうはろくじゅっキログラムだって！"
}
```

## Tips

* **Keep `text` natural.** Don't include furigana or other annotations inside `text`. Put all pronunciation hints in `normalized_text`.
* **Match word-for-word.** `normalized_text` should match the meaning of `text` exactly — don't paraphrase or rewrite, only respell.
* **Cache LLM outputs.** For deterministic inputs (UI strings, recurring announcements), generate `normalized_text` once and store it alongside the original.
* **Skip if not needed.** Casual conversational lines without numbers, units, or ambiguous kanji usually don't benefit from `normalized_text`.

## Related

<CardGroup cols={2}>
  <Card title="Models" icon="layer-group" href="/en/docs/core-concepts/models">
    Which models accept `normalized_text`.
  </Card>

  <Card title="Create speech" icon="comment" href="/en/docs/text-to-speech/create-speech">
    Full TTS request reference.
  </Card>
</CardGroup>
