Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.supertoneapi.com/llms.txt

Use this file to discover all available pages before exploring further.

For inputs that contain kanji, numbers, units, or symbols, the spoken form often diverges from the written form. The normalized_text field lets you provide a pronunciation-oriented version of your input alongside the original — the engine uses both to produce more accurate speech. The original text preserves meaning and context. The normalized_text describes how the sentence should be spoken.
normalized_text is currently used by sona_speech_2 and sona_speech_2_flash and is primarily designed for Japanese.

When normalized text helps

Pair normalized_text with text whenever your input contains:
  • Numbers with implicit pronunciation (years, prices, phone numbers)
  • Units and symbols (10%, 170cm, $50)
  • Mixed scripts (Japanese with English abbreviations, Latin words inside Korean)
  • Kanji with ambiguous readings
  • Special symbols (, , )
It is strongly recommended for audiobooks, narration, announcements, and character voice work where pronunciation accuracy matters. For casual short conversational lines, it’s usually not necessary.

Basic usage

VOICE_ID = "20160a4c5ba38967330c84"  # replace with your voice ID

response = client.text_to_speech.create_speech(
    voice_id=VOICE_ID,
    text="今日は10%オフだよ。身長は170cm、体重は60kg。",
    normalized_text="きょうはじゅっパーセントオフだよ。しんちょうはひゃくななじゅっセンチメートル、たいじゅうはろくじゅっキログラム。",
    language="ja",
    model="sona_speech_2",
)

Generating normalized Japanese text with an LLM

The most common pattern is to call an LLM once to produce the normalized version, then pass both text and normalized_text to the TTS API. The prompt below produces clean JSON output that you can map directly to the request.
You will receive a Japanese sentence that may contain kanji, numbers, symbols, and units.
For the given input, provide:
- the original text (natural Japanese using standard kanji–kana mixed notation, without furigana)
- the normalized text, converted according to the rules below.

Important:
- You must respond only with pure JSON format.
- Do not include any explanations or additional text.
- In original_text, do not include furigana (ruby annotations).

Response Format

{
  "original_text": "[natural Japanese Text]",
  "normalized_text": "[converted Text]"
}

Transcription Conversion Rules
1. Convert all kanji into hiragana using context-appropriate readings.
2. Keep katakana as is.
3. Preserve punctuation exactly as written.
4. Convert Arabic numerals into hiragana.
5. Expand units and English abbreviations into full katakana forms.
6. Apply natural phonological changes such as gemination and sound alternations.

Conversion Examples

{
  "original_text": "今日はどんな一日だったの?",
  "normalized_text": "きょうはどんないちにちだったの?"
}

{
  "original_text": "今日は10%オフだよ。身長は170cm、体重は60kgだって!",
  "normalized_text": "きょうはじゅっパーセントオフだよ。しんちょうはひゃくななじゅっセンチメートル、たいじゅうはろくじゅっキログラムだって!"
}

Tips

  • Keep text natural. Don’t include furigana or other annotations inside text. Put all pronunciation hints in normalized_text.
  • Match word-for-word. normalized_text should match the meaning of text exactly — don’t paraphrase or rewrite, only respell.
  • Cache LLM outputs. For deterministic inputs (UI strings, recurring announcements), generate normalized_text once and store it alongside the original.
  • Skip if not needed. Casual conversational lines without numbers, units, or ambiguous kanji usually don’t benefit from normalized_text.

Models

Which models accept normalized_text.

Create speech

Full TTS request reference.