> ## Documentation Index
> Fetch the complete documentation index at: https://docs.supertoneapi.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Stream TTS from an LLM response

> Pipe an LLM's text output through Supertone so the user hears the answer as it is generated — runnable examples for OpenAI and Anthropic.

For voice agents and chatbots, the user should hear the answer **as the LLM is producing it** — not after the full response is done. The pattern is:

1. Stream tokens from your LLM.
2. Group them into sentence-sized chunks.
3. Send each chunk to Supertone TTS and forward the audio.

Below are end-to-end recipes you can paste into a fresh project and run. Set the two API keys, swap in a `voice_id`, and you're done.

## You may not need streaming

`stream_speech` is supported on `sona_speech_1` only. If your priority is **overall time-to-first-audio**, you'll often get there faster by picking a non-streaming model that simply finishes each request quickly:

* **`supertonic_api_3`** — fastest inference, lowest latency, with significantly improved speech stability. Best for voice agents where time-to-first-audio matters most.
* **`sona_speech_2_flash`** — balanced; lower latency than `sona_speech_2` with similar quality.
* **`sona_speech_1`** with `stream_speech` — only useful when a single chunk of text is long enough that chunked streaming meaningfully starts playback earlier.

For the sentence-by-sentence LLM pattern below, each TTS call covers one short sentence — and **a non-streaming call on a fast model usually returns before streaming on `sona_speech_1` even starts emitting chunks.** The examples default to `supertonic_api_3`; switch the `model` string to try the others.

## Recipes

Pick your LLM and language stack below. All four recipes follow the same sentence-batching pattern — only the LLM streaming bit differs.

<Tabs>
  <Tab title="Python · Anthropic">
    ```bash theme={"dark"}
    pip install supertone anthropic
    export SUPERTONE_API_KEY="Kp9mZ3xQ7v..."
    export ANTHROPIC_API_KEY="sk-ant-..."
    ```

    ```python theme={"dark"}
    import os
    import re
    from anthropic import Anthropic
    from supertone import Supertone

    VOICE_ID = "20160a4c5ba38967330c84"  # replace with your voice ID
    MODEL = "supertonic_api_3"            # try sona_speech_2_flash for higher quality

    SENTENCE_END = re.compile(r"[.!?。！？]\s+")
    # Supertone TTS rejects text containing '#' (reserved). Instruction-tuned
    # LLMs often emit markdown — strip the common inline markers before sending.
    MARKDOWN_MARKERS = re.compile(r"[#*_`]+")

    def for_tts(text: str) -> str:
        return MARKDOWN_MARKERS.sub("", text).strip()

    def sentences_from_stream(token_stream):
        """Yield sentence-sized strings from an iterable of text tokens."""
        buffer = ""
        for token in token_stream:
            buffer += token
            while True:
                match = SENTENCE_END.search(buffer)
                if not match:
                    break
                sentence = for_tts(buffer[: match.end()])
                if sentence:
                    yield sentence
                buffer = buffer[match.end():]
        tail = for_tts(buffer)
        if tail:
            yield tail

    def stream_claude_tokens(prompt: str):
        anthropic = Anthropic()  # reads ANTHROPIC_API_KEY from env
        with anthropic.messages.stream(
            model="claude-sonnet-4-5",
            max_tokens=1024,
            messages=[{"role": "user", "content": prompt}],
        ) as stream:
            for text in stream.text_stream:
                yield text

    def play_or_save(audio_bytes: bytes, path: str):
        """Replace with your audio player. Here we just append to a file."""
        with open(path, "ab") as f:
            f.write(audio_bytes)

    def main():
        prompt = "Tell me a short story about a curious robot in three sentences."
        out_path = "response.wav"
        open(out_path, "wb").close()  # truncate

        with Supertone(api_key=os.environ["SUPERTONE_API_KEY"]) as supertone:
            for sentence in sentences_from_stream(stream_claude_tokens(prompt)):
                print(f"→ {sentence}")
                response = supertone.text_to_speech.create_speech(
                    voice_id=VOICE_ID,
                    text=sentence,
                    language="en",
                    model=MODEL,
                )
                play_or_save(response.result.read(), out_path)

        print(f"Saved {out_path}")

    if __name__ == "__main__":
        main()
    ```
  </Tab>

  <Tab title="Python · OpenAI">
    ```bash theme={"dark"}
    pip install supertone openai
    export SUPERTONE_API_KEY="Kp9mZ3xQ7v..."
    export OPENAI_API_KEY="sk-..."
    ```

    ```python theme={"dark"}
    import os
    import re
    from openai import OpenAI
    from supertone import Supertone

    VOICE_ID = "20160a4c5ba38967330c84"  # replace with your voice ID
    MODEL = "supertonic_api_3"            # try sona_speech_2_flash for higher quality

    SENTENCE_END = re.compile(r"[.!?。！？]\s+")
    # Supertone TTS rejects text containing '#' (reserved). Instruction-tuned
    # LLMs often emit markdown — strip the common inline markers before sending.
    MARKDOWN_MARKERS = re.compile(r"[#*_`]+")

    def for_tts(text: str) -> str:
        return MARKDOWN_MARKERS.sub("", text).strip()

    def sentences_from_stream(token_stream):
        buffer = ""
        for token in token_stream:
            buffer += token
            while True:
                match = SENTENCE_END.search(buffer)
                if not match:
                    break
                sentence = for_tts(buffer[: match.end()])
                if sentence:
                    yield sentence
                buffer = buffer[match.end():]
        tail = for_tts(buffer)
        if tail:
            yield tail

    def stream_openai_tokens(prompt: str):
        openai = OpenAI()  # reads OPENAI_API_KEY from env
        stream = openai.chat.completions.create(
            model="gpt-4o-mini",
            messages=[{"role": "user", "content": prompt}],
            stream=True,
        )
        for chunk in stream:
            delta = chunk.choices[0].delta.content
            if delta:
                yield delta

    def main():
        prompt = "Tell me a short story about a curious robot in three sentences."
        out_path = "response.wav"
        open(out_path, "wb").close()  # truncate

        with Supertone(api_key=os.environ["SUPERTONE_API_KEY"]) as supertone:
            for sentence in sentences_from_stream(stream_openai_tokens(prompt)):
                print(f"→ {sentence}")
                response = supertone.text_to_speech.create_speech(
                    voice_id=VOICE_ID,
                    text=sentence,
                    language="en",
                    model=MODEL,
                )
                with open(out_path, "ab") as f:
                    f.write(response.result.read())

        print(f"Saved {out_path}")

    if __name__ == "__main__":
        main()
    ```
  </Tab>

  <Tab title="TypeScript · Anthropic">
    ```bash theme={"dark"}
    npm add @supertone/supertone @anthropic-ai/sdk
    export SUPERTONE_API_KEY="Kp9mZ3xQ7v..."
    export ANTHROPIC_API_KEY="sk-ant-..."
    ```

    ```typescript theme={"dark"}
    import Anthropic from "@anthropic-ai/sdk";
    import { Supertone } from "@supertone/supertone";
    import * as fs from "node:fs";

    const VOICE_ID = "20160a4c5ba38967330c84"; // replace with your voice ID
    const MODEL = "supertonic_api_3";          // try sona_speech_2_flash for higher quality

    const SENTENCE_END = /[.!?。！？]\s+/;
    // Supertone TTS rejects text containing '#' (reserved). Instruction-tuned
    // LLMs often emit markdown — strip the common inline markers before sending.
    const MARKDOWN_MARKERS = /[#*_`]+/g;
    const forTts = (text: string) => text.replace(MARKDOWN_MARKERS, "").trim();

    async function* sentencesFromStream(tokenStream: AsyncIterable<string>) {
      let buffer = "";
      for await (const token of tokenStream) {
        buffer += token;
        while (true) {
          const match = SENTENCE_END.exec(buffer);
          if (!match) break;
          const sentence = forTts(buffer.slice(0, match.index + match[0].length));
          if (sentence) yield sentence;
          buffer = buffer.slice(match.index + match[0].length);
        }
      }
      const tail = forTts(buffer);
      if (tail) yield tail;
    }

    async function* streamClaudeTokens(prompt: string) {
      const anthropic = new Anthropic(); // reads ANTHROPIC_API_KEY from env
      const stream = anthropic.messages.stream({
        model: "claude-sonnet-4-5",
        max_tokens: 1024,
        messages: [{ role: "user", content: prompt }],
      });
      for await (const event of stream) {
        if (event.type === "content_block_delta" && event.delta.type === "text_delta") {
          yield event.delta.text;
        }
      }
    }

    async function main() {
      const prompt = "Tell me a short story about a curious robot in three sentences.";
      const outPath = "response.wav";
      fs.writeFileSync(outPath, Buffer.alloc(0));

      const supertone = new Supertone({ apiKey: process.env.SUPERTONE_API_KEY });

      for await (const sentence of sentencesFromStream(streamClaudeTokens(prompt))) {
        console.log(`→ ${sentence}`);
        const response = await supertone.textToSpeech.createSpeech({
          voiceId: VOICE_ID,
          apiConvertTextToSpeechUsingCharacterRequest: {
            text: sentence,
            language: "en",
            model: MODEL,
          },
        });

        if (response.result instanceof Uint8Array) {
          fs.appendFileSync(outPath, response.result);
        } else if (response.result && "getReader" in response.result) {
          const reader = (response.result as ReadableStream<Uint8Array>).getReader();
          while (true) {
            const { done, value } = await reader.read();
            if (done) break;
            if (value) fs.appendFileSync(outPath, value);
          }
        }
      }

      console.log(`Saved ${outPath}`);
    }

    main();
    ```
  </Tab>

  <Tab title="TypeScript · OpenAI">
    ```bash theme={"dark"}
    npm add @supertone/supertone openai
    export SUPERTONE_API_KEY="Kp9mZ3xQ7v..."
    export OPENAI_API_KEY="sk-..."
    ```

    ```typescript theme={"dark"}
    import OpenAI from "openai";
    import { Supertone } from "@supertone/supertone";
    import * as fs from "node:fs";

    const VOICE_ID = "20160a4c5ba38967330c84"; // replace with your voice ID
    const MODEL = "supertonic_api_3";          // try sona_speech_2_flash for higher quality

    const SENTENCE_END = /[.!?。！？]\s+/;
    // Supertone TTS rejects text containing '#' (reserved). Instruction-tuned
    // LLMs often emit markdown — strip the common inline markers before sending.
    const MARKDOWN_MARKERS = /[#*_`]+/g;
    const forTts = (text: string) => text.replace(MARKDOWN_MARKERS, "").trim();

    async function* sentencesFromStream(tokenStream: AsyncIterable<string>) {
      let buffer = "";
      for await (const token of tokenStream) {
        buffer += token;
        while (true) {
          const match = SENTENCE_END.exec(buffer);
          if (!match) break;
          const sentence = forTts(buffer.slice(0, match.index + match[0].length));
          if (sentence) yield sentence;
          buffer = buffer.slice(match.index + match[0].length);
        }
      }
      const tail = forTts(buffer);
      if (tail) yield tail;
    }

    async function* streamOpenAITokens(prompt: string) {
      const openai = new OpenAI(); // reads OPENAI_API_KEY from env
      const stream = await openai.chat.completions.create({
        model: "gpt-4o-mini",
        messages: [{ role: "user", content: prompt }],
        stream: true,
      });
      for await (const chunk of stream) {
        const delta = chunk.choices[0]?.delta?.content;
        if (delta) yield delta;
      }
    }

    async function main() {
      const prompt = "Tell me a short story about a curious robot in three sentences.";
      const outPath = "response.wav";
      fs.writeFileSync(outPath, Buffer.alloc(0));

      const supertone = new Supertone({ apiKey: process.env.SUPERTONE_API_KEY });

      for await (const sentence of sentencesFromStream(streamOpenAITokens(prompt))) {
        console.log(`→ ${sentence}`);
        const response = await supertone.textToSpeech.createSpeech({
          voiceId: VOICE_ID,
          apiConvertTextToSpeechUsingCharacterRequest: {
            text: sentence,
            language: "en",
            model: MODEL,
          },
        });

        if (response.result instanceof Uint8Array) {
          fs.appendFileSync(outPath, response.result);
        } else if (response.result && "getReader" in response.result) {
          const reader = (response.result as ReadableStream<Uint8Array>).getReader();
          while (true) {
            const { done, value } = await reader.read();
            if (done) break;
            if (value) fs.appendFileSync(outPath, value);
          }
        }
      }

      console.log(`Saved ${outPath}`);
    }

    main();
    ```
  </Tab>
</Tabs>

## Design notes

* **Sentence batching matters.** Sending one token at a time produces choppy, unnatural speech. The sentence splitter above flushes on `.`, `!`, `?`, `。`, `！`, `？`. For lower latency, you can also flush on a comma once the buffer exceeds \~60 characters.
* **Strip markdown before sending.** Instruction-tuned models (Claude especially) often wrap their answers in markdown — headings like `# Title`, bold `**text**`, code spans, etc. Supertone TTS rejects text containing `#` (it's a reserved character), so the snippets above pipe every sentence through a small `for_tts` / `forTts` helper that removes `#`, `*`, `_`, and backticks. Without it, the first sentence of a Claude response will commonly fail with a 400.
* **Model choice for latency.** Reserve `sona_speech_2` for offline / high-quality use cases where the user can wait. `sona_speech_2_flash` is a good balance of quality and speed. `supertonic_api_3` gives the fastest time-to-first-audio with high speech stability. `sona_speech_1` is the only model that supports `stream_speech` chunked streaming — useful if a single sentence is long and you want to start playing before it finishes.
* **Saving vs playing.** The examples append every audio chunk to `response.wav`. In a real agent you'd pipe each clip into your audio output (Web Audio, PortAudio, etc.) instead of (or in addition to) writing to disk.
* **Connection reuse.** Reuse the Supertone client across requests — don't recreate it per sentence.
* **Long sentences.** If a single sentence exceeds 300 characters, the SDK auto-chunks it internally, so you don't need to split further.

## Related

<CardGroup cols={2}>
  <Card title="Models" icon="layer-group" href="/en/docs/core-concepts/models">
    Pick the right model for your latency budget.
  </Card>

  <Card title="Latency optimization" icon="gauge-high" href="/en/docs/production/latency-optimization">
    More tips for reducing time-to-audio.
  </Card>
</CardGroup>
