Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.supertoneapi.com/llms.txt

Use this file to discover all available pages before exploring further.

The official Python SDK is published as supertone on PyPI. Source: supertone-inc/supertone-python.

Installation

pip install supertone
Requires Python 3.9+.

Set your API key

The SDK does not auto-read from the environment. Pass api_key explicitly — the convention is to read from SUPERTONE_API_KEY.
export SUPERTONE_API_KEY="Kp9mZ3xQ7v..."
import os
from supertone import Supertone

client = Supertone(api_key=os.environ["SUPERTONE_API_KEY"])
Voice IDs are not environment variables — they change per use case, so keep them as plain strings in your code (or pass them from your request payload).

Generate speech (sync)

The recommended pattern uses a context manager so the underlying HTTP connection is closed cleanly:
import os
from supertone import Supertone

VOICE_ID = "20160a4c5ba38967330c84"  # replace with your voice ID

with Supertone(api_key=os.environ["SUPERTONE_API_KEY"]) as client:
    response = client.text_to_speech.create_speech(
        voice_id=VOICE_ID,
        text="Hello from the Python SDK.",
        language="en",
        output_format="wav",
    )

    with open("speech.wav", "wb") as f:
        f.write(response.result.read())

Generate speech (async)

Use the _async suffix and async with:
import asyncio
import os
from supertone import Supertone

VOICE_ID = "20160a4c5ba38967330c84"  # replace with your voice ID

async def main():
    async with Supertone(api_key=os.environ["SUPERTONE_API_KEY"]) as client:
        response = await client.text_to_speech.create_speech_async(
            voice_id=VOICE_ID,
            text="Hello from the async Python SDK.",
            language="en",
        )

        with open("speech.wav", "wb") as f:
            f.write(response.result.read())

asyncio.run(main())
Every resource method on the SDK has both forms: create_speech / create_speech_async, stream_speech / stream_speech_async, list_voices / list_voices_async, and so on.

Stream speech

Streaming returns an iterator (or async iterator) of audio chunks:
import os
from supertone import Supertone

VOICE_ID = "20160a4c5ba38967330c84"  # replace with your voice ID

with Supertone(api_key=os.environ["SUPERTONE_API_KEY"]) as client:
    response = client.text_to_speech.stream_speech(
        voice_id=VOICE_ID,
        text="This response is streamed chunk by chunk.",
        language="en",
        model="sona_speech_1",
    )

    with open("streamed.wav", "wb") as f:
        for chunk in response.result.iter_bytes():
            f.write(chunk)
Async equivalent uses async for chunk in response.result.aiter_bytes(). Streaming is currently supported on sona_speech_1 only.

Long text auto-chunking

create_speech, create_speech_async, stream_speech, and stream_speech_async automatically split text longer than 300 characters. create_speech runs up to 3 segments in parallel and merges the audio; stream_speech runs segments sequentially and forwards chunks to your iterator.
LONG_TEXT = "..."  # any length, including thousands of characters

response = client.text_to_speech.create_speech(
    voice_id=VOICE_ID,
    text=LONG_TEXT,
    language="en",
)

with open("narration.wav", "wb") as f:
    f.write(response.result.read())   # single merged file
predict_duration does not auto-chunk — keep that input under 300 characters and sum durations manually for longer scripts. See Long text for details and tuning.

Common operations

# List voices with pagination
result = client.voices.list_voices(page_size=20)

# Search voices
result = client.voices.search_voices(language="ko,en", style="happy")

# Get a single voice
voice = client.voices.get_voice(voice_id=VOICE_ID)

# Predict duration (no credits deducted)
duration = client.text_to_speech.predict_duration(
    voice_id=VOICE_ID,
    text="How long will this be?",
    language="en",
)

# Get credit balance
balance = client.usage.get_credit_balance()

Type-safe enums (optional)

For type safety, the SDK exposes enum constants in supertone.models:
from supertone import models

models.APIConvertTextToSpeechUsingCharacterRequestLanguage.EN       # "en"
models.APIConvertTextToSpeechUsingCharacterRequestModel.SONA_SPEECH_1  # "sona_speech_1"
Plain strings ("en", "sona_speech_1") work too — use whichever style you prefer.

Error handling

Errors live in supertone.errors and all extend SupertoneError:
from supertone import Supertone, errors

try:
    response = client.text_to_speech.create_speech(...)
except errors.TooManyRequestsErrorResponse as e:
    # 429 — back off and retry
    print("Rate limited:", e.message)
except errors.UnauthorizedErrorResponse as e:
    # 401 — bad or missing API key
    print("Auth failed:", e.message)
except errors.PaymentRequiredErrorResponse as e:
    # 402 — out of credits
    print("Buy more credits:", e.message)
except errors.SupertoneError as e:
    # Any other API error
    print(f"HTTP {e.status_code}: {e.message}")
Error classHTTP status
BadRequestErrorResponse400
UnauthorizedErrorResponse401
PaymentRequiredErrorResponse402
ForbiddenErrorResponse403
NotFoundErrorResponse404
RequestTimeoutErrorResponse408
PayloadTooLargeErrorResponse413
UnsupportedMediaTypeErrorResponse415
TooManyRequestsErrorResponse429
InternalServerErrorResponse500
Network errors (DNS failure, broken pipe, etc.) come from httpx and don’t inherit from SupertoneError.

Configuration

from supertone import Supertone
from supertone.utils.retries import RetryConfig

client = Supertone(
    api_key=os.environ["SUPERTONE_API_KEY"],
    timeout_ms=30_000,
    retry_config=RetryConfig(
        strategy="backoff",
        backoff={"initial_interval": 500, "max_interval": 60_000},
        retry_connection_errors=True,
    ),
)

TypeScript SDK

The equivalent SDK for Node and Bun.

Examples

Recipes for common workflows.