Voice IDs are not environment variables — they change per use case, so keep them as plain strings in your code (or pass them from your request payload).
The recommended pattern uses a context manager so the underlying HTTP connection is closed cleanly:
import osfrom supertone import SupertoneVOICE_ID = "20160a4c5ba38967330c84" # replace with your voice IDwith Supertone(api_key=os.environ["SUPERTONE_API_KEY"]) as client: response = client.text_to_speech.create_speech( voice_id=VOICE_ID, text="Hello from the Python SDK.", language="en", output_format="wav", ) with open("speech.wav", "wb") as f: f.write(response.result.read())
import asyncioimport osfrom supertone import SupertoneVOICE_ID = "20160a4c5ba38967330c84" # replace with your voice IDasync def main(): async with Supertone(api_key=os.environ["SUPERTONE_API_KEY"]) as client: response = await client.text_to_speech.create_speech_async( voice_id=VOICE_ID, text="Hello from the async Python SDK.", language="en", ) with open("speech.wav", "wb") as f: f.write(response.result.read())asyncio.run(main())
Every resource method on the SDK has both forms: create_speech / create_speech_async, stream_speech / stream_speech_async, list_voices / list_voices_async, and so on.
Streaming returns an iterator (or async iterator) of audio chunks:
import osfrom supertone import SupertoneVOICE_ID = "20160a4c5ba38967330c84" # replace with your voice IDwith Supertone(api_key=os.environ["SUPERTONE_API_KEY"]) as client: response = client.text_to_speech.stream_speech( voice_id=VOICE_ID, text="This response is streamed chunk by chunk.", language="en", model="sona_speech_1", ) with open("streamed.wav", "wb") as f: for chunk in response.result.iter_bytes(): f.write(chunk)
Async equivalent uses async for chunk in response.result.aiter_bytes(). Streaming is currently supported on sona_speech_1 only.
create_speech, create_speech_async, stream_speech, and stream_speech_async automatically split text longer than 300 characters. create_speech runs up to 3 segments in parallel and merges the audio; stream_speech runs segments sequentially and forwards chunks to your iterator.
LONG_TEXT = "..." # any length, including thousands of charactersresponse = client.text_to_speech.create_speech( voice_id=VOICE_ID, text=LONG_TEXT, language="en",)with open("narration.wav", "wb") as f: f.write(response.result.read()) # single merged file
predict_duration does not auto-chunk — keep that input under 300 characters and sum durations manually for longer scripts.See Long text for details and tuning.
# List voices with paginationresult = client.voices.list_voices(page_size=20)# Search voicesresult = client.voices.search_voices(language="ko,en", style="happy")# Get a single voicevoice = client.voices.get_voice(voice_id=VOICE_ID)# Predict duration (no credits deducted)duration = client.text_to_speech.predict_duration( voice_id=VOICE_ID, text="How long will this be?", language="en",)# Get credit balancebalance = client.usage.get_credit_balance()
Errors live in supertone.errors and all extend SupertoneError:
from supertone import Supertone, errorstry: response = client.text_to_speech.create_speech(...)except errors.TooManyRequestsErrorResponse as e: # 429 — back off and retry print("Rate limited:", e.message)except errors.UnauthorizedErrorResponse as e: # 401 — bad or missing API key print("Auth failed:", e.message)except errors.PaymentRequiredErrorResponse as e: # 402 — out of credits print("Buy more credits:", e.message)except errors.SupertoneError as e: # Any other API error print(f"HTTP {e.status_code}: {e.message}")
Error class
HTTP status
BadRequestErrorResponse
400
UnauthorizedErrorResponse
401
PaymentRequiredErrorResponse
402
ForbiddenErrorResponse
403
NotFoundErrorResponse
404
RequestTimeoutErrorResponse
408
PayloadTooLargeErrorResponse
413
UnsupportedMediaTypeErrorResponse
415
TooManyRequestsErrorResponse
429
InternalServerErrorResponse
500
Network errors (DNS failure, broken pipe, etc.) come from httpx and don’t inherit from SupertoneError.