Skip to main content
Transient failures (429, 408, 500, network drops) are normal in any networked service. The right response is exponential backoff with jitter, applied only to retryable errors. Both SDKs ship with a configurable retry policy out of the box.

What to retry

Status / failureRetryable?Notes
408 Request Timeoutโœ…Server side โ€” wait briefly and retry.
429 Too Many Requestsโœ…Rate limit; back off.
500 Internal Server Errorโœ…Treat as transient.
502, 503, 504โœ…Network/upstream โ€” retry with backoff.
Network errors (DNS, broken pipe, connect timeout)โœ…Retry โ€” these are pure transport failures.
400, 401, 402, 403, 404, 413, 415โŒCaller-side problem โ€” fix the request first.

SDK configuration

Both SDKs accept a retry config at the client level and per-call.
from supertone import Supertone
from supertone.utils.retries import RetryConfig

client = Supertone(
    api_key=os.environ["SUPERTONE_API_KEY"],
    retry_config=RetryConfig(
        strategy="backoff",
        backoff={
            "initial_interval": 500,        # ms
            "max_interval": 60_000,         # ms
            "exponent": 1.5,
            "max_elapsed_time": 3_600_000,  # ms (1 hour cap)
        },
        retry_connection_errors=True,
    ),
)
This retries on the SDKโ€™s default set of retryable status codes (429, 5xx) with exponential backoff, capped at 1 minute between retries and 1 hour total. Tune the numbers to your latency SLO.

Manual retry pattern

If you call the REST API directly or want to wrap the SDK with your own logic:
import random
import time

def call_with_backoff(fn, *, max_attempts=5, base_ms=500, max_ms=60_000):
    for attempt in range(max_attempts):
        try:
            return fn()
        except (
            errors.TooManyRequestsErrorResponse,
            errors.InternalServerErrorResponse,
            errors.RequestTimeoutErrorResponse,
        ) as e:
            if attempt == max_attempts - 1:
                raise
            delay_ms = min(base_ms * (2 ** attempt), max_ms)
            # Jitter to avoid thundering herd
            delay_ms = delay_ms / 2 + random.uniform(0, delay_ms / 2)
            time.sleep(delay_ms / 1000)

Choosing the right backoff

  • 429 from rate limiting โ€” initial wait of 500โ€“1000 ms doubles to 30โ€“60 s; cap retries at 3โ€“5 attempts.
  • 500/5xx from transient errors โ€” same shape, slightly more aggressive (300 ms initial) since they usually clear quickly.
  • Streaming requests โ€” retrying a partial stream is fine if you havenโ€™t started playback yet; once playback has begun, itโ€™s usually better to fail than to splice in fresh audio. Decide based on UX.
  • Long-text auto-chunking โ€” both SDKs apply the same retry policy to each underlying segment, so a single long-text call effectively has the retry budget per segment.

Idempotency

Supertone API calls are idempotent in effect โ€” the same request produces the same audio output (modulo small synthesis variability). Retrying a successful-but-aborted request wonโ€™t double-bill you for credits as long as the original request didnโ€™t produce billed audio. For voice-cloning uploads, retries do create separate voices each time if the upload completes. If you re-upload after a network blip, verify whether the previous attempt actually finished before posting again โ€” otherwise youโ€™ll end up with duplicate voices in list_custom_voices.

Anti-patterns to avoid

  • Retrying 4xx (other than 429) โ€” these donโ€™t get better with time. Fix the request.
  • No upper bound โ€” always cap the number of attempts and the total elapsed time.
  • No jitter โ€” fixed-delay retries cause thundering-herd patterns under widespread failures.
  • Retrying inside a loop without backoff โ€” burns through credits and rate-limit budget in seconds.

Rate limits

What triggers 429 in the first place.

Error handling

Full reference of error codes and SDK error classes.