What you can do
- Synthesize speech with control over voice, language, speed, pitch, and emotion style.
- Discover voices by language, gender, age, use case, or style — and preview samples before committing.
- Clone and manage custom voices from a local audio file.
- Track usage — check credit balance and usage history.
- Stitch audio — merge multiple clips into one file, with optional silence gaps or crossfades.
Prerequisites
uvinstalled (providesuvx), or Python withpip.- A Supertone API key from the Developer Console.
Install
Every client runs the same server —uvx supertone-mcp — with your API key passed as an environment variable. Pick your client below.
- Cursor
- Claude Desktop
- Claude Code
- VS Code
- Windsurf
Add to
~/.cursor/mcp.json (global) or .cursor/mcp.json (per-project), then fill in your API key:Environment variables
| Variable | Required | Default | Purpose |
|---|---|---|---|
SUPERTONE_API_KEY | Yes | — | Authentication |
SUPERTONE_MCP_VOICE_ID | No | Aiden (multilingual) | Default voice_id for text_to_speech |
SUPERTONE_OUTPUT_DIR | No | ~/supertone-tts-output/ | Where generated audio files are saved |
Tools
The server exposes its capabilities as composable building blocks the agent can chain.Speech synthesis
Speech synthesis
| Tool | Description |
|---|---|
text_to_speech | Generate audio with control over speed, pitch, emotion style, and output format. |
predict_duration | Estimate synthesis duration and credit cost before generating. |
Voice discovery
Voice discovery
| Tool | Description |
|---|---|
search_voice | Filter preset voices by language, gender, age, use case, or style. |
get_voice | Retrieve full details for a voice. |
preview_voice | Fetch sample audio URLs to evaluate a voice. |
Voice cloning
Voice cloning
| Tool | Description |
|---|---|
clone_voice | Create a cloned voice from a local WAV/MP3 (≤ 3 MB). |
search_custom_voice | List and filter your cloned voices. |
get_custom_voice | Fetch details for a cloned voice. |
edit_custom_voice | Update a cloned voice’s name or description. |
delete_custom_voice | Permanently remove a cloned voice (irreversible). |
Usage & credits
Usage & credits
| Tool | Description |
|---|---|
get_credit_balance | Check remaining credits. |
get_usage_history | View usage over a time window. |
get_voice_usage | Usage metrics for a specific voice. |
Audio editing
Audio editing
| Tool | Description |
|---|---|
merge_audio_files | Merge two or more local audio files into one — plain concatenation, silence gaps (gap_ms), or crossfade blending (crossfade_ms). Useful for stitching multiple text_to_speech outputs. |
Key text_to_speech parameters
text(required),voice_id,language,output_format(mp3/wav)model— e.g.sona_speech_2_flash,sona_speech_1speed(0.5–2.0),pitch_shift(−24 to +24 semitones),styleoutput_mode(files/resources/both),autoplay(defaultfalse),streaming(sona_speech_1only)
Key merge_audio_files parameters
input_paths(required) — two or more local audio file paths, in order. (A single path is returned unchanged.)gap_ms— silence inserted between clips, in milliseconds.crossfade_ms— crossfade blend between clips, in milliseconds. Mutually exclusive withgap_ms.output_format— override the output format. By default it’s auto-detected: all inputs sharing an extension → that extension; mixed →mp3. Differing sample rates or channel counts are normalized automatically before merging.
imageio-ffmpeg, so merging works out of the box with uvx supertone-mcp — no system ffmpeg install required.
Example workflows
Discover → preview → estimate → synthesize
“Find a calm Korean female voice, let me hear a sample, check the cost, then make this announcement as mp3.”Chains
search_voice() → preview_voice() → predict_duration() + get_credit_balance() → text_to_speech().Clone and use immediately
“Create a cloned voice from ~/recordings/sample.wav named MyVoice, then read this greeting with it and play it.”Chains
clone_voice() → get_custom_voice() → text_to_speech(autoplay=true).Narrate a script and stitch it together
“Generate each paragraph of this script, then merge them into one mp3 with a short pause between each.”Chains
text_to_speech() per segment → merge_audio_files(gap_ms=...).Troubleshooting
The client doesn't list the Supertone tools
The client doesn't list the Supertone tools
Make sure the config file is valid JSON and the client was fully restarted. Most clients only load MCP servers at startup.
uvx: command not found
uvx: command not found
Install
uv (which provides uvx): see the uv install guide. Alternatively pip install supertone-mcp and set the command to supertone-mcp.Authentication errors
Authentication errors
Confirm
SUPERTONE_API_KEY is set in the server’s env block (not just your shell) and is valid. Get a key from the Developer Console.Where did my audio go?
Where did my audio go?
With
output_mode: files, audio is written to SUPERTONE_OUTPUT_DIR (default ~/supertone-tts-output/). Set autoplay: true to also play it immediately.Related
CLI
The same capabilities from your terminal and scripts.
Custom voices
How voice cloning works on Supertone.