Skip to content
Stable Diffusion API

Text to Voice API — Clone & Speak

Clone a voice from reference audio and synthesize speech in that voice. Multilingual.

Text to Voice (Cloning)

Clone a target voice from a reference clip, then synthesize new speech in that voice.

curl -X POST 'https://stablediffusionapi.com/api/v6/text_to_voice' \
  -d '{
    "key": "YOUR_API_KEY",
    "text": "Speak this in the cloned voice.",
    "reference_audio": "https://example.com/voice-sample.wav",
    "language": "en"
  }'

Best results: 5-10 second clean reference audio, single speaker, no music.