Multi-Voice Text-to-Speech for Stories and Audiobooks
Enter your text below. Use speaker tags like [narrator]text[/narrator] or
[speaker1]text[/speaker1] for multi-voice stories, or plain text for single voice.
You can use any name: [alice], [bob], [narrator], etc.
💡 Tip: You can submit multiple jobs! Check the Job Queue tab to monitor all your generations.
Monitor active and pending audio generation jobs.
Loading queue...
Browse built-in, custom, and prompt-based voice libraries.
Explore all Kokoro voices. Click any voice to hear a preview sample.
Blend two Kokoro voices and manage your custom library.
Loading custom voices...
Design a Qwen3 voice, preview it, then save it to your prompts.
Manage prompt libraries for Chatterbox, VoxCPM, and Qwen3.
Drop multiple WAV/MP3 clips (5–10s). We'll keep any clips ≥5s and name them after the file.
| Name | Gender | Language | Duration | Source | Actions | |
|---|---|---|---|---|---|---|
| No saved voices yet. Add one above. | ||||||
Hidden voices live here until you restore them.
| Name | Gender | Language | Duration | Source | Actions | |
|---|---|---|---|---|---|---|
| No archived voices. | ||||||
Configure defaults, engine details, audio generation, and LLM prep behavior.
Default engine and output format used when TTS-Story starts.
Kokoro uses built-in voices. Configure the default voice on the Generate tab.
For Replicate, add your API key in the API Keys tab.
Local Chatterbox Turbo. Requires CUDA GPU with ~8GB VRAM.
Cloud-hosted Chatterbox on Replicate. Requires API key.
VoxCPM 1.5 for expressive speech and voice cloning. Uses same prompts as Chatterbox.
CPU-only Pocket TTS with voice prompts (.wav/.mp3 or .safetensors). Built-in voices run faster than cloning prompts.
Ultra-lightweight (<25MB) CPU-only TTS. No GPU required. Install: pip install https://github.com/KittenML/KittenTTS/releases/download/0.8/kittentts-0.8.0-py3-none-any.whl
Zero-shot voice cloning by Bilibili. Requires an isolated venv under engines/index-tts/. Uses the existing Voice Prompts library for reference audio.
engines/index-tts/checkpoints/.
Qwen3-TTS CustomVoice with built-in speakers and instruction control.
For faster GPU inference, install Microsoft Visual C++ Build Tools and select “Desktop development with C++” to enable flash-attn.
API keys for cloud-based engines.
Use Gemini or a local LLM (LM Studio / Ollama) to clean up or enhance text before TTS.
/v1. Ollama uses http://localhost:11434.
Browse tips and walkthroughs for each part of the Generate page.