Meet OmniVoice Studio: The Local, Open Source Alternative at ElevenLabs

OmniVoice Studio – How to Use it
01 / 08
What is OmniVoice Studio?
OmniVoice Studio is an app an open source desktop application voice cloning, video copying, real-time calling, and speaker dyeing. Everything works locally on your device. No API keys, no cloud account, no registration required.
- 646 languages TTS is supported by the default OmniVoice engine
- 99 languages by writing with WhisperX
- Available at macOS, Windows, and Linux
- GPU is optional – the full pipeline runs on the CPU
- Free for personal use, teaching, and research (FSL-1.1-ALv2)
OmniVoice Studio – How to Use it
02 / 08
System Requirements
GPU is optional. With one exception, TTS works approx 3× less on the CPU. With ≤8 GB VRAM, TTS automatically loads on the CPU during transcription — no configuration required.
| Element | The minimum | Recommended |
|---|---|---|
| OS | Win 10 / macOS 12+ / Ubuntu 20.04+ | Any modern 64-bit OS |
| RAM | 8GB | 16GB+ |
| VRAM | 4 GB (autoload) | 8 GB+ (RTX 3060+) |
| The disc | 10 GB for free | 20 GB+ SSD |
| Python | 3.10+ | 3.11–3.12 |
| The GPU | Optional | CUDA / MPS / ROCm |
OmniVoice Studio – How to Use it
03 / 08
Installation
The project recommends that you start with the source. Enter the three requirements first: ffmpeg, The Bun (JS runtime), and uv (Python package manager).
git clone
cd OmniVoice-Studio
uv sync
bun install
bun dev
It’s in the front loading to | API it works on port 8000.
Model weights are automatically downloaded from the first generation.
Prebuilt installers are available: macOS DMG, Windows MSI, Linux AppImage and .deb — see the Releases page on GitHub.
OmniVoice Studio – How to Use it
04 / 08
Voice Cloning
Use of Voice cloning zero shot reading — joins the word in a short clip like 3 secondswithout prior training in that voice. The default OmniVoice engine places a broadcast-based TTS model on the reference audio.
- Go to Voice Clone tab in the UI
- Upload or record a 3 second audio clip of the target word
- Enter your text and select the target language (646 available)
- Click Produce — the output is saved to your project library
Voice Gallery: Search YouTube, browse categories, and download reference clips right within the app to build your voice library.
OmniVoice Studio – How to Use it
05 / 08
Video Copying
The full copy pipeline works in place: write → translate → synthesize → mux. Demucs separates the vocals so that the original background sound is preserved in the final export.
- Go to Dub tab — paste a YouTube URL or upload a local file
- WhisperX records speech with word level compatibility
- Select the target language; translation starts automatically
- The TTS engine repeats the transcription; Demucs keep background noise
- Take out the last one MP4 with integrated integrated sound
Collection line: Stream up to 50 videos and go. Each task has its own tracking bar with a full pipeline.
OmniVoice Studio – How to Use it
06/08
Dictation & Speaker Diarization
Summoning works in a wide range of applications from any operating system. Dialing identifies individual speakers in a multi-speaker audio file using Pyannote + WhisperX.
- Press ⌘+⇧+Space (macOS) to open the floating call widget
- The expression broadcasts via WebSocket and automatically attaches to the active input field
- Upload a multi-speaker file to Dialing tab
- Pianonote points who said; each speaker gets an automatically generated voice profile
- Assign a TTS voice to each speaker by duplicating each speaker
An equivalent face token is required because Piannote diarization. See docs/setup/huggingface-token.md in the repo.
OmniVoice Studio – How to Use it
07/08
TTS engines
Six TTS engines are built in. Change using Settings → TTS Engine or env var:OMNIVOICE_TTS_BACKEND=cosyvoice
| Engine | Languages | Clone | The platform |
|---|---|---|---|
| OmniVoice (default) | 600+ | ✓ | CUDA / MPS / CPU |
| CozyVoice 3 | 9 + 18 dialects | ✓ | CUDA / MPS / CPU |
| MLX-Audio | More | It varies | Apple Silicon only |
| VoxCPM2 | 30 | ✓ | CUDA / MPS / CPU |
| MOSS-TTS-Nano | 20 | ✓ | CUDA / CPU |
| KittenTTS | English | ✗ | CPU only |
Custom engine: Subclass TTSBakend in backend/services/tts_backend.py and add it to _REGISTRY. ~ 50 lines of Python.
OmniVoice Studio – How to Use it
08/08
MCP Server and Services
OmniVoice Studio sends built-in MCP serverwhich exposes voice and copy capabilities to any MCP-compatible client – Claude, Cursor, or your tools – without opening the desktop UI.
- The MCP server starts with the FastAPI backend enabled dev
- Point your MCP client to a local server to access all endpoints
- AudioSeal (Meta) embeds an invisible neural watermark in every AI generated sound
- GitHub: github.com/debpalash/OmniVoice-Studio
- Enter the documents: docs/install/ (macos/windows/linux/docker)
- To solve the problem: docs/install/troubleshooting.md
- Disagreements: discord.gg/bzQavDfVV9



