ElBruno.VibeVoiceTTS
0.5.0
dotnet add package ElBruno.VibeVoiceTTS --version 0.5.0
NuGet\Install-Package ElBruno.VibeVoiceTTS -Version 0.5.0
<PackageReference Include="ElBruno.VibeVoiceTTS" Version="0.5.0" />
<PackageVersion Include="ElBruno.VibeVoiceTTS" Version="0.5.0" />
<PackageReference Include="ElBruno.VibeVoiceTTS" />
paket add ElBruno.VibeVoiceTTS --version 0.5.0
#r "nuget: ElBruno.VibeVoiceTTS, 0.5.0"
#:package ElBruno.VibeVoiceTTS@0.5.0
#addin nuget:?package=ElBruno.VibeVoiceTTS&version=0.5.0
#tool nuget:?package=ElBruno.VibeVoiceTTS&version=0.5.0
๐๏ธ VibeVoiceTTS
A .NET library for text-to-speech synthesis using Microsoft's VibeVoice-Realtime-0.5B โ native C# inference via ONNX Runtime, no Python required at runtime.
Features
- ๐ Natural Text-to-Speech โ High-quality speech synthesis powered by VibeVoice-Realtime-0.5B
- ๐ฆ NuGet Package โ
ElBruno.VibeVoiceTTSโ install and start generating speech in minutes - ๐ค Pure C# Inference โ ONNX Runtime, zero Python dependency at runtime
- ๐ GPU Acceleration โ DirectML (any Windows GPU) and CUDA (NVIDIA) support with automatic CPU fallback
- ๐ฅ Auto-Download โ Models automatically downloaded from ๐ค HuggingFace on first use
- ๐ 6 Voice Presets โ Carter, Davis, Emma, Frank, Grace, Mike (English voices with multilingual experimental support)
- ๐ Dependency Injection โ First-class
IServiceCollectionintegration - ๐ฅ๏ธ Cross-Platform โ Windows, Linux, macOS, MAUI-ready
Installation
dotnet add package ElBruno.VibeVoiceTTS
Quick Start
1) Generate speech and save to WAV
using ElBruno.VibeVoiceTTS;
using var tts = new VibeVoiceSynthesizer();
await tts.EnsureModelAvailableAsync(); // auto-downloads ~1.5 GB on first run
float[] audio = await tts.GenerateAudioAsync("Hello! Welcome to VibeVoiceTTS.", "Carter");
tts.SaveWav("output.wav", audio);
2) Use voice presets
// Use the enum (recommended)
float[] carter = await tts.GenerateAudioAsync("Hello from Carter!", VibeVoicePreset.Carter);
float[] emma = await tts.GenerateAudioAsync("Hello from Emma!", VibeVoicePreset.Emma);
// Or use a string name โ both short and internal names work
float[] audio = await tts.GenerateAudioAsync("Hello!", "Carter");
float[] audio2 = await tts.GenerateAudioAsync("Hello!", "en-Carter_man"); // also works
3) Discover available voices
// Voices currently downloaded on disk
string[] available = tts.GetAvailableVoices();
// โ ["Carter", "Emma"] (default download includes Carter and Emma)
// All supported voices (including those not yet downloaded)
string[] supported = tts.GetSupportedVoices();
// โ ["Carter", "Davis", "Emma", "Frank", "Grace", "Mike"]
// Detailed metadata for all supported voices
VoiceInfo[] details = tts.GetSupportedVoiceDetails();
foreach (var voice in details)
Console.WriteLine($"{voice.Name} ({voice.Gender}, {voice.Language})");
๐ก On-demand voice download: Only Carter and Emma are downloaded by default with
EnsureModelAvailableAsync(). Other voices (Davis, Frank, Grace, Mike) are automatically downloaded on first use when you callGenerateAudioAsync(). You can also pre-download a specific voice:await tts.EnsureVoiceAvailableAsync("Davis", progress);
4) Track download progress
var progress = new Progress<DownloadProgress>(p =>
{
if (p.Stage == DownloadStage.Downloading)
Console.Write($"\rโฌ๏ธ [{p.CurrentFile}] {p.PercentComplete:F0}%");
else
Console.WriteLine($"{p.Stage}: {p.Message}");
});
await tts.EnsureModelAvailableAsync(progress);
5) Configure options
var options = new VibeVoiceOptions
{
ModelPath = @"D:\models\vibevoice", // Custom model location (default: OS cache)
DiffusionSteps = 20, // Quality vs speed tradeoff
CfgScale = 1.5f, // Classifier-free guidance scale
SampleRate = 24000, // Output sample rate
};
using var tts = new VibeVoiceSynthesizer(options);
| Option | Default | Description |
|---|---|---|
ModelPath |
OS cache* | Directory where ONNX models are stored and downloaded |
HuggingFaceRepo |
elbruno/VibeVoice-Realtime-0.5B-ONNX |
HuggingFace repo for model downloads |
DiffusionSteps |
20 |
Number of diffusion denoising steps |
CfgScale |
1.5 |
Classifier-free guidance scale |
SampleRate |
24000 |
Output audio sample rate (Hz) |
Seed |
42 |
Random seed for reproducible output |
ExecutionProvider |
Cpu |
ONNX Runtime execution provider (Cpu, DirectML, Cuda) |
GpuDeviceId |
0 |
GPU device index (used with DirectML or CUDA) |
*Default model cache: Windows: %LOCALAPPDATA%\ElBruno\VibeVoice\models ยท Linux/macOS: ~/.local/share/elbruno/vibevoice/models
6) GPU Acceleration
Enable GPU acceleration by setting the execution provider and installing the corresponding NuGet package:
# For DirectML (any Windows GPU โ NVIDIA, AMD, Intel):
dotnet add package Microsoft.ML.OnnxRuntime.DirectML
# For CUDA (NVIDIA only โ Windows and Linux):
dotnet add package Microsoft.ML.OnnxRuntime.Gpu
// DirectML โ recommended for Windows desktop apps
var options = new VibeVoiceOptions
{
ExecutionProvider = ExecutionProvider.DirectML,
GpuDeviceId = 0 // optional, selects which GPU
};
using var tts = new VibeVoiceSynthesizer(options);
// CUDA โ for NVIDIA GPUs with CUDA drivers
var options = new VibeVoiceOptions
{
ExecutionProvider = ExecutionProvider.Cuda,
GpuDeviceId = 0
};
using var tts = new VibeVoiceSynthesizer(options);
๐ก Note: If the selected GPU provider is unavailable (missing NuGet package or no compatible GPU), the library automatically falls back to CPU inference. When using DirectML, models with dynamic tensor shapes (LM models, acoustic decoder) run on CPU while fixed-shape models (prediction head, connector, EOS classifier) use GPU โ this works around known DirectML limitations with dynamic Reshape and ConvTranspose operations.
7) Dependency Injection
builder.Services.AddVibeVoice(options =>
{
options.DiffusionSteps = 20;
});
// Then inject IVibeVoiceSynthesizer in your services
๐ก Tip: For best results, keep sentences short (~10 words). Longer text may produce artifacts due to model limitations. Consider splitting long text into sentences.
๐ฃ๏ธ Voices & Languages
| Voice | Gender | Preset Enum | Internal Name |
|---|---|---|---|
| Carter | Male | VibeVoicePreset.Carter |
en-Carter_man |
| Davis | Male | VibeVoicePreset.Davis |
en-Davis_man |
| Emma | Female | VibeVoicePreset.Emma |
en-Emma_woman |
| Frank | Male | VibeVoicePreset.Frank |
en-Frank_man |
| Grace | Female | VibeVoicePreset.Grace |
en-Grace_woman |
| Mike | Male | VibeVoicePreset.Mike |
en-Mike_man |
All 6 voice presets are available on HuggingFace and are downloaded on-demand when first used.
โก Migration note: In versions prior to 0.2.0,
GetAvailableVoices()returned all 6 voices regardless of download status. Starting with 0.2.0, it returns only voices actually downloaded on disk. UseGetSupportedVoices()to see all 6 known presets. Voices are auto-downloaded on first use withGenerateAudioAsync(), or pre-download withEnsureVoiceAvailableAsync("Davis").
Language support: The model is primarily trained on English, with experimental multilingual capabilities (e.g., Spanish, French, German). Results may vary for non-English text.
๐ For full details on the model, supported languages, and voice characteristics, see the official VibeVoice documentation on HuggingFace and the VibeVoice GitHub repository.
For the complete API reference and advanced usage, see the Getting Started Guide.
Scenarios
This repository includes example projects demonstrating different ways to use VibeVoice:
| # | Status | Scenario | Stack | Level | Description |
|---|---|---|---|---|---|
| 1 | โ | Simple Python Script | Python | Beginner | Minimal TTS demo โ useful for model export and testing |
| 2 | โ | Full-Stack App | Python + Blazor + Aspire | Intermediate | Web app with FastAPI backend and Blazor frontend |
| 3 | โ | C# Console App | C# (.NET 8) | Beginner | Recommended starting point โ pure C# with ElBruno.VibeVoiceTTS |
| 4 | โ | Full C# with Aspire | C# + Blazor + Aspire | Intermediate | Full-stack C# app with WebAPI + Blazor frontend |
| 5 | โ | Batch Processing | Python | Intermediate | CLI to convert folders of .txt to .wav |
| 6 | โ | Real-Time Streaming | Python | Intermediate | Chunked audio playback for low-latency |
| 7 | โ | MAUI Mobile | C# (.NET 10 MAUI) | Advanced | Cross-platform app with in-process ONNX TTS via ElBruno.VibeVoiceTTS NuGet package |
| 8 | โ | ONNX Export | Python โ C# | Advanced | ONNX model export tools and pipeline docs |
Note: Python scenarios (1, 2, 5, 6) are primarily for ONNX model export, testing, and reference. The C# scenarios (3, 4) run entirely in .NET with no Python dependency. See the Scenarios Guide for details.
ONNX Models on HuggingFace
Pre-exported ONNX models are available on HuggingFace โ the C# library downloads them automatically:
๐ค elbruno/VibeVoice-Realtime-0.5B-ONNX
The model includes 9 ONNX files (autoregressive pipeline with KV-cache) and 6 voice presets. See Scenario 8 for export details.
Documentation
| Topic | Description |
|---|---|
| Getting Started | Prerequisites, setup, and first steps |
| Scenarios Guide | Detailed descriptions of all 8 scenarios |
| Architecture | System design, ONNX pipeline, and data flow |
| Project Structure | Repository layout and file organization |
| API Reference | REST API documentation (for web scenarios) |
| User Manual | End-user guide for web interfaces |
| Publishing | NuGet publishing with GitHub Actions |
Tech Stack
| Layer | Technology | Purpose |
|---|---|---|
| C# TTS Library | ElBruno.VibeVoiceTTS | Reusable .NET library with HuggingFace auto-download |
| TTS Model | VibeVoice-Realtime-0.5B | Microsoft's text-to-speech model |
| Inference | ONNX Runtime | Native C# model inference |
| Frontend | Blazor (.NET 10) | Interactive web UI |
| Orchestration | .NET Aspire | Service discovery & health checks |
Building from Source
git clone https://github.com/elbruno/ElBruno.VibeVoiceTTS.git
cd ElBruno.VibeVoiceTTS
dotnet build src/ElBruno.VibeVoiceTTS/ElBruno.VibeVoiceTTS.csproj
dotnet test src/ElBruno.VibeVoiceTTS.Tests/ElBruno.VibeVoiceTTS.Tests.csproj
Requirements
- .NET 8.0 SDK or later
- ONNX Runtime compatible platform (Windows, Linux, macOS)
- Python 3.11+ (only needed for ONNX model export โ not for runtime use)
๐ Related Projects
- <a href="https://github.com/elbruno/ElBruno.PersonaPlex">ElBruno.PersonaPlex</a> โ C# wrapper for NVIDIA's PersonaPlex-7B-v1 full-duplex speech-to-speech model, using ONNX Runtime for local inference. Pre-exported ONNX models: <a href="https://huggingface.co/elbruno/personaplex-7b-v1-onnx">elbruno/personaplex-7b-v1-onnx</a>
๐ค Contributing
Contributions are welcome! Please:
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
๐ License
This project is licensed under the MIT License โ see the LICENSE file for details.
๐ About the Author
Hi! I'm ElBruno ๐งก, a passionate developer and content creator exploring AI, .NET, and modern development practices.
Made with โค๏ธ by ElBruno
If you like this project, consider following my work across platforms:
- ๐ป Podcast: No Tienen Nombre โ Spanish-language episodes on AI, development, and tech culture
- ๐ป Blog: ElBruno.com โ Deep dives on embeddings, RAG, .NET, and local AI
- ๐บ YouTube: youtube.com/elbruno โ Demos, tutorials, and live coding
- ๐ LinkedIn: @elbruno โ Professional updates and insights
- ๐ Twitter: @elbruno โ Quick tips, releases, and tech news
| Product | Versions Compatible and additional computed target framework versions. |
|---|---|
| .NET | net8.0 is compatible. net8.0-android was computed. net8.0-browser was computed. net8.0-ios was computed. net8.0-maccatalyst was computed. net8.0-macos was computed. net8.0-tvos was computed. net8.0-windows was computed. net9.0 was computed. net9.0-android was computed. net9.0-browser was computed. net9.0-ios was computed. net9.0-maccatalyst was computed. net9.0-macos was computed. net9.0-tvos was computed. net9.0-windows was computed. net10.0 was computed. net10.0-android was computed. net10.0-browser was computed. net10.0-ios was computed. net10.0-maccatalyst was computed. net10.0-macos was computed. net10.0-tvos was computed. net10.0-windows was computed. |
-
net8.0
- ElBruno.HuggingFace.Downloader (>= 0.5.0)
- Microsoft.Extensions.DependencyInjection.Abstractions (>= 9.0.15)
- Microsoft.ML.OnnxRuntime (>= 1.20.1)
NuGet packages (1)
Showing the top 1 NuGet packages that depend on ElBruno.VibeVoiceTTS:
| Package | Downloads |
|---|---|
|
ElBruno.VibeVoiceTTS.Realtime
Bridge between ElBruno.VibeVoiceTTS and ElBruno.Realtime โ provides ITextToSpeechClient adapter and DI extensions for VibeVoiceTTS integration with the real-time conversation pipeline. |
GitHub repositories
This package is not used by any popular GitHub repositories.
| Version | Downloads | Last Updated |
|---|---|---|
| 0.5.0 | 116 | 4/30/2026 |
| 0.2.1-preview | 88 | 4/30/2026 |
| 0.2.0 | 123 | 4/10/2026 |
| 0.1.9 | 154 | 2/28/2026 |
| 0.1.8 | 178 | 2/27/2026 |
| 0.1.7-preview | 121 | 2/23/2026 |
| 0.1.6-preview | 115 | 2/22/2026 |
| 0.1.5-preview | 111 | 2/22/2026 |
| 0.1.4-preview | 111 | 2/22/2026 |
| 0.1.2-preview | 104 | 2/22/2026 |
| 0.1.1-preview | 112 | 2/22/2026 |
| 0.1.0-preview | 120 | 2/22/2026 |
| 0.0.1-preview | 112 | 2/22/2026 |