ElBruno.VibeVoiceTTS 0.5.0

.NET 8.0

dotnet add package ElBruno.VibeVoiceTTS --version 0.5.0

NuGet\Install-Package ElBruno.VibeVoiceTTS -Version 0.5.0

This command is intended to be used within the Package Manager Console in Visual Studio, as it uses the NuGet module's version of Install-Package.

<PackageReference Include="ElBruno.VibeVoiceTTS" Version="0.5.0" />

For projects that support PackageReference, copy this XML node into the project file to reference the package.

<PackageVersion Include="ElBruno.VibeVoiceTTS" Version="0.5.0" />
                    

                            Directory.Packages.props

<PackageReference Include="ElBruno.VibeVoiceTTS" />
                    

                            Project file

For projects that support Central Package Management (CPM), copy this XML node into the solution Directory.Packages.props file to version the package.

paket add ElBruno.VibeVoiceTTS --version 0.5.0

The NuGet Team does not provide support for this client. Please contact its maintainers for support.

#r "nuget: ElBruno.VibeVoiceTTS, 0.5.0"

#r directive can be used in F# Interactive and Polyglot Notebooks. Copy this into the interactive tool or source code of the script to reference the package.

#:package ElBruno.VibeVoiceTTS@0.5.0

#:package directive can be used in C# file-based apps starting in .NET 10 preview 4. Copy this into a .cs file before any lines of code to reference the package.

#addin nuget:?package=ElBruno.VibeVoiceTTS&version=0.5.0
                    

                            Install as a Cake Addin

#tool nuget:?package=ElBruno.VibeVoiceTTS&version=0.5.0
                    

                            Install as a Cake Tool

The NuGet Team does not provide support for this client. Please contact its maintainers for support.

🎙️ VibeVoiceTTS

A .NET library for text-to-speech synthesis using Microsoft's VibeVoice-Realtime-0.5B — native C# inference via ONNX Runtime, no Python required at runtime.

Features

🔊 Natural Text-to-Speech — High-quality speech synthesis powered by VibeVoice-Realtime-0.5B
📦 NuGet Package — ElBruno.VibeVoiceTTS — install and start generating speech in minutes
🤖 Pure C# Inference — ONNX Runtime, zero Python dependency at runtime
🚀 GPU Acceleration — DirectML (any Windows GPU) and CUDA (NVIDIA) support with automatic CPU fallback
📥 Auto-Download — Models automatically downloaded from 🤗 HuggingFace on first use
🌍 6 Voice Presets — Carter, Davis, Emma, Frank, Grace, Mike (English voices with multilingual experimental support)
💉 Dependency Injection — First-class IServiceCollection integration
🖥️ Cross-Platform — Windows, Linux, macOS, MAUI-ready

Installation

dotnet add package ElBruno.VibeVoiceTTS

Quick Start

1) Generate speech and save to WAV

using ElBruno.VibeVoiceTTS;

using var tts = new VibeVoiceSynthesizer();
await tts.EnsureModelAvailableAsync(); // auto-downloads ~1.5 GB on first run

float[] audio = await tts.GenerateAudioAsync("Hello! Welcome to VibeVoiceTTS.", "Carter");
tts.SaveWav("output.wav", audio);

2) Use voice presets

// Use the enum (recommended)
float[] carter = await tts.GenerateAudioAsync("Hello from Carter!", VibeVoicePreset.Carter);
float[] emma = await tts.GenerateAudioAsync("Hello from Emma!", VibeVoicePreset.Emma);

// Or use a string name — both short and internal names work
float[] audio = await tts.GenerateAudioAsync("Hello!", "Carter");
float[] audio2 = await tts.GenerateAudioAsync("Hello!", "en-Carter_man"); // also works

3) Discover available voices

// Voices currently downloaded on disk
string[] available = tts.GetAvailableVoices();
// → ["Carter", "Emma"]  (default download includes Carter and Emma)

// All supported voices (including those not yet downloaded)
string[] supported = tts.GetSupportedVoices();
// → ["Carter", "Davis", "Emma", "Frank", "Grace", "Mike"]

// Detailed metadata for all supported voices
VoiceInfo[] details = tts.GetSupportedVoiceDetails();
foreach (var voice in details)
    Console.WriteLine($"{voice.Name} ({voice.Gender}, {voice.Language})");

💡 On-demand voice download: Only Carter and Emma are downloaded by default with EnsureModelAvailableAsync(). Other voices (Davis, Frank, Grace, Mike) are automatically downloaded on first use when you call GenerateAudioAsync(). You can also pre-download a specific voice:
await tts.EnsureVoiceAvailableAsync("Davis", progress);

4) Track download progress

var progress = new Progress<DownloadProgress>(p =>
{
    if (p.Stage == DownloadStage.Downloading)
        Console.Write($"\r⬇️ [{p.CurrentFile}] {p.PercentComplete:F0}%");
    else
        Console.WriteLine($"{p.Stage}: {p.Message}");
});

await tts.EnsureModelAvailableAsync(progress);

5) Configure options

var options = new VibeVoiceOptions
{
    ModelPath = @"D:\models\vibevoice",  // Custom model location (default: OS cache)
    DiffusionSteps = 20,                 // Quality vs speed tradeoff
    CfgScale = 1.5f,                     // Classifier-free guidance scale
    SampleRate = 24000,                  // Output sample rate
};

using var tts = new VibeVoiceSynthesizer(options);

Option	Default	Description
`ModelPath`	OS cache*	Directory where ONNX models are stored and downloaded
`HuggingFaceRepo`	`elbruno/VibeVoice-Realtime-0.5B-ONNX`	HuggingFace repo for model downloads
`DiffusionSteps`	`20`	Number of diffusion denoising steps
`CfgScale`	`1.5`	Classifier-free guidance scale
`SampleRate`	`24000`	Output audio sample rate (Hz)
`Seed`	`42`	Random seed for reproducible output
`ExecutionProvider`	`Cpu`	ONNX Runtime execution provider (`Cpu`, `DirectML`, `Cuda`)
`GpuDeviceId`	`0`	GPU device index (used with DirectML or CUDA)

*Default model cache: Windows: %LOCALAPPDATA%\ElBruno\VibeVoice\models · Linux/macOS: ~/.local/share/elbruno/vibevoice/models

6) GPU Acceleration

Enable GPU acceleration by setting the execution provider and installing the corresponding NuGet package:

# For DirectML (any Windows GPU — NVIDIA, AMD, Intel):
dotnet add package Microsoft.ML.OnnxRuntime.DirectML

# For CUDA (NVIDIA only — Windows and Linux):
dotnet add package Microsoft.ML.OnnxRuntime.Gpu

// DirectML — recommended for Windows desktop apps
var options = new VibeVoiceOptions
{
    ExecutionProvider = ExecutionProvider.DirectML,
    GpuDeviceId = 0   // optional, selects which GPU
};
using var tts = new VibeVoiceSynthesizer(options);

// CUDA — for NVIDIA GPUs with CUDA drivers
var options = new VibeVoiceOptions
{
    ExecutionProvider = ExecutionProvider.Cuda,
    GpuDeviceId = 0
};
using var tts = new VibeVoiceSynthesizer(options);

💡 Note: If the selected GPU provider is unavailable (missing NuGet package or no compatible GPU), the library automatically falls back to CPU inference. When using DirectML, models with dynamic tensor shapes (LM models, acoustic decoder) run on CPU while fixed-shape models (prediction head, connector, EOS classifier) use GPU — this works around known DirectML limitations with dynamic Reshape and ConvTranspose operations.

7) Dependency Injection

builder.Services.AddVibeVoice(options =>
{
    options.DiffusionSteps = 20;
});

// Then inject IVibeVoiceSynthesizer in your services

💡 Tip: For best results, keep sentences short (~10 words). Longer text may produce artifacts due to model limitations. Consider splitting long text into sentences.

🗣️ Voices & Languages

Voice	Gender	Preset Enum	Internal Name
Carter	Male	`VibeVoicePreset.Carter`	`en-Carter_man`
Davis	Male	`VibeVoicePreset.Davis`	`en-Davis_man`
Emma	Female	`VibeVoicePreset.Emma`	`en-Emma_woman`
Frank	Male	`VibeVoicePreset.Frank`	`en-Frank_man`
Grace	Female	`VibeVoicePreset.Grace`	`en-Grace_woman`
Mike	Male	`VibeVoicePreset.Mike`	`en-Mike_man`

All 6 voice presets are available on HuggingFace and are downloaded on-demand when first used.

⚡ Migration note: In versions prior to 0.2.0, GetAvailableVoices() returned all 6 voices regardless of download status. Starting with 0.2.0, it returns only voices actually downloaded on disk. Use GetSupportedVoices() to see all 6 known presets. Voices are auto-downloaded on first use with GenerateAudioAsync(), or pre-download with EnsureVoiceAvailableAsync("Davis").

Language support: The model is primarily trained on English, with experimental multilingual capabilities (e.g., Spanish, French, German). Results may vary for non-English text.

📖 For full details on the model, supported languages, and voice characteristics, see the official VibeVoice documentation on HuggingFace and the VibeVoice GitHub repository.

For the complete API reference and advanced usage, see the Getting Started Guide.

Scenarios

This repository includes example projects demonstrating different ways to use VibeVoice:

#	Status	Scenario	Stack	Level	Description
1	✅	Simple Python Script	Python	Beginner	Minimal TTS demo — useful for model export and testing
2	✅	Full-Stack App	Python + Blazor + Aspire	Intermediate	Web app with FastAPI backend and Blazor frontend
3	✅	C# Console App	C# (.NET 8)	Beginner	Recommended starting point — pure C# with `ElBruno.VibeVoiceTTS`
4	✅	Full C# with Aspire	C# + Blazor + Aspire	Intermediate	Full-stack C# app with WebAPI + Blazor frontend
5	✅	Batch Processing	Python	Intermediate	CLI to convert folders of .txt to .wav
6	✅	Real-Time Streaming	Python	Intermediate	Chunked audio playback for low-latency
7	✅	MAUI Mobile	C# (.NET 10 MAUI)	Advanced	Cross-platform app with in-process ONNX TTS via `ElBruno.VibeVoiceTTS` NuGet package
8	✅	ONNX Export	Python → C#	Advanced	ONNX model export tools and pipeline docs

Note: Python scenarios (1, 2, 5, 6) are primarily for ONNX model export, testing, and reference. The C# scenarios (3, 4) run entirely in .NET with no Python dependency. See the Scenarios Guide for details.

ONNX Models on HuggingFace

Pre-exported ONNX models are available on HuggingFace — the C# library downloads them automatically:

🤗 elbruno/VibeVoice-Realtime-0.5B-ONNX

The model includes 9 ONNX files (autoregressive pipeline with KV-cache) and 6 voice presets. See Scenario 8 for export details.

Documentation

Topic	Description
Getting Started	Prerequisites, setup, and first steps
Scenarios Guide	Detailed descriptions of all 8 scenarios
Architecture	System design, ONNX pipeline, and data flow
Project Structure	Repository layout and file organization
API Reference	REST API documentation (for web scenarios)
User Manual	End-user guide for web interfaces
Publishing	NuGet publishing with GitHub Actions

Tech Stack

Layer	Technology	Purpose
C# TTS Library	ElBruno.VibeVoiceTTS	Reusable .NET library with HuggingFace auto-download
TTS Model	VibeVoice-Realtime-0.5B	Microsoft's text-to-speech model
Inference	ONNX Runtime	Native C# model inference
Frontend	Blazor (.NET 10)	Interactive web UI
Orchestration	.NET Aspire	Service discovery & health checks

Building from Source

git clone https://github.com/elbruno/ElBruno.VibeVoiceTTS.git
cd ElBruno.VibeVoiceTTS
dotnet build src/ElBruno.VibeVoiceTTS/ElBruno.VibeVoiceTTS.csproj
dotnet test src/ElBruno.VibeVoiceTTS.Tests/ElBruno.VibeVoiceTTS.Tests.csproj

Requirements

.NET 8.0 SDK or later
ONNX Runtime compatible platform (Windows, Linux, macOS)
Python 3.11+ (only needed for ONNX model export — not for runtime use)

<a href="https://github.com/elbruno/ElBruno.PersonaPlex">ElBruno.PersonaPlex</a> — C# wrapper for NVIDIA's PersonaPlex-7B-v1 full-duplex speech-to-speech model, using ONNX Runtime for local inference. Pre-exported ONNX models: <a href="https://huggingface.co/elbruno/personaplex-7b-v1-onnx">elbruno/personaplex-7b-v1-onnx</a>

🤝 Contributing

Contributions are welcome! Please:

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

📄 License

This project is licensed under the MIT License — see the LICENSE file for details.

👋 About the Author

Hi! I'm ElBruno 🧡, a passionate developer and content creator exploring AI, .NET, and modern development practices.

Made with ❤️ by ElBruno

If you like this project, consider following my work across platforms:

📻 Podcast: No Tienen Nombre — Spanish-language episodes on AI, development, and tech culture
💻 Blog: ElBruno.com — Deep dives on embeddings, RAG, .NET, and local AI
📺 YouTube: youtube.com/elbruno — Demos, tutorials, and live coding
🔗 LinkedIn: @elbruno — Professional updates and insights
𝕏 Twitter: @elbruno — Quick tips, releases, and tech news

Product	Compatible and additional computed target framework versions.
.NET	net8.0 is compatible. net8.0-android was computed. net8.0-browser was computed. net8.0-ios was computed. net8.0-maccatalyst was computed. net8.0-macos was computed. net8.0-tvos was computed. net8.0-windows was computed. net9.0 was computed. net9.0-android was computed. net9.0-browser was computed. net9.0-ios was computed. net9.0-maccatalyst was computed. net9.0-macos was computed. net9.0-tvos was computed. net9.0-windows was computed. net10.0 was computed. net10.0-android was computed. net10.0-browser was computed. net10.0-ios was computed. net10.0-maccatalyst was computed. net10.0-macos was computed. net10.0-tvos was computed. net10.0-windows was computed.

Product

.NET

Compatible target framework(s)

Included target framework(s) (in package)

Learn more about Target Frameworks and .NET Standard.

net8.0
- ElBruno.HuggingFace.Downloader (>= 0.5.0)
- Microsoft.Extensions.DependencyInjection.Abstractions (>= 9.0.15)
- Microsoft.ML.OnnxRuntime (>= 1.20.1)

NuGet packages (1)

Showing the top 1 NuGet packages that depend on ElBruno.VibeVoiceTTS:

Package	Downloads
ElBruno.VibeVoiceTTS.Realtime Bridge between ElBruno.VibeVoiceTTS and ElBruno.Realtime — provides ITextToSpeechClient adapter and DI extensions for VibeVoiceTTS integration with the real-time conversation pipeline.	92

GitHub repositories

This package is not used by any popular GitHub repositories.

Version	Downloads	Last Updated
0.5.0	116	4/30/2026
0.2.1-preview	88	4/30/2026
0.2.0	123	4/10/2026
0.1.9	154	2/28/2026
0.1.8	178	2/27/2026
0.1.7-preview	121	2/23/2026
0.1.6-preview	115	2/22/2026
0.1.5-preview	111	2/22/2026
0.1.4-preview	111	2/22/2026
0.1.2-preview	104	2/22/2026
0.1.1-preview	112	2/22/2026
0.1.0-preview	120	2/22/2026
0.0.1-preview	112	2/22/2026