ElBruno.VibeVoiceTTS 0.5.0

dotnet add package ElBruno.VibeVoiceTTS --version 0.5.0
                    
NuGet\Install-Package ElBruno.VibeVoiceTTS -Version 0.5.0
                    
This command is intended to be used within the Package Manager Console in Visual Studio, as it uses the NuGet module's version of Install-Package.
<PackageReference Include="ElBruno.VibeVoiceTTS" Version="0.5.0" />
                    
For projects that support PackageReference, copy this XML node into the project file to reference the package.
<PackageVersion Include="ElBruno.VibeVoiceTTS" Version="0.5.0" />
                    
Directory.Packages.props
<PackageReference Include="ElBruno.VibeVoiceTTS" />
                    
Project file
For projects that support Central Package Management (CPM), copy this XML node into the solution Directory.Packages.props file to version the package.
paket add ElBruno.VibeVoiceTTS --version 0.5.0
                    
#r "nuget: ElBruno.VibeVoiceTTS, 0.5.0"
                    
#r directive can be used in F# Interactive and Polyglot Notebooks. Copy this into the interactive tool or source code of the script to reference the package.
#:package ElBruno.VibeVoiceTTS@0.5.0
                    
#:package directive can be used in C# file-based apps starting in .NET 10 preview 4. Copy this into a .cs file before any lines of code to reference the package.
#addin nuget:?package=ElBruno.VibeVoiceTTS&version=0.5.0
                    
Install as a Cake Addin
#tool nuget:?package=ElBruno.VibeVoiceTTS&version=0.5.0
                    
Install as a Cake Tool

๐ŸŽ™๏ธ VibeVoiceTTS

NuGet NuGet Downloads Build Status License: MIT HuggingFace GitHub stars Twitter Follow

A .NET library for text-to-speech synthesis using Microsoft's VibeVoice-Realtime-0.5B โ€” native C# inference via ONNX Runtime, no Python required at runtime.

Features

  • ๐Ÿ”Š Natural Text-to-Speech โ€” High-quality speech synthesis powered by VibeVoice-Realtime-0.5B
  • ๐Ÿ“ฆ NuGet Package โ€” ElBruno.VibeVoiceTTS โ€” install and start generating speech in minutes
  • ๐Ÿค– Pure C# Inference โ€” ONNX Runtime, zero Python dependency at runtime
  • ๐Ÿš€ GPU Acceleration โ€” DirectML (any Windows GPU) and CUDA (NVIDIA) support with automatic CPU fallback
  • ๐Ÿ“ฅ Auto-Download โ€” Models automatically downloaded from ๐Ÿค— HuggingFace on first use
  • ๐ŸŒ 6 Voice Presets โ€” Carter, Davis, Emma, Frank, Grace, Mike (English voices with multilingual experimental support)
  • ๐Ÿ’‰ Dependency Injection โ€” First-class IServiceCollection integration
  • ๐Ÿ–ฅ๏ธ Cross-Platform โ€” Windows, Linux, macOS, MAUI-ready

Installation

dotnet add package ElBruno.VibeVoiceTTS

Quick Start

1) Generate speech and save to WAV

using ElBruno.VibeVoiceTTS;

using var tts = new VibeVoiceSynthesizer();
await tts.EnsureModelAvailableAsync(); // auto-downloads ~1.5 GB on first run

float[] audio = await tts.GenerateAudioAsync("Hello! Welcome to VibeVoiceTTS.", "Carter");
tts.SaveWav("output.wav", audio);

2) Use voice presets

// Use the enum (recommended)
float[] carter = await tts.GenerateAudioAsync("Hello from Carter!", VibeVoicePreset.Carter);
float[] emma = await tts.GenerateAudioAsync("Hello from Emma!", VibeVoicePreset.Emma);

// Or use a string name โ€” both short and internal names work
float[] audio = await tts.GenerateAudioAsync("Hello!", "Carter");
float[] audio2 = await tts.GenerateAudioAsync("Hello!", "en-Carter_man"); // also works

3) Discover available voices

// Voices currently downloaded on disk
string[] available = tts.GetAvailableVoices();
// โ†’ ["Carter", "Emma"]  (default download includes Carter and Emma)

// All supported voices (including those not yet downloaded)
string[] supported = tts.GetSupportedVoices();
// โ†’ ["Carter", "Davis", "Emma", "Frank", "Grace", "Mike"]

// Detailed metadata for all supported voices
VoiceInfo[] details = tts.GetSupportedVoiceDetails();
foreach (var voice in details)
    Console.WriteLine($"{voice.Name} ({voice.Gender}, {voice.Language})");

๐Ÿ’ก On-demand voice download: Only Carter and Emma are downloaded by default with EnsureModelAvailableAsync(). Other voices (Davis, Frank, Grace, Mike) are automatically downloaded on first use when you call GenerateAudioAsync(). You can also pre-download a specific voice:

await tts.EnsureVoiceAvailableAsync("Davis", progress);

4) Track download progress

var progress = new Progress<DownloadProgress>(p =>
{
    if (p.Stage == DownloadStage.Downloading)
        Console.Write($"\rโฌ‡๏ธ [{p.CurrentFile}] {p.PercentComplete:F0}%");
    else
        Console.WriteLine($"{p.Stage}: {p.Message}");
});

await tts.EnsureModelAvailableAsync(progress);

5) Configure options

var options = new VibeVoiceOptions
{
    ModelPath = @"D:\models\vibevoice",  // Custom model location (default: OS cache)
    DiffusionSteps = 20,                 // Quality vs speed tradeoff
    CfgScale = 1.5f,                     // Classifier-free guidance scale
    SampleRate = 24000,                  // Output sample rate
};

using var tts = new VibeVoiceSynthesizer(options);
Option Default Description
ModelPath OS cache* Directory where ONNX models are stored and downloaded
HuggingFaceRepo elbruno/VibeVoice-Realtime-0.5B-ONNX HuggingFace repo for model downloads
DiffusionSteps 20 Number of diffusion denoising steps
CfgScale 1.5 Classifier-free guidance scale
SampleRate 24000 Output audio sample rate (Hz)
Seed 42 Random seed for reproducible output
ExecutionProvider Cpu ONNX Runtime execution provider (Cpu, DirectML, Cuda)
GpuDeviceId 0 GPU device index (used with DirectML or CUDA)

*Default model cache: Windows: %LOCALAPPDATA%\ElBruno\VibeVoice\models ยท Linux/macOS: ~/.local/share/elbruno/vibevoice/models

6) GPU Acceleration

Enable GPU acceleration by setting the execution provider and installing the corresponding NuGet package:

# For DirectML (any Windows GPU โ€” NVIDIA, AMD, Intel):
dotnet add package Microsoft.ML.OnnxRuntime.DirectML

# For CUDA (NVIDIA only โ€” Windows and Linux):
dotnet add package Microsoft.ML.OnnxRuntime.Gpu
// DirectML โ€” recommended for Windows desktop apps
var options = new VibeVoiceOptions
{
    ExecutionProvider = ExecutionProvider.DirectML,
    GpuDeviceId = 0   // optional, selects which GPU
};
using var tts = new VibeVoiceSynthesizer(options);

// CUDA โ€” for NVIDIA GPUs with CUDA drivers
var options = new VibeVoiceOptions
{
    ExecutionProvider = ExecutionProvider.Cuda,
    GpuDeviceId = 0
};
using var tts = new VibeVoiceSynthesizer(options);

๐Ÿ’ก Note: If the selected GPU provider is unavailable (missing NuGet package or no compatible GPU), the library automatically falls back to CPU inference. When using DirectML, models with dynamic tensor shapes (LM models, acoustic decoder) run on CPU while fixed-shape models (prediction head, connector, EOS classifier) use GPU โ€” this works around known DirectML limitations with dynamic Reshape and ConvTranspose operations.

7) Dependency Injection

builder.Services.AddVibeVoice(options =>
{
    options.DiffusionSteps = 20;
});

// Then inject IVibeVoiceSynthesizer in your services

๐Ÿ’ก Tip: For best results, keep sentences short (~10 words). Longer text may produce artifacts due to model limitations. Consider splitting long text into sentences.

๐Ÿ—ฃ๏ธ Voices & Languages

Voice Gender Preset Enum Internal Name
Carter Male VibeVoicePreset.Carter en-Carter_man
Davis Male VibeVoicePreset.Davis en-Davis_man
Emma Female VibeVoicePreset.Emma en-Emma_woman
Frank Male VibeVoicePreset.Frank en-Frank_man
Grace Female VibeVoicePreset.Grace en-Grace_woman
Mike Male VibeVoicePreset.Mike en-Mike_man

All 6 voice presets are available on HuggingFace and are downloaded on-demand when first used.

โšก Migration note: In versions prior to 0.2.0, GetAvailableVoices() returned all 6 voices regardless of download status. Starting with 0.2.0, it returns only voices actually downloaded on disk. Use GetSupportedVoices() to see all 6 known presets. Voices are auto-downloaded on first use with GenerateAudioAsync(), or pre-download with EnsureVoiceAvailableAsync("Davis").

Language support: The model is primarily trained on English, with experimental multilingual capabilities (e.g., Spanish, French, German). Results may vary for non-English text.

๐Ÿ“– For full details on the model, supported languages, and voice characteristics, see the official VibeVoice documentation on HuggingFace and the VibeVoice GitHub repository.

For the complete API reference and advanced usage, see the Getting Started Guide.

Scenarios

This repository includes example projects demonstrating different ways to use VibeVoice:

# Status Scenario Stack Level Description
1 โœ… Simple Python Script Python Beginner Minimal TTS demo โ€” useful for model export and testing
2 โœ… Full-Stack App Python + Blazor + Aspire Intermediate Web app with FastAPI backend and Blazor frontend
3 โœ… C# Console App C# (.NET 8) Beginner Recommended starting point โ€” pure C# with ElBruno.VibeVoiceTTS
4 โœ… Full C# with Aspire C# + Blazor + Aspire Intermediate Full-stack C# app with WebAPI + Blazor frontend
5 โœ… Batch Processing Python Intermediate CLI to convert folders of .txt to .wav
6 โœ… Real-Time Streaming Python Intermediate Chunked audio playback for low-latency
7 โœ… MAUI Mobile C# (.NET 10 MAUI) Advanced Cross-platform app with in-process ONNX TTS via ElBruno.VibeVoiceTTS NuGet package
8 โœ… ONNX Export Python โ†’ C# Advanced ONNX model export tools and pipeline docs

Note: Python scenarios (1, 2, 5, 6) are primarily for ONNX model export, testing, and reference. The C# scenarios (3, 4) run entirely in .NET with no Python dependency. See the Scenarios Guide for details.

ONNX Models on HuggingFace

Pre-exported ONNX models are available on HuggingFace โ€” the C# library downloads them automatically:

๐Ÿค— elbruno/VibeVoice-Realtime-0.5B-ONNX

The model includes 9 ONNX files (autoregressive pipeline with KV-cache) and 6 voice presets. See Scenario 8 for export details.

Documentation

Topic Description
Getting Started Prerequisites, setup, and first steps
Scenarios Guide Detailed descriptions of all 8 scenarios
Architecture System design, ONNX pipeline, and data flow
Project Structure Repository layout and file organization
API Reference REST API documentation (for web scenarios)
User Manual End-user guide for web interfaces
Publishing NuGet publishing with GitHub Actions

Tech Stack

Layer Technology Purpose
C# TTS Library ElBruno.VibeVoiceTTS Reusable .NET library with HuggingFace auto-download
TTS Model VibeVoice-Realtime-0.5B Microsoft's text-to-speech model
Inference ONNX Runtime Native C# model inference
Frontend Blazor (.NET 10) Interactive web UI
Orchestration .NET Aspire Service discovery & health checks

Building from Source

git clone https://github.com/elbruno/ElBruno.VibeVoiceTTS.git
cd ElBruno.VibeVoiceTTS
dotnet build src/ElBruno.VibeVoiceTTS/ElBruno.VibeVoiceTTS.csproj
dotnet test src/ElBruno.VibeVoiceTTS.Tests/ElBruno.VibeVoiceTTS.Tests.csproj

Requirements

  • .NET 8.0 SDK or later
  • ONNX Runtime compatible platform (Windows, Linux, macOS)
  • Python 3.11+ (only needed for ONNX model export โ€” not for runtime use)
  • <a href="https://github.com/elbruno/ElBruno.PersonaPlex">ElBruno.PersonaPlex</a> โ€” C# wrapper for NVIDIA's PersonaPlex-7B-v1 full-duplex speech-to-speech model, using ONNX Runtime for local inference. Pre-exported ONNX models: <a href="https://huggingface.co/elbruno/personaplex-7b-v1-onnx">elbruno/personaplex-7b-v1-onnx</a>

๐Ÿค Contributing

Contributions are welcome! Please:

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

๐Ÿ“„ License

This project is licensed under the MIT License โ€” see the LICENSE file for details.

๐Ÿ‘‹ About the Author

Hi! I'm ElBruno ๐Ÿงก, a passionate developer and content creator exploring AI, .NET, and modern development practices.

Made with โค๏ธ by ElBruno

If you like this project, consider following my work across platforms:

  • ๐Ÿ“ป Podcast: No Tienen Nombre โ€” Spanish-language episodes on AI, development, and tech culture
  • ๐Ÿ’ป Blog: ElBruno.com โ€” Deep dives on embeddings, RAG, .NET, and local AI
  • ๐Ÿ“บ YouTube: youtube.com/elbruno โ€” Demos, tutorials, and live coding
  • ๐Ÿ”— LinkedIn: @elbruno โ€” Professional updates and insights
  • ๐• Twitter: @elbruno โ€” Quick tips, releases, and tech news
Product Compatible and additional computed target framework versions.
.NET net8.0 is compatible.  net8.0-android was computed.  net8.0-browser was computed.  net8.0-ios was computed.  net8.0-maccatalyst was computed.  net8.0-macos was computed.  net8.0-tvos was computed.  net8.0-windows was computed.  net9.0 was computed.  net9.0-android was computed.  net9.0-browser was computed.  net9.0-ios was computed.  net9.0-maccatalyst was computed.  net9.0-macos was computed.  net9.0-tvos was computed.  net9.0-windows was computed.  net10.0 was computed.  net10.0-android was computed.  net10.0-browser was computed.  net10.0-ios was computed.  net10.0-maccatalyst was computed.  net10.0-macos was computed.  net10.0-tvos was computed.  net10.0-windows was computed. 
Compatible target framework(s)
Included target framework(s) (in package)
Learn more about Target Frameworks and .NET Standard.

NuGet packages (1)

Showing the top 1 NuGet packages that depend on ElBruno.VibeVoiceTTS:

Package Downloads
ElBruno.VibeVoiceTTS.Realtime

Bridge between ElBruno.VibeVoiceTTS and ElBruno.Realtime โ€” provides ITextToSpeechClient adapter and DI extensions for VibeVoiceTTS integration with the real-time conversation pipeline.

GitHub repositories

This package is not used by any popular GitHub repositories.

Version Downloads Last Updated
0.5.0 116 4/30/2026
0.2.1-preview 88 4/30/2026
0.2.0 123 4/10/2026
0.1.9 154 2/28/2026
0.1.8 178 2/27/2026
0.1.7-preview 121 2/23/2026
0.1.6-preview 115 2/22/2026
0.1.5-preview 111 2/22/2026
0.1.4-preview 111 2/22/2026
0.1.2-preview 104 2/22/2026
0.1.1-preview 112 2/22/2026
0.1.0-preview 120 2/22/2026
0.0.1-preview 112 2/22/2026