SharpInference.Audio 1.0.0-alpha.2

This is a prerelease version of SharpInference.Audio.
dotnet add package SharpInference.Audio --version 1.0.0-alpha.2
                    
NuGet\Install-Package SharpInference.Audio -Version 1.0.0-alpha.2
                    
This command is intended to be used within the Package Manager Console in Visual Studio, as it uses the NuGet module's version of Install-Package.
<PackageReference Include="SharpInference.Audio" Version="1.0.0-alpha.2" />
                    
For projects that support PackageReference, copy this XML node into the project file to reference the package.
<PackageVersion Include="SharpInference.Audio" Version="1.0.0-alpha.2" />
                    
Directory.Packages.props
<PackageReference Include="SharpInference.Audio" />
                    
Project file
For projects that support Central Package Management (CPM), copy this XML node into the solution Directory.Packages.props file to version the package.
paket add SharpInference.Audio --version 1.0.0-alpha.2
                    
#r "nuget: SharpInference.Audio, 1.0.0-alpha.2"
                    
#r directive can be used in F# Interactive and Polyglot Notebooks. Copy this into the interactive tool or source code of the script to reference the package.
#:package SharpInference.Audio@1.0.0-alpha.2
                    
#:package directive can be used in C# file-based apps starting in .NET 10 preview 4. Copy this into a .cs file before any lines of code to reference the package.
#addin nuget:?package=SharpInference.Audio&version=1.0.0-alpha.2&prerelease
                    
Install as a Cake Addin
#tool nuget:?package=SharpInference.Audio&version=1.0.0-alpha.2&prerelease
                    
Install as a Cake Tool

SharpInference

A pure C#/.NET AI inference engine for image generation, speech, vision, video, and interactive world models — no Python, no native runtime DLLs.

SharpInference loads .safetensors and .gguf checkpoints directly and runs them on NVIDIA CUDA, cross-vendor Vulkan, or CPU SIMD — entirely in managed .NET. GPU kernels are PTX/SPIR-V shipped with the package and JIT-compiled at runtime; there are no C++ wrappers, no bundled native inference library, and no external Python process to manage. Just NuGet packages.

It is the non-LLM companion to dotLLM: together they form a complete AI stack in pure .NET.


⚠️ Alpha software

This is 1.0.0-alpha — an early, fast-moving preview. Use it to experiment, not in production.

  • APIs will change without notice between alpha releases. Pin an exact version.
  • Model coverage is broad but maturity varies. Many architectures are implemented and load/run end-to-end but are still being validated numerically against their reference implementations. Treat output quality per-model as "verify before you rely on it."
  • No support guarantees, no semver stability until 1.0.0.
  • The OpenAI-compatible server and CLI are not published as packages in this alpha — they live in the source repository.

Found a bug or a mismatch against a reference? Please open an issue.


Install

One package pulls in the whole stack (all backends + every modality):

dotnet add package SharpInference --prerelease

Or reference only the pieces you need (see Packages):

dotnet add package SharpInference.Audio --prerelease
dotnet add package SharpInference.Cpu   --prerelease

Requires .NET 8 or .NET 10.


Quick start — speech-to-text

The Whisper pipeline downloads a checkpoint from HuggingFace on first use and runs on whichever backend you pass:

using SharpInference.Audio.Pipelines;
using SharpInference.Core.Backends;
using SharpInference.Cpu;          // or SharpInference.Cuda / SharpInference.Vulkan

using WhisperPipeline pipeline = await WhisperPipeline.LoadAsync("openai/whisper-base");
using IBackend backend = new CpuBackend();

var options = new WhisperOptions { Language = "en", WithTimestamps = false };
string text = pipeline.TranscribeWav(backend, "audio.wav", options);

Console.WriteLine(text);

Swap new CpuBackend() for new CudaBackend() or new VulkanBackend() — the pipeline is backend-agnostic. The same LoadAsync / pipeline pattern applies across modalities (StableDiffusion15Pipeline, WanVideoPipeline, KokoroPipeline, …); see the samples in the repo for image, video, and TTS walkthroughs.


What it can do

Modality Highlights
Image generation SD 1.5, SDXL, SD3, Flux.1 / Flux.2, AuraFlow, Chroma, HiDream, Qwen-Image, Lumina 2, OmniGen2, HunyuanImage, Ideogram, Kandinsky 5, and more — with LoRA, img2img, and tiling
Video generation LTX-Video, Wan 2.x, Lance, Kandinsky 5 video
Interactive / world models Matrix-Game 2 & 3, Oasis — action-conditioned, frame-by-frame generation
Speech-to-text Whisper (tiny → large-v3), Moonshine — with streaming and timestamps
Text-to-speech & voice Kokoro, F5-TTS, StyleTTS2, Bark, CosyVoice, Spark-TTS, VibeVoice, CSM
Music ACE-Step, MusicGen, YuE
Vision CLIP & SigLIP embeddings, YOLO detection, SAM segmentation, face detection

Checkpoints load directly from .safetensors / .gguf, including quantized weights (GGUF, MXFP4/8, NVFP4, block-scaled).

Coverage is wide because the engine shares a common core (tensors, schedulers, VAEs, text encoders, DSP) across architectures. Per-model numerical validation is ongoing — see the alpha note above.


Design pillars

Pillar What it means
Pure C# GPU access via PTX (CUDA Driver API) and SPIR-V (Vulkan) — no native shared inference libraries
Eager execution Ops run immediately; no computation graph to compile
Zero-allocation hot paths Tensor storage in NativeMemory.AlignedAlloc; weights memory-mapped; Span<T> throughout
Modular packages Pull in only the modality and backend you need

Packages

Package Description
SharpInference Meta-package — references everything below
SharpInference.Core Tensor types, IBackend, schedulers, pipeline base types
SharpInference.ModelHandler Safetensors/GGUF loaders, quant dequant, HuggingFace download, model registry
SharpInference.Tokenizers CLIP, T5, Whisper, and LLM-style tokenizers
SharpInference.Cpu CPU backend with AVX2 / AVX-512 / NEON SIMD kernels
SharpInference.Cuda CUDA backend — PTX kernels + cuBLAS
SharpInference.Vulkan Cross-vendor Vulkan backend (NVIDIA / AMD / Intel) via SPIR-V
SharpInference.Diffusion Image + music diffusion pipelines, VAEs, text encoders, LoRA
SharpInference.Audio Whisper/Moonshine STT, TTS, voice conversion, music
SharpInference.Vision CLIP/SigLIP embeddings, YOLO, SAM, face detection
SharpInference.Video LTX-Video, Wan, Lance, Kandinsky 5 video
SharpInference.Interactive Action-conditioned world models (Matrix-Game, Oasis)

Requirements

  • .NET 8 or .NET 10 SDK

CUDA backend (NVIDIA, fastest)

  • CUDA 12.x runtime
  • NVIDIA GPU, compute capability 8.0+ (RTX 30xx/40xx, A100, H100)

Vulkan backend (NVIDIA / AMD / Intel, cross-vendor)

  • Vulkan 1.3+ runtime (ships with the GPU vendor driver)
  • GPU with FP16 compute (shaderFloat16) — most discrete GPUs from 2019+

CPU backend

  • Any x86-64 (AVX2+) or ARM64 (NEON) machine — no GPU required


License

MIT © 2026 kalebbroo

Product Compatible and additional computed target framework versions.
.NET net8.0 is compatible.  net8.0-android was computed.  net8.0-browser was computed.  net8.0-ios was computed.  net8.0-maccatalyst was computed.  net8.0-macos was computed.  net8.0-tvos was computed.  net8.0-windows was computed.  net9.0 was computed.  net9.0-android was computed.  net9.0-browser was computed.  net9.0-ios was computed.  net9.0-maccatalyst was computed.  net9.0-macos was computed.  net9.0-tvos was computed.  net9.0-windows was computed.  net10.0 is compatible.  net10.0-android was computed.  net10.0-browser was computed.  net10.0-ios was computed.  net10.0-maccatalyst was computed.  net10.0-macos was computed.  net10.0-tvos was computed.  net10.0-windows was computed. 
Compatible target framework(s)
Included target framework(s) (in package)
Learn more about Target Frameworks and .NET Standard.

NuGet packages

This package is not used by any NuGet packages.

GitHub repositories

This package is not used by any popular GitHub repositories.

Version Downloads Last Updated
1.0.0-alpha.2 47 6/14/2026