HartsyInference.Diffusion 1.0.0-alpha.9

This is a prerelease version of HartsyInference.Diffusion.

There is a newer prerelease version of this package available.
See the version list below for details.

dotnet add package HartsyInference.Diffusion --version 1.0.0-alpha.9

NuGet\Install-Package HartsyInference.Diffusion -Version 1.0.0-alpha.9

This command is intended to be used within the Package Manager Console in Visual Studio, as it uses the NuGet module's version of Install-Package.

<PackageReference Include="HartsyInference.Diffusion" Version="1.0.0-alpha.9" />

For projects that support PackageReference, copy this XML node into the project file to reference the package.

<PackageVersion Include="HartsyInference.Diffusion" Version="1.0.0-alpha.9" />
                    

                            Directory.Packages.props

<PackageReference Include="HartsyInference.Diffusion" />
                    

                            Project file

For projects that support Central Package Management (CPM), copy this XML node into the solution Directory.Packages.props file to version the package.

paket add HartsyInference.Diffusion --version 1.0.0-alpha.9

The NuGet Team does not provide support for this client. Please contact its maintainers for support.

#r "nuget: HartsyInference.Diffusion, 1.0.0-alpha.9"

#r directive can be used in F# Interactive and Polyglot Notebooks. Copy this into the interactive tool or source code of the script to reference the package.

#:package HartsyInference.Diffusion@1.0.0-alpha.9

#:package directive can be used in C# file-based apps starting in .NET 10 preview 4. Copy this into a .cs file before any lines of code to reference the package.

#addin nuget:?package=HartsyInference.Diffusion&version=1.0.0-alpha.9&prerelease
                    

                            Install as a Cake Addin

#tool nuget:?package=HartsyInference.Diffusion&version=1.0.0-alpha.9&prerelease
                    

                            Install as a Cake Tool

The NuGet Team does not provide support for this client. Please contact its maintainers for support.

HartsyInference

A pure C#/.NET AI inference engine for image generation, speech, vision, video, and interactive world models. No Python, no native runtime DLLs.

HartsyInference loads .safetensors and .gguf checkpoints directly and runs them on NVIDIA CUDA, cross-vendor Vulkan, or CPU SIMD, entirely in managed .NET. GPU kernels are PTX/SPIR-V shipped with the package and JIT-compiled at runtime; there are no C++ wrappers, no bundled native inference library, and no external Python process to manage. Just NuGet packages.

It is the non-LLM companion to dotLLM: together they form a complete AI stack in pure .NET.

⚠️ Alpha software

This is 1.0.0-alpha, an early, fast-moving preview. Use it to experiment, not in production.

APIs will change without notice between alpha releases. Pin an exact version.
Model coverage is broad but maturity varies. Many architectures are implemented and load/run end-to-end but are still being validated numerically against their reference implementations. Treat output quality per-model as "verify before you rely on it."
No support guarantees, no semver stability until 1.0.0.
The OpenAI-compatible server and CLI are not published as packages in this alpha; they live in the source repository. Publishing them is on the roadmap.

Found a bug or a mismatch against a reference? Please open an issue.

Install

One package pulls in the whole stack (all backends + every modality):

dotnet add package HartsyInference --prerelease

Or reference only the pieces you need (see Packages):

dotnet add package HartsyInference.Audio --prerelease
dotnet add package HartsyInference.Cpu   --prerelease

Requires .NET 8 or .NET 10.

Quick start: speech-to-text

The Whisper pipeline downloads a checkpoint from HuggingFace on first use and runs on whichever backend you pass:

using HartsyInference.Audio.Pipelines;
using HartsyInference.Core.Backends;
using HartsyInference.Cpu;          // or HartsyInference.Cuda / HartsyInference.Vulkan

using WhisperPipeline pipeline = await WhisperPipeline.LoadAsync("openai/whisper-base");
using IBackend backend = new CpuBackend();

WhisperOptions options = new() { Language = "en", WithTimestamps = false };
string text = pipeline.TranscribeWav(backend, "audio.wav", options);

Console.WriteLine(text);

Swap new CpuBackend() for new CudaBackend() or new VulkanBackend(); the pipeline is backend-agnostic. Audio pipelines that auto-download (WhisperPipeline, KokoroPipeline) share this LoadAsync convention, while image and video pipelines (StableDiffusion15Pipeline, WanVideoPipeline) are constructed from pre-loaded components. See the samples in the repo for image, video, and TTS walkthroughs.

What it can do

Modality	Highlights
Image generation	SD 1.5, SDXL, SD3, Flux.1 / Flux.2, AuraFlow, Chroma, HiDream, Qwen-Image, Lumina 2, OmniGen2, HunyuanImage, Ideogram, Kandinsky 5, and more, with LoRA, img2img, and tiling
Video generation	LTX-Video, Wan 2.x, Lance, Kandinsky 5 video
Interactive / world models	Matrix-Game 2 & 3, Oasis: action-conditioned, frame-by-frame generation
Speech-to-text	Whisper (tiny → large-v3), Moonshine, with streaming and timestamps
Text-to-speech & voice	Kokoro, F5-TTS, StyleTTS2, Bark, CosyVoice, Spark-TTS, VibeVoice, CSM
Music	ACE-Step, MusicGen, YuE
Vision	CLIP & SigLIP embeddings, YOLO detection, SAM segmentation, face detection
3D generation	Hunyuan3D-2 (flow-match DiT + ShapeVAE) & TripoSR (feed-forward triplane/NeRF) image to mesh, via marching cubes to glTF/OBJ/PLY

Checkpoints load directly from .safetensors / .gguf, including quantized weights (GGUF, MXFP4/8, NVFP4, block-scaled).

Coverage is wide because the engine shares a common core (tensors, schedulers, VAEs, text encoders, DSP) across architectures. Per-model numerical validation is ongoing; see the alpha note above.

Coming soon

HartsyInference is moving fast, and the roadmap is broad. On deck:

Area	Planned
Image	ControlNet, IP-Adapter, LCM/Turbo distillation across more architectures, regional prompting
Vision	Grounding DINO, YOLO-World, OWLv2, Florence-2, RT-DETR, depth & pose estimation, OCR, tracking
Video	HunyuanVideo, CogVideoX, longer-context temporal generation
3D	Gaussian-splat output, texture synthesis, multi-view to mesh
World models	Broader action spaces, longer memory horizons, multiplayer state
Serving	OpenAI-compatible REST server and CLI, published as NuGet packages
Tooling	Wider quantized inference (MXFP4 / MXFP8 / NVFP4), model hot-swap, per-modality CLI subcommands

Track progress and releases on the GitHub repo.

Design pillars

Pillar	What it means
Pure C#	GPU access via PTX (CUDA Driver API) and SPIR-V (Vulkan), with no native shared inference libraries
Eager execution	Ops run immediately; no computation graph to compile
Zero-allocation hot paths	Tensor storage in `NativeMemory.AlignedAlloc`; weights memory-mapped; `Span<T>` throughout
Modular packages	Pull in only the modality and backend you need

Packages

Package	Description
`HartsyInference`	Meta-package that references everything below
`HartsyInference.Core`	Tensor types, `IBackend`, schedulers, pipeline base types
`HartsyInference.ModelHandler`	Safetensors/GGUF loaders, quant dequant, HuggingFace download, model registry
`HartsyInference.Tokenizers`	CLIP, T5, Whisper, and LLM-style tokenizers
`HartsyInference.Cpu`	CPU backend with AVX2 / AVX-512 / NEON SIMD kernels
`HartsyInference.Cuda`	CUDA backend with PTX kernels + cuBLAS
`HartsyInference.Vulkan`	Cross-vendor Vulkan backend (NVIDIA / AMD / Intel) via SPIR-V
`HartsyInference.Diffusion`	Image + music diffusion pipelines, VAEs, text encoders, LoRA
`HartsyInference.Audio`	Whisper/Moonshine STT, TTS, voice conversion, music
`HartsyInference.Vision`	CLIP/SigLIP embeddings, YOLO, SAM, face detection
`HartsyInference.Video`	LTX-Video, Wan, Lance, Kandinsky 5 video
`HartsyInference.Interactive`	Action-conditioned world models (Matrix-Game, Oasis)
`HartsyInference.ThreeD`	3D asset generation: mesh/splat foundation (marching cubes, glTF/OBJ/PLY) + Hunyuan3D-2 image to mesh

Requirements

.NET 8 or .NET 10 SDK

CUDA backend (NVIDIA, fastest)

CUDA 12.x runtime
NVIDIA GPU, compute capability 8.0+ (RTX 30xx/40xx, A100, H100)

Vulkan backend (NVIDIA / AMD / Intel, cross-vendor)

Vulkan 1.3+ runtime (ships with the GPU vendor driver)
GPU with FP16 compute (shaderFloat16), most discrete GPUs from 2019+

CPU backend

Any x86-64 (AVX2+) or ARM64 (NEON) machine, no GPU required

License

Product	Compatible and additional computed target framework versions.
.NET	net8.0 is compatible. net8.0-android was computed. net8.0-browser was computed. net8.0-ios was computed. net8.0-maccatalyst was computed. net8.0-macos was computed. net8.0-tvos was computed. net8.0-windows was computed. net9.0 was computed. net9.0-android was computed. net9.0-browser was computed. net9.0-ios was computed. net9.0-maccatalyst was computed. net9.0-macos was computed. net9.0-tvos was computed. net9.0-windows was computed. net10.0 is compatible. net10.0-android was computed. net10.0-browser was computed. net10.0-ios was computed. net10.0-maccatalyst was computed. net10.0-macos was computed. net10.0-tvos was computed. net10.0-windows was computed.

Product

.NET

Compatible target framework(s)

Included target framework(s) (in package)

Learn more about Target Frameworks and .NET Standard.

net10.0
- HartsyInference.Core (>= 1.0.0-alpha.9)
- HartsyInference.ModelHandler (>= 1.0.0-alpha.9)
- HartsyInference.Tokenizers (>= 1.0.0-alpha.9)
net8.0
- HartsyInference.Core (>= 1.0.0-alpha.9)
- HartsyInference.ModelHandler (>= 1.0.0-alpha.9)
- HartsyInference.Tokenizers (>= 1.0.0-alpha.9)

NuGet packages (4)

Showing the top 4 NuGet packages that depend on HartsyInference.Diffusion:

Package	Downloads
HartsyInference.Video Video generation for HartsyInference — LTX-Video, Wan, temporal attention, and video VAE. (Phase 3 — stub)	133
HartsyInference.Vision Vision inference for HartsyInference — CLIP embeddings, YOLO detection (planned), SAM segmentation (planned), and face detection (planned).	125
HartsyInference Meta-package for HartsyInference — a pure C#/.NET AI inference engine for non-LLM modalities (diffusion image generation, speech-to-text, text-to-speech, vision, video, interactive world models). Adds all backends (CPU, CUDA, Vulkan) and modality packages in one reference.	121
HartsyInference.ThreeD 3D asset generation for HartsyInference — image/text → mesh and Gaussian-splat. Representation-agnostic foundation (marching cubes, glTF/OBJ/PLY export, 3D sampling) plus model pipelines (Hunyuan3D-2 image→mesh).	75

GitHub repositories

This package is not used by any popular GitHub repositories.

Version	Downloads	Last Updated
1.0.0-alpha.11	0	6/19/2026
1.0.0-alpha.10	0	6/19/2026
1.0.0-alpha.9	28	6/18/2026
1.0.0-alpha.8	53	6/17/2026
1.0.0-alpha.3	66	6/14/2026