HartsyInference.Diffusion
1.0.0-alpha.9
See the version list below for details.
dotnet add package HartsyInference.Diffusion --version 1.0.0-alpha.9
NuGet\Install-Package HartsyInference.Diffusion -Version 1.0.0-alpha.9
<PackageReference Include="HartsyInference.Diffusion" Version="1.0.0-alpha.9" />
<PackageVersion Include="HartsyInference.Diffusion" Version="1.0.0-alpha.9" />
<PackageReference Include="HartsyInference.Diffusion" />
paket add HartsyInference.Diffusion --version 1.0.0-alpha.9
#r "nuget: HartsyInference.Diffusion, 1.0.0-alpha.9"
#:package HartsyInference.Diffusion@1.0.0-alpha.9
#addin nuget:?package=HartsyInference.Diffusion&version=1.0.0-alpha.9&prerelease
#tool nuget:?package=HartsyInference.Diffusion&version=1.0.0-alpha.9&prerelease
HartsyInference
A pure C#/.NET AI inference engine for image generation, speech, vision, video, and interactive world models. No Python, no native runtime DLLs.
HartsyInference loads .safetensors and .gguf checkpoints directly and runs them on NVIDIA CUDA, cross-vendor Vulkan, or CPU SIMD, entirely in managed .NET. GPU kernels are PTX/SPIR-V shipped with the package and JIT-compiled at runtime; there are no C++ wrappers, no bundled native inference library, and no external Python process to manage. Just NuGet packages.
It is the non-LLM companion to dotLLM: together they form a complete AI stack in pure .NET.
⚠️ Alpha software
This is 1.0.0-alpha, an early, fast-moving preview. Use it to experiment, not in production.
- APIs will change without notice between alpha releases. Pin an exact version.
- Model coverage is broad but maturity varies. Many architectures are implemented and load/run end-to-end but are still being validated numerically against their reference implementations. Treat output quality per-model as "verify before you rely on it."
- No support guarantees, no semver stability until
1.0.0. - The OpenAI-compatible server and CLI are not published as packages in this alpha; they live in the source repository. Publishing them is on the roadmap.
Found a bug or a mismatch against a reference? Please open an issue.
Install
One package pulls in the whole stack (all backends + every modality):
dotnet add package HartsyInference --prerelease
Or reference only the pieces you need (see Packages):
dotnet add package HartsyInference.Audio --prerelease
dotnet add package HartsyInference.Cpu --prerelease
Requires .NET 8 or .NET 10.
Quick start: speech-to-text
The Whisper pipeline downloads a checkpoint from HuggingFace on first use and runs on whichever backend you pass:
using HartsyInference.Audio.Pipelines;
using HartsyInference.Core.Backends;
using HartsyInference.Cpu; // or HartsyInference.Cuda / HartsyInference.Vulkan
using WhisperPipeline pipeline = await WhisperPipeline.LoadAsync("openai/whisper-base");
using IBackend backend = new CpuBackend();
WhisperOptions options = new() { Language = "en", WithTimestamps = false };
string text = pipeline.TranscribeWav(backend, "audio.wav", options);
Console.WriteLine(text);
Swap new CpuBackend() for new CudaBackend() or new VulkanBackend(); the pipeline is backend-agnostic. Audio pipelines that auto-download (WhisperPipeline, KokoroPipeline) share this LoadAsync convention, while image and video pipelines (StableDiffusion15Pipeline, WanVideoPipeline) are constructed from pre-loaded components. See the samples in the repo for image, video, and TTS walkthroughs.
What it can do
| Modality | Highlights |
|---|---|
| Image generation | SD 1.5, SDXL, SD3, Flux.1 / Flux.2, AuraFlow, Chroma, HiDream, Qwen-Image, Lumina 2, OmniGen2, HunyuanImage, Ideogram, Kandinsky 5, and more, with LoRA, img2img, and tiling |
| Video generation | LTX-Video, Wan 2.x, Lance, Kandinsky 5 video |
| Interactive / world models | Matrix-Game 2 & 3, Oasis: action-conditioned, frame-by-frame generation |
| Speech-to-text | Whisper (tiny → large-v3), Moonshine, with streaming and timestamps |
| Text-to-speech & voice | Kokoro, F5-TTS, StyleTTS2, Bark, CosyVoice, Spark-TTS, VibeVoice, CSM |
| Music | ACE-Step, MusicGen, YuE |
| Vision | CLIP & SigLIP embeddings, YOLO detection, SAM segmentation, face detection |
| 3D generation | Hunyuan3D-2 (flow-match DiT + ShapeVAE) & TripoSR (feed-forward triplane/NeRF) image to mesh, via marching cubes to glTF/OBJ/PLY |
Checkpoints load directly from .safetensors / .gguf, including quantized weights (GGUF, MXFP4/8, NVFP4, block-scaled).
Coverage is wide because the engine shares a common core (tensors, schedulers, VAEs, text encoders, DSP) across architectures. Per-model numerical validation is ongoing; see the alpha note above.
Coming soon
HartsyInference is moving fast, and the roadmap is broad. On deck:
| Area | Planned |
|---|---|
| Image | ControlNet, IP-Adapter, LCM/Turbo distillation across more architectures, regional prompting |
| Vision | Grounding DINO, YOLO-World, OWLv2, Florence-2, RT-DETR, depth & pose estimation, OCR, tracking |
| Video | HunyuanVideo, CogVideoX, longer-context temporal generation |
| 3D | Gaussian-splat output, texture synthesis, multi-view to mesh |
| World models | Broader action spaces, longer memory horizons, multiplayer state |
| Serving | OpenAI-compatible REST server and CLI, published as NuGet packages |
| Tooling | Wider quantized inference (MXFP4 / MXFP8 / NVFP4), model hot-swap, per-modality CLI subcommands |
Track progress and releases on the GitHub repo.
Design pillars
| Pillar | What it means |
|---|---|
| Pure C# | GPU access via PTX (CUDA Driver API) and SPIR-V (Vulkan), with no native shared inference libraries |
| Eager execution | Ops run immediately; no computation graph to compile |
| Zero-allocation hot paths | Tensor storage in NativeMemory.AlignedAlloc; weights memory-mapped; Span<T> throughout |
| Modular packages | Pull in only the modality and backend you need |
Packages
| Package | Description |
|---|---|
HartsyInference |
Meta-package that references everything below |
HartsyInference.Core |
Tensor types, IBackend, schedulers, pipeline base types |
HartsyInference.ModelHandler |
Safetensors/GGUF loaders, quant dequant, HuggingFace download, model registry |
HartsyInference.Tokenizers |
CLIP, T5, Whisper, and LLM-style tokenizers |
HartsyInference.Cpu |
CPU backend with AVX2 / AVX-512 / NEON SIMD kernels |
HartsyInference.Cuda |
CUDA backend with PTX kernels + cuBLAS |
HartsyInference.Vulkan |
Cross-vendor Vulkan backend (NVIDIA / AMD / Intel) via SPIR-V |
HartsyInference.Diffusion |
Image + music diffusion pipelines, VAEs, text encoders, LoRA |
HartsyInference.Audio |
Whisper/Moonshine STT, TTS, voice conversion, music |
HartsyInference.Vision |
CLIP/SigLIP embeddings, YOLO, SAM, face detection |
HartsyInference.Video |
LTX-Video, Wan, Lance, Kandinsky 5 video |
HartsyInference.Interactive |
Action-conditioned world models (Matrix-Game, Oasis) |
HartsyInference.ThreeD |
3D asset generation: mesh/splat foundation (marching cubes, glTF/OBJ/PLY) + Hunyuan3D-2 image to mesh |
Requirements
- .NET 8 or .NET 10 SDK
CUDA backend (NVIDIA, fastest)
- CUDA 12.x runtime
- NVIDIA GPU, compute capability 8.0+ (RTX 30xx/40xx, A100, H100)
Vulkan backend (NVIDIA / AMD / Intel, cross-vendor)
- Vulkan 1.3+ runtime (ships with the GPU vendor driver)
- GPU with FP16 compute (
shaderFloat16), most discrete GPUs from 2019+
CPU backend
- Any x86-64 (AVX2+) or ARM64 (NEON) machine, no GPU required
Links
- Source & docs: https://github.com/kalebbroo/HartsyInference
- Issues: https://github.com/kalebbroo/HartsyInference/issues
- LLM companion: dotLLM
License
MIT © 2026 kalebbroo
| Product | Versions Compatible and additional computed target framework versions. |
|---|---|
| .NET | net8.0 is compatible. net8.0-android was computed. net8.0-browser was computed. net8.0-ios was computed. net8.0-maccatalyst was computed. net8.0-macos was computed. net8.0-tvos was computed. net8.0-windows was computed. net9.0 was computed. net9.0-android was computed. net9.0-browser was computed. net9.0-ios was computed. net9.0-maccatalyst was computed. net9.0-macos was computed. net9.0-tvos was computed. net9.0-windows was computed. net10.0 is compatible. net10.0-android was computed. net10.0-browser was computed. net10.0-ios was computed. net10.0-maccatalyst was computed. net10.0-macos was computed. net10.0-tvos was computed. net10.0-windows was computed. |
-
net10.0
- HartsyInference.Core (>= 1.0.0-alpha.9)
- HartsyInference.ModelHandler (>= 1.0.0-alpha.9)
- HartsyInference.Tokenizers (>= 1.0.0-alpha.9)
-
net8.0
- HartsyInference.Core (>= 1.0.0-alpha.9)
- HartsyInference.ModelHandler (>= 1.0.0-alpha.9)
- HartsyInference.Tokenizers (>= 1.0.0-alpha.9)
NuGet packages (4)
Showing the top 4 NuGet packages that depend on HartsyInference.Diffusion:
| Package | Downloads |
|---|---|
|
HartsyInference.Video
Video generation for HartsyInference — LTX-Video, Wan, temporal attention, and video VAE. (Phase 3 — stub) |
|
|
HartsyInference.Vision
Vision inference for HartsyInference — CLIP embeddings, YOLO detection (planned), SAM segmentation (planned), and face detection (planned). |
|
|
HartsyInference
Meta-package for HartsyInference — a pure C#/.NET AI inference engine for non-LLM modalities (diffusion image generation, speech-to-text, text-to-speech, vision, video, interactive world models). Adds all backends (CPU, CUDA, Vulkan) and modality packages in one reference. |
|
|
HartsyInference.ThreeD
3D asset generation for HartsyInference — image/text → mesh and Gaussian-splat. Representation-agnostic foundation (marching cubes, glTF/OBJ/PLY export, 3D sampling) plus model pipelines (Hunyuan3D-2 image→mesh). |
GitHub repositories
This package is not used by any popular GitHub repositories.
| Version | Downloads | Last Updated |
|---|---|---|
| 1.0.0-alpha.11 | 0 | 6/19/2026 |
| 1.0.0-alpha.10 | 0 | 6/19/2026 |
| 1.0.0-alpha.9 | 28 | 6/18/2026 |
| 1.0.0-alpha.8 | 53 | 6/17/2026 |
| 1.0.0-alpha.3 | 66 | 6/14/2026 |