ElBruno.LocalLLMs
0.1.7
See the version list below for details.
dotnet add package ElBruno.LocalLLMs --version 0.1.7
NuGet\Install-Package ElBruno.LocalLLMs -Version 0.1.7
<PackageReference Include="ElBruno.LocalLLMs" Version="0.1.7" />
<PackageVersion Include="ElBruno.LocalLLMs" Version="0.1.7" />
<PackageReference Include="ElBruno.LocalLLMs" />
paket add ElBruno.LocalLLMs --version 0.1.7
#r "nuget: ElBruno.LocalLLMs, 0.1.7"
#:package ElBruno.LocalLLMs@0.1.7
#addin nuget:?package=ElBruno.LocalLLMs&version=0.1.7
#tool nuget:?package=ElBruno.LocalLLMs&version=0.1.7
ElBruno.LocalLLMs
Run local LLMs in .NET through IChatClient — the same interface you'd use for Azure OpenAI, Ollama, or any other provider. Powered by ONNX Runtime GenAI.
Features
- 🔌
IChatClientimplementation — seamless integration with Microsoft.Extensions.AI - 📦 Automatic model download — models are fetched from HuggingFace on first use
- 🚀 Zero friction — works out of the box with sensible defaults (Phi-3.5 mini)
- 🖥️ Multi-hardware — CPU, CUDA, and DirectML execution providers
- 💉 DI-friendly — register with
AddLocalLLMs()in ASP.NET Core - 🔄 Streaming — token-by-token streaming via
GetStreamingResponseAsync - 📊 Multi-model — switch between Phi-3.5, Phi-4, Qwen2.5, Llama 3.2, and more
Installation
dotnet add package ElBruno.LocalLLMs
Quick Start
using ElBruno.LocalLLMs;
using Microsoft.Extensions.AI;
// Create a local chat client (downloads Phi-3.5 mini on first run)
using var client = await LocalChatClient.CreateAsync();
var response = await client.GetResponseAsync([
new(ChatRole.User, "What is the capital of France?")
]);
Console.WriteLine(response.Text);
Streaming
using ElBruno.LocalLLMs;
using Microsoft.Extensions.AI;
using var client = await LocalChatClient.CreateAsync(new LocalLLMsOptions
{
Model = KnownModels.Phi35MiniInstruct
});
await foreach (var update in client.GetStreamingResponseAsync([
new(ChatRole.System, "You are a helpful assistant."),
new(ChatRole.User, "Explain quantum computing in simple terms.")
]))
{
Console.Write(update.Text);
}
Dependency Injection
builder.Services.AddLocalLLMs(options =>
{
options.Model = KnownModels.Phi35MiniInstruct;
options.ExecutionProvider = ExecutionProvider.DirectML;
});
// Inject IChatClient anywhere
public class MyService(IChatClient chatClient) { ... }
Supported Models
| Tier | Model | Parameters | ONNX | ID |
|---|---|---|---|---|
| ⚪ Tiny | TinyLlama-1.1B-Chat | 1.1B | ✅ Native | tinyllama-1.1b-chat |
| ⚪ Tiny | SmolLM2-1.7B-Instruct | 1.7B | ✅ Native | smollm2-1.7b-instruct |
| ⚪ Tiny | Qwen2.5-0.5B-Instruct | 0.5B | ✅ Native | qwen2.5-0.5b-instruct |
| ⚪ Tiny | Qwen2.5-1.5B-Instruct | 1.5B | ✅ Native | qwen2.5-1.5b-instruct |
| ⚪ Tiny | Gemma-2B-IT | 2B | ✅ Native | gemma-2b-it |
| ⚪ Tiny | StableLM-2-1.6B-Chat | 1.6B | 🔄 Convert | stablelm-2-1.6b-chat |
| 🟢 Small | Phi-3.5 mini instruct | 3.8B | ✅ Native | phi-3.5-mini-instruct |
| 🟢 Small | Qwen2.5-3B-Instruct | 3B | ✅ Native | qwen2.5-3b-instruct |
| 🟢 Small | Llama-3.2-3B-Instruct | 3B | ✅ Native | llama-3.2-3b-instruct |
| 🟢 Small | Gemma-2-2B-IT | 2B | ✅ Native | gemma-2-2b-it |
| 🟡 Medium | Qwen2.5-7B-Instruct | 7B | ✅ Native | qwen2.5-7b-instruct |
| 🟡 Medium | Llama-3.1-8B-Instruct | 8B | ✅ Native | llama-3.1-8b-instruct |
| 🟡 Medium | Mistral-7B-Instruct-v0.3 | 7B | ✅ Native | mistral-7b-instruct-v0.3 |
| 🟡 Medium | Gemma-2-9B-IT | 9B | ✅ Native | gemma-2-9b-it |
| 🟡 Medium | Phi-4 | 14B | ✅ Native | phi-4 |
| 🟡 Medium | DeepSeek-R1-Distill-Qwen-14B | 14B | ✅ Native | deepseek-r1-distill-qwen-14b |
| 🟡 Medium | Mistral-Small-24B-Instruct | 24B | ✅ Native | mistral-small-24b-instruct |
| 🔴 Large | Qwen2.5-14B-Instruct | 14B | ✅ Native | qwen2.5-14b-instruct |
| 🔴 Large | Qwen2.5-32B-Instruct | 32B | ✅ Native | qwen2.5-32b-instruct |
| 🔴 Large | Llama-3.3-70B-Instruct | 70B | 🔄 Convert | llama-3.3-70b-instruct |
| 🔴 Large | Mixtral-8x7B-Instruct-v0.1 | 8x7B | 🔄 Convert | mixtral-8x7b-instruct-v0.1 |
| 🔴 Large | DeepSeek-R1-Distill-Llama-70B | 70B | 🔄 Convert | deepseek-r1-distill-llama-70b |
| 🔴 Large | Command-R (35B) | 35B | 🔄 Convert | command-r-35b |
See the Supported Models Guide for detailed model cards, performance benchmarks, and selection guidance.
Samples
| Sample | Description |
|---|---|
| HelloChat | Minimal console chat |
| StreamingChat | Token-by-token streaming |
| MultiModelChat | Switch models at runtime |
| DependencyInjection | ASP.NET Core DI registration |
Requirements
- .NET 8.0 or .NET 10.0
- CPU (default), NVIDIA GPU (CUDA), or Windows GPU (DirectML)
- ~2-8 GB disk space per model (depending on size and quantization)
Documentation
- Getting Started — installation, first steps, configuration
- Supported Models — full model reference with tiers, specs, decision tree
- Architecture — design decisions and internal structure
- Samples Guide — walkthrough of each sample application
- Benchmarks — how to run and interpret performance benchmarks
- ONNX Conversion — converting HuggingFace models to ONNX format
- Publishing — NuGet package publishing with OIDC
- Contributing — how to contribute
- Changelog — version history
License
MIT © Bruno Capuano
| Product | Versions Compatible and additional computed target framework versions. |
|---|---|
| .NET | net8.0 is compatible. net8.0-android was computed. net8.0-browser was computed. net8.0-ios was computed. net8.0-maccatalyst was computed. net8.0-macos was computed. net8.0-tvos was computed. net8.0-windows was computed. net9.0 was computed. net9.0-android was computed. net9.0-browser was computed. net9.0-ios was computed. net9.0-maccatalyst was computed. net9.0-macos was computed. net9.0-tvos was computed. net9.0-windows was computed. net10.0 is compatible. net10.0-android was computed. net10.0-browser was computed. net10.0-ios was computed. net10.0-maccatalyst was computed. net10.0-macos was computed. net10.0-tvos was computed. net10.0-windows was computed. |
-
net10.0
- ElBruno.HuggingFace.Downloader (>= 0.6.0)
- Microsoft.Extensions.AI.Abstractions (>= 10.4.0)
- Microsoft.Extensions.DependencyInjection.Abstractions (>= 10.0.5)
- Microsoft.ML.OnnxRuntimeGenAI (>= 0.8.3)
-
net8.0
- ElBruno.HuggingFace.Downloader (>= 0.6.0)
- Microsoft.Extensions.AI.Abstractions (>= 10.4.0)
- Microsoft.Extensions.DependencyInjection.Abstractions (>= 10.0.5)
- Microsoft.ML.OnnxRuntimeGenAI (>= 0.8.3)
NuGet packages (2)
Showing the top 2 NuGet packages that depend on ElBruno.LocalLLMs:
| Package | Downloads |
|---|---|
|
ElBruno.ModelContextProtocol.MCPToolRouter
Semantic routing for Model Context Protocol (MCP) tool definitions using local embeddings. Indexes MCP tools and returns the most relevant tools for a given prompt via vector search. |
|
|
ElBruno.LocalLLMs.Rag
RAG (Retrieval-Augmented Generation) pipeline for ElBruno.LocalLLMs. Provides document chunking, embedding storage, and semantic search. |
GitHub repositories
This package is not used by any popular GitHub repositories.