LocalAI.Embedder 0.5.0

There is a newer version of this package available.
See the version list below for details.
dotnet add package LocalAI.Embedder --version 0.5.0
                    
NuGet\Install-Package LocalAI.Embedder -Version 0.5.0
                    
This command is intended to be used within the Package Manager Console in Visual Studio, as it uses the NuGet module's version of Install-Package.
<PackageReference Include="LocalAI.Embedder" Version="0.5.0" />
                    
For projects that support PackageReference, copy this XML node into the project file to reference the package.
<PackageVersion Include="LocalAI.Embedder" Version="0.5.0" />
                    
Directory.Packages.props
<PackageReference Include="LocalAI.Embedder" />
                    
Project file
For projects that support Central Package Management (CPM), copy this XML node into the solution Directory.Packages.props file to version the package.
paket add LocalAI.Embedder --version 0.5.0
                    
#r "nuget: LocalAI.Embedder, 0.5.0"
                    
#r directive can be used in F# Interactive and Polyglot Notebooks. Copy this into the interactive tool or source code of the script to reference the package.
#:package LocalAI.Embedder@0.5.0
                    
#:package directive can be used in C# file-based apps starting in .NET 10 preview 4. Copy this into a .cs file before any lines of code to reference the package.
#addin nuget:?package=LocalAI.Embedder&version=0.5.0
                    
Install as a Cake Addin
#tool nuget:?package=LocalAI.Embedder&version=0.5.0
                    
Install as a Cake Tool

LocalAI

CI NuGet NuGet NuGet License: MIT

Philosophy

Start small. Download what you need. Run locally.

// This is all you need. No setup. No configuration. No API keys.
await using var model = await LocalEmbedder.LoadAsync("default");
float[] embedding = await model.EmbedAsync("Hello, world!");

LocalAI is designed around three core principles:

ðŸŠķ Minimal Footprint

Your application ships with zero bundled models. The base package is tiny. Models, tokenizers, and runtime components are downloaded only when first requested and cached for reuse.

⚡ Lazy Everything

First run:  LoadAsync("default") → Downloads model → Caches → Runs inference
Next runs:  LoadAsync("default") → Uses cached model → Runs inference instantly

No pre-download scripts. No model management. Just use it.

ðŸŽŊ Zero Boilerplate

Traditional approach:

// ❌ Without LocalAI: 50+ lines of setup
var tokenizer = LoadTokenizer(modelPath);
var session = new InferenceSession(modelPath, sessionOptions);
var inputIds = tokenizer.Encode(text);
var attentionMask = CreateAttentionMask(inputIds);
var inputs = new List<NamedOnnxValue> { ... };
var outputs = session.Run(inputs);
var embeddings = PostProcess(outputs);
// ... error handling, pooling, normalization, cleanup ...
// ✅ With LocalAI: 2 lines
await using var model = await LocalEmbedder.LoadAsync("default");
float[] embedding = await model.EmbedAsync("Hello, world!");

Packages

Package Description Status
LocalAI.Embedder Text → Vector embeddings NuGet
LocalAI.Reranker Semantic reranking for search NuGet
LocalAI.Generator Text generation & chat NuGet
LocalAI.Transcriber Speech → Text (Whisper) 📋 Planned
LocalAI.Synthesizer Text → Speech 📋 Planned
LocalAI.Translator Neural machine translation 📋 Planned
LocalAI.Detector Object detection 📋 Planned
LocalAI.Segmenter Image segmentation 📋 Planned
LocalAI.Ocr Document OCR 📋 Planned
LocalAI.Captioner Image → Text 📋 Planned

Quick Start

Text Embeddings

using LocalAI.Embedder;

await using var model = await LocalEmbedder.LoadAsync("default");

// Single text
float[] embedding = await model.EmbedAsync("Hello, world!");

// Batch processing
float[][] embeddings = await model.EmbedBatchAsync(new[]
{
    "First document",
    "Second document",
    "Third document"
});

// Similarity
float similarity = model.CosineSimilarity(embeddings[0], embeddings[1]);

Semantic Reranking

using LocalAI.Reranker;

await using var reranker = await LocalReranker.LoadAsync("default");

var results = await reranker.RerankAsync(
    query: "What is machine learning?",
    documents: new[]
    {
        "Machine learning is a subset of artificial intelligence...",
        "The weather today is sunny and warm...",
        "Deep learning uses neural networks..."
    },
    topK: 2
);

foreach (var result in results)
{
    Console.WriteLine($"[{result.Score:F4}] {result.Document}");
}

Text Generation

using LocalAI.Generator;

// Simple generation
var generator = await TextGeneratorBuilder.Create()
    .WithDefaultModel()  // Phi-3.5 Mini
    .BuildAsync();

string response = await generator.GenerateCompleteAsync("What is machine learning?");
Console.WriteLine(response);

// Chat format
var messages = new[]
{
    new ChatMessage(ChatRole.System, "You are a helpful assistant."),
    new ChatMessage(ChatRole.User, "Explain quantum computing simply.")
};

string chatResponse = await generator.GenerateChatCompleteAsync(messages);

// Streaming
await foreach (var token in generator.GenerateAsync("Write a story:"))
{
    Console.Write(token);
}

Available Models

Embedder

Alias Model Dimensions Size
default all-MiniLM-L6-v2 384 ~90MB
large all-mpnet-base-v2 768 ~420MB
multilingual paraphrase-multilingual-MiniLM-L12-v2 384 ~470MB

Reranker

Alias Model Max Tokens Size
default ms-marco-MiniLM-L-6-v2 512 ~90MB
quality ms-marco-MiniLM-L-12-v2 512 ~134MB
fast ms-marco-TinyBERT-L-2-v2 512 ~18MB
multilingual bge-reranker-v2-m3 8192 ~1.1GB

Generator

Alias Model Parameters License
default Phi-3.5-mini-instruct 3.8B MIT
fast Llama-3.2-1B-Instruct 1B Llama 3.2
quality phi-4 14B MIT
small Llama-3.2-1B-Instruct 1B Llama 3.2

GPU Acceleration

GPU acceleration is automatic when detected:

// Auto-detect (default) - uses GPU if available, falls back to CPU
var options = new EmbedderOptions { Provider = ExecutionProvider.Auto };

// Force specific provider
var options = new EmbedderOptions { Provider = ExecutionProvider.Cuda };     // NVIDIA
var options = new EmbedderOptions { Provider = ExecutionProvider.DirectML }; // Windows GPU
var options = new EmbedderOptions { Provider = ExecutionProvider.CoreML };   // macOS

Install the appropriate ONNX Runtime package for GPU support:

dotnet add package Microsoft.ML.OnnxRuntime.Gpu       # NVIDIA CUDA
dotnet add package Microsoft.ML.OnnxRuntime.DirectML  # Windows (AMD, Intel, NVIDIA)
dotnet add package Microsoft.ML.OnnxRuntime.CoreML    # macOS

Model Caching

Models are cached following HuggingFace Hub conventions:

  • Default: ~/.cache/huggingface/hub
  • Environment variables: HF_HUB_CACHE, HF_HOME, or XDG_CACHE_HOME
  • Manual override: new EmbedderOptions { CacheDirectory = "/path/to/cache" }

Requirements

  • .NET 10.0+
  • Windows, Linux, or macOS

Documentation


License

MIT License - see LICENSE for details.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Release Process

Releases are automated via GitHub Actions when Directory.Build.props is updated:

  1. Update the <Version> in Directory.Build.props
  2. Commit and push to main
  3. CI automatically publishes all packages to NuGet and creates a GitHub release

Requires NUGET_API_KEY secret configured in GitHub repository settings.

Product Compatible and additional computed target framework versions.
.NET net10.0 is compatible.  net10.0-android was computed.  net10.0-browser was computed.  net10.0-ios was computed.  net10.0-maccatalyst was computed.  net10.0-macos was computed.  net10.0-tvos was computed.  net10.0-windows was computed. 
Compatible target framework(s)
Included target framework(s) (in package)
Learn more about Target Frameworks and .NET Standard.

NuGet packages

This package is not used by any NuGet packages.

GitHub repositories

This package is not used by any popular GitHub repositories.

Version Downloads Last Updated
0.7.2 121 12/15/2025
0.7.1 86 12/15/2025
0.7.0 143 12/14/2025
0.6.0 102 12/13/2025
0.5.0 96 12/13/2025
0.4.0 104 12/13/2025