Retrievo.AzureOpenAI 0.3.0-preview.2

This is a prerelease version of Retrievo.AzureOpenAI.
There is a newer prerelease version of this package available.
See the version list below for details.
dotnet add package Retrievo.AzureOpenAI --version 0.3.0-preview.2
                    
NuGet\Install-Package Retrievo.AzureOpenAI -Version 0.3.0-preview.2
                    
This command is intended to be used within the Package Manager Console in Visual Studio, as it uses the NuGet module's version of Install-Package.
<PackageReference Include="Retrievo.AzureOpenAI" Version="0.3.0-preview.2" />
                    
For projects that support PackageReference, copy this XML node into the project file to reference the package.
<PackageVersion Include="Retrievo.AzureOpenAI" Version="0.3.0-preview.2" />
                    
Directory.Packages.props
<PackageReference Include="Retrievo.AzureOpenAI" />
                    
Project file
For projects that support Central Package Management (CPM), copy this XML node into the solution Directory.Packages.props file to version the package.
paket add Retrievo.AzureOpenAI --version 0.3.0-preview.2
                    
#r "nuget: Retrievo.AzureOpenAI, 0.3.0-preview.2"
                    
#r directive can be used in F# Interactive and Polyglot Notebooks. Copy this into the interactive tool or source code of the script to reference the package.
#:package Retrievo.AzureOpenAI@0.3.0-preview.2
                    
#:package directive can be used in C# file-based apps starting in .NET 10 preview 4. Copy this into a .cs file before any lines of code to reference the package.
#addin nuget:?package=Retrievo.AzureOpenAI&version=0.3.0-preview.2&prerelease
                    
Install as a Cake Addin
#tool nuget:?package=Retrievo.AzureOpenAI&version=0.3.0-preview.2&prerelease
                    
Install as a Cake Tool

Retrievo

Hybrid search for .NET — BM25 + vectors + RRF fusion, zero infrastructure

NuGet NuGet AzureOpenAI Downloads License .NET

Retrievo is an open-source, in-process, in-memory search library for .NET that combines BM25 lexical matching with vector similarity search. Results are merged via Reciprocal Rank Fusion (RRF) into a single ranked list — no external servers, no databases, no infrastructure. Designed for corpora up to ~10k documents: local agent memory, small RAG pipelines, developer tools, and offline/edge scenarios.


Packages

Package Description
Retrievo Core library — BM25 lexical search, brute-force vector search, RRF fusion, builder, mutable index. Zero external service dependencies.
Retrievo.AzureOpenAI Azure OpenAI embedding provider. Install this if you want automatic document/query embedding via Azure OpenAI. Adds a dependency on Azure.AI.OpenAI.
dotnet add package Retrievo --prerelease
dotnet add package Retrievo.AzureOpenAI --prerelease  # optional, for Azure OpenAI embeddings

Key Features

  • Hybrid Retrieval: Combine BM25 and cosine similarity using RRF fusion.
  • Standalone Modes: Use lexical-only or vector-only search when needed.
  • Explain Mode: Detailed score breakdown for every search result.
  • Fielded Search: Title and body fields with independent boost weights.
  • Metadata Filters: Exact-match, range, and contains filtering post-fusion.
  • Field Definitions: Declare field types (String, StringArray) at index time for automatic filter semantics.
  • Finite Vector Validation: Rejects NaN/Infinity embeddings and query vectors with clear exceptions.

Index Management

  • Fluent Builder: Clean API for batch construction and folder ingestion.
  • Mutable Index: Incremental upserts and deletes with thread-safe commits.
  • Zero Infrastructure: Runs entirely in-process with no external dependencies.
  • Auto-Embedding: Transparently embed documents at index time.

Developer Experience

  • SIMD Accelerated: Hardware-intrinsics for fast brute-force vector math.
  • Query Diagnostics: Detailed timing breakdown for every pipeline stage.
  • Pluggable Providers: Easy integration with any embedding model or API.
  • CLI Tool: Powerful terminal interface for indexing and querying.

Quick Start

Build an index and search in a few lines:

using Retrievo;
using Retrievo.Models;

var index = new HybridSearchIndexBuilder()
    .AddDocument(new Document { Id = "1", Body = "Neural networks learn complex patterns." })
    .AddDocument(new Document { Id = "2", Body = "Kubernetes orchestrates container deployments." })
    .Build();

using var _ = index;
var response = index.Search(new HybridQuery { Text = "neural network training", TopK = 5 });

foreach (var r in response.Results)
    Console.WriteLine($"  {r.Id}: {r.Score:F4}");

Field Definitions

Declare field types at index time so filters automatically use the right matching strategy:

using var index = new HybridSearchIndexBuilder()
    .DefineField("tags", FieldType.StringArray)         // pipe-delimited by default
    .DefineField("categories", FieldType.StringArray, delimiter: ',')
    .AddDocument(new Document
    {
        Id = "1",
        Body = "Deep learning fundamentals",
        Metadata = new Dictionary<string, string>
        {
            ["tags"] = "ml|deep-learning|neural-nets",
            ["categories"] = "ai,education"
        }
    })
    .Build();

// StringArray fields auto-split and do contains-match; undeclared fields use exact-match
var response = index.Search(new HybridQuery
{
    Text = "deep learning",
    MetadataFilters = new Dictionary<string, string> { ["tags"] = "ml" }
});

Azure OpenAI Embeddings

Plug in an embedding provider and Retrievo handles the rest — documents are embedded at build time, queries at search time.

using Retrievo.AzureOpenAI;

var provider = new AzureOpenAIEmbeddingProvider(
    new Uri("https://your-resource.openai.azure.com/"),
    "your-api-key",
    "text-embedding-3-small");

// Documents are auto-embedded during build
using var index = await new HybridSearchIndexBuilder()
    .AddFolder("./docs")  // loads *.md and *.txt recursively
    .WithEmbeddingProvider(provider)
    .BuildAsync();

// Query text is automatically converted to a vector
var response = await index.SearchAsync(new HybridQuery { Text = "how to deploy", TopK = 5 });

Architecture

HybridQuery
    |
    v
+-------------------+
|  HybridSearchIndex |  (orchestrator)
+-------------------+
    |           |
    v           v
+--------+  +----------+
| Lucene |  | Brute-   |
| BM25   |  | Force    |
| Search |  | Cosine   |
+--------+  +----------+
    |           |
    v           v
  ranked      ranked
  list        list
    \         /
     v       v
  +----------+
  | RRF      |
  | Fusion   |
  +----------+
       |
       v
  SearchResponse

Reciprocal Rank Fusion merges multiple ranked lists without score normalization: score(doc) = Σ weight / (k + rank). Documents that rank high on both lexical and vector lists get the biggest boost — surfacing results that are semantically relevant and contain the right keywords.


Benchmarks

Retrieval Quality (NDCG@10)

Validated against BEIR with 245-configuration parameter sweeps per dataset:

Dataset BM25 Vector-only Hybrid (default) Hybrid (tuned) Anserini BM25
NFCorpus 0.325 0.384 0.392 0.392 0.325
SciFact 0.665 0.731 0.756 0.757 0.679

Default parameters (LexicalWeight=0.5, VectorWeight=1.0, RrfK=20, TitleBoost=0.5) tuned via cross-dataset harmonic mean optimization.

Query Latency

3,000 documents × 768-dimensional embeddings (text-embedding-3-small):

Operation Latency
Vector-only query < 5 ms
Lexical-only query < 5 ms
Hybrid query (BM25 + vector + RRF) < 10 ms
Index build (3k docs) < 2 s

Roadmap

Phase Status Description
Phase 1 Done MVP hybrid retrieval, CLI, Azure OpenAI provider
Phase 2 Done Mutable index, fielded search, filters (exact, range, contains), field definitions, diagnostics
Phase 3 Planned Snapshot export and import
Phase 4 Planned ANN support for larger corpora

Build & Test

Requires .NET 8 SDK or later.

dotnet build
dotnet test

238 tests covering retrieval, vector math, fusion, mutable index, filters, field definitions, cancellation, and CLI integration — 0 warnings.

Known Limitations

  • Lexical (BM25) search is English-only: The lexical retrieval pipeline uses EnglishStemAnalyzer (StandardTokenizer → EnglishPossessiveFilter → LowerCaseFilter → English StopWords → PorterStemmer). Non-English text will not be properly tokenized or stemmed for BM25 matching.
  • Vector search is language-agnostic: Semantic search works with any language supported by your embedding model (e.g., multilingual embeddings). Hybrid search inherits the English-only limitation for its lexical component.
  • Brute-force vector search is O(n) per query: Designed for corpora up to ~10k documents. For larger corpora, consider ANN-based solutions (planned for Phase 4).
  • In-memory only: No persistence or crash recovery. The index must be rebuilt from source documents on each application start.
  • No concurrent writers on HybridSearchIndex: The immutable index is built once via the builder. Use MutableHybridSearchIndex for incremental upserts and deletes.
  • Single-process: No distributed or shared index support. The index lives in a single process's memory.
  • Workaround for non-English corpora: Use vector-only search by omitting lexical configuration, or configure a custom analyzer for your language in a fork.

License

MIT

Product Compatible and additional computed target framework versions.
.NET net8.0 is compatible.  net8.0-android was computed.  net8.0-browser was computed.  net8.0-ios was computed.  net8.0-maccatalyst was computed.  net8.0-macos was computed.  net8.0-tvos was computed.  net8.0-windows was computed.  net9.0 was computed.  net9.0-android was computed.  net9.0-browser was computed.  net9.0-ios was computed.  net9.0-maccatalyst was computed.  net9.0-macos was computed.  net9.0-tvos was computed.  net9.0-windows was computed.  net10.0 was computed.  net10.0-android was computed.  net10.0-browser was computed.  net10.0-ios was computed.  net10.0-maccatalyst was computed.  net10.0-macos was computed.  net10.0-tvos was computed.  net10.0-windows was computed. 
Compatible target framework(s)
Included target framework(s) (in package)
Learn more about Target Frameworks and .NET Standard.

NuGet packages

This package is not used by any NuGet packages.

GitHub repositories

This package is not used by any popular GitHub repositories.

Version Downloads Last Updated
0.3.0-preview.4 31 3/6/2026
0.3.0-preview.3 34 3/5/2026
0.3.0-preview.2 27 3/4/2026
0.3.0-preview.1 31 3/4/2026