Retrievo.AzureOpenAI
0.3.0-preview.1
See the version list below for details.
dotnet add package Retrievo.AzureOpenAI --version 0.3.0-preview.1
NuGet\Install-Package Retrievo.AzureOpenAI -Version 0.3.0-preview.1
<PackageReference Include="Retrievo.AzureOpenAI" Version="0.3.0-preview.1" />
<PackageVersion Include="Retrievo.AzureOpenAI" Version="0.3.0-preview.1" />
<PackageReference Include="Retrievo.AzureOpenAI" />
paket add Retrievo.AzureOpenAI --version 0.3.0-preview.1
#r "nuget: Retrievo.AzureOpenAI, 0.3.0-preview.1"
#:package Retrievo.AzureOpenAI@0.3.0-preview.1
#addin nuget:?package=Retrievo.AzureOpenAI&version=0.3.0-preview.1&prerelease
#tool nuget:?package=Retrievo.AzureOpenAI&version=0.3.0-preview.1&prerelease
Retrievo
Hybrid search for .NET — BM25 + vectors + RRF fusion, zero infrastructure
Retrievo is an open-source, in-process, in-memory search library for .NET that combines BM25 lexical matching with vector similarity search. Results are merged via Reciprocal Rank Fusion (RRF) into a single ranked list — no external servers, no databases, no infrastructure. Designed for corpora up to ~10k documents: local agent memory, small RAG pipelines, developer tools, and offline/edge scenarios.
Quick Install
dotnet add package Retrievo --prerelease
For Azure OpenAI embeddings:
dotnet add package Retrievo.AzureOpenAI --prerelease
Key Features
Core Search
- Hybrid Retrieval: Combine BM25 and cosine similarity using RRF fusion.
- Standalone Modes: Use lexical-only or vector-only search when needed.
- Explain Mode: Detailed score breakdown for every search result.
- Fielded Search: Title and body fields with independent boost weights.
- Metadata Filters: Exact-match, range, and contains filtering post-fusion.
- Field Definitions: Declare field types (
String,StringArray) at index time for automatic filter semantics. - Finite Vector Validation: Rejects NaN/Infinity embeddings and query vectors with clear exceptions.
Index Management
- Fluent Builder: Clean API for batch construction and folder ingestion.
- Mutable Index: Incremental upserts and deletes with thread-safe commits.
- Zero Infrastructure: Runs entirely in-process with no external dependencies.
- Auto-Embedding: Transparently embed documents at index time.
Developer Experience
- SIMD Accelerated: Hardware-intrinsics for fast brute-force vector math.
- Query Diagnostics: Detailed timing breakdown for every pipeline stage.
- Pluggable Providers: Easy integration with any embedding model or API.
- CLI Tool: Powerful terminal interface for indexing and querying.
Quick Start
Build an index and search in a few lines:
using Retrievo;
using Retrievo.Models;
var index = new HybridSearchIndexBuilder()
.AddDocument(new Document { Id = "1", Body = "Neural networks learn complex patterns." })
.AddDocument(new Document { Id = "2", Body = "Kubernetes orchestrates container deployments." })
.Build();
using var _ = index;
var response = index.Search(new HybridQuery { Text = "neural network training", TopK = 5 });
foreach (var r in response.Results)
Console.WriteLine($" {r.Id}: {r.Score:F4}");
Field Definitions
Declare field types at index time so filters automatically use the right matching strategy:
using var index = new HybridSearchIndexBuilder()
.DefineField("tags", FieldType.StringArray) // pipe-delimited by default
.DefineField("categories", FieldType.StringArray, delimiter: ',')
.AddDocument(new Document
{
Id = "1",
Body = "Deep learning fundamentals",
Metadata = new Dictionary<string, string>
{
["tags"] = "ml|deep-learning|neural-nets",
["categories"] = "ai,education"
}
})
.Build();
// StringArray fields auto-split and do contains-match; undeclared fields use exact-match
var response = index.Search(new HybridQuery
{
Text = "deep learning",
MetadataFilters = new Dictionary<string, string> { ["tags"] = "ml" }
});
Azure OpenAI Embeddings
Plug in an embedding provider and Retrievo handles the rest — documents are embedded at build time, queries at search time.
using Retrievo.AzureOpenAI;
var provider = new AzureOpenAIEmbeddingProvider(
new Uri("https://your-resource.openai.azure.com/"),
"your-api-key",
"text-embedding-3-small");
// Documents are auto-embedded during build
using var index = await new HybridSearchIndexBuilder()
.AddFolder("./docs") // loads *.md and *.txt recursively
.WithEmbeddingProvider(provider)
.BuildAsync();
// Query text is automatically converted to a vector
var response = await index.SearchAsync(new HybridQuery { Text = "how to deploy", TopK = 5 });
Architecture
HybridQuery
|
v
+-------------------+
| HybridSearchIndex | (orchestrator)
+-------------------+
| |
v v
+--------+ +----------+
| Lucene | | Brute- |
| BM25 | | Force |
| Search | | Cosine |
+--------+ +----------+
| |
v v
ranked ranked
list list
\ /
v v
+----------+
| RRF |
| Fusion |
+----------+
|
v
SearchResponse
Reciprocal Rank Fusion merges multiple ranked lists without score normalization: score(doc) = Σ weight / (k + rank). Documents that rank high on both lexical and vector lists get the biggest boost — surfacing results that are semantically relevant and contain the right keywords.
Benchmarks
Retrieval Quality (NDCG@10)
Validated against BEIR with 245-configuration parameter sweeps per dataset:
| Dataset | BM25 | Vector-only | Hybrid (default) | Hybrid (tuned) | Anserini BM25 |
|---|---|---|---|---|---|
| NFCorpus | 0.325 | 0.384 | 0.392 | 0.392 | 0.325 |
| SciFact | 0.665 | 0.731 | 0.756 | 0.757 | 0.679 |
Default parameters (LexicalWeight=0.5, VectorWeight=1.0, RrfK=20, TitleBoost=0.5) tuned via cross-dataset harmonic mean optimization.
Query Latency
3,000 documents × 768-dimensional embeddings (text-embedding-3-small):
| Operation | Latency |
|---|---|
| Vector-only query | < 5 ms |
| Lexical-only query | < 5 ms |
| Hybrid query (BM25 + vector + RRF) | < 10 ms |
| Index build (3k docs) | < 2 s |
Roadmap
| Phase | Status | Description |
|---|---|---|
| Phase 1 | Done | MVP hybrid retrieval, CLI, Azure OpenAI provider |
| Phase 2 | Done | Mutable index, fielded search, filters (exact, range, contains), field definitions, diagnostics |
| Phase 3 | Planned | Snapshot export and import |
| Phase 4 | Planned | ANN support for larger corpora |
Build & Test
Requires .NET 8 SDK or later.
dotnet build
dotnet test
238 tests covering retrieval, vector math, fusion, mutable index, filters, field definitions, cancellation, and CLI integration — 0 warnings.
Known Limitations
- Lexical (BM25) search is English-only: The lexical retrieval pipeline uses
EnglishStemAnalyzer(StandardTokenizer → EnglishPossessiveFilter → LowerCaseFilter → English StopWords → PorterStemmer). Non-English text will not be properly tokenized or stemmed for BM25 matching. - Vector search is language-agnostic: Semantic search works with any language supported by your embedding model (e.g., multilingual embeddings). Hybrid search inherits the English-only limitation for its lexical component.
- Brute-force vector search is O(n) per query: Designed for corpora up to ~10k documents. For larger corpora, consider ANN-based solutions (planned for Phase 4).
- In-memory only: No persistence or crash recovery. The index must be rebuilt from source documents on each application start.
- No concurrent writers on
HybridSearchIndex: The immutable index is built once via the builder. UseMutableHybridSearchIndexfor incremental upserts and deletes. - Single-process: No distributed or shared index support. The index lives in a single process's memory.
- Workaround for non-English corpora: Use vector-only search by omitting lexical configuration, or configure a custom analyzer for your language in a fork.
License
| Product | Versions Compatible and additional computed target framework versions. |
|---|---|
| .NET | net8.0 is compatible. net8.0-android was computed. net8.0-browser was computed. net8.0-ios was computed. net8.0-maccatalyst was computed. net8.0-macos was computed. net8.0-tvos was computed. net8.0-windows was computed. net9.0 was computed. net9.0-android was computed. net9.0-browser was computed. net9.0-ios was computed. net9.0-maccatalyst was computed. net9.0-macos was computed. net9.0-tvos was computed. net9.0-windows was computed. net10.0 was computed. net10.0-android was computed. net10.0-browser was computed. net10.0-ios was computed. net10.0-maccatalyst was computed. net10.0-macos was computed. net10.0-tvos was computed. net10.0-windows was computed. |
-
net8.0
- Azure.AI.OpenAI (>= 2.1.0)
- Retrievo (>= 0.3.0-preview.1)
NuGet packages
This package is not used by any NuGet packages.
GitHub repositories
This package is not used by any popular GitHub repositories.
| Version | Downloads | Last Updated |
|---|---|---|
| 0.3.0-preview.4 | 31 | 3/6/2026 |
| 0.3.0-preview.3 | 34 | 3/5/2026 |
| 0.3.0-preview.2 | 27 | 3/4/2026 |
| 0.3.0-preview.1 | 31 | 3/4/2026 |