Vorcyc.Quiver 1.0.0

There is a newer version of this package available.
See the version list below for details.
dotnet add package Vorcyc.Quiver --version 1.0.0
                    
NuGet\Install-Package Vorcyc.Quiver -Version 1.0.0
                    
This command is intended to be used within the Package Manager Console in Visual Studio, as it uses the NuGet module's version of Install-Package.
<PackageReference Include="Vorcyc.Quiver" Version="1.0.0" />
                    
For projects that support PackageReference, copy this XML node into the project file to reference the package.
<PackageVersion Include="Vorcyc.Quiver" Version="1.0.0" />
                    
Directory.Packages.props
<PackageReference Include="Vorcyc.Quiver" />
                    
Project file
For projects that support Central Package Management (CPM), copy this XML node into the solution Directory.Packages.props file to version the package.
paket add Vorcyc.Quiver --version 1.0.0
                    
#r "nuget: Vorcyc.Quiver, 1.0.0"
                    
#r directive can be used in F# Interactive and Polyglot Notebooks. Copy this into the interactive tool or source code of the script to reference the package.
#:package Vorcyc.Quiver@1.0.0
                    
#:package directive can be used in C# file-based apps starting in .NET 10 preview 4. Copy this into a .cs file before any lines of code to reference the package.
#addin nuget:?package=Vorcyc.Quiver&version=1.0.0
                    
Install as a Cake Addin
#tool nuget:?package=Vorcyc.Quiver&version=1.0.0
                    
Install as a Cake Tool

Vorcy Quiver 1.0 Technical Documentation

Vorcy Quiver 1.0

Product Positioning: A pure .NET embedded vector database — zero native dependencies, runs in-process, no standalone database server deployment required
Framework Version: .NET 10
Namespace: Vorcyc.Quiver
Design Philosophy: Similar to EF Core's DbContext pattern, achieving automatic discovery, index construction, and persistence of the vector database through declarative attribute annotations
Core Features: Code-First declarative entity definition · Multiple ANN indexes (Flat / HNSW / IVF / KDTree) · Multiple persistence formats (JSON / XML / Binary) · WAL incremental persistence · Reader-writer lock concurrency safety · SIMD-accelerated similarity computation
Keywords: Embedded Vector Database Pure .NET ANN Approximate Nearest Neighbor Search Similarity Retrieval HNSW IVF KDTree Code-First EF Core Style Embedding Semantic Search Face Recognition Image-to-Image Search RAG SIMD WAL Write-Ahead Log Incremental Persistence Crash Recovery
Name Origin: Quiver — a container for arrows (Arrow), and the mathematical essence of a vector is an arrow

Creation Overview

The inspiration for creating Quiver can be traced back to my development of the Vorcyc.AwesomeAI.Ash class, which provided simple vector storage and retrieval functionality to meet some lightweight semantic search needs. Although Ash pursued minimalism and ease of use in its design, as application scenarios evolved, its design bottlenecks became increasingly apparent:

  • Non-customizable table structureAsh's storage architecture was internally fixed by the framework. Users could only access data according to a preset field layout and could not freely define entity properties and structures based on business requirements. This limitation was particularly prominent when designing differentiated data models for different scenarios (such as face recognition, document retrieval, multimodal search).
  • Only brute-force search supportedAsh's retrieval method was brute-force search, traversing each record and computing similarity one by one, with time complexity O(n*d). While acceptable for small data volumes, search latency increased dramatically when vector scale grew to tens or even hundreds of thousands. The lack of Approximate Nearest Neighbor (ANN) index support made it unsuitable for production scenarios requiring fast response times.
  • No concurrent operations supportedAsh's internal data structures had no thread synchronization protection. Performing read and write operations simultaneously in a multi-threaded environment would cause data races and unpredictable exceptions. For server-side scenarios requiring concurrent queries (such as ASP.NET Web API handling multiple search requests simultaneously), users had to add their own external locks, which increased usage complexity and easily led to performance bottlenecks or deadlock risks due to improper lock granularity.

While reflecting on these pain points, EF Core's design philosophy provided key inspiration — especially its "Code-First" concept: developers simply annotate entity class properties with attributes, and the framework automatically completes model discovery, relationship mapping, and data persistence, all in a declarative and non-intrusive manner. Meanwhile, the Python library Annoy (Approximate Nearest Neighbors Oh Yeah) also provided inspiration, but its .NET wrapper HNSWSharp did not support a structured database-like design and only offered a single HNSW index type, lacking flexibility and diversity.

Therefore, I decided to design a brand-new vector database framework that would maintain EF Core-style ease of use and declarative modeling, support multiple ANN index algorithms to accommodate scenarios with different scales and performance requirements, and also include built-in concurrency safety mechanisms and efficient persistence solutions.


Product Introduction

Quiver is a pure .NET embedded vector database with zero native dependencies, running as an in-process library without requiring standalone database server deployment. It draws on EF Core's DbContext design pattern, allowing developers to define entities and indexing strategies through declarative attributes such as [QuiverKey], [QuiverVector], and [QuiverIndex], with the framework automatically completing model discovery, index construction, and persistence management at runtime.

Core Capabilities at a Glance:

  • Code-First Declarative Modeling — Like EF Core, annotate entity classes with attributes, and the framework automatically discovers and registers QuiverSet<T> collections via reflection — zero configuration required.
  • Multiple ANN Index Algorithms — Built-in Flat (brute-force search), HNSW (Hierarchical Navigable Small World graph), IVF (Inverted File Index), and KDTree indexes, covering the full range from small-scale exact search to million-scale approximate search.
  • Flexible Persistence Options — Supports JSON (human-readable for debugging), XML (compatibility), and Binary (high-performance production) storage formats, plus a WAL (Write-Ahead Log) incremental persistence mechanism that reduces persistence complexity from O(N) to O(delta) in high-frequency write scenarios.
  • Out-of-the-box Concurrency SafetyQuiverSet<T> internally implements reader-writer separation locks via ReaderWriterLockSlim, making concurrent multi-threaded searching and writing inherently safe without external locking.
  • SIMD Hardware Acceleration — Leverages TensorPrimitives-based SIMD instructions to accelerate vector similarity computation and L2 normalization, fully utilizing modern CPU vectorization capabilities.

Typical Use Cases: Semantic search, RAG (Retrieval-Augmented Generation), face recognition, image-to-image search, recommendation systems, multimodal retrieval, etc.


Table of Contents

  1. Architecture Overview
  2. Quick Start
  3. Core Concepts
  4. Distance Metrics
  5. Index Types
  6. CRUD Operations
  7. Vector Search
  8. Persistent Storage
  9. Multi-Vector Field Support
  10. Thread Safety and Concurrency
  11. Lifecycle Management
  12. Configuration Options
  13. Internal Implementation Details
  14. Complete Examples
  15. API Reference Cheat Sheet

1. Architecture Overview

1.1 Layered Architecture

graph TB
    subgraph User Code Layer
        Entity["Entity<br/>[QuiverKey] [QuiverVector] [QuiverIndex]"]
        Context["MyDbContext : QuiverDbContext<br/>QuiverSet&lt;Entity&gt; Entities"]
    end

    subgraph QuiverDb Framework Layer
        VDC["QuiverDbContext<br/>* Auto-discover QuiverSet<br/>* SaveAsync / LoadAsync persistence<br/>* SaveChangesAsync WAL incremental<br/>* IDisposable / IAsyncDisposable"]
        VS["QuiverSet&lt;TEntity&gt;<br/>* CRUD (Add / Upsert / Remove)<br/>* Search (Top-K / Threshold / Filtered)<br/>* ReaderWriterLockSlim concurrency<br/>* Change tracking (_changeLog)"]
    end

    subgraph Indexing Layer
        IVI["IVectorIndex Interface"]
        Flat["FlatIndex<br/>Brute-force O(n*d)<br/>100% exact"]
        HNSW["HnswIndex<br/>Layered graph O(log n)<br/>Approximate search preferred"]
        IVF["IvfIndex<br/>K-Means clustering O(n/k)<br/>Large datasets"]
        KDT["KDTreeIndex<br/>Spatial bisection O(log n)<br/>Low-dimensional exact"]
    end

    subgraph Storage Layer
        ISP["IStorageProvider Interface"]
        JSON["JsonStorageProvider<br/>System.Text.Json<br/>Readable, for debugging"]
        XML["XmlStorageProvider<br/>XDocument + Base64<br/>Compatibility"]
        BIN["BinaryStorageProvider<br/>Custom protocol + MemoryMarshal<br/>Smallest size, zero-copy"]
    end

    subgraph WAL Layer Write-Ahead Log
        WAL["WriteAheadLog<br/>Custom binary format<br/>CRC32 checksum<br/>Crash recovery safe"]
        WE["WalEntry<br/>Operation record<br/>Add / Remove / Clear"]
    end

    Entity --> Context
    Context --> VDC
    VDC --> VS
    VS --> IVI
    IVI --> Flat
    IVI --> HNSW
    IVI --> IVF
    IVI --> KDT
    VDC --> ISP
    ISP --> JSON
    ISP --> XML
    ISP --> BIN
    VDC --> WAL
    WAL --> WE

1.2 Core Components Overview

Component Type Responsibility
QuiverDbContext abstract class Database context base class, manages automatic reflection discovery of QuiverSet collections, persistence read/write, lifecycle
QuiverSet<TEntity> class Vector collection, provides full CRUD + multiple search modes, internal ReaderWriterLockSlim reader-writer lock
IVectorIndex internal interface Unified vector index contract, defines Add / Remove / Clear / Search / SearchByThreshold
IStorageProvider internal interface Unified persistence contract, supports SaveAsync / LoadAsync
StorageProviderFactory internal static class Factory method, creates corresponding IStorageProvider instance based on StorageFormat enum
QuiverVectorAttribute Attribute Marks vector field, specifies dimensions and distance metric
QuiverKeyAttribute Attribute Marks entity primary key (exactly one per entity)
QuiverIndexAttribute Attribute Configures index type and tuning parameters (optional, defaults to Flat)
QuiverDbOptions class Global configuration: storage path, default metric, format, JSON options, WAL configuration
QuiverSearchResult<T> record Search result DTO, contains Entity and Similarity
WriteAheadLog internal sealed class WAL file read/write engine, custom binary format + CRC32 checksum, crash recovery safe
WalEntry internal sealed record WAL change record, contains operation type, target type name, JSON payload
WalOperation internal enum WAL operation types: Add / Remove / Clear

1.3 Class Relationship Diagram

classDiagram
    class QuiverDbContext {
        <<abstract>>
        -Dictionary~Type, object~ _sets
        -Dictionary~string, Type~ _typeMap
        -IStorageProvider _storageProvider
        -QuiverDbOptions _options
        -WriteAheadLog? _wal
        +Set~TEntity~() QuiverSet~TEntity~
        +SaveAsync(path?) Task
        +SaveChangesAsync() Task
        +CompactAsync() Task
        +LoadAsync(path?) Task
        +Dispose()
        +DisposeAsync() ValueTask
        -InitializeSets()
        -ReplayWal(walFilePath)
    }

    class QuiverSet~TEntity~ {
        -Dictionary~int, TEntity~ _entities
        -Dictionary~object, int~ _keyToId
        -FrozenDictionary~string, QuiverFieldInfo~ _vectorFields
        -FrozenDictionary~string, Func~ _vectorGetters
        -FrozenDictionary~string, IVectorIndex~ _indices
        -List _changeLog
        -ReaderWriterLockSlim _lock
        -int _nextId
        +int Count
        +IReadOnlyDictionary VectorFields
        +Add(entity)
        +AddRange(entities)
        +Upsert(entity)
        +Remove(entity) bool
        +RemoveByKey(key) bool
        +Find(key) TEntity?
        +Clear()
        +Search(...) List~QuiverSearchResult~
        +SearchTop1(...) QuiverSearchResult?
        +SearchByThreshold(...) List~QuiverSearchResult~
    }

    class IVectorIndex {
        <<interface>>
        +int Count
        +Add(id, vector)
        +Remove(id)
        +Clear()
        +Search(query, topK) List
        +SearchByThreshold(query, threshold) List
    }

    class FlatIndex {
        -Dictionary~int, float[]~ _vectors
        -SimilarityFunc similarityFunc
        -SequentialSearchCore()
        -ParallelSearchCore()
    }

    class HnswIndex {
        -Dictionary~int, HnswNode~ _nodes
        -int _entryPointId
        -int _maxLevel
        -int _m, _mMax0
        -int _efConstruction, _efSearch
        +int EfSearch
    }

    class IvfIndex {
        -Dictionary~int, float[]~ _vectors
        -float[][] _centroids
        -List~int~[] _invertedLists
        -int _numClusters, _numProbes
        -Build()
        -KMeansPlusPlusInit()
    }

    class KDTreeIndex {
        -Dictionary~int, float[]~ _vectors
        -KDNode? _root
        -BuildTree()
        -SearchNode()
    }

    class IStorageProvider {
        <<interface>>
        +SaveAsync(filePath, sets) Task
        +LoadAsync(filePath, typeMap) Task
    }

    class QuiverDbOptions {
        +string? DatabasePath
        +DistanceMetric DefaultMetric
        +StorageFormat StorageFormat
        +JsonSerializerOptions JsonOptions
        +bool EnableWal
        +int WalCompactionThreshold
        +bool WalFlushToDisk
    }

    class WriteAheadLog {
        -FileStream _stream
        -BinaryWriter _writer
        -Lock _writeLock
        -long _nextSeqNo
        -long _recordCount
        +string FilePath
        +long RecordCount
        +Append(entries, flushToDisk)
        +ReadAll(filePath)$ List~WalEntry~
        +Truncate()
        +Dispose()
    }

    class WalEntry {
        <<record>>
        +WalOperation Operation
        +string TypeName
        +string PayloadJson
    }

    class QuiverSearchResult~TEntity~ {
        <<record>>
        +TEntity Entity
        +float Similarity
    }

    QuiverDbContext o-- QuiverSet~TEntity~ : contains N
    QuiverDbContext --> IStorageProvider : uses
    QuiverDbContext --> QuiverDbOptions : configured by
    QuiverDbContext --> WriteAheadLog : WAL incremental persistence
    WriteAheadLog --> WalEntry : records
    QuiverSet~TEntity~ --> IVectorIndex : one per vector field
    IVectorIndex <|.. FlatIndex
    IVectorIndex <|.. HnswIndex
    IVectorIndex <|.. IvfIndex
    IVectorIndex <|.. KDTreeIndex
    IStorageProvider <|.. JsonStorageProvider
    IStorageProvider <|.. XmlStorageProvider
    IStorageProvider <|.. BinaryStorageProvider
    QuiverSet~TEntity~ ..> QuiverSearchResult~TEntity~ : returns

2. Quick Start

2.1 Define Entity Class

using Vorcyc.Quiver;

public class Document
{
    [QuiverKey]
    public string Id { get; set; } = string.Empty;

    public string Title { get; set; } = string.Empty;

    public string Category { get; set; } = string.Empty;

    [QuiverVector(384, DistanceMetric.Cosine)]
    public float[] Embedding { get; set; } = [];
}

2.2 Define Database Context

public class MyDocumentDb : QuiverDbContext
{
    public QuiverSet<Document> Documents { get; set; } = null!;

    public MyDocumentDb() : base(new QuiverDbOptions
    {
        DatabasePath = "documents.json",
        StorageFormat = StorageFormat.Json,
        DefaultMetric = DistanceMetric.Cosine
    })
    { }
}

2.3 Basic Usage

// Create database, use await using to ensure automatic saving
await using var db = new MyDocumentDb();
await db.LoadAsync(); // Load existing data (silently returns if file doesn't exist)

// Add entity
db.Documents.Add(new Document
{
    Id = "doc-001",
    Title = "Introduction to Vector Databases",
    Category = "Tutorial",
    Embedding = new float[384] // Should be embedding vector output by a model
});

// Search Top-5 most similar documents
float[] queryVector = new float[384]; // Query vector
var results = db.Documents.Search(
    e => e.Embedding,
    queryVector,
    topK: 5
);

foreach (var result in results)
{
    Console.WriteLine($"Document: {result.Entity.Title}, Similarity: {result.Similarity:F4}");
}

// DisposeAsync automatically saves data to disk when scope ends

2.4 WAL Incremental Mode Quick Start

With WAL enabled, daily writes only append changes to the WAL file with O(delta) complexity, orders of magnitude faster than full snapshots:

// WAL mode database context
public class MyWalDb : QuiverDbContext
{
    public QuiverSet<Document> Documents { get; set; } = null!;

    public MyWalDb() : base(new QuiverDbOptions
    {
        DatabasePath = "documents.vdb",
        StorageFormat = StorageFormat.Binary,
        EnableWal = true,              // Enable WAL
        WalCompactionThreshold = 10_000, // Auto-compact when WAL exceeds 10K records
        WalFlushToDisk = true            // fsync to guarantee durability
    })
    { }
}

// Usage
await using var db = new MyWalDb();
await db.LoadAsync(); // Load snapshot + replay WAL incremental changes

db.Documents.Add(new Document
{
    Id = "doc-001",
    Title = "Introduction to Vector Databases",
    Category = "Tutorial",
    Embedding = new float[384]
});

// Only append changes to WAL, O(delta) complexity
await db.SaveChangesAsync();

// Manually compact when needed: create full snapshot + clear WAL
await db.CompactAsync();

// DisposeAsync automatically calls SaveChangesAsync when scope ends

2.5 End-to-End Flow

sequenceDiagram
    participant User as User Code
    participant Ctx as QuiverDbContext
    participant Set as QuiverSet
    participant Idx as IVectorIndex
    participant SP as IStorageProvider
    participant WAL as WriteAheadLog

    User->>Ctx: new MyDb(options)
    activate Ctx
    Ctx->>Ctx: InitializeSets() discover QuiverSet properties via reflection
    Ctx->>Set: Activator.CreateInstance() create QuiverSet
    Set->>Set: Scan entity attributes via reflection [QuiverKey] [QuiverVector] [QuiverIndex]
    Set->>Set: Compile expression tree accessors
    Set->>Idx: CreateIndex() create index for each vector field
    opt EnableWal = true
        Ctx->>WAL: new WriteAheadLog(path + ".wal")
    end
    deactivate Ctx

    User->>Ctx: LoadAsync()
    Ctx->>SP: LoadAsync(filePath, typeMap)
    SP-->>Ctx: Dictionary<typeName, List<object>>
    Ctx->>Set: LoadEntities(entities) invoke via reflection
    Set->>Idx: Add(id, vector) rebuild index one by one
    opt WAL enabled
        Ctx->>WAL: ReadAll(walFilePath)
        WAL-->>Ctx: List<WalEntry>
        Ctx->>Set: ReplayAdd/ReplayRemove/ReplayClear
    end

    User->>Set: Add(entity)
    Set->>Set: Acquire write lock -> AddCore()
    Set->>Set: PrepareVectors() validate dimensions + normalize/copy
    Set->>Idx: Add(id, vector)
    Set->>Set: _changeLog.Add() record change
    Set->>Set: Release write lock

    User->>Set: Search(selector, query, topK)
    Set->>Set: Acquire read lock -> ResolveField()
    Set->>Set: NormalizeIfNeeded() pre-normalize query vector
    Set->>Idx: Search(query, topK)
    Idx-->>Set: List<(id, similarity)>
    Set->>Set: MapResults() -> reverse lookup entity by id
    Set-->>User: List<QuiverSearchResult>

    alt WAL mode
        User->>Ctx: SaveChangesAsync()
        Ctx->>Set: DrainChanges() get and clear change log
        Set-->>Ctx: List<(operation, key, entity)>
        Ctx->>WAL: Append(walEntries, flushToDisk)
    else Full snapshot mode
        User->>Ctx: SaveAsync()
        Ctx->>SP: SaveAsync(filePath, setsData)
    end

    User->>Ctx: DisposeAsync()
    alt WAL mode
        Ctx->>Ctx: SaveChangesAsync()
    else Full snapshot mode
        Ctx->>Ctx: SaveAsync()
    end
    Ctx->>WAL: Dispose()
    Ctx->>Set: Dispose() release reader-writer lock

3. Core Concepts

3.1 Entity Definition and Attribute Annotations

Entity classes declare vector database metadata through Attributes. QuiverSet<TEntity> scans these attributes via reflection during construction to automatically discover and register fields.

[QuiverKey] — Primary Key Annotation

Each entity must have exactly one [QuiverKey] property. Supports any type (string, int, Guid, etc.). At runtime, the primary key value is read through a compiled expression tree accessor, internally stored as boxed object in Dictionary<object, int> for O(1) lookup and deduplication.

[QuiverKey]
public string PersonId { get; set; } = string.Empty;

Constraints:

  • Primary key value cannot be null (validated on write)
  • Primary key must be unique within the collection (Add validates, Upsert handles automatically)
  • Missing [QuiverKey] attribute causes QuiverSet construction to throw InvalidOperationException
[QuiverVector(dimensions, metric)] — Vector Field Annotation

Marks a property as a vector feature field. Property type must be float[]. An entity can have multiple vector fields annotated (multimodal scenarios).

// 128-dimensional vector, using cosine similarity (default)
[QuiverVector(128)]
public float[] Embedding { get; set; } = [];

// 384-dimensional vector, explicitly specifying Euclidean distance
[QuiverVector(384, DistanceMetric.Euclidean)]
public float[] TextFeature { get; set; } = [];

Parameter Description:

Parameter Type Default Description
dimensions int — (required) Vector dimensions, validated at runtime vector.Length == dimensions
metric DistanceMetric Cosine Distance metric type

Common Dimensions: 128 (lightweight models), 384 (MiniLM), 768 (BERT-base), 1024 (BERT-large), 1536 (OpenAI Ada-002), 3072 (OpenAI text-embedding-3-large).

Runtime Behavior:

  • On write (AddCore / PrepareVectors): validates dimension match, throws ArgumentException on mismatch
  • Cosine metric: performs L2 normalization before storing in index (NormalizeToArray)
  • Non-Cosine metrics: performs defensive copy (vector.Clone()) to prevent external array modifications from corrupting the index
[QuiverIndex(indexType)] — Index Configuration (Optional)

Used on the same property as [QuiverVector] to specify the indexing strategy for that vector field. Defaults to Flat brute-force search when not annotated.

// HNSW index: preferred for approximate search of high-dimensional vectors
[QuiverVector(768)]
[QuiverIndex(VectorIndexType.HNSW, M = 32, EfConstruction = 300, EfSearch = 100)]
public float[] Embedding { get; set; } = [];

// IVF index: large dataset scenarios
[QuiverVector(128)]
[QuiverIndex(VectorIndexType.IVF, NumClusters = 100, NumProbes = 15)]
public float[] Feature { get; set; } = [];

// KDTree index: only suitable for low dimensions < 20
[QuiverVector(16)]
[QuiverIndex(VectorIndexType.KDTree)]
public float[] LowDimFeature { get; set; } = [];

QuiverIndexAttribute Complete Parameters:

Parameter Applicable Index Default Description
IndexType All Flat Index type enum
M HNSW 16 Max neighbor connections per layer, layer 0 automatically uses M * 2
EfConstruction HNSW 200 Candidate set size during construction
EfSearch HNSW 50 Candidate set size during search, must be >= topK
NumClusters IVF 0 (auto sqrt(n)) K-Means cluster count
NumProbes IVF 10 Number of clusters to probe during search

3.2 Database Context QuiverDbContext

QuiverDbContext is the core entry point for the vector database, designed to mimic EF Core's DbContext.

Auto-Discovery Mechanism
flowchart TD
    A["new MyDb(options)"] --> B["InitializeSets()"]
    B --> C["GetType().GetProperties()<br/>filter QuiverSet&lt;T&gt; type properties"]
    C --> D{"Iterate each property"}
    D --> E["Extract generic argument T<br/>typeof(QuiverSet&lt;&gt;).GetGenericArguments()"]
    E --> F["Activator.CreateInstance()<br/>invoke internal constructor"]
    F --> G["Register to _sets: Type -> QuiverSet<br/>Register to _typeMap: FullName -> Type"]
    G --> H["PropertyInfo.SetValue()<br/>inject into subclass property"]
    H --> D

Key Behaviors:

  • Auto-discovery: During construction, scans all QuiverSet<T> public properties of the subclass via reflection, automatically creates instances and injects them (no manual new required).
  • Persistence: Delegates all collection data serialization/deserialization to IStorageProvider via SaveAsync() / LoadAsync().
  • Lifecycle: Implements IDisposable and IAsyncDisposable. Synchronous Dispose only releases resources; asynchronous DisposeAsync automatically saves before releasing.
public class MyDb : QuiverDbContext
{
    // Declare to register, no manual initialization needed.
    // Property values are automatically injected by the framework after construction.
    public QuiverSet<FaceFeature> Faces { get; set; } = null!;
    public QuiverSet<Document> Documents { get; set; } = null!;

    public MyDb(string path, StorageFormat format)
        : base(new QuiverDbOptions
        {
            DatabasePath = path,
            StorageFormat = format
        })
    { }
}

Generic Method Access:

// The following two approaches are equivalent:
var set1 = db.Faces;              // Direct property access
var set2 = db.Set<FaceFeature>(); // Generic method access (supports dynamic type lookup)
// Set<T>() internally looks up _sets dictionary, throws InvalidOperationException if not found

3.3 Vector Collection QuiverSet<TEntity>

QuiverSet<TEntity> is a vector collection for a single entity type, providing complete CRUD and search capabilities.

Internal Data Structures
graph LR
    subgraph QuiverSet Internals
        E["_entities<br/>Dictionary&lt;int, TEntity&gt;<br/>Internal ID -> Entity"]
        K["_keyToId<br/>Dictionary&lt;object, int&gt;<br/>User Key -> Internal ID"]
        VF["_vectorFields<br/>FrozenDictionary&lt;string, QuiverFieldInfo&gt;<br/>Field Name -> Metadata"]
        VG["_vectorGetters<br/>FrozenDictionary&lt;string, Func&gt;<br/>Field Name -> Compiled Property Accessor"]
        IDX["_indices<br/>FrozenDictionary&lt;string, IVectorIndex&gt;<br/>Field Name -> Index Instance"]
        LK["_lock<br/>ReaderWriterLockSlim<br/>Reader-Writer Lock"]
        NID["_nextId: int<br/>Auto-increment ID Counter"]
    end

    subgraph External Access
        ADD["Add / Upsert / Remove<br/>-> Write Lock"]
        SEARCH["Search / Find / Count<br/>-> Read Lock"]
    end

    ADD --> LK
    SEARCH --> LK
    LK --> E
    LK --> K
    LK --> IDX
Initialization Flow During Construction
flowchart TD
    A["QuiverSet&lt;TEntity&gt;(defaultMetric)"]
    A --> B["Discover [QuiverKey] property via reflection"]
    B --> C{"Key found?"}
    C -- "No" --> ERR1["throw InvalidOperationException"]
    C -- "Yes" --> D["CompileGetter&lt;object?&gt;(keyProp)<br/>Compile expression tree key accessor"]
    D --> E["Discover all [QuiverVector] properties via reflection"]
    E --> F{"Iterate each vector property"}
    F --> G["Verify property type == float[]"]
    G --> H["Read QuiverVectorAttribute: dimensions, metric"]
    H --> I["Read QuiverIndexAttribute (optional)"]
    I --> J["Determine preNormalize = metric == Cosine"]
    J --> K["CompileGetter&lt;float[]&gt;(prop)"]
    K --> L["Determine SimilarityFunc:<br/>preNormalize? Dot : CreateSimilarityFunc(metric)"]
    L --> M["CreateIndex(indexAttr, simFunc)"]
    M --> F
    F -- "All processed" --> N{"Vector field count == 0?"}
    N -- "Yes" --> ERR2["throw InvalidOperationException"]
    N -- "No" --> O["ToFrozenDictionary() freeze all dictionaries"]
    O --> P["Single field? Cache _defaultField"]

Performance Optimization Highlights:

Optimization Technique Effect
Property access Expression tree compiled Func<TEntity, T> Nanosecond-level, ~100x faster than reflection PropertyInfo.GetValue
Metadata lookup FrozenDictionary Zero heap allocation, optimized hash strategy for small key sets
Cosine computation Pre-normalization + TensorPrimitives.Dot Avoids recomputing norms on every search
L2 normalization TensorPrimitives.Norm + Divide SIMD accelerated
Similarity function Direct binding to TensorPrimitives method groups Zero lambda overhead

4. Distance Metrics

The DistanceMetric enum defines three vector similarity computation methods:

Metric Type Mathematical Formula Range Use Case Pre-normalization
Cosine $\cos(\theta) = \frac{a \cdot b}{|a| \times |b|}$ [-1, 1] Text embeddings, semantic search ✅ Automatically enabled
Euclidean $\frac{1}{1 + |a - b|_2}$ (0, 1] Spatial coordinates, physical distances
DotProduct $a \cdot b = \sum_i a_i b_i$ $(-\infty, +\infty)$ Pre-normalized vectors, MIPS

4.1 Cosine Pre-normalization Optimization Principle

flowchart LR
    subgraph On Write
        V["Original vector v"] --> NORM["L2 normalize<br/>v-hat = v / ||v||"]
        NORM --> IDX["Store in index: v-hat"]
    end

    subgraph On Search
        Q["Query vector q"] --> QNORM["L2 normalize<br/>q-hat = q / ||q||"]
        QNORM --> DOT["Dot(q-hat, v-hat)"]
        DOT --> RES["= CosineSimilarity(q, v)"]
    end

    style NORM fill:#d4edda
    style QNORM fill:#d4edda
    style DOT fill:#cce5ff

Why is Dot faster than Cosine?

  • CosineSimilarity(a, b) = one dot product + two norm computations = 3 vector traversals
  • After pre-normalization, Dot(a-hat, b-hat) = one dot product = 1 vector traversal
  • Normalization overhead is incurred only once during write/query, while search only performs dot products against N candidates

SIMD-Accelerated Implementation:

// SIMD-optimized implementation using TensorPrimitives
private static void NormalizeVector(ReadOnlySpan<float> source, Span<float> destination)
{
    var norm = TensorPrimitives.Norm(source);    // SIMD-accelerated L2 norm
    if (norm > 0f)
        TensorPrimitives.Divide(source, norm, destination); // SIMD-accelerated vector division
    else
        destination.Clear(); // Zero vector safety, avoid NaN
}

4.2 Metric Selection Recommendations

// Cosine — most common, text/semantic search
[QuiverVector(384, DistanceMetric.Cosine)]
public float[] TextEmbedding { get; set; } = [];

// Euclidean — scenarios caring about absolute distance (geographic coordinates, physical space)
[QuiverVector(3, DistanceMetric.Euclidean)]
public float[] Position { get; set; } = [];

// DotProduct — vectors already pre-normalized or needing Maximum Inner Product Search (MIPS)
[QuiverVector(128, DistanceMetric.DotProduct)]
public float[] Feature { get; set; } = [];

4.3 Similarity Function Mapping

The framework internally creates different SimilarityFunc delegates (ReadOnlySpan<float>, ReadOnlySpan<float> -> float) based on metric type, which can directly bind to TensorPrimitives static method groups:

Metric PreNormalize Bound Function
Cosine true TensorPrimitives.Dot
DotProduct false TensorPrimitives.Dot
Euclidean false (a, b) => 1f / (1f + TensorPrimitives.Distance(a, b))
Cosine (fallback) false TensorPrimitives.CosineSimilarity

5. Index Types

Traverses all vectors computing similarity, results are 100% exact, and is the default index type.

Property Value
Implementation FlatIndex
Time Complexity O(n * d)
Space Complexity O(n * d)
Accuracy 100%
Suitable Data Size < 10,000
Parallel Threshold Automatically enables Parallel.ForEach when > 10,000 entries
flowchart TD
    Q["Query vector q"] --> CHECK{"Vector count > 10,000?"}
    CHECK -- "No" --> SEQ["Sequential search<br/>Traverse all vectors computing sim(q, v)"]
    CHECK -- "Yes" --> PAR["Parallel search<br/>Parallel.ForEach + ConcurrentBag"]
    SEQ --> SORT["OrderByDescending(sim)<br/>.Take(topK)"]
    PAR --> SORT
    SORT --> RES["Top-K results"]

Search Strategy Switching:

// Small data (<=10K): sequential traversal is faster, avoids thread scheduling overhead
private List<(int, float)> SequentialSearchCore(float[] query, int topK)
{
    var results = new List<(int Id, float Sim)>(_vectors.Count);
    foreach (var (id, vector) in _vectors)
        results.Add((id, similarityFunc(query, vector)));
    return results.OrderByDescending(r => r.Sim).Take(topK).ToList();
}

// Large data (>10K): Parallel.ForEach for multi-threaded parallel computation
private List<(int, float)> ParallelSearchCore(float[] query, int topK)
{
    var results = new ConcurrentBag<(int Id, float Similarity)>();
    Parallel.ForEach(_vectors, kvp =>
        results.Add((kvp.Key, similarityFunc(query, kvp.Value))));
    return results.OrderByDescending(r => r.Similarity).Take(topK).ToList();
}
// Usage: default index, no [QuiverIndex] annotation needed
[QuiverVector(128)]
public float[] Embedding { get; set; } = [];

5.2 HNSW (Hierarchical Navigable Small World Graph)

Multi-layer proximity graph structure, the universal preferred choice for approximate search. Similar to "highway → regional road → local road" layered navigation.

Property Value
Implementation HnswIndex
Search Complexity O(log n)
Insert Complexity O(log n) * efConstruction
Space Complexity O(n * M)
Suitable Data Size 10K ~ 10M
Deletion Strategy Lazy deletion (residual references auto-cleaned)
HNSW Layered Structure
graph TD
    subgraph "Layer 2 (Sparse, Highway)"
        L2A((A)) --- L2D((D))
    end

    subgraph "Layer 1 (Medium Density, Regional Road)"
        L1A((A)) --- L1B((B))
        L1B --- L1D((D))
        L1A --- L1D
    end

    subgraph "Layer 0 (Densest, Local Road - All Nodes)"
        L0A((A)) --- L0B((B))
        L0B --- L0C((C))
        L0C --- L0D((D))
        L0D --- L0E((E))
        L0A --- L0C
        L0B --- L0E
        L0A --- L0E
    end

    L2A -.->|Same node| L1A
    L2D -.->|Same node| L1D
    L1A -.->|Same node| L0A
    L1B -.->|Same node| L0B
    L1D -.->|Same node| L0D
Insertion Algorithm Flow
flowchart TD
    START["Add(id, vector)"] --> RL["RandomLevel()<br/>Exponential decay random level l"]
    RL --> CREATE["Create HnswNode(id, vector, l)"]
    CREATE --> EMPTY{"Graph empty?"}
    EMPTY -- "Yes" --> EP["Set as entry point<br/>_entryPointId = id"]
    EMPTY -- "No" --> GREEDY["Start from entry point<br/>Greedy search from maxLevel to l+1<br/>(ef=1)<br/>Quickly locate target region"]
    GREEDY --> LAYER["From min(l, maxLevel) to layer 0<br/>Build bidirectional connections layer by layer"]
    LAYER --> SRCH["SearchLayer(ef=efConstruction)<br/>Search best neighbors at current layer"]
    SRCH --> SELECT["Select Top-mMax neighbors<br/>Layer 0: mMax = M×2<br/>Other layers: mMax = M"]
    SELECT --> CONNECT["Establish bidirectional connections<br/>node ↔ neighbor"]
    CONNECT --> PRUNE{"Neighbor connection count > mMax?"}
    PRUNE -- "Yes" --> TRIM["PruneConnections()<br/>Keep the mMax with highest similarity"]
    PRUNE -- "No" --> NEXT{"More layers remaining?"}
    TRIM --> NEXT
    NEXT -- "Yes" --> LAYER
    NEXT -- "No" --> UPEP{"l > maxLevel?"}
    UPEP -- "Yes" --> NEWEP["Update entry point to new node"]
    UPEP -- "No" --> DONE["Done"]
    NEWEP --> DONE
Search Algorithm Flow
flowchart TD
    START["Search(query, topK)"] --> GREEDY["Start from entry point<br/>Greedy search from maxLevel to layer 1<br/>(ef=1)<br/>Quickly approach target region"]
    GREEDY --> FINE["At layer 0<br/>ef = max(efSearch, topK)<br/>Fine-grained search"]
    FINE --> TOPK["Take top topK by similarity"]
    TOPK --> RES["Return (id, similarity) list"]

Parameter Tuning Guide:

Parameter Default Recommended Range Increase Effect Decrease Effect
M 16 12 ~ 48 Higher recall, more memory, longer build time Less memory, lower recall
EfConstruction 200 100 ~ 500 Better graph quality, slower insertion Faster insertion, lower graph quality
EfSearch 50 50 ~ 500 Higher recall, slower search Faster search, lower recall

EfSearch can be dynamically adjusted at runtime without rebuilding the index: hnswIndex.EfSearch = 200;

[QuiverVector(768, DistanceMetric.Cosine)]
[QuiverIndex(VectorIndexType.HNSW, M = 32, EfConstruction = 300, EfSearch = 100)]
public float[] Embedding { get; set; } = [];

5.3 IVF (Inverted File Index)

Partitions vector space based on K-Means clustering, only probing the nearest clusters during search.

Property Value
Implementation IvfIndex
Build Complexity O(n * k * d * iter)
Search Complexity O(k * d + nProbe * n/k * d)
Suitable Data Size 100K+
Build Method Lazy (triggered on first search)
Auto-Rebuild Flagged for rebuild after 50% data growth
Centroid Initialization K-Means++
Iteration Algorithm Lloyd (max 50 rounds)
SIMD Acceleration TensorPrimitives.Add / TensorPrimitives.Divide
IVF Search Flow
flowchart TD
    Q["Query vector q"] --> ENSURE["EnsureBuilt()<br/>Build on first search or data growth"]
    ENSURE --> BUILD{"Need to build?"}
    BUILD -- "Yes" --> KMEANS["K-Means clustering<br/>1. K-Means++ initialize centroids<br/>2. Lloyd iteration (max 50 rounds)<br/>3. Build inverted lists"]
    BUILD -- "No" --> CENT["Compute similarity of q against all K centroids"]
    KMEANS --> CENT
    CENT --> PROBE["Select the nProbe most similar clusters"]
    PROBE --> SCAN["Traverse inverted lists of selected clusters<br/>Compute exact similarity"]
    SCAN --> TOPK["OrderByDescending(sim)<br/>.Take(topK)"]
    TOPK --> RES["Return Top-K results"]
K-Means Clustering Build
flowchart TD
    START["Build()"] --> K["Determine cluster count K<br/>Explicitly specified or auto √n"]
    K --> INIT["K-Means++ initialize centroids<br/>Probability proportional to distance²"]
    INIT --> ITER["Lloyd iteration loop"]
    ITER --> ASSIGN["Assignment phase: each vector → nearest centroid"]
    ASSIGN --> CHK{"Assignments changed?"}
    CHK -- "No (converged)" --> IL["Build inverted lists"]
    CHK -- "Yes" --> UPDATE["Update phase:<br/>Centroid = mean of member vectors<br/>TensorPrimitives.Add (SIMD accumulation)<br/>TensorPrimitives.Divide (SIMD division)"]
    UPDATE --> MAX{"Reached 50 rounds?"}
    MAX -- "No" --> ASSIGN
    MAX -- "Yes" --> IL
    IL --> DONE["Record _lastBuildCount<br/>_isBuilt = true"]

Parameter Tuning:

Parameter Default Recommended Range Description
NumClusters 0 (auto sqrt(n)) sqrt(n) ~ 4*sqrt(n) Cluster count. Larger → smaller clusters → faster search but more centroid comparisons
NumProbes 10 1 ~ 20 Probe count. When = total clusters, degrades to brute-force search

Threshold search automatically expands probe range to nProbe * 2, reducing missed results from cluster partitioning.

[QuiverVector(128, DistanceMetric.Cosine)]
[QuiverIndex(VectorIndexType.IVF, NumClusters = 100, NumProbes = 15)]
public float[] Feature { get; set; } = [];

5.4 KDTree

Spatial binary partition tree for exact search. Alternately splits space along dimensions, using pruning to skip impossible subtrees.

Property Value
Implementation KDTreeIndex
Search Complexity O(log n) (low dim), O(n) (high dim)
Accuracy 100%
Suitable Dimensions < 20
Build Method Lazy (triggered on first search, full rebuild)
Rebuild Trigger Flagged for rebuild after every Add/Remove
KD-Tree Structure Diagram
graph TD
    ROOT["Root Node<br/>SplitDim=X, SplitVal=5"]
    L1["Left Subtree<br/>SplitDim=Y, SplitVal=3<br/>(X ≤ 5)"]
    R1["Right Subtree<br/>SplitDim=Y, SplitVal=7<br/>(X > 5)"]
    LL["Left-Left<br/>SplitDim=Z, SplitVal=1"]
    LR["Left-Right<br/>SplitDim=Z, SplitVal=4"]
    RL["Right-Left<br/>SplitDim=Z, SplitVal=6"]
    RR["Right-Right<br/>SplitDim=Z, SplitVal=9"]

    ROOT --> L1
    ROOT --> R1
    L1 --> LL
    L1 --> LR
    R1 --> RL
    R1 --> RR
Search Pruning Strategy
flowchart TD
    START["SearchNode(node, query, topK)"] --> CALC["Compute sim(query, node.Vector)"]
    CALC --> HEAP["Min-heap update<br/>heap &lt; topK → push<br/>heap full and sim > heap top → replace"]
    HEAP --> DIFF["diff = query[splitDim] - node.splitValue"]
    DIFF --> FIRST["Search subtree on query point's side<br/>(diff ≤ 0 ? Left : Right)"]
    FIRST --> CHECK{"heap &lt; topK<br/>OR |diff| &lt; search radius?"}
    CHECK -- "Yes" --> SECOND["Search the other subtree<br/>(may contain better results)"]
    CHECK -- "No (prune)" --> SKIP["Skip the other side<br/>Cannot have better results ✂️"]
    SECOND --> DONE["Return"]
    SKIP --> DONE

⚠️ Curse of Dimensionality: When dimensions exceed ~20, nearly every subtree must be visited (pruning fails), degrading to O(n). Use HNSW for high-dimensional scenarios.
⚠️ Threshold search degrades to brute-force traversal (KD-Tree pruning cannot be directly applied to threshold search).

[QuiverVector(16, DistanceMetric.Euclidean)]
[QuiverIndex(VectorIndexType.KDTree)]
public float[] LowDimFeature { get; set; } = [];

5.5 Index Selection Decision Guide

flowchart TD
    START["Choose Index Type"] --> Q1{"Data size < 10K?"}
    Q1 -- "Yes" --> FLAT["Flat<br/>Brute-force, 100% exact<br/>Simple and reliable"]
    Q1 -- "No" --> Q2{"Dimensions < 20?"}
    Q2 -- "Yes" --> KDT["KDTree<br/>Exact search, O(log n)<br/>Best for low dimensions"]
    Q2 -- "No" --> Q3{"Data size > 100K<br/>and batch queries needed?"}
    Q3 -- "Yes" --> IVF["IVF<br/>Cluster search, high throughput<br/>Adjustable accuracy"]
    Q3 -- "No" --> HNSW["HNSW<br/>Universal preferred<br/>O(log n), high recall"]

    style FLAT fill:#d4edda
    style HNSW fill:#cce5ff
    style IVF fill:#fff3cd
    style KDT fill:#f8d7da

Comprehensive Comparison Table:

Dimension Flat HNSW IVF KDTree
Search Speed O(n*d) O(log n) O(n/k*d) O(log n) ~ O(n)
Accuracy 100% ~95-99%+ ~90-99% 100%
Insert Speed O(1) O(log n) O(1)* O(1)**
Memory n*d n*(d+M) nd + kd n*d + tree structure
Suitable Data Size <10K 10K~10M 100K+ <10K (low dim)
Suitable Dimensions Any Any Any <20
Build Method Immediate Immediate Lazy Lazy
Parallelization Yes >10K No No No

* IVF insertion is immediate, but index needs rebuilding ** KDTree insertion is immediate, but tree needs rebuilding


6. CRUD Operations

6.1 Adding Entities

// Add a single entity
db.Documents.Add(new Document
{
    Id = "doc-001",
    Title = "Getting Started Guide",
    Embedding = new float[384]
});

// Batch add (atomic semantics: if any validation fails, all are rolled back)
var batch = new List<Document>
{
    new() { Id = "doc-002", Title = "Advanced Tutorial", Embedding = new float[384] },
    new() { Id = "doc-003", Title = "Best Practices", Embedding = new float[384] },
};
db.Documents.AddRange(batch);

// Async batch add (offloads CPU-intensive computation to thread pool)
await db.Documents.AddRangeAsync(batch, cancellationToken);
AddRange Two-Phase Commit
flowchart TD
    START["AddRange(entities)"] --> LOCK["Acquire write lock"]
    LOCK --> P1["Phase 1: Pre-validation<br/>(no state modified, exception-safe)"]

    P1 --> V1["Validate each entity's key is not null"]
    V1 --> V2["Validate key does not duplicate existing data"]
    V2 --> V3["Validate keys are unique within the batch (HashSet)"]
    V3 --> V4["PrepareVectors() validate vector dimensions"]
    V4 --> CHK{"All passed?"}
    CHK -- "Any failure" --> ROLLBACK["Throw exception<br/>No data modified ✅"]
    CHK -- "All passed" --> P2["Phase 2: Atomic commit<br/>(no more exceptions after this point)"]

    P2 --> W1["Assign internal ID (_nextId++)"]
    W1 --> W2["Write to _entities + _keyToId"]
    W2 --> W3["Write to all _indices"]
    W3 --> UNLOCK["Release write lock"]

    style ROLLBACK fill:#f8d7da
    style P2 fill:#d4edda

6.2 Insert or Update (Upsert)

Completed within a single write lock, more efficient and atomic than external Remove + Add.

db.Documents.Upsert(new Document
{
    Id = "doc-001",
    Title = "Updated Getting Started Guide",
    Embedding = new float[384]
});
// Key exists -> RemoveCore() + AddCore()
// Key doesn't exist -> AddCore() directly

6.3 Removing Entities

// Remove by entity (matched by primary key, not by reference comparison)
bool removed = db.Documents.Remove(entity);

// Remove by primary key directly (no entity reference needed)
bool removed = db.Documents.RemoveByKey("doc-001");

Internal removal flow (RemoveCore):

  1. Reverse lookup internal ID via _keyToId
  2. Remove entity from _entities dictionary
  3. Remove key mapping from _keyToId dictionary
  4. Remove vector from all _indices (index implementations handle residual references internally)

6.4 Finding Entities

// Find by primary key, O(1) complexity (dual dictionary: key -> internal ID -> entity)
Document? doc = db.Documents.Find("doc-001");

6.5 Clearing the Collection

db.Documents.Clear();
// Clears _entities + _keyToId + all indices
// Resets _nextId = 0

6.6 Getting Information

int count = db.Documents.Count; // Thread-safe (read lock)

// View vector field metadata
foreach (var (name, dimensions) in db.Documents.VectorFields)
    Console.WriteLine($"Field: {name}, Dimensions: {dimensions}");

Returns the top K entities with the highest similarity, sorted in descending order of similarity.

float[] queryVector = GetEmbedding("search keywords");

var results = db.Documents.Search(
    vectorSelector: e => e.Embedding,  // Expression tree selector
    queryVector: queryVector,
    topK: 10
);

foreach (var result in results)
{
    Console.WriteLine($"ID: {result.Entity.Id}");
    Console.WriteLine($"Title: {result.Entity.Title}");
    Console.WriteLine($"Similarity: {result.Similarity:F4}");
}

Internal flow:

flowchart TD
    A["Search(selector, query, topK)"] --> B["Acquire read lock"]
    B --> C["ResolveField(selector)<br/>Parse expression tree to extract property name<br/>Look up corresponding QuiverFieldInfo"]
    C --> D["Validate query.Length == dimensions"]
    D --> E["NormalizeIfNeeded(field, query)<br/>Cosine -> normalized copy<br/>Other -> original array (zero-copy)"]
    E --> F["_indices[name].Search(query, topK)"]
    F --> G["MapResults():<br/>Iterate (id, sim) list<br/>Reverse lookup entity via _entities[id]<br/>Skip deleted invalid IDs"]
    G --> H["Release read lock"]
    H --> I["Return List&lt;QuiverSearchResult&lt;T&gt;&gt;"]

Returns all entities with similarity not less than the specified threshold. The number of results is variable.

var results = db.Documents.SearchByThreshold(
    vectorSelector: e => e.Embedding,
    queryVector: queryVector,
    threshold: 0.85f
);

Console.WriteLine($"Found {results.Count} results with similarity >= 0.85");

Supports both expression filtering and delegate filtering.

// Approach 1: Expression filtering
// ⚠️ Compiles expression tree on each call, overhead ~50μs
var results = db.Documents.Search(
    e => e.Embedding,
    queryVector,
    topK: 10,
    filter: e => e.Title.Contains("tutorial")
);

// Approach 2: Delegate filtering (recommended for high-frequency calls)
// Cache the compiled delegate externally to avoid repeated compilation
Func<Document, bool> myFilter = e => e.Title.Contains("tutorial");
var results = db.Documents.Search(
    e => e.Embedding,
    queryVector,
    topK: 10,
    filter: myFilter,
    overFetchMultiplier: 4
);
Over-Fetch Strategy
flowchart LR
    IDX["Index retrieval<br/>topK × overFetchMultiplier<br/>candidates"] --> FILTER["Filter one by one<br/>filter(entity)"]
    FILTER --> COLLECT{"Collected topK?"}
    COLLECT -- "No" --> FILTER
    COLLECT -- "Yes (early termination)" --> RES["Return results"]
overFetchMultiplier Description
4 (default) Suitable for filter rates < 75%
8~16 High filter rate scenarios (e.g., filtering by category)
Larger values Extreme filter rates (e.g., searching for specific tags only)

Searches for the single most similar entity. Internal optimization path: avoids intermediate List allocation, MapTop1 takes only the first valid result.

var top1 = db.Documents.SearchTop1(
    e => e.Embedding,
    queryVector
);

if (top1 != null)
    Console.WriteLine($"Most similar: {top1.Entity.Title} ({top1.Similarity:F4})");
else
    Console.WriteLine("No similar document found");

All search methods provide Async suffix overloads that offload CPU-intensive computation to the thread pool via Task.Run, avoiding blocking UI threads or ASP.NET request threads.

// Async Top-K
var results = await db.Documents.SearchAsync(
    e => e.Embedding, queryVector, topK: 10, cancellationToken);

// Async with filter
var results = await db.Documents.SearchAsync(
    e => e.Embedding, queryVector, topK: 10,
    filter: e => e.Category == "Tutorial",
    overFetchMultiplier: 4, cancellationToken);

// Async threshold search
var results = await db.Documents.SearchByThresholdAsync(
    e => e.Embedding, queryVector, threshold: 0.8f, cancellationToken);

// Async Top-1
var top1 = await db.Documents.SearchTop1Async(
    e => e.Embedding, queryVector, cancellationToken);

7.6 Default Field Convenience Methods

When an entity has only one [QuiverVector] field, the vectorSelector parameter can be omitted. The framework caches _defaultField to avoid calling _vectorFields.First() every time.

// Single vector field entity — omit vectorSelector
var results = db.Documents.Search(queryVector, topK: 5);
var top1 = db.Documents.SearchTop1(queryVector);

// Async versions
var results = await db.Documents.SearchAsync(queryVector, topK: 5);
var top1 = await db.Documents.SearchTop1Async(queryVector);

⚠️ Calling default methods on multi-vector-field entities throws InvalidOperationException("Entity has N vector fields. Use the overload with a vectorSelector expression.")


8. Persistent Storage

8.1 Save and Load

// Save to configured DatabasePath (full snapshot)
await db.SaveAsync();

// Save to a specified path (overrides DatabasePath)
await db.SaveAsync(@"C:\backup\mydata.json");

// WAL incremental save — only append changes to WAL file, O(Δ) complexity
await db.SaveChangesAsync();

// Manual compaction: create full snapshot + clear WAL
await db.CompactAsync();

// Load (silently returns if file doesn't exist, no exception — suitable for first startup)
// Automatically replays incremental changes when WAL is enabled
await db.LoadAsync();

// Load from a specified path
await db.LoadAsync(@"C:\backup\mydata.json");
Persistence Internal Flow
sequenceDiagram
    participant Ctx as QuiverDbContext
    participant Set as QuiverSet
    participant SP as IStorageProvider
    participant WAL as WriteAheadLog

    Note over Ctx: SaveAsync() full snapshot
    loop Each QuiverSet
        Ctx->>Set: GetAll() invoke via reflection (internal)
        Set-->>Ctx: IEnumerable<TEntity> snapshot
    end
    Ctx->>SP: SaveAsync(tempPath, setsData)
    SP->>SP: Serialize and write to temp file
    Ctx->>Ctx: File.Move(tempPath, filePath) atomic replace
    opt WAL enabled
        Ctx->>WAL: Truncate() clear WAL
        Ctx->>Set: DrainChanges() clear change log
    end

    Note over Ctx: SaveChangesAsync() WAL incremental
    loop Each QuiverSet
        Ctx->>Set: DrainChanges()
        Set-->>Ctx: List<(operation, key, entity)>
    end
    Ctx->>Ctx: Serialize to WalEntry list
    Ctx->>WAL: Append(walEntries, flushToDisk)
    alt RecordCount >= WalCompactionThreshold
        Ctx->>Ctx: CompactAsync() auto compaction
    end

    Note over Ctx: LoadAsync() two-phase load
    Ctx->>SP: LoadAsync(path, typeMap)
    SP-->>Ctx: Dictionary<typeName, List<object>>
    loop Each registered type
        Ctx->>Set: LoadEntities(entities) invoke via reflection (internal)
        Set->>Set: AddCore() × N rebuild indices
    end
    opt WAL enabled
        Ctx->>WAL: ReadAll(walFilePath)
        WAL-->>Ctx: List<WalEntry>
        loop Each WAL record
            Ctx->>Set: ReplayAdd/ReplayRemove/ReplayClear
        end
    end

8.2 Storage Format Comparison

Format Implementation Readability File Size Read/Write Speed Use Case
Json JsonStorageProvider ✅ Excellent Largest Average Development & debugging
Xml XmlStorageProvider ✅ Good Large Average Compatibility requirements
Binary BinaryStorageProvider ❌ Not readable Smallest Fastest Production environments

8.3 JSON Format Details

Uses System.Text.Json for serialization. Output structure:

{
  "MyNamespace.FaceFeature": [
    { "personId": "P001", "name": "Alice", "embedding": [0.1, 0.2, ...] },
    { "personId": "P002", "name": "Bob", "embedding": [0.3, 0.4, ...] }
  ]
}
  • Supports customizing indentation and naming policy via QuiverDbOptions.JsonOptions
  • Defaults to WriteIndented = true + CamelCase
  • Uses JsonDocument DOM parsing during loading, deserializing element by element
  • Unrecognized type names are automatically skipped (forward compatible)

8.4 XML Format Details

Uses System.Xml.Linq (XDocument). Output structure:

<?xml version="1.0" encoding="utf-8"?>
<QuiverDb version="1">
  <Set type="FaceFeature" count="2">
    <Entity>
      <PersonId>P001</PersonId>
      <Name>Alice</Name>
      <Embedding>Base64EncodedBytes...</Embedding>
    </Entity>
  </Set>
</QuiverDb>
  • Vector data uses Base64 encoding (MemoryMarshal.AsBytesConvert.ToBase64String), compact with no precision loss
  • DateTime uses ISO 8601 round-trip format ("O")
  • Numeric values use CultureInfo.InvariantCulture, ensuring cross-region consistency

8.5 Binary Format Details

Custom compact binary protocol with optimal performance:

┌─ File Header ─────────────────────────────────────────────
│  Magic: "QDB\x01" (4B)              ← File identifier + version
│  SetCount (int32)                    ← Number of vector collections
├─ Set × SetCount ──────────────────────────────────────────
│  TypeName (string)                   ← BinaryWriter length-prefixed
│  PropCount (int32)                   ← Number of property descriptors
│  ┌─ PropDescriptor × PropCount ───────────────────────────
│  │  PropName (string)
│  │  TypeCode (byte)                  ← 0=string 1=int32 ... 9=float[]
│  ├─ Entity × EntityCount ─────────────────────────────────
│  │  [null flag 1B] + [field value]   ← Written field by field in descriptor order
│  │  float[] → [len int32][raw bytes] ← MemoryMarshal.AsBytes zero-copy
└───────────────────────────────────────────────────────────

Supported Property Type Codes:

TypeCode CLR Type Storage Method
0 string BinaryWriter.Write (length-prefixed)
1 int 4 bytes
2 long 8 bytes
3 float 4 bytes
4 double 8 bytes
5 bool 1 byte
6 DateTime ToBinary() → 8 bytes
7 Guid 16 bytes
8 decimal 16 bytes
9 float[] [length int32] + [raw bytes zero-copy]
10 string[] [length int32] + [element-by-element strings]
11 byte 1 byte
12 short 2 bytes
13 Half 2 bytes (half-precision float, common in ML/AI scenarios)
14 DateTimeOffset [Ticks int64] + [OffsetMinutes int16] = 10 bytes
15 TimeSpan Ticks → 8 bytes
16 byte[] [length int32] + [raw bytes]
17 double[] [length int32] + [raw bytes zero-copy]

8.6 WAL Incremental Persistence

WAL (Write-Ahead Log) is an incremental persistence mechanism that records write operations by appending them to a log file, avoiding full serialization on every save and significantly reducing persistence overhead.

Two Persistence Modes Compared
Dimension Full Snapshot Mode (SaveAsync) WAL Incremental Mode (SaveChangesAsync)
Persisted Content Complete snapshot of all entities Only changes since last save (Δ)
Time Complexity O(N) (N = total entity count) O(Δ) (Δ = number of changes)
File Writing Full overwrite Append-only
Applicable Scenario Small data, low save frequency Large data, high-frequency writes
Enable Method Default EnableWal = true
Enabling WAL
var options = new QuiverDbOptions
{
    DatabasePath = "mydata.vdb",
    StorageFormat = StorageFormat.Binary,
    EnableWal = true,                // Enable WAL
    WalCompactionThreshold = 10_000, // Auto-compaction threshold
    WalFlushToDisk = true            // fsync durability guarantee
};
Core API
// Incremental save: only append unpersisted changes to the WAL file
await db.SaveChangesAsync();

// Full snapshot + clear WAL (equivalent to CompactAsync)
await db.SaveAsync();

// Manual compaction: create full snapshot + clear WAL
await db.CompactAsync();

// Load: read full snapshot + replay WAL incremental changes in order
await db.LoadAsync();
WAL Workflow
flowchart TD
    subgraph Write Phase
        W1["User calls Add/Upsert/Remove/Clear"] --> W2["QuiverSet updates in-memory data + indices"]
        W2 --> W3["_changeLog records change<br/>(Op, Key, Entity)"]
    end

    subgraph SaveChangesAsync
        S1["DrainChanges() get and clear change log"] --> S2["Serialize to WalEntry list<br/>(JSON payload)"]
        S2 --> S3["WriteAheadLog.Append()<br/>Batch append + CRC32 checksum"]
        S3 --> S4{"RecordCount >= threshold?"}
        S4 -- "Yes" --> S5["CompactAsync()<br/>Create full snapshot + clear WAL"]
        S4 -- "No" --> S6["Done"]
    end

    subgraph LoadAsync
        L1["Phase 1: Load full snapshot<br/>IStorageProvider.LoadAsync()"] --> L2["Phase 2: Read WAL file<br/>WriteAheadLog.ReadAll()"]
        L2 --> L3["Replay each record in order<br/>ReplayAdd / ReplayRemove / ReplayClear"]
        L3 --> L4["Memory state = snapshot + Δ"]
    end

    W3 --> S1
    S6 --> L1
Change Tracking Mechanism

QuiverSet<T> internally maintains a _changeLog list, recording each write operation within the write lock:

Operation Op Code Key Entity
Add 1 Key value Entity instance
Remove 2 Key value null
Clear 3 null null

Special Behaviors:

  • Upsert is recorded as two changes: Remove + Add
  • LoadEntities (snapshot loading) and ReplayAdd/Remove/Clear (WAL replay) do not record changes, avoiding circular writes
  • DrainChanges() uses snapshot + clear semantics: retrieves the change list and immediately clears it, ensuring each change is persisted only once
WAL File Format

Custom compact binary format with CRC32 checksum per record:

┌─ File Header (5 bytes) ──────────────────────────────────
│  [4B] Magic = "WLOG"                 ← File identifier
│  [1B] Version = 0x01                 ← Protocol version
├─ Record × N ─────────────────────────────────────────────
│  [4B uint32] DataLength              ← Data area length (excludes this field and CRC)
│  ┌─ Data Area (DataLength bytes) ────────────────────────
│  │  [8B int64]  SeqNo                ← Monotonically increasing sequence number
│  │  [1B]        OpCode               ← 1=Add, 2=Remove, 3=Clear
│  │  [string]    TypeName             ← BinaryWriter length-prefixed UTF-8
│  │  [string]    PayloadJson          ← BinaryWriter length-prefixed UTF-8
│  ├───────────────────────────────────────────────────────
│  [4B uint32] CRC32                   ← Covers SeqNo through PayloadJson
└──────────────────────────────────────────────────────────

Crash Recovery Safety:

  • Validates CRC32 record by record during reading; stops reading upon checksum failure or truncated records
  • Incomplete trailing records are safely discarded (only losing the most recent batch of unflushed changes)
  • Automatically scans and truncates corrupted trailing data when opening an existing WAL file
WAL Thread Safety
Operation Thread Safety Mechanism
Append Lock object serializes all write operations
ReadAll Static method, uses an independent read-only file stream
Truncate Executed within _writeLock
RecordCount Volatile.Read ensures cross-thread visibility
Auto-Compaction Strategy
flowchart LR
    SC["SaveChangesAsync()"] --> CHK{"WAL record count >= threshold?"}
    CHK -- "No" --> DONE["Done"]
    CHK -- "Yes" --> COMPACT["CompactAsync()"]
    COMPACT --> SNAP["Create full snapshot<br/>(atomic write: temp file -> replace)"]
    SNAP --> TRUNC["WAL.Truncate()<br/>Keep only file header"]
    TRUNC --> DRAIN["DrainChanges()<br/>Clear in-memory change log"]
    DRAIN --> DONE

Recommended threshold range: 1,000 ~ 100,000, depending on individual record size (vector dimensions) and loading speed requirements. Default is 10,000.


9. Multi-Vector Field Support

An entity can have multiple [QuiverVector] properties annotated, each field maintaining its own independent index, supporting different dimensions, metrics, and indexing strategies.

9.1 Defining Multi-Vector Entities

public class MultiModalItem
{
    [QuiverKey]
    public string Id { get; set; } = string.Empty;

    public string Title { get; set; } = string.Empty;
    public string Category { get; set; } = string.Empty;
    public bool IsPublished { get; set; }

    [QuiverVector(384, DistanceMetric.Cosine)]
    [QuiverIndex(VectorIndexType.HNSW, M = 32, EfConstruction = 200, EfSearch = 100)]
    public float[] TextEmbedding { get; set; } = [];

    [QuiverVector(512, DistanceMetric.Cosine)]
    [QuiverIndex(VectorIndexType.HNSW, M = 24, EfConstruction = 200, EfSearch = 80)]
    public float[] ImageEmbedding { get; set; } = [];
}

9.2 Internal Structure

graph TD
    subgraph "QuiverSet&lt;MultiModalItem&gt;"
        E["_entities"]
        K["_keyToId"]

        subgraph "_indices (independent index per field)"
            TI["TextEmbedding<br/>-> HnswIndex<br/>384d, Cosine"]
            II["ImageEmbedding<br/>-> FlatIndex<br/>512d, Euclidean"]
            AI["AudioEmbedding<br/>-> IvfIndex<br/>256d, DotProduct"]
        end
    end

    ADD["Add(entity)"] --> E
    ADD --> K
    ADD --> TI
    ADD --> II
    ADD --> AI
// Search by text vector
var textResults = db.Items.Search(e => e.TextEmbedding, textQuery, topK: 5);

// Search by image vector
var imageResults = db.Items.Search(e => e.ImageEmbedding, imageQuery, topK: 5);

// Search by audio vector
var audioResults = db.Items.Search(e => e.AudioEmbedding, audioQuery, topK: 5);

// Search results from the three fields are mutually independent (different vector spaces)

9.4 Viewing Vector Field Information

foreach (var (name, dimensions) in db.Items.VectorFields)
    Console.WriteLine($"Field: {name}, Dimensions: {dimensions}");
// Output:
// Field: TextEmbedding, Dimensions: 384
// Field: ImageEmbedding, Dimensions: 512
// Field: AudioEmbedding, Dimensions: 256

10. Thread Safety and Concurrency

10.1 Lock Model

QuiverSet<TEntity> internally uses ReaderWriterLockSlim to implement reader-writer separation:

flowchart LR
    subgraph "Read Operations (Shared Lock)"
        S["Search"]
        F["Find"]
        C["Count"]
        GA["GetAll"]
    end

    subgraph "Write Operations (Exclusive Lock)"
        A["Add"]
        AR["AddRange"]
        U["Upsert"]
        R["Remove"]
        CL["Clear"]
        LE["LoadEntities"]
    end

    S & F & C & GA -->|"Parallel execution ✅"| RLock["EnterReadLock"]
    A & AR & U & R & CL & LE -->|"Mutually exclusive 🔒"| WLock["EnterWriteLock"]

10.2 Concurrency Safety Examples

var db = new MyDocumentDb();

// ✅ Safe: multi-threaded concurrent search (shared read lock)
var tasks = Enumerable.Range(0, 24).Select(_ => Task.Run(() =>
{
    var query = GenerateRandomVector(384);
    return db.Documents.Search(e => e.Embedding, query, topK: 5);
}));
await Task.WhenAll(tasks);

// ✅ Safe: concurrent read-write (read operations wait while write holds exclusive lock)
var writerTask = Task.Run(() =>
{
    db.Documents.Upsert(new Document
    {
        Id = "new-doc",
        Title = "New Document",
        Embedding = new float[384]
    });
});

var readerTask = Task.Run(() =>
    db.Documents.Search(e => e.Embedding, queryVector, topK: 5));

await Task.WhenAll(writerTask, readerTask);

10.3 Dispose Thread Safety

QuiverSet uses Interlocked.Exchange(ref _disposed, 1) to guarantee concurrent Dispose safety. All operation entry points call ThrowIfDisposed(), using Volatile.Read to ensure cross-thread visibility.

10.4 Concurrency Performance Reference

Test Scenario Data Size Configuration Result
Pure read concurrency 3,000 entries × 3 vectors 24 threads × 100 searches 2,400 searches with zero exceptions
Mixed read-write 1,000 entries × 3 vectors 4 writers + 8 readers + 2 deleters, 3 seconds Zero exceptions
Batch write + search Dynamically growing 3 writer threads (50 per batch) + 6 search threads, 3 seconds Zero exceptions

11. Lifecycle Management

11.1 QuiverDbContext Lifecycle

stateDiagram-v2
    [*] --> Created: new MyDb(options)
    Created --> Active: InitializeSets() + Initialize WAL
    Active --> Active: Add / Search / ...
    Active --> Saving: SaveAsync() full snapshot
    Active --> WalAppend: SaveChangesAsync() incremental append
    Saving --> Active: Save complete
    WalAppend --> Active: Append complete
    WalAppend --> Saving: Auto-compaction (threshold exceeded)
    Active --> Disposing_Async: DisposeAsync()
    Disposing_Async --> WalAppend_Final: WAL mode -> SaveChangesAsync()
    Disposing_Async --> Saving_Final: Full mode -> SaveAsync()
    WalAppend_Final --> Disposed: Release WAL + all QuiverSets
    Saving_Final --> Disposed: Release all QuiverSets
    Active --> Disposed: Dispose() (no save)
    Disposed --> [*]
Disposal Method Auto-Save WAL Mode Behavior Recommended Scenario
Dispose() ❌ No save No save, only releases resources When manual save timing control is needed
DisposeAsync() ✅ Save then release Calls SaveChangesAsync() for incremental save Recommended, use with await using
// ✅ Recommended: await using with auto-save (full snapshot mode)
await using var db = new MyDocumentDb();
await db.LoadAsync();
db.Documents.Add(new Document { ... });
// Scope ends -> DisposeAsync -> SaveAsync -> Dispose all resources

// ✅ Recommended: await using with auto-save (WAL mode)
await using var walDb = new MyWalDb();
await walDb.LoadAsync(); // Load snapshot + replay WAL
walDb.Documents.Add(new Document { ... });
// Can explicitly call incremental save after each batch write
await walDb.SaveChangesAsync();
// Scope ends -> DisposeAsync -> SaveChangesAsync -> Dispose all resources

// Manual control approach
var db2 = new MyDocumentDb();
try
{
    db2.Documents.Add(new Document { ... });
    await db2.SaveAsync(); // Manual save
}
finally
{
    db2.Dispose(); // Only releases resources, does not save
}

11.3 QuiverSet Disposal

QuiverSet implements IDisposable, releasing the internal ReaderWriterLockSlim. All operations throw ObjectDisposedException after disposal.


12. Configuration Options

QuiverDbOptions provides the following configurations:

var options = new QuiverDbOptions
{
    // Database file path. null for in-memory mode (no persistence)
    // Directory is auto-created by storage provider if it doesn't exist
    DatabasePath = @"C:\Data\MyQuiverDb.json",

    // Default distance metric (entity-level [QuiverVector] attribute can override)
    DefaultMetric = DistanceMetric.Cosine,

    // Persistence storage format
    StorageFormat = StorageFormat.Json,

    // JSON serialization options (only used when StorageFormat.Json)
    JsonOptions = new JsonSerializerOptions
    {
        WriteIndented = true,                           // Indented output
        PropertyNamingPolicy = JsonNamingPolicy.CamelCase  // Camel case naming
    },

    // ── WAL Incremental Persistence Configuration ──

    // Whether to enable WAL incremental persistence
    EnableWal = true,

    // Auto-trigger compaction (full snapshot + clear WAL) when WAL record count reaches this threshold
    WalCompactionThreshold = 10_000,

    // Whether to fsync to physical disk immediately after WAL write
    // true = strongest durability (no data loss on power failure), false = relies on OS buffer (better performance)
    WalFlushToDisk = true
};
Property Type Default Description
DatabasePath string? null Storage path, null for in-memory mode (SaveAsync requires explicit path)
DefaultMetric DistanceMetric Cosine Default distance metric
StorageFormat StorageFormat Json Persistence format: Json / Xml / Binary
JsonOptions JsonSerializerOptions Indented + CamelCase JSON serialization options
EnableWal bool false Whether to enable WAL incremental persistence
WalCompactionThreshold int 10,000 Auto-compact when WAL record count reaches this value
WalFlushToDisk bool true Whether to fsync to disk after WAL write

13. Internal Implementation Details

13.1 Expression Tree Compiled Property Accessors

The framework uses expression trees to compile high-performance accessors for each primary key and vector property, replacing runtime reflection calls:

// Before compilation (reflection): ~200ns / call
var value = propertyInfo.GetValue(entity);

// After compilation (expression tree): ~2ns / call, ~100x improvement
private static Func<TEntity, TResult> CompileGetter<TResult>(PropertyInfo prop)
{
    var param = Expression.Parameter(typeof(TEntity), "e");
    Expression body = Expression.Property(param, prop);
    // Value types automatically get boxing node inserted (e.g., int -> object)
    if (prop.PropertyType != typeof(TResult))
        body = Expression.Convert(body, typeof(TResult));
    return Expression.Lambda<Func<TEntity, TResult>>(body, param).Compile();
}

13.2 SimilarityFunc Delegate Design

Uses delegates with ReadOnlySpan<float> parameter types that can directly bind to TensorPrimitives method groups without additional lambda wrapping:

// Delegate signature
internal delegate float SimilarityFunc(ReadOnlySpan<float> a, ReadOnlySpan<float> b);

// Direct binding to TensorPrimitives method groups (zero overhead)
SimilarityFunc simFunc = TensorPrimitives.Dot;
SimilarityFunc simFunc = TensorPrimitives.CosineSimilarity;

// Euclidean requires transformation to similarity
SimilarityFunc simFunc = (a, b) => 1f / (1f + TensorPrimitives.Distance(a, b));

13.3 HNSW Level Random Generation

Levels follow an exponential decay distribution, ensuring upper layers are sparse and lower layers are dense:

level = floor(-ln(uniform(0, 1)) × ml)
where ml = 1 / ln(M)

Most nodes (~93.75% when M=16) exist only on layer 0, while a few nodes exist on higher layers serving as "highway" entry points.

13.4 IVF K-Means++ Initialization

Converges faster and produces higher-quality clusters than random initialization:

  1. Randomly select the first centroid
  2. For each vector not yet selected as a centroid, compute its distance D(x) to the nearest centroid
  3. Select the next centroid with probability proportional to D(x)²
  4. Repeat until K centroids are selected

13.5 KDTree Pruning Optimization

Uses split hyperplane distance for pruning during search:

  • diff = query[splitDim] - node.splitValue
  • Prioritize searching the side containing the query point
  • For the other side: explore only when the heap is not full or |diff| < current search radius
  • Can skip large numbers of subtrees in low dimensions; pruning fails in high dimensions

13.6 StorageProviderFactory

Simple factory pattern, invoked during QuiverDbContext construction:

internal static IStorageProvider Create(QuiverDbOptions options) => options.StorageFormat switch
{
    StorageFormat.Json   => new JsonStorageProvider(options.JsonOptions),
    StorageFormat.Xml    => new XmlStorageProvider(),
    StorageFormat.Binary => new BinaryStorageProvider(),
    _ => throw new ArgumentOutOfRangeException(nameof(options.StorageFormat))
};

13.7 Change Tracking and WAL Replay

The _changeLog inside QuiverSet<T> records each write operation within the write lock, enabling incremental persistence:

// Change log buffer
private readonly List<(byte Op, object? Key, object? Entity)> _changeLog = [];

// Record change in AddCore
if (logChanges)
    _changeLog.Add((1, key, entity)); // Op=1: Add

// Record change in RemoveCore
if (logChanges)
    _changeLog.Add((2, key, null)); // Op=2: Remove

logChanges Parameter Control:

Call Scenario logChanges Reason
User calls Add/Remove/Upsert/Clear true Needs to be recorded to WAL
LoadEntities (snapshot loading) false Snapshot data doesn't need re-recording
ReplayAdd/Remove/Clear (WAL replay) false Replay data comes from WAL, avoiding circular writes

DrainChanges() Snapshot + Clear Semantics:

internal List<(byte Op, object? Key, object? Entity)> DrainChanges()
{
    _lock.EnterWriteLock();
    try
    {
        if (_changeLog.Count == 0) return [];
        var snapshot = new List<(byte, object?, object?)>(_changeLog);
        _changeLog.Clear();
        return snapshot;
    }
    finally { _lock.ExitWriteLock(); }
}

WAL Replay Method Special Handling:

  • ReplayAdd: Silently skips when primary key already exists (snapshot may already contain this entity)
  • ReplayRemove: Returns false when primary key doesn't exist (entity may be re-added in subsequent WAL records)
  • ReplayClear: Directly clears all data and indices

13.8 Atomic Write (SaveAsync)

SaveAsync uses a strategy of writing to a temporary file first, then atomically replacing, preventing data corruption from mid-write crashes:

var tempPath = filePath + ".tmp";
await _storageProvider.SaveAsync(tempPath, setsData);
File.Move(tempPath, filePath, overwrite: true); // Atomic replace

13.9 WAL CRC32 Checksum

Each WAL record's data area (SeqNo through PayloadJson) is checksummed using System.IO.Hashing.Crc32, appended at the end of the record:

var data = ms.ToArray(); // SeqNo + OpCode + TypeName + PayloadJson
var crc = Crc32.HashToUInt32(data);
_writer.Write(data.Length);
_writer.Write(data);
_writer.Write(crc);

During reading, reverse verification is performed — a CRC mismatch is treated as corruption, and reading of subsequent records stops.


14. Complete Examples

14.1 Face Recognition System

using Vorcyc.Quiver;

// ═══ Define Entity ═══
public class FaceFeature
{
    [QuiverKey]
    public string PersonId { get; set; } = string.Empty;
    public string Name { get; set; } = string.Empty;
    public DateTime RegisterTime { get; set; }

    [QuiverVector(128, DistanceMetric.Cosine)]
    public float[] Embedding { get; set; } = [];
}

// ═══ Define Database Context ═══
public class FaceDb : QuiverDbContext
{
    public QuiverSet<FaceFeature> Faces { get; set; } = null!;

    public FaceDb(string path) : base(new QuiverDbOptions
    {
        DatabasePath = path,
        StorageFormat = StorageFormat.Binary,
        DefaultMetric = DistanceMetric.Cosine
    })
    { }
}

// ═══ Usage ═══
await using var db = new FaceDb("faces.vdb");
await db.LoadAsync();

// Batch register faces
var faces = employees.Select(e => new FaceFeature
{
    PersonId = e.Id,
    Name = e.Name,
    RegisterTime = DateTime.UtcNow,
    Embedding = GetFaceEmbedding(e.Photo)
}).ToList();
db.Faces.AddRange(faces);

// Real-time face recognition
float[] probeVector = GetFaceEmbedding(cameraFrame);
var match = db.Faces.SearchTop1(probeVector);

if (match is { Similarity: > 0.9f })
{
    Console.WriteLine($"Recognition successful: {match.Entity.Name} (confidence: {match.Similarity:P1})");
}
else
{
    Console.WriteLine("No matching face recognized");
}

14.2 Multimodal Search Engine (HNSW Index)

using Vorcyc.Quiver;

// ═══ Multimodal Entity ═══
public class MediaItem
{
    [QuiverKey]
    public string Id { get; set; } = string.Empty;
    public string Title { get; set; } = string.Empty;
    public string Category { get; set; } = string.Empty;
    public bool IsPublished { get; set; }

    [QuiverVector(384, DistanceMetric.Cosine)]
    [QuiverIndex(VectorIndexType.HNSW, M = 32, EfConstruction = 200, EfSearch = 100)]
    public float[] TextEmbedding { get; set; } = [];

    [QuiverVector(512, DistanceMetric.Cosine)]
    [QuiverIndex(VectorIndexType.HNSW, M = 24, EfConstruction = 200, EfSearch = 80)]
    public float[] ImageEmbedding { get; set; } = [];
}

// ═══ Database Context ═══
public class MediaDb : QuiverDbContext
{
    public QuiverSet<MediaItem> Items { get; set; } = null!;

    public MediaDb() : base(new QuiverDbOptions
    {
        DatabasePath = "media.vdb",
        StorageFormat = StorageFormat.Binary
    })
    { }
}

// ═══ Usage ═══
await using var db = new MediaDb();
await db.LoadAsync();

// Batch import
await db.Items.AddRangeAsync(LoadMediaItems());

// Text search + published status filtering
float[] textQuery = GetTextEmbedding("machine learning tutorial");
var textResults = db.Items.Search(
    e => e.TextEmbedding,
    textQuery,
    topK: 10,
    filter: e => e.IsPublished
);

// Image search
float[] imageQuery = GetImageEmbedding(uploadedImage);
var imageResults = db.Items.Search(
    e => e.ImageEmbedding, imageQuery, topK: 10);

// Category filtering + high over-fetch rate
Func<MediaItem, bool> categoryFilter = e => e.Category == "Technology";
var filtered = db.Items.Search(
    e => e.TextEmbedding, textQuery, topK: 20,
    filter: categoryFilter,
    overFetchMultiplier: 8);

14.3 Simplifying Context with Primary Constructor

public class MyFaceDb(string path, StorageFormat format)
    : QuiverDbContext(new QuiverDbOptions
    {
        DatabasePath = path,
        StorageFormat = format,
        DefaultMetric = DistanceMetric.Cosine
    })
{
    public QuiverSet<FaceFeature> Faces { get; set; } = null!;
}

// Usage
var jsonDb = new MyFaceDb("data.json", StorageFormat.Json);
var binaryDb = new MyFaceDb("data.vdb", StorageFormat.Binary);

14.4 WAL Incremental Persistence Service

using Vorcyc.Quiver;

// ═══ WAL Mode Database Context ═══
public class MyWalDocDb(string path) : QuiverDbContext(new QuiverDbOptions
{
    DatabasePath = path,
    StorageFormat = StorageFormat.Binary,
    EnableWal = true,
    WalCompactionThreshold = 10_000,
    WalFlushToDisk = true
})
{
    public QuiverSet<Document> Documents { get; set; } = null!;
}

// ═══ Usage: High-Frequency Write Scenario ═══
await using var db = new MyWalDocDb("documents.vdb");
await db.LoadAsync(); // Load snapshot + replay WAL

// Batch write
for (int i = 0; i < 1000; i++)
{
    db.Documents.Add(new Document
    {
        Id = $"doc-{i:D5}",
        Title = $"Document {i}",
        Category = "Technology",
        Embedding = GetEmbedding($"Document content {i}")
    });
}

// Incremental save: only append 1000 changes to WAL, O(Δ) complexity
await db.SaveChangesAsync();

// Continue incremental operations
db.Documents.Upsert(new Document
{
    Id = "doc-00000",
    Title = "Updated Document 0",
    Category = "Tutorial",
    Embedding = GetEmbedding("Updated content")
});
db.Documents.RemoveByKey("doc-00999");

// Incremental save again (only 2 changes: 1 Upsert + 1 Remove)
await db.SaveChangesAsync();

// Manually trigger compaction when needed
await db.CompactAsync(); // Full snapshot + clear WAL

// Scope ends -> DisposeAsync -> SaveChangesAsync (auto-save unpersisted changes)

14.5 Async Concurrent Search Service

public class SearchService
{
    private readonly MyDocumentDb _db;

    public SearchService(string dbPath)
    {
        _db = new MyDocumentDb(dbPath, StorageFormat.Binary);
        _db.LoadAsync().GetAwaiter().GetResult();
    }

    /// <summary>
    /// Concurrency-safe search method that can be called by multiple ASP.NET requests simultaneously.
    /// The reader-writer lock inside QuiverSet guarantees thread safety.
    /// </summary>
    public async Task<List<QuiverSearchResult<Document>>> SearchAsync(
        float[] queryVector, int topK, CancellationToken ct)
    {
        return await _db.Documents.SearchAsync(
            e => e.Embedding, queryVector, topK, ct);
    }

    /// <summary>Search with category filtering.</summary>
    public async Task<List<QuiverSearchResult<Document>>> SearchByCategoryAsync(
        float[] queryVector, string category, int topK, CancellationToken ct)
    {
        Func<Document, bool> filter = e => e.Category == category;
        return await _db.Documents.SearchAsync(
            e => e.Embedding, queryVector, topK,
            filter, overFetchMultiplier: 8, ct);
    }
}

15. API Reference Cheat Sheet

QuiverDbContext

Method / Property Return Type Description
Set<TEntity>() QuiverSet<TEntity> Get vector collection by type (throws if not registered)
SaveAsync(path?) Task Async full save of all collections to disk (also clears WAL when enabled)
SaveChangesAsync() Task Only append unpersisted changes to WAL file, O(Δ) (equivalent to SaveAsync when WAL is not enabled)
CompactAsync() Task Create full snapshot + clear WAL (equivalent to SaveAsync)
LoadAsync(path?) Task Async load snapshot + replay WAL (silently returns if file doesn't exist)
Dispose() void Synchronous disposal (no save)
DisposeAsync() ValueTask Async disposal (WAL mode calls SaveChangesAsync, otherwise calls SaveAsync)

QuiverSet<TEntity>

Properties
Property Type Description
Count int Entity count (read lock protected, thread-safe)
VectorFields IReadOnlyDictionary<string, int> Read-only mapping of vector field name → dimensions (lazily cached)
CRUD Methods
Method Return Type Lock Description
Add(entity) void Write Add single entity (throws on duplicate key)
AddRange(entities) void Write Batch add (atomic, two-phase commit)
AddRangeAsync(entities, ct) Task Write Async batch add (Task.Run)
Upsert(entity) void Write Insert or update (completed within single write lock)
Remove(entity) bool Write Remove by entity primary key
RemoveByKey(key) bool Write Remove by key value
Find(key) TEntity? Read Find by primary key, O(1)
Clear() void Write Clear all data + indices
Search Methods (Synchronous)
Method Return Type Description
Search(selector, query, topK) List<QuiverSearchResult<T>> Top-K search
Search(selector, query, topK, Expression filter) List<QuiverSearchResult<T>> With expression filter
Search(selector, query, topK, Func filter, overFetchMultiplier) List<QuiverSearchResult<T>> With delegate filter + over-fetch
SearchByThreshold(selector, query, threshold) List<QuiverSearchResult<T>> Threshold search
SearchTop1(selector, query) QuiverSearchResult<T>? Most similar single entity
Search(query, topK) List<QuiverSearchResult<T>> Default field Top-K
SearchTop1(query) QuiverSearchResult<T>? Default field Top-1
Search Methods (Asynchronous)

All synchronous search methods have corresponding Async suffix versions with an additional CancellationToken parameter, offloaded to the thread pool via Task.Run.

Attribute Annotations

Attribute Target Required Description
[QuiverKey] Property Marks primary key (exactly one required)
[QuiverVector(dim, metric)] Property Marks vector field (at least one, type must be float[])
[QuiverIndex(type, ...)] Property Configures index type and parameters (defaults to Flat)

Enums

DistanceMetric
Value Description
Cosine Cosine similarity (pre-normalization optimized)
Euclidean Euclidean distance (converted to similarity)
DotProduct Dot product
VectorIndexType
Value Description
Flat Brute-force search, 100% exact
HNSW Hierarchical Navigable Small World graph
IVF Inverted File Index
KDTree KD Tree
StorageFormat
Value Description
Json JSON format
Xml XML format
Binary Binary format

Search Result

public record QuiverSearchResult<TEntity>(TEntity Entity, float Similarity);
Property Type Description
Entity TEntity The matched entity instance
Similarity float Similarity score (higher is more similar)
Product Compatible and additional computed target framework versions.
.NET net10.0 is compatible.  net10.0-android was computed.  net10.0-browser was computed.  net10.0-ios was computed.  net10.0-maccatalyst was computed.  net10.0-macos was computed.  net10.0-tvos was computed.  net10.0-windows was computed. 
Compatible target framework(s)
Included target framework(s) (in package)
Learn more about Target Frameworks and .NET Standard.

NuGet packages

This package is not used by any NuGet packages.

GitHub repositories

This package is not used by any popular GitHub repositories.

Version Downloads Last Updated
4.0.1 0 6/2/2026
3.2.1 103 5/9/2026
3.2.0 91 5/6/2026
3.1.0 93 5/5/2026
3.0.0 94 5/5/2026
2.0.0 115 4/15/2026
1.2.2 134 4/4/2026
1.1.2 171 3/30/2026
1.1.1 147 3/30/2026
1.1.0 142 3/30/2026
1.0.1 146 3/30/2026
1.0.0 158 3/29/2026