HnswIndex.SqliteStorage 1.0.6

.NET 8.0

dotnet add package HnswIndex.SqliteStorage --version 1.0.6

NuGet\Install-Package HnswIndex.SqliteStorage -Version 1.0.6

This command is intended to be used within the Package Manager Console in Visual Studio, as it uses the NuGet module's version of Install-Package.

<PackageReference Include="HnswIndex.SqliteStorage" Version="1.0.6" />

For projects that support PackageReference, copy this XML node into the project file to reference the package.

<PackageVersion Include="HnswIndex.SqliteStorage" Version="1.0.6" />
                    

                            Directory.Packages.props

<PackageReference Include="HnswIndex.SqliteStorage" />
                    

                            Project file

For projects that support Central Package Management (CPM), copy this XML node into the solution Directory.Packages.props file to version the package.

paket add HnswIndex.SqliteStorage --version 1.0.6

The NuGet Team does not provide support for this client. Please contact its maintainers for support.

#r "nuget: HnswIndex.SqliteStorage, 1.0.6"

#r directive can be used in F# Interactive and Polyglot Notebooks. Copy this into the interactive tool or source code of the script to reference the package.

#:package HnswIndex.SqliteStorage@1.0.6

#:package directive can be used in C# file-based apps starting in .NET 10 preview 4. Copy this into a .cs file before any lines of code to reference the package.

#addin nuget:?package=HnswIndex.SqliteStorage&version=1.0.6
                    

                            Install as a Cake Addin

#tool nuget:?package=HnswIndex.SqliteStorage&version=1.0.6
                    

                            Install as a Cake Tool

The NuGet Team does not provide support for this client. Please contact its maintainers for support.

HnswLite

A pure C# implementation of Hierarchical Navigable Small World (HNSW) graphs for approximate nearest neighbor search. This library provides a thread-safe, embeddable solution for vector similarity search in .NET applications.

Note: This library is in its early stages of development. We welcome your patience, constructive feedback, and contributions! Please be kind and considerate when reporting issues or suggesting improvements. I am not an expert on this topic and relied heavily on available AI tools to build this library. Pull requests are greatly appreciated!

Overview

HnswLite implements the Hierarchical Navigable Small World algorithm, which provides fast approximate nearest neighbor search with excellent recall rates. The library is designed to be embeddable, extensible, and easy to use in any .NET application.

Key Features

Pure C# implementation - No native dependencies
Thread-safe - Safe for concurrent operations
Async/await support - Modern async APIs with cancellation tokens
Pluggable storage - Extensible storage backend interface
Multiple distance metrics - Euclidean, Cosine, and Dot Product
Flexible deployment - Support for RAM or Sqlite, or build-your-own backend
Batch operations - Efficient bulk insert and remove
Comprehensive validation - Input validation and error handling

New in v1.0.x

SQLite backend with optimized binary serialization (4x faster than JSON)
Deferred flush for batch operations (100x improvement for large insertions)
SearchContext caching reduces database queries by 90%+
WAL mode and optimized PRAGMA settings for SQLite
Standalone REST server (HnswIndex.Server) with Docker image and Postman collection
Core HNSW algorithm implementation
In-memory storage backend
SQLite storage backend
Async APIs with cancellation support
Three distance functions (Euclidean, Cosine, Dot Product)
Batch add/remove operations
State export/import functionality
Thread-safety

For version history, see CHANGELOG.md.

Use Cases

HnswLite is ideal for:

Semantic Search - Find similar documents, sentences, or paragraphs based on embeddings
Recommendation Systems - Discover similar items, users, or content
Image Similarity - Search for visually similar images using feature vectors
Anomaly Detection - Identify outliers by finding distant neighbors
Clustering - Group similar items together based on vector proximity
RAG Applications - Retrieval-Augmented Generation for LLM applications
Duplicate Detection - Find near-duplicate content in large datasets

Performance and Scalability Recommendations

Recommended Limits

Vector dimensions: 50-1000 (optimal: 128-768)
Dataset size: Up to 1-10M vectors (depending on dimensions and available RAM)
Memory usage: Approximately (vector_count * dimension * 4 bytes) + (vector_count * M * 32 bytes)

Note: the above are estimations. This library has not been tested (yet) at any level of scale.

Performance Characteristics

Index build time: O(N log N) expected
Search time: O(log N) expected
Memory complexity: O(N * M) where M is the connectivity parameter

Optimization Tips

Parameter Tuning:

M: Number of connections per vector (default: 16). Think of this as how many "friends" each vector has in the network. More connections mean better search quality but use more memory. For most cases, 16-32 works well.
EfConstruction: Size of the candidate list when building the index (default: 200). This controls how thoroughly the algorithm searches for connections when adding new vectors. Higher values create better quality indices but take longer to build. For faster batch insertion, consider reducing to 50-100.
Ef (search parameter): Size of the candidate list during search (default: 50-200). This controls how many paths the algorithm explores when searching. Higher values find better results but take more time. Set this based on your speed/quality needs.
Seed: Set a consistent seed value for reproducible index builds (useful for testing).

For Large Datasets:

Use SQLite backend for persistence and lower memory usage
Always use AddNodesAsync for batch operations instead of individual AddAsync calls
SQLite backend automatically handles flushing - no manual flush needed
Consider reducing EfConstruction for faster insertion (trade-off with search quality)

For High-Dimensional Data:

Keep dimensions under 384 for optimal performance
Consider dimensionality reduction before indexing (PCA, UMAP, etc.)
Use cosine distance for normalized embeddings (common with text embeddings)
Monitor memory usage: ~4 bytes per dimension per vector plus graph overhead

Bugs, Feedback, or Enhancement Requests

We value your input! If you encounter any issues or have suggestions:

Bug Reports: Please file an issue with reproduction steps
Feature Requests: Start a discussion or create an issue
Questions: Use the discussions forum for general questions
Contributions: Pull requests are welcome! Please read our contributing guidelines first

Simple Example

using Hnsw;
using Hnsw.RamStorage;
using Hnsw.SqliteStorage;

// Create an index for 128-dimensional vectors in RAM
HnswIndex index = new HnswIndex(128, new RamHnswStorage(), new RamHnswLayerStorage());

// Or using SQLite (with proper disposal)
using SqliteHnswStorage sqliteStorage = new SqliteHnswStorage("my-index.db");
using SqliteHnswLayerStorage sqliteLayerStorage = new SqliteHnswLayerStorage(sqliteStorage.Connection);
HnswIndex sqliteIndex = new HnswIndex(128, sqliteStorage, sqliteLayerStorage);

// Configure parameters (optional)
index.M = 16;
index.EfConstruction = 200;
index.DistanceFunction = new CosineDistance();

// Add vectors to the index
Guid vectorId = Guid.NewGuid();
List<float> vector = new List<float>(128); // Your 128-dimensional embedding
// ... populate vector with data ...

await index.AddAsync(vectorId, vector);

// Add multiple vectors
Dictionary<Guid, List<float>> batch = new Dictionary<Guid, List<float>>();
for (int i = 0; i < 1000; i++)
{
    Guid id = Guid.NewGuid();
    List<float> v = GenerateRandomVector(128); // Your vector generation logic
    batch[id] = v;
}
await index.AddNodesAsync(batch);

// Search for nearest neighbors
List<float> queryVector = new List<float>(128); // Your query embedding
// ... populate query vector ...

List<SearchResult> neighbors = await index.GetTopKAsync(queryVector, k: 10);

foreach (SearchResult result in neighbors)
{
    Console.WriteLine($"ID: {result.GUID}, Distance: {result.Distance:F4}");
}

// Save the index
HnswState state = await index.ExportStateAsync();
// ... serialize state to disk ...

// Load the index
HnswIndex newIndex = new HnswIndex(128, new RamHnswStorage(), new RamHnswLayerStorage());
await newIndex.ImportStateAsync(state);

Best Practices

Resource Management:
- Always use using statements with SQLite storage to ensure proper cleanup
- The SQLite backend automatically flushes pending changes on disposal
- No manual flush is needed - the library handles this internally

Batch Operations:

// GOOD: Use batch operations for multiple vectors
Dictionary<Guid, List<float>> batch = new Dictionary<Guid, List<float>>();
// ... populate batch ...
await index.AddNodesAsync(batch);

// AVOID: Individual adds in a loop
foreach (Item item in items)
{
    await index.AddAsync(item.Id, item.Vector); // Slower
}

Search Performance:

// Adjust ef parameter based on your needs
List<SearchResult> quickResults = await index.GetTopKAsync(query, k: 10, ef: 50);  // Faster, lower quality
List<SearchResult> bestResults = await index.GetTopKAsync(query, k: 10, ef: 400);  // Slower, higher quality

Custom Storage Example

Refer to Hnsw.RamStorage and Hnsw.SqliteStorage for actual implementations. To implement your own backend, you need to implement:

IHnswLayerStorage - Manages layer assignments for nodes
IHnswNode - Represents a single node with its vector and neighbors
IHnswStorage - Handles node persistence and retrieval

Running in Docker

Refer to the src/Docker directory for assets related to running in Docker. The Docker image can be found on Docker Hub and a Postman collection is contained within this repository's root directory.

License

This library is available under the MIT license.

Acknowledgments

This implementation is based on the paper: Efficient and robust approximate nearest neighbor search using Hierarchical Navigable Small World graphs by Yu. A. Malkov and D. A. Yashunin.

Product	Compatible and additional computed target framework versions.
.NET	net8.0 is compatible. net8.0-android was computed. net8.0-browser was computed. net8.0-ios was computed. net8.0-maccatalyst was computed. net8.0-macos was computed. net8.0-tvos was computed. net8.0-windows was computed. net9.0 was computed. net9.0-android was computed. net9.0-browser was computed. net9.0-ios was computed. net9.0-maccatalyst was computed. net9.0-macos was computed. net9.0-tvos was computed. net9.0-windows was computed. net10.0 was computed. net10.0-android was computed. net10.0-browser was computed. net10.0-ios was computed. net10.0-maccatalyst was computed. net10.0-macos was computed. net10.0-tvos was computed. net10.0-windows was computed.

Product

.NET

Compatible target framework(s)

Included target framework(s) (in package)

Learn more about Target Frameworks and .NET Standard.

net8.0
- HnswLite (>= 1.0.6)
- Microsoft.Data.Sqlite (>= 9.0.7)

NuGet packages

This package is not used by any NuGet packages.

GitHub repositories

This package is not used by any popular GitHub repositories.

Version	Downloads	Last Updated
1.0.6	117	9/2/2025
1.0.5	163	8/29/2025
1.0.4	148	8/25/2025
1.0.3	97	8/24/2025
1.0.2	390	7/21/2025

Initial release