HnswLite.RamStorage
1.0.2
dotnet add package HnswLite.RamStorage --version 1.0.2
NuGet\Install-Package HnswLite.RamStorage -Version 1.0.2
<PackageReference Include="HnswLite.RamStorage" Version="1.0.2" />
<PackageVersion Include="HnswLite.RamStorage" Version="1.0.2" />
<PackageReference Include="HnswLite.RamStorage" />
paket add HnswLite.RamStorage --version 1.0.2
#r "nuget: HnswLite.RamStorage, 1.0.2"
#:package HnswLite.RamStorage@1.0.2
#addin nuget:?package=HnswLite.RamStorage&version=1.0.2
#tool nuget:?package=HnswLite.RamStorage&version=1.0.2
<img src="https://github.com/jchristn/HnswLite/blob/main/assets/logo.png" width="256" height="256">
HnswLite
A pure C# implementation of Hierarchical Navigable Small World (HNSW) graphs for approximate nearest neighbor search. This library provides a thread-safe, embeddable solution for vector similarity search in .NET applications.
Note: This library is in its early stages of development. We welcome your patience, constructive feedback, and contributions! Please be kind and considerate when reporting issues or suggesting improvements. I am not an expert on this topic and relied heavily on available AI tools to build this library. Pull requests are greatly appreciated!
Overview
HnswLite implements the Hierarchical Navigable Small World algorithm, which provides fast approximate nearest neighbor search with excellent recall rates. The library is designed to be embeddable, extensible, and easy to use in any .NET application.
Key Features
- Pure C# implementation - No native dependencies
- Thread-safe - Safe for concurrent operations
- Async/await support - Modern async APIs with cancellation tokens
- Pluggable storage - Extensible storage backend interface
- Multiple distance metrics - Euclidean, Cosine, and Dot Product
- Flexible deployment - Support for RAM or Sqlite, or build-your-own backend
- Batch operations - Efficient bulk insert and remove
- Comprehensive validation - Input validation and error handling
New in v1.0.x
Initial version featuring:
- Core HNSW algorithm implementation
- In-memory storage backend
- Sqlite storage backend
- Async APIs with cancellation support
- Three distance functions (Euclidean, Cosine, Dot Product)
- Batch add/remove operations
- State export/import functionality
- Thread-safety
For version history, see CHANGELOG.md.
Use Cases
HnswLite is ideal for:
- Semantic Search - Find similar documents, sentences, or paragraphs based on embeddings
- Recommendation Systems - Discover similar items, users, or content
- Image Similarity - Search for visually similar images using feature vectors
- Anomaly Detection - Identify outliers by finding distant neighbors
- Clustering - Group similar items together based on vector proximity
- RAG Applications - Retrieval-Augmented Generation for LLM applications
- Duplicate Detection - Find near-duplicate content in large datasets
Performance and Scalability Recommendations
Recommended Limits
- Vector dimensions: 50-1000 (optimal: 128-768)
- Dataset size: Up to 1-10M vectors (depending on dimensions and available RAM)
- Memory usage: Approximately
(vector_count * dimension * 4 bytes) + (vector_count * M * 32 bytes)
Note: the above are estimations. This library has not been tested (yet) at any level of scale.
Performance Characteristics
- Index build time: O(N log N) expected
- Search time: O(log N) expected
- Memory complexity: O(N * M) where M is the connectivity parameter
Optimization Tips
- Parameter Tuning:
M
: Number of connections per vector (default: 16). Think of this as how many "friends" each vector has in the network. More connections mean better search quality but use more memory. For most cases, 16-32 works well.EfConstruction
: Size of the candidate list when building the index (default: 200). This controls how thoroughly the algorithm searches for connections when adding new vectors. Higher values create better quality indices but take longer to build.Ef
(search parameter): Size of the candidate list during search (default: 50-200). This controls how many paths the algorithm explores when searching. Higher values find better results but take more time. Set this based on your speed/quality needs.
- For Large Datasets:
- Leverage the Sqlite implementation or build your own disk-based storage backend
- Use batch operations for initial index building
- Monitor memory usage closely
- For High-Dimensional Data:
- Consider dimensionality reduction before indexing
- Use cosine distance for normalized embeddings
Bugs, Feedback, or Enhancement Requests
We value your input! If you encounter any issues or have suggestions:
- Bug Reports: Please file an issue with reproduction steps
- Feature Requests: Start a discussion or create an issue
- Questions: Use the discussions forum for general questions
- Contributions: Pull requests are welcome! Please read our contributing guidelines first
Simple Example
using Hnsw;
using HnswIndex.RamStorage;
using HnswIndex.SqliteStorage;
// Create an index for 128-dimensional vectors in RAM
var index = new HnswIndex(128, new RamHnswStorage(), new RamHnswLayerStorage());
// Or using Sqlite
var sqliteStorage = new SqliteHnswStorage("my-index.db");
var sqliteLayerStorage = new SqliteHnswLayerStorage(sqliteStorage.Connection);
var sqliteIndex = new HnswIndex(2, sqliteStorage, sqliteLayerStorage);
// Configure parameters (optional)
index.M = 16;
index.EfConstruction = 200;
index.DistanceFunction = new CosineDistance();
// Add vectors to the index
var vectorId = Guid.NewGuid();
var vector = new List<float>(128); // Your 128-dimensional embedding
// ... populate vector with data ...
await index.AddAsync(vectorId, vector);
// Add multiple vectors
var batch = new List<(Guid id, List<float> vector)>();
for (int i = 0; i < 1000; i++)
{
var id = Guid.NewGuid();
var v = GenerateRandomVector(128); // Your vector generation logic
batch.Add((id, v));
}
await index.AddBatchAsync(batch);
// Search for nearest neighbors
var queryVector = new List<float>(128); // Your query embedding
// ... populate query vector ...
var neighbors = await index.GetTopKAsync(queryVector, k: 10);
foreach (var result in neighbors)
{
Console.WriteLine($"ID: {result.GUID}, Distance: {result.Distance:F4}");
}
// Save the index
var state = await index.ExportStateAsync();
// ... serialize state to disk ...
// Load the index
var newIndex = new HnswLite(dimension: 128);
await newIndex.ImportStateAsync(state);
Custom Storage Example
Refer to HnswIndex.RamStorage
and HnswIndex.SqliteStorage
for actual implementations. To implement your own backend, you need to implement:
IHnswLayerStorage
IHnswNode
IHnswStorage
License
This library is available under the MIT license.
Acknowledgments
This implementation is based on the paper: Efficient and robust approximate nearest neighbor search using Hierarchical Navigable Small World graphs by Yu. A. Malkov and D. A. Yashunin.
Product | Versions Compatible and additional computed target framework versions. |
---|---|
.NET | net8.0 is compatible. net8.0-android was computed. net8.0-browser was computed. net8.0-ios was computed. net8.0-maccatalyst was computed. net8.0-macos was computed. net8.0-tvos was computed. net8.0-windows was computed. net9.0 was computed. net9.0-android was computed. net9.0-browser was computed. net9.0-ios was computed. net9.0-maccatalyst was computed. net9.0-macos was computed. net9.0-tvos was computed. net9.0-windows was computed. net10.0 was computed. net10.0-android was computed. net10.0-browser was computed. net10.0-ios was computed. net10.0-maccatalyst was computed. net10.0-macos was computed. net10.0-tvos was computed. net10.0-windows was computed. |
-
net8.0
- HnswLite (>= 1.0.2)
NuGet packages
This package is not used by any NuGet packages.
GitHub repositories
This package is not used by any popular GitHub repositories.
Initial release