HnswLite.RamStorage
2.0.1
dotnet add package HnswLite.RamStorage --version 2.0.1
NuGet\Install-Package HnswLite.RamStorage -Version 2.0.1
<PackageReference Include="HnswLite.RamStorage" Version="2.0.1" />
<PackageVersion Include="HnswLite.RamStorage" Version="2.0.1" />
<PackageReference Include="HnswLite.RamStorage" />
paket add HnswLite.RamStorage --version 2.0.1
#r "nuget: HnswLite.RamStorage, 2.0.1"
#:package HnswLite.RamStorage@2.0.1
#addin nuget:?package=HnswLite.RamStorage&version=2.0.1
#tool nuget:?package=HnswLite.RamStorage&version=2.0.1
<img src="https://raw.githubusercontent.com/jchristn/HnswLite/main/assets/logo.png" width="256" height="256">
HnswLite
A pure C# implementation of Hierarchical Navigable Small World (HNSW) graphs for approximate nearest neighbor search. HnswLite ships as an embeddable library, a REST server, a React dashboard, and SDKs in three languages.
Note: This library is in its early stages of development. We welcome your patience, constructive feedback, and contributions! Please be kind and considerate when reporting issues or suggesting improvements. I am not an expert on this topic and relied heavily on available AI tools to build this library. Pull requests are greatly appreciated!
Overview
HnswLite implements the Hierarchical Navigable Small World algorithm, which provides fast approximate nearest-neighbor search with excellent recall rates. The library is designed to be embeddable, extensible, and easy to use from any .NET application — or from Python / JavaScript / any HTTP client via the REST server.
Repository layout
| Path | Purpose |
|---|---|
src/HnswIndex/ |
Core library (HnswLite on NuGet) |
src/HnswIndex.RamStorage/ |
In-memory storage provider |
src/HnswIndex.SqliteStorage/ |
SQLite storage provider |
src/HnswIndex.PostgresqlStorage/ |
PostgreSQL storage provider |
src/HnswIndex.Server/ |
Standalone REST server (Watson 7) |
src/Test.Shared/ + src/Test.{Automated,XUnit,NUnit,MSTest}/ |
Touchstone-driven test suites |
dashboard/ |
React 19 + Vite dashboard |
sdk/csharp/, sdk/python/, sdk/js/ |
Client SDKs with 100% endpoint coverage |
docker/ |
compose.yaml for server + dashboard, plus factory-reset scripts |
Key features
- Pure C# implementation — no native dependencies.
- Thread-safe, async/await with cancellation tokens throughout.
- Async-only
IStorageProviderinterface - build your own backend by implementing one provider contract. - Storage backends - PostgreSQL, SQLite, and RAM.
- Multiple distance metrics — Euclidean, Cosine, Dot Product, with SIMD acceleration via
System.Numerics.Vector<float>. - Batch operations — efficient bulk insert and remove.
- Persistence by default - the REST server and Docker deployment default to PostgreSQL; SQLite remains available for embedded and fallback deployments.
- Paginated enumeration contract across every GET collection endpoint (
EnumerationQuery/EnumerationResult<T>). - OPTIONS preflight + CORS out of the box in the REST server.
New in v2.0.0
- Breaking async storage API.
IHnswStorage,IHnswLayerStorage,IHnswNode, andIStorageProviderare async-only. Node metadata is updated throughSetMetadataAsync(...), and providers implementIAsyncDisposable. - PostgreSQL provider.
HnswLite.PostgresqlStoragestores index metadata, vectors, graph layers, neighbors, and vector metadata in PostgreSQL using async Npgsql APIs. - PostgreSQL multi-index model. One PostgreSQL schema stores all logical indexes in shared tables partitioned by
hnsw_indexes.id/index_id. - PostgreSQL server default. New server-created indexes default to
PostgreSQLunlessStorageTypeis explicitly set toSQLiteorRAM. - Docker PostgreSQL deployment.
docker/compose.yamlstarts PostgreSQL, runs a one-shot provisioner for schema/default records, then starts the server and dashboard. - SDK and test updates. C#, Python, and JS/TS SDK examples and harnesses default to Docker's PostgreSQL-backed server.
New in v1.2.0
- Metadata filters. Both
POST /v1.0/indexes/{name}/searchandGET /v1.0/indexes/{name}/vectorsnow accept optionalLabels,Tags, andCaseInsensitiveparameters. Filtering uses AND semantics on both — every label must be present and every tag key/value must match for a record to be kept. WhenCaseInsensitiveis true, labels, tag keys, and tag values are all compared usingStringComparison.OrdinalIgnoreCase. FilteredCounton responses. BothSearchResponseandEnumerationResult<T>now include aFilteredCountinteger reporting how many candidates/records were dropped by the metadata filter — so callers can tell at a glance whether a restrictive filter is responsible for a short page.- Full coverage across C# / Python / JS SDKs and the dashboard (Search and Vectors pages).
Filtering by labels and tags
Request body (search):
POST /v1.0/indexes/demo/search
{
"Vector": [0.1, 0.2, 0.3, 0.4],
"K": 10,
"Labels": ["red", "small"],
"Tags": { "env": "prod", "owner": "alice" },
"CaseInsensitive": false
}
Query-string (enumerate):
curl -H "x-api-key: $API_KEY" \
"http://localhost:8080/v1.0/indexes/demo/vectors?labels=red,small&tags=env:prod,owner:alice&caseInsensitive=true&includeVectors=false"
Both endpoints return a FilteredCount alongside the existing fields:
{
"Results": [ ... ],
"SearchTimeMs": 2.41,
"FilteredCount": 3
}
Limitations (v1.2):
- Labels passed via query string cannot contain
,; tag keys cannot contain:or,; tag values cannot contain,. Use the JSON body form (POST /search) when filter tokens contain these characters. - Tag values are compared as strings (via
Convert.ToString(value, InvariantCulture)on the stored side). Numeric / boolean tag values stringify predictably (42→"42",true→"True"). - Search applies the filter after HNSW traversal, so restrictive filters can return fewer than K results —
FilteredCounttells you how many were dropped.
New in v1.1.x
See CHANGELOG.md for the full list. Highlights:
Platform
- Multi-target
net8.0+net10.0across the library, server, and tests. - Watson web server upgraded to
7.0.11. OPTIONS pre-flight is handled by Watson's native hook and bypasses authentication; CORS response headers are emitted on every response from a configurableCorsblock inhnswindex.json.
Vector metadata
- Every vector now carries optional
Name(string),Labels(list of strings), andTags(string → object dictionary) alongside its GUID and float array. - Metadata is exposed as mutable properties on
IHnswNode. SQLite writes are immediate — every setter commits anUPDATE, so metadata survives even an unclean process crash. - The REST API accepts and returns metadata on every vector endpoint (add, batch-add, enumerate, get-single, search).
- The dashboard Vectors table shows Name and Labels; the Add / Edit / Search-result-detail modals all expose all three fields.
Storage abstraction
IStorageProvider- a single interface that combinesIHnswStorage,IHnswLayerStorage, transaction/flush hooks, andIAsyncDisposable.HnswIndexaccepts it via a provider constructor.RamStorageProviderandSqliteStorageProviderconsolidate the previous pair-of-objects setup into one lifecycle-managed instance.
Server persistence
- Historical v1.1 default:
StorageTypechanged fromRAMtoSQLitein v1.1. In v2.0.0, the server and Docker default isPostgreSQL. - Server-owned metadata (GUID / dimension / distance function / M / MaxM / EfConstruction / created timestamp) is persisted inside each SQLite
.dbfile via the library'shnsw_metadatakey/value table under aserver.*key prefix. No manifest file — the database IS the manifest. IndexManagerscans the SQLite directory on startup, opens every.db, and re-registers the index. Indexes survive restarts.
Paginated enumeration across every GET
GET /v1.0/indexesis paginated. Query-string parameters populate anEnumerationQuery; response is anEnumerationResult<T>. No more "get all" endpoints.- New
GET /v1.0/indexes/{name}/vectors— paginated vector enumeration with anincludeVectors=true|falseswitch for whether vector bodies are inlined. - New
GET /v1.0/indexes/{name}/vectors/{guid}— fetch a single vector (always includes theVectorarray).
Performance
- SIMD-accelerated distance functions (
Euclidean,Cosine,DotProduct) viaSystem.Numerics.Vector<float>+CollectionsMarshal.AsSpan, with a scalar fallback. Task.Runwrappers removed fromSelectNeighborsHeuristicAsync,GreedySearchLayerAsync, andSearchLayerAsync— async state-machine allocation eliminated on the search hot path.- Pre-fetch + cached node references in neighbor selection — O(N²) storage round-trips collapsed to O(N).
- In-place sort in neighbor selection (no
.OrderBy().ToList()allocations). ContainsKey+ indexer →TryGetValueacross hot paths.ConfigureAwait(false)on every library-internal await.- Bounded
SearchContextcache (default 50k nodes) to prevent unbounded memory growth on large searches. - Span-based SQLite vector serialization (
MemoryMarshal.AsBytes/MemoryMarshal.Cast<byte, float>). - Sparse neighbor map in
RamHnswNode—HashSet<Guid>?[]indexed by layer (max 64) replacesDictionary<int, HashSet<Guid>>. MinHeap.GetAll()switched from LINQ.OrderBy().ThenBy()to in-place heap extraction.- SQLite connection consolidation — both constructors now share a single helper that applies WAL + synchronous + cache +
mmap_size=256MB+wal_autocheckpoint=1000PRAGMAs (previously only the default-table-name constructor was configured).
See archive/PERFORMANCE_IMPROVEMENTS.md for details and remaining future work.
Testing
- Unified Touchstone test suite: tests are defined once in
Test.Sharedand executed by four runners (Test.Automatedconsole,Test.XUnit,Test.NUnit,Test.MSTest). Coverage grew from 23 to 53 cases across 11 suites including concurrency, cross-storage parity, and cluster-recall scenarios.
Dashboard
- React 19 + Vite 6 + TypeScript dashboard at
dashboard/with pages for Indices, Vectors (browse / edit / add / delete with an index dropdown and Add-vector modal), Search, Request History (30-day browser-local capture with hour / day / week / month ranges), API Explorer, Server Info, Settings, plus a login flow. - Docker image
jchristn77/hnswlite-dashboardwith nginx serving the SPA and proxying/v1.0/to the server container.
SDKs
Three new SDKs with 100% endpoint coverage + integration test harnesses:
- C# (
HnswLite.Sdk) —net8.0/net10.0. - Python (
hnswlite-sdk) — Python 3.9+,requests. - JS / TS (
hnswlite-sdk) — Node 18+, zero runtime deps, nativefetch.
Docker
docker/compose.yamlruns the server and dashboard together.docker/factory/reset.bat+reset.sh— factory-reset scripts.clean.bat+clean.shin the server output directory — deletehnswindex.json/data//logs/for a fresh start.
Use cases
- Semantic search — find similar documents / sentences from embeddings.
- Recommendation systems — discover similar items / users / content.
- Image similarity — search on feature vectors.
- Anomaly detection — identify outliers by neighbour distance.
- Clustering — group similar items.
- RAG — retrieval-augmented generation for LLM applications.
- Duplicate detection — find near-duplicate content at scale.
Performance & scalability
Recommended limits
- Vector dimensions: 50–1000 (optimal: 128–768).
- Dataset size: up to 1–10M vectors depending on dimension and RAM.
- Memory usage: approximately
(vector_count × dimension × 4 bytes) + (vector_count × M × 32 bytes).
These are estimates. The library has not been exhaustively load-tested.
Parameters
M— connections per vector (default 16). More connections → better recall, more memory. 16–32 works well for most cases.EfConstruction— construction search depth (default 200). Higher → better graph quality, slower builds. Drop to 50–100 for fast batch insertion.Ef— search depth (default 50–200). Higher → better recall, slower search.Seed— fix for reproducible builds.
Tips
- Use
AddNodesAsync(...)/RemoveNodesAsync(...)for batches — they acquire the write lock once. - Prefer
PostgresqlStorageProviderfor server and Docker persistence,SqliteStorageProviderfor local embedded persistence, andRamStorageProviderfor ephemeral in-memory indexes. - For high-dimensional embeddings use
CosineDistance.
Simple example (embedded)
using Hnsw;
using Hnsw.RamStorage;
using HnswIndex.PostgresqlStorage;
using HnswIndex.SqliteStorage;
// RAM
await using RamStorageProvider ram = new RamStorageProvider();
HnswIndex index = new HnswIndex(128, ram);
// PostgreSQL (server/Docker default)
string connectionString = "Host=localhost;Port=5432;Database=hnswlite;Username=hnswlite;Password=hnswlite";
await using PostgresqlStorageProvider postgres = await PostgresqlStorageProvider.CreateAsync(
connectionString,
"my-index",
dimension: 128);
HnswIndex persistentIndex = new HnswIndex(128, postgres);
// Or SQLite (local embedded persistence)
await using SqliteStorageProvider sqlite = await SqliteStorageProvider.CreateAsync("my-index.db");
HnswIndex sqliteIndex = new HnswIndex(128, sqlite);
// Configure
index.M = 16;
index.EfConstruction = 200;
index.DistanceFunction = new CosineDistance();
// Add a single vector
Guid id = Guid.NewGuid();
List<float> vector = new List<float>(128); // your 128-d embedding
await index.AddAsync(id, vector);
// Add a batch
Dictionary<Guid, List<float>> batch = new Dictionary<Guid, List<float>>();
for (int i = 0; i < 1000; i++) batch[Guid.NewGuid()] = GenerateRandomVector(128);
await index.AddNodesAsync(batch);
// Search
List<float> query = new List<float>(128);
IEnumerable<VectorResult> neighbors = await index.GetTopKAsync(query, count: 10);
foreach (VectorResult r in neighbors)
Console.WriteLine($"id={r.GUID} distance={r.Distance:F4}");
// Export / import state
HnswState state = await index.ExportStateAsync();
await using RamStorageProvider restoredStorage = new RamStorageProvider();
HnswIndex restored = new HnswIndex(128, restoredStorage);
await restored.ImportStateAsync(state);
Best practices
- Resource management.
IStorageProviderisIAsyncDisposable- useawait usingto guarantee flush on scope exit. - Prefer batches. Calling
AddNodesAsyncis substantially faster than a loop ofAddAsyncbecause it acquires the write lock once. - Tune
Efat search time.IEnumerable<VectorResult> quick = await index.GetTopKAsync(query, 10, ef: 50); // fast, lower recall IEnumerable<VectorResult> better = await index.GetTopKAsync(query, 10, ef: 400); // slower, higher recall
Custom storage backend
Implement IStorageProvider (which aggregates IHnswStorage, IHnswLayerStorage, transactional hooks, flush, and IAsyncDisposable). See RamStorageProvider, SqliteStorageProvider, and PostgresqlStorageProvider as reference implementations. The server and dashboard are provider-agnostic.
REST server
cd src/HnswIndex.Server
dotnet run -- --setup # writes hnswindex.json with a generated admin API key
dotnet run
The server listens on http://localhost:8080 by default. Authentication uses the x-api-key header (configurable via Server.AdminApiKeyHeader). OPTIONS pre-flight is unauthenticated and served by Watson's preflight hook; CORS headers are emitted on every response and configured under the Cors block in hnswindex.json.
Full endpoint reference: REST_API.md. Interactive reference: HNSW Index.postman_collection.json.
Test runners
The shared Touchstone tests can be run through Test.Automated, xUnit, NUnit, or MSTest. Test.Automated accepts storage overrides directly:
dotnet run --project src/Test.Automated/Test.Automated.csproj -- --storage sqlite --filename test.db
dotnet run --project src/Test.Automated/Test.Automated.csproj -- --storage postgresql --host localhost --user hnsw --pass password --schema public --databasename hnswtest
The adapter projects use the same shared configuration through environment variables before dotnet test: HNSWLITE_TEST_STORAGE, HNSWLITE_TEST_SQLITE_FILENAME, HNSWLITE_TEST_POSTGRES_CONNECTION, or PostgreSQL components HNSWLITE_TEST_POSTGRES_HOST, HNSWLITE_TEST_POSTGRES_PORT, HNSWLITE_TEST_POSTGRES_USER, HNSWLITE_TEST_POSTGRES_PASSWORD, HNSWLITE_TEST_POSTGRES_DATABASE, and HNSWLITE_TEST_POSTGRES_SCHEMA.
Dashboard
React 19 + Vite 6 + TypeScript dashboard at dashboard/. Pages include Indices, Vectors (browse / edit / add / delete), Search, Request History with an activity chart, API Explorer, Server Info, Settings, plus a login flow.
# Local development
cd dashboard
npm install
HNSWLITE_SERVER_URL=http://localhost:8080 npm run dev
# Production build (static assets in dashboard/dist)
npm run build
SDKs
| Language | Directory | Package | Runtime |
|---|---|---|---|
| C# | sdk/csharp/ |
HnswLite.Sdk |
.NET 8 or .NET 10 |
| Python | sdk/python/ |
hnswlite-sdk |
Python 3.9+ |
| JavaScript / TypeScript | sdk/js/ |
hnswlite-sdk |
Node 18+ (native fetch) |
Each SDK has 100% endpoint coverage and a test harness. See sdk/README.md for the method matrix and per-language READMEs.
Docker
cd docker
docker compose up -d --build
- Server:
http://localhost:8080/ - Dashboard:
http://localhost:8081/dashboard/ - Storage: PostgreSQL by default, provisioned by the Compose stack
Build and push both release images with one tag:
build-all.bat v2.0.0
Factory reset (with RESET confirmation):
cd docker/factory
./reset.sh # or reset.bat on Windows
See docker/README.md for image tags and environment overrides.
Bugs, feedback, or enhancement requests
- Bug reports: please file an issue with reproduction steps.
- Feature requests: open a discussion or create an issue.
- Questions: use the discussions forum.
- Contributions: pull requests welcome.
License
MIT. See LICENSE.md.
Acknowledgments
Based on Efficient and robust approximate nearest neighbor search using Hierarchical Navigable Small World graphs by Yu. A. Malkov and D. A. Yashunin.
| Product | Versions Compatible and additional computed target framework versions. |
|---|---|
| .NET | net8.0 is compatible. net8.0-android was computed. net8.0-browser was computed. net8.0-ios was computed. net8.0-maccatalyst was computed. net8.0-macos was computed. net8.0-tvos was computed. net8.0-windows was computed. net9.0 was computed. net9.0-android was computed. net9.0-browser was computed. net9.0-ios was computed. net9.0-maccatalyst was computed. net9.0-macos was computed. net9.0-tvos was computed. net9.0-windows was computed. net10.0 is compatible. net10.0-android was computed. net10.0-browser was computed. net10.0-ios was computed. net10.0-maccatalyst was computed. net10.0-macos was computed. net10.0-tvos was computed. net10.0-windows was computed. |
NuGet packages
This package is not used by any NuGet packages.
GitHub repositories
This package is not used by any popular GitHub repositories.
Initial release