Mythosia.VectorDb.Postgres
10.6.1
dotnet add package Mythosia.VectorDb.Postgres --version 10.6.1
NuGet\Install-Package Mythosia.VectorDb.Postgres -Version 10.6.1
<PackageReference Include="Mythosia.VectorDb.Postgres" Version="10.6.1" />
<PackageVersion Include="Mythosia.VectorDb.Postgres" Version="10.6.1" />
<PackageReference Include="Mythosia.VectorDb.Postgres" />
paket add Mythosia.VectorDb.Postgres --version 10.6.1
#r "nuget: Mythosia.VectorDb.Postgres, 10.6.1"
#:package Mythosia.VectorDb.Postgres@10.6.1
#addin nuget:?package=Mythosia.VectorDb.Postgres&version=10.6.1
#tool nuget:?package=Mythosia.VectorDb.Postgres&version=10.6.1
Mythosia.VectorDb.Postgres
PostgreSQL (pgvector) implementation of IVectorStore.
Single-table design with namespace column for logical isolation.
Migration
Run the following migration before deploying if your Postgres table was created by any earlier version of Mythosia.VectorDb.Postgres.
This normalizes the table to the current schema and is safe to run on already-updated tables.
If you configured custom names via PostgresOptions.SchemaName or PostgresOptions.TableName, replace every occurrence of "public"."vectors" in the SQL below to match your actual schema and table name.
-- Legacy rename step for older schemas that still use `collection`
DO $$
BEGIN
IF EXISTS (
SELECT 1
FROM information_schema.columns
WHERE table_schema = 'public'
AND table_name = 'vectors'
AND column_name = 'collection'
) AND EXISTS (
SELECT 1
FROM information_schema.columns
WHERE table_schema = 'public'
AND table_name = 'vectors'
AND column_name = 'namespace'
) THEN
EXECUTE 'ALTER TABLE "public"."vectors" RENAME COLUMN namespace TO scope';
EXECUTE 'ALTER TABLE "public"."vectors" RENAME COLUMN collection TO namespace';
END IF;
END $$;
-- Recreate index/primary key if the legacy schema was renamed above
DROP INDEX IF EXISTS idx_vectors_collection_ns;
CREATE INDEX IF NOT EXISTS idx_vectors_ns_scope ON "public"."vectors" (namespace, scope);
DO $$
BEGIN
IF EXISTS (
SELECT 1
FROM pg_constraint c
JOIN pg_class t ON t.oid = c.conrelid
JOIN pg_namespace n ON n.oid = t.relnamespace
WHERE n.nspname = 'public'
AND t.relname = 'vectors'
AND c.conname = 'vectors_pkey'
) THEN
EXECUTE 'ALTER TABLE "public"."vectors" DROP CONSTRAINT vectors_pkey';
END IF;
EXCEPTION
WHEN undefined_table THEN NULL;
END $$;
ALTER TABLE "public"."vectors"
ADD PRIMARY KEY (namespace, id);
-- Current v10.6.0 hybrid-search schema
ALTER TABLE "public"."vectors"
ADD COLUMN IF NOT EXISTS content_tsv tsvector;
UPDATE "public"."vectors"
SET content_tsv = to_tsvector('simple',
regexp_replace(
regexp_replace(coalesce(content, ''),
'([a-zA-Z0-9])([^\u0001-\u007F\s])', E'\\1 \\2', 'g'),
'([^\u0001-\u007F\s])([a-zA-Z0-9])', E'\\1 \\2', 'g'));
ALTER TABLE "public"."vectors"
ALTER COLUMN content_tsv SET NOT NULL;
COMMENT ON COLUMN "public"."vectors".content IS
'Nullable to support customer policies that prohibit storing original text while still allowing hybrid search via content_tsv.';
CREATE INDEX IF NOT EXISTS idx_vectors_content_tsv
ON "public"."vectors" USING gin (content_tsv);
Notes:
- The
collection→namespacerename block only affects old legacy schemas. - The
content_tsvmigration is the current breaking schema change because hybrid search now depends on a persistedcontent_tsvcolumn instead of recalculatingto_tsvector(content)at query time. - If you are creating a new deployment from scratch, you do not need this migration; the current schema below is sufficient.
- The sample SQL assumes the default
SchemaName = "public"andTableName = "vectors".
Prerequisites
- PostgreSQL 12+
- pgvector extension installed:
CREATE EXTENSION IF NOT EXISTS vector;
- (Optional, for
TextSearchMode.Trigram) pg_trgm extension:
CREATE EXTENSION IF NOT EXISTS pg_trgm;
pg_trgmis a standard PostgreSQL contrib module, available on most managed services (Azure, Supabase, RDS, etc.). WhenEnsureSchema = trueandTextSearchMode = Trigram, the extension is created automatically.
Quick Start
using Mythosia.VectorDb;
using Mythosia.VectorDb.Postgres;
var store = new PostgresStore(new PostgresOptions
{
ConnectionString = "Host=localhost;Database=mydb;Username=postgres;Password=secret",
Dimension = 1536,
EnsureSchema = true, // auto-creates table + indexes
Index = new HnswIndexOptions { M = 16, EfConstruction = 64, EfSearch = 40 }
});
// Fluent API (recommended)
var ns = store.InNamespace("my-namespace");
await ns.UpsertAsync(record);
var results = await ns.SearchAsync(queryVector, topK: 5);
// With scope
var scoped = ns.InScope("tenant-1");
await scoped.UpsertAsync(record); // record.Scope set automatically
var scopedResults = await scoped.SearchAsync(queryVector);
ERD
erDiagram
vectors {
text namespace PK "NOT NULL — logical namespace"
text id PK "NOT NULL — unique record ID within namespace"
text scope "NULL — optional sub-namespace isolation"
text content "NULL — original text content, optional when only content_tsv may be stored"
tsvector content_tsv "NOT NULL — persisted full-text search vector for hybrid search"
jsonb metadata "NOT NULL DEFAULT '{}' — arbitrary key-value pairs"
vector embedding "NOT NULL — vector(dimension) for similarity search"
timestamptz created_at "NOT NULL DEFAULT now()"
timestamptz updated_at "NOT NULL DEFAULT now()"
}
Single-table design: All namespaces share one table. The composite primary key
(namespace, id)ensures uniqueness per namespace.
Indexes
| Index | Type | Target | Purpose |
|---|---|---|---|
| PK | btree | (namespace, id) |
Primary key / upsert conflict |
idx_*_embedding |
hnsw / ivfflat | embedding vector_*_ops |
ANN similarity search (distance strategy dependent) |
idx_*_metadata |
gin | metadata |
jsonb containment filter (@>) |
idx_*_content_tsv |
gin | content_tsv |
Full-text search index for hybrid lexical retrieval (TsVector mode) |
idx_*_content_trgm |
gin | content gin_trgm_ops |
Trigram similarity index for hybrid search (Trigram mode, auto-created when configured) |
idx_*_ns_scope |
btree | (namespace, scope) |
Scope-scoped queries |
Schema
When EnsureSchema = true, the following is created automatically:
CREATE EXTENSION IF NOT EXISTS vector;
CREATE TABLE IF NOT EXISTS "public"."vectors" (
namespace text NOT NULL,
id text NOT NULL,
scope text NULL,
content text NULL,
content_tsv tsvector NOT NULL,
metadata jsonb NOT NULL DEFAULT '{}'::jsonb,
embedding vector(1536) NOT NULL,
created_at timestamptz NOT NULL DEFAULT now(),
updated_at timestamptz NOT NULL DEFAULT now(),
PRIMARY KEY (namespace, id)
);
ALTER TABLE "public"."vectors"
ADD COLUMN IF NOT EXISTS content_tsv tsvector;
UPDATE "public"."vectors"
SET content_tsv = to_tsvector('simple',
regexp_replace(
regexp_replace(coalesce(content, ''),
'([a-zA-Z0-9])([^\u0001-\u007F\s])', E'\\1 \\2', 'g'),
'([^\u0001-\u007F\s])([a-zA-Z0-9])', E'\\1 \\2', 'g'));
ALTER TABLE "public"."vectors"
ALTER COLUMN content_tsv SET NOT NULL;
COMMENT ON COLUMN "public"."vectors".content IS
'Nullable to support customer policies that prohibit storing original text while still allowing hybrid search via content_tsv.';
-- Indexes
CREATE INDEX IF NOT EXISTS idx_vectors_metadata
ON "public"."vectors" USING gin (metadata);
CREATE INDEX IF NOT EXISTS idx_vectors_content_tsv
ON "public"."vectors" USING gin (content_tsv);
CREATE INDEX IF NOT EXISTS idx_vectors_ns_scope
ON "public"."vectors" (namespace, scope);
-- vector index (default: HNSW)
CREATE INDEX IF NOT EXISTS idx_vectors_embedding
ON "public"."vectors" USING hnsw (embedding vector_cosine_ops) WITH (m = 16, ef_construction = 64);
Notes:
- The vector index SQL changes by
Indextype (HnswIndexOptions/IvfFlatIndexOptions/NoIndexOptions). - The operator class changes by
DistanceStrategy:Cosine→vector_cosine_opsEuclidean→vector_l2_opsInnerProduct→vector_ip_ops
When EnsureSchema = false (recommended for production), the table must already exist.
An InvalidOperationException is thrown with a clear message if the table is missing.
Manual Schema Setup (Production)
For production deployments, create the schema manually before starting the application:
-- 1. Enable pgvector
CREATE EXTENSION IF NOT EXISTS vector;
-- 2. Create table (adjust dimension as needed)
CREATE TABLE public.vectors (
namespace text NOT NULL,
id text NOT NULL,
scope text NULL,
content text NULL,
content_tsv tsvector NOT NULL,
metadata jsonb NOT NULL DEFAULT '{}'::jsonb,
embedding vector(1536) NOT NULL,
created_at timestamptz NOT NULL DEFAULT now(),
updated_at timestamptz NOT NULL DEFAULT now(),
PRIMARY KEY (namespace, id)
);
COMMENT ON COLUMN public.vectors.content IS
'Nullable to support customer policies that prohibit storing original text while still allowing hybrid search via content_tsv.';
-- 3. Indexes
CREATE INDEX idx_vectors_metadata
ON public.vectors USING gin (metadata);
CREATE INDEX idx_vectors_content_tsv
ON public.vectors USING gin (content_tsv);
CREATE INDEX idx_vectors_ns_scope
ON public.vectors (namespace, scope);
-- 4-A. Option A (recommended default): HNSW
CREATE INDEX idx_vectors_embedding
ON public.vectors USING hnsw (embedding vector_cosine_ops) WITH (m = 16, ef_construction = 64);
-- 4-B. Option B: IVFFlat (create after loading data)
-- ivfflat requires rows to exist for training
-- CREATE INDEX idx_vectors_embedding
-- ON public.vectors USING ivfflat (embedding vector_cosine_ops) WITH (lists = 100);
-- 5. Analyze for query planner (recommended)
ANALYZE public.vectors;
-- 6. (Optional, for TextSearchMode.Trigram) Trigram index
-- CREATE EXTENSION IF NOT EXISTS pg_trgm;
-- CREATE INDEX idx_vectors_content_trgm
-- ON public.vectors USING gin (content gin_trgm_ops);
Options
| Option | Default | Description |
|---|---|---|
ConnectionString |
(required) | PostgreSQL connection string |
Dimension |
(required) | Embedding vector dimension (e.g., 1536 for OpenAI) |
SchemaName |
"public" |
Database schema |
TableName |
"vectors" |
Table name |
EnsureSchema |
false |
Auto-create extension/table/indexes |
DistanceStrategy |
Cosine |
Similarity metric (Cosine, Euclidean, InnerProduct) |
Index |
new HnswIndexOptions() |
Vector index settings object (HnswIndexOptions, IvfFlatIndexOptions, NoIndexOptions) |
HnswIndexOptions.M |
16 |
HNSW build param (m), typical range 8-64 |
HnswIndexOptions.EfConstruction |
64 |
HNSW build param (ef_construction), typical range 32-400 |
HnswIndexOptions.EfSearch |
40 |
HNSW runtime ef_search default |
IvfFlatIndexOptions.Lists |
100 |
Number of IVF lists for the ivfflat index |
IvfFlatIndexOptions.Probes |
10 |
IVFFlat runtime probes default |
FailFastOnIndexCreationFailure |
true |
Throw when vector index creation fails (recommended for production) |
TextSearchMode |
TsVector |
Text search strategy for hybrid search (TsVector or Trigram) |
TextSearchConfig |
"simple" |
PostgreSQL text search configuration for to_tsvector / to_tsquery (only used in TsVector mode) |
Runtime Tuning Guide (DX)
IvfFlatSearchRuntimeOptions.Probes: increase for better recall, decrease for lower latency.HnswSearchRuntimeOptions.EfSearch: increase for better recall, decrease for lower latency.IvfFlatIndexOptions.Lists: start aroundsqrt(total_rows)and tune from there.
Use runtime options matching your index settings:
Index = new HnswIndexOptions(...)→HnswSearchRuntimeOptionsIndex = new IvfFlatIndexOptions(...)→IvfFlatSearchRuntimeOptions
Use named presets for convenience, or set explicit values:
// Named presets (static properties)
await store.SearchAsync(queryVector, topK: 10, filter, HnswSearchRuntimeOptions.Fast);
await store.SearchAsync(queryVector, topK: 10, filter, HnswSearchRuntimeOptions.Balanced);
await store.SearchAsync(queryVector, topK: 10, filter, HnswSearchRuntimeOptions.HighRecall);
// Explicit override
await store.SearchAsync(queryVector, topK: 10, filter, new HnswSearchRuntimeOptions { EfSearch = 80 });
await store.SearchAsync(queryVector, topK: 10, filter, new IvfFlatSearchRuntimeOptions { Probes = 20 });
Recommended starting points:
| Goal | IvfFlatSearchRuntimeOptions.Probes |
HnswSearchRuntimeOptions.EfSearch |
|---|---|---|
| Fast | 4 | 16 |
| Balanced | 10 | 40 |
| HighRecall | 32 | 120 |
These are practical ranges, not strict hard limits. Final values should be chosen from production latency/recall measurements.
Namespace & Filter Behavior
⚠ Deprecation Notice —
VectorRecord.Namespace,VectorRecord.Scope, and theInNamespace()/InScope()fluent API are deprecated and will be removed in a future major version. UseMetadataentries (e.g.Metadata["namespace"],Metadata["scope"]) andVectorFilter.Where("namespace", value)for logical isolation instead. This aligns with industry-standard vector database designs where all filtering is done via metadata/payload.
- Namespaces are currently stored as a
namespacecolumn (this column will be removed in a future version). - Namespace is optional for queries — when
VectorFilter.Namespaceis null, theWHERE namespace = @nsclause is omitted, allowing searches and deletes across all namespaces. - Upsert always requires a namespace — if
record.Namespaceis null, it falls back to"default"because the DB column isNOT NULLand part of the primary key. - Recommended — store isolation keys in
Metadataand filter withVectorFilter.Where()instead of usingNamespace/Scopeproperties. - Metadata filter — SQL per operator:
Eq:metadata @> @val(JSONB containment — uses GIN index)Ne:metadata->>'key' != @valIn:metadata->>'key' = ANY(@vals)NotIn:NOT (metadata->>'key' = ANY(@vals))Gt / Gte / Lt / Lte:metadata->>'key' > @val(lexicographic)Like:metadata->>'key' LIKE @valExists:jsonb_exists(metadata, 'key')NotExists:NOT jsonb_exists(metadata, 'key')And / Or groups: wrapped in(...)withAND/OR
- MinScore filter (distance-strategy dependent):
Cosine:1 - (embedding <=> @q::vector) >= @minScoreEuclidean:1 / (1 + (embedding <-> @q::vector)) >= @minScoreInnerProduct:-(embedding <#> @q::vector) >= @minScore
VectorFilter Examples
For the full operator reference and fluent API examples (Where, WhereNot, WhereIn, WhereLike, WhereExists, Or, And, WithMinScore, etc.), see the Mythosia.VectorDb.Abstractions README.
Hybrid Search
PostgresStore supports native IVectorStore.HybridSearchAsync for hybrid search. When called, it executes a single SQL query using CTEs — combining the vector similarity leg and the text search leg — then merges results via Reciprocal Rank Fusion (RRF).
TextSearchMode.TsVector (default)
Uses PostgreSQL tsvector / tsquery full-text search with a persisted content_tsv column plus GIN index. Works well for European languages with good built-in text search configurations.
var store = new PostgresStore(new PostgresOptions
{
ConnectionString = connString,
Dimension = 1536,
EnsureSchema = true,
// TextSearchMode = TextSearchMode.TsVector (default)
});
TextSearchMode.Trigram
Uses pg_trgm extension with word_similarity matching. Better for CJK languages (Korean, Japanese, Chinese) and agglutinative languages where PostgreSQL lacks built-in morphological analysis.
var store = new PostgresStore(new PostgresOptions
{
ConnectionString = connString,
Dimension = 1536,
EnsureSchema = true,
TextSearchMode = TextSearchMode.Trigram // pg_trgm word_similarity
});
Why Trigram for Korean/CJK?
PostgreSQL’ssimpletext search config tokenizes by whitespace only. Korean particles (조사/어미) attach to words, so"opm에"≠"opm은"— no match. Trigram splits text into 3-character grams and uses substring similarity, bypassing morphological analysis entirely.
content may remain nullable for deployments where original text storage is prohibited, but content_tsv is required for lexical retrieval in TsVector mode.
Batch Get & Count
// Fetch multiple records by ID — single query using WHERE id = ANY(@ids)
var records = await store.InNamespace("docs").GetBatchAsync(new[] { "id-1", "id-2", "id-3" });
// Count all records in a namespace
long count = await store.InNamespace("docs").CountAsync();
// Count with additional metadata filter
long filtered = await store.InNamespace("docs").CountAsync(
new VectorFilter().Where("storage_id", storageId));
// Count across all namespaces
long total = await store.CountAsync();
GetBatchAsync uses a single WHERE id = ANY(@ids) query with Npgsql array binding. Applies the full filter (namespace, scope, metadata) in the same WHERE clause. CountAsync uses SELECT COUNT(*) with optional clauses for namespace, scope, and jsonb containment.
Atomic Vector Replacement
ReplaceByFilterAsync wraps DELETE + INSERT in a single PostgreSQL transaction, eliminating the query gap that occurs during re-embedding:
// Delete all vectors matching the filter, then insert new ones — atomically
var filter = new VectorFilter { Namespace = "default" }
.Where("full_path", "/docs/policy.md");
await store.ReplaceByFilterAsync(filter, newRecords);
How it works:
BEGIN TRANSACTION
DELETE FROM vectors WHERE metadata->>'full_path' = '/docs/policy.md'
INSERT INTO vectors (...) VALUES (...), (...), ...
COMMIT
If the INSERT fails, the DELETE is rolled back and existing data remains intact. Queries always see either the old data or the new data, never an empty result.
RAG Integration
var store = await RagStore.BuildAsync(config => config
.AddText("Your document text here", id: "doc-1")
.UseLocalEmbedding(512)
.UseVectorStore(new PostgresStore(new PostgresOptions
{
ConnectionString = Environment.GetEnvironmentVariable("MYTHOSIA_PG_CONN")!,
Dimension = 512,
EnsureSchema = true,
Index = new HnswIndexOptions()
}))
.WithTopK(5)
);
Connection Verification
Call VerifyConnectionAsync to test TCP connectivity and authentication before running queries:
var store = new PostgresStore(new PostgresOptions
{
ConnectionString = connString,
Dimension = 1536
});
try
{
await store.VerifyConnectionAsync();
Console.WriteLine("Connected!");
}
catch (Exception ex)
{
Console.WriteLine($"Connection failed: {ex.Message}");
}
Resource Disposal
PostgresStore implements IDisposable. Dispose the store when it is no longer needed to release internal resources (schema initialization lock):
using var store = new PostgresStore(options);
// ... use store
Each operation opens and closes its own NpgsqlConnection from the connection pool — the store itself holds no open connections between calls.
Performance Tips
- ivfflat lists: Rule of thumb —
lists = sqrt(total_rows). Default 100 is good for up to ~10K rows. - Run
ANALYZE vectors;after bulk inserts for optimal query plans. - For large datasets (1M+ rows), consider HNSW index (
CREATE INDEX ... USING hnsw) instead of ivfflat. - Use connection pooling (e.g.,
Npgsqlconnection stringPooling=true;Maximum Pool Size=20).
EnsureSchema Guidance
EnsureSchema = true: Development, testing, local Docker — auto-provisions everything.EnsureSchema = false: Production — schema managed by DBA/migration tools; fails fast with clear error if missing.- For
ivfflat, index creation can fail on empty tables (PostgreSQL/pgvector behavior). In that case, useHnswor createivfflatafter loading data. FailFastOnIndexCreationFailure = true(default): throws immediately if vector index creation fails.FailFastOnIndexCreationFailure = false: startup continues even if vector index creation fails.
| Product | Versions Compatible and additional computed target framework versions. |
|---|---|
| .NET | net10.0 is compatible. net10.0-android was computed. net10.0-browser was computed. net10.0-ios was computed. net10.0-maccatalyst was computed. net10.0-macos was computed. net10.0-tvos was computed. net10.0-windows was computed. |
-
net10.0
- Mythosia.VectorDb.Abstractions (>= 3.0.1)
- Npgsql (>= 10.0.2)
NuGet packages
This package is not used by any NuGet packages.
GitHub repositories
This package is not used by any popular GitHub repositories.
v10.6.1: Namespace filtering is now optional — null namespace omits WHERE clause. Deprecated: VectorRecord.Namespace/Scope, VectorFilter.Namespace/Scope/WithNamespace, INamespaceContext, IScopeContext, InNamespace()/InScope() — use Metadata and VectorFilter.Where() for logical isolation instead.