D4S.Indexer.Domain
1.0.18
dotnet add package D4S.Indexer.Domain --version 1.0.18
NuGet\Install-Package D4S.Indexer.Domain -Version 1.0.18
<PackageReference Include="D4S.Indexer.Domain" Version="1.0.18" />
<PackageVersion Include="D4S.Indexer.Domain" Version="1.0.18" />
<PackageReference Include="D4S.Indexer.Domain" />
paket add D4S.Indexer.Domain --version 1.0.18
#r "nuget: D4S.Indexer.Domain, 1.0.18"
#:package D4S.Indexer.Domain@1.0.18
#addin nuget:?package=D4S.Indexer.Domain&version=1.0.18
#tool nuget:?package=D4S.Indexer.Domain&version=1.0.18
D4S.Indexer
A document indexing library for Azure AI Search. It extracts text from documents, generates vector embeddings, and uploads them as searchable chunks.
Architecture
D4S.Indexer.Domain Entities, abstractions (interfaces)
D4S.Indexer.Application Orchestration (DocumentIndexerService, DocumentExtractor)
D4S.Indexer.Infrastructure Azure implementations, builder, document processors, sources
Quick Start
var indexer = IndexerBuilder.Create("my-index")
.WithAzureSearch(searchEndpoint, searchKey)
.WithAzureOpenAI(aoaiEndpoint, aoaiKey, embeddingDeployment, embeddingDimensions)
.WithLocalFiles("./documents")
.WithFileMetadataFields()
.Build();
var result = await indexer.IndexAsync();
Key Interfaces
| Interface | Purpose |
|---|---|
IDocumentSource |
Enumerates documents from a data source |
IDocumentProcessor |
Extracts text and metadata from a document |
IEmbeddingService |
Generates vector embeddings |
ISearchIndexService |
Manages the search index (CRUD on chunks) |
ITextChunker |
Splits text into chunks |
IOcrService |
OCR for scanned/image documents |
IKeywordExtractor |
AI-based keyword extraction |
Built-in Document Sources
- LocalFileSystemDocumentSource — local filesystem with filtering and subdirectory scanning
- MultiSiteSharePointDocumentSource — multiple SharePoint sites via PnP Core
Built-in Document Processors
PDF, DOCX, XLSX, PPTX, TXT/Markdown.
Indexing Modes
Full Mode (default)
All documents are fetched from every source. Documents missing from the source list are automatically deleted from the index.
Delta Mode
Enabled via .WithDeltaMode(). Only changed/new/deleted documents are provided by the source. Deletion is driven by DocumentMetadata.DeletedDate — documents with a non-null DeletedDate are removed from the index. The implicit cleanup step is skipped.
var indexer = IndexerBuilder.Create("my-index")
.WithAzureSearch(searchEndpoint, searchKey)
.WithAzureOpenAI(aoaiEndpoint, aoaiKey, deployment, dimensions)
.WithDeltaMode()
.WithCustomDocumentSource<MyDeltaSource>(serviceProvider, "delta")
.Build();
The source provides SourceDocument instances. For documents to delete, set DeletedDate and pass null for GetContentAsync:
new SourceDocument(
new DocumentMetadata
{
Id = "doc-123",
LastModifiedDate = DateTimeOffset.UtcNow,
Extension = ".pdf",
DeletedDate = DateTimeOffset.UtcNow // signals deletion
},
GetContentAsync: null);
In both modes, the indexer compares LastModifiedDate against the index to decide whether to reindex or skip unchanged documents.
Builder Options
IndexerBuilder.Create("index-name")
// Required
.WithAzureSearch(endpoint, apiKey)
.WithAzureOpenAI(endpoint, apiKey, deployment, dimensions)
// Sources (at least one required)
.WithLocalFiles("./docs")
.WithLocalFiles(opts => { opts.Path = "./docs"; opts.FileExtensions = [".pdf", ".docx"]; })
.WithSharePointMultiSite(spOptions, contextFactory)
.WithCustomDocumentSource<T>(serviceProvider, serviceKey)
// Optional
.WithDeltaMode()
.WithFileMetadataFields()
.WithChunkSize(maxSize: 1000, overlap: 200)
.WithBatchSize(50)
.WithKeywordExtraction(gptDeployment, maxKeywords: 10)
.WithAzureDocumentIntelligence(endpoint, apiKey) // OCR
.WithCustomDocumentProcessor<T>(serviceProvider, serviceKey)
.ContinueOnError(true)
.Filter(meta => meta.Extension == ".pdf")
.ConfigureMetadata(meta => meta with { CustomFields = ... })
.AddCustomField("Status", CustomFieldType.String, filterable: true)
.AddIndexFieldsFromAttributes<MyModel>()
.OnProgress(p => Console.WriteLine(p.Phase))
.WithLogging()
.Build();
Samples
See src/Rag/samples/ for working examples: local files, SharePoint, OCR, and agentic retrieval.
| Product | Versions Compatible and additional computed target framework versions. |
|---|---|
| .NET | net10.0 is compatible. net10.0-android was computed. net10.0-browser was computed. net10.0-ios was computed. net10.0-maccatalyst was computed. net10.0-macos was computed. net10.0-tvos was computed. net10.0-windows was computed. |
-
net10.0
- No dependencies.
NuGet packages (2)
Showing the top 2 NuGet packages that depend on D4S.Indexer.Domain:
| Package | Downloads |
|---|---|
|
D4S.Indexer.Application
Application services and configuration for D4S Indexer. |
|
|
D4S.Indexer
D4S document indexer for Azure AI Search and RAG workflows. |
GitHub repositories
This package is not used by any popular GitHub repositories.