MarkZither.Rag.Chunking
0.0.1-alpha.1
See the version list below for details.
dotnet add package MarkZither.Rag.Chunking --version 0.0.1-alpha.1
NuGet\Install-Package MarkZither.Rag.Chunking -Version 0.0.1-alpha.1
<PackageReference Include="MarkZither.Rag.Chunking" Version="0.0.1-alpha.1" />
<PackageVersion Include="MarkZither.Rag.Chunking" Version="0.0.1-alpha.1" />
<PackageReference Include="MarkZither.Rag.Chunking" />
paket add MarkZither.Rag.Chunking --version 0.0.1-alpha.1
#r "nuget: MarkZither.Rag.Chunking, 0.0.1-alpha.1"
#:package MarkZither.Rag.Chunking@0.0.1-alpha.1
#addin nuget:?package=MarkZither.Rag.Chunking&version=0.0.1-alpha.1&prerelease
#tool nuget:?package=MarkZither.Rag.Chunking&version=0.0.1-alpha.1&prerelease
MarkZither.Rag.Chunking
Token-aware sliding-window chunking primitives for retrieval-augmented generation (RAG) pipelines in .NET.
Splits long documents into overlapping token-bounded chunks ready for embedding, with optional HTML stripping. No HtmlAgilityPack dependency.
Installation
dotnet add package MarkZither.Rag.Chunking --version 0.0.1-alpha.1
Quick Start
// Register via DI
services.AddChunking();
// Or resolve manually
ITokenEncoder encoder = new TiktokenEncoder();
IChunkingService chunker = new SlideWindowChunkingService(encoder);
var options = new ChunkOptions
{
ChunkSize = 512, // tokens per chunk
ChunkOverlap = 128, // overlap tokens between chunks
MaxChunksPerDocument = 200,
StripHtml = true // strip HTML before chunking
};
IReadOnlyList<TextChunk> chunks = await chunker.ChunkAsync(text, options, ct);
foreach (var chunk in chunks)
{
Console.WriteLine($"[{chunk.ChunkIndex + 1}/{chunk.TotalChunks}] {chunk.TokenCount} tokens: {chunk.Text[..Math.Min(80, chunk.Text.Length)]}...");
}
API
IChunkingService
Task<IReadOnlyList<TextChunk>> ChunkAsync(string text, ChunkOptions options, CancellationToken cancellationToken = default);
ChunkOptions
| Property | Default | Description |
|---|---|---|
ChunkSize |
512 |
Maximum tokens per chunk |
ChunkOverlap |
128 |
Token overlap between consecutive chunks |
MaxChunksPerDocument |
200 |
Hard cap on chunks produced per document |
StripHtml |
true |
Strip HTML tags before chunking |
TextChunk
record TextChunk(string Text, int ChunkIndex, int TotalChunks, int TokenCount);
ITokenEncoder
Abstraction over the tokenizer. Default implementation is TiktokenEncoder using cl100k_base encoding (compatible with nomic-embed-text, text-embedding-ada-002, and similar models).
DI Registration
// Registers TiktokenEncoder as ITokenEncoder and SlideWindowChunkingService as IChunkingService
services.AddChunking();
Supported Frameworks
net9.0net10.0
Versioning
This package follows SemVer 2.0. While the version is 0.*, minor bumps may include breaking changes. See ADR-0021 for the full versioning policy.
License
MIT
| Product | Versions Compatible and additional computed target framework versions. |
|---|---|
| .NET | net9.0 is compatible. net9.0-android was computed. net9.0-browser was computed. net9.0-ios was computed. net9.0-maccatalyst was computed. net9.0-macos was computed. net9.0-tvos was computed. net9.0-windows was computed. net10.0 is compatible. net10.0-android was computed. net10.0-browser was computed. net10.0-ios was computed. net10.0-maccatalyst was computed. net10.0-macos was computed. net10.0-tvos was computed. net10.0-windows was computed. |
-
net10.0
- Microsoft.Extensions.DependencyInjection.Abstractions (>= 10.0.8)
- Tiktoken (>= 2.0.3)
-
net9.0
- Microsoft.Extensions.DependencyInjection.Abstractions (>= 10.0.8)
- Tiktoken (>= 2.0.3)
NuGet packages
This package is not used by any NuGet packages.
GitHub repositories
This package is not used by any popular GitHub repositories.
| Version | Downloads | Last Updated |
|---|---|---|
| 0.0.1-alpha.2 | 32 | 5/22/2026 |
| 0.0.1-alpha.1 | 174 | 5/20/2026 |
Initial alpha release. API surface is unstable until v1.0.0.