Chonkie.Embeddings
0.1.0-preview.87
dotnet add package Chonkie.Embeddings --version 0.1.0-preview.87
NuGet\Install-Package Chonkie.Embeddings -Version 0.1.0-preview.87
<PackageReference Include="Chonkie.Embeddings" Version="0.1.0-preview.87" />
<PackageVersion Include="Chonkie.Embeddings" Version="0.1.0-preview.87" />
<PackageReference Include="Chonkie.Embeddings" />
paket add Chonkie.Embeddings --version 0.1.0-preview.87
#r "nuget: Chonkie.Embeddings, 0.1.0-preview.87"
#:package Chonkie.Embeddings@0.1.0-preview.87
#addin nuget:?package=Chonkie.Embeddings&version=0.1.0-preview.87&prerelease
#tool nuget:?package=Chonkie.Embeddings&version=0.1.0-preview.87&prerelease
Chonkie.Net - The Lightweight RAG Ingestion Library
Chonkie.Net is an experimental .NET/C# port of Python Chonkie, providing fast, efficient, and robust text chunking for Retrieval-Augmented Generation (RAG) systems. This is an independent port and is not officially affiliated with the original Chonkie project.
Key Features
- Fast & Efficient - 10-100x faster than Python implementations
- 11 Specialized Chunkers - Choose the right chunker for your data type
- 7 Embedding Providers - OpenAI, Azure, Gemini, Cohere, VoyageAI, Jina, and ONNX local models
- 9 Vector Database Integrations - Pinecone, Qdrant, Chroma, Weaviate, MongoDB, Pgvector, Elasticsearch, Milvus, Turbopuffer
- 5 LLM Providers - OpenAI, Azure, Groq, Cerebras, Gemini
- ONNX Support - Local embeddings with SentenceTransformers
- Complete RAG Pipeline - End-to-end document processing for RAG
- No Dependencies Bloat - Minimal, modular architecture
- Type-Safe - Full C# 14 nullable reference types support
- 900+ Tests - Comprehensive unit and integration test suite
Quick Start
Installation
dotnet add package Chonkie.Net
Basic Chunking (30 seconds)
using Chonkie.Chunkers;
using Chonkie.Tokenizers;
// Create a chunker
var chunker = new RecursiveChunker(
tokenizer: new WordTokenizer(),
chunkSize: 512
);
// Chunk your text
var text = "Your document here...";
var chunks = chunker.Chunk(text);
// Use the chunks
foreach (var chunk in chunks)
{
Console.WriteLine($"Text: {chunk.Text}");
Console.WriteLine($"Tokens: {chunk.TokenCount}");
}
With Embeddings & Vector Database
using Chonkie.Embeddings;
using Chonkie.Handshakes;
// Create embeddings
var embeddings = new OpenAIEmbeddings(
apiKey: Environment.GetEnvironmentVariable("OPENAI_API_KEY")!
);
// Create vector database connection
var vectorDb = new PineconeHandshake(
apiKey: "your-pinecone-key",
indexName: "my-index",
embeddingModel: embeddings
);
// Store chunks with embeddings (vectorDb embeds internally)
await vectorDb.WriteAsync(chunks);
Documentation
- Quick Start Guide - Get started in 5 minutes
- RAG System Tutorial - Build a complete RAG system
- Chunker Selection Guide - Choose the right chunker
- Vector Database Integration - Connect to any vector DB
- Python Migration Guide - Coming from Python Chonkie?
Chunkers (11 Types)
| Chunker | Best For | Speed |
|---|---|---|
| TokenChunker | Simple, fast splitting | ⚡⚡⚡ |
| RecursiveChunker | Natural documents (RECOMMENDED) | ⚡⚡ |
| SentenceChunker | Sentence boundaries | ⚡⚡ |
| SemanticChunker | Meaning-aware grouping | ⚡ |
| CodeChunker | Source code | ⚡⚡ |
| TableChunker | Structured data | ⚡⚡ |
| MarkdownChunker | Markdown documents | ⚡⚡ |
| LateChunker | Two-stage processing | ⚡ |
| NeuralChunker | ONNX embeddings | ⚡ |
| SlumberChunker | Complex documents | ⚡ |
| FastChunker | High-speed splitting | ⚡⚡⚡ |
Embeddings (7 Providers)
- OpenAI
- Azure OpenAI
- Google Gemini
- Cohere
- VoyageAI
- Jina
- Local ONNX (SentenceTransformers)
LLM Providers (5 Types)
- OpenAI
- Azure OpenAI
- Groq (fast inference)
- Cerebras (ultra-fast)
- Google Gemini
Vector Databases (9 Integrations)
- Pinecone - Fully managed serverless
- Qdrant - Open-source vector search
- Chroma - Lightweight local embedding DB
- Weaviate - Open-source, flexible
- MongoDB - MongoDB Atlas Vector Search
- PostgreSQL - pgvector extension
- Elasticsearch - Search-optimized
- Milvus - High-performance distributed
- Turbopuffer - Real-time, edge-optimized
Common Use Cases
1. Document Ingestion for RAG
// Chunk documents, embed, and store in vector DB
var chunks = chunker.Chunk(document);
await vectorDb.WriteAsync(chunks);
2. Code Analysis
var codeChunker = new CodeChunker(
tokenizer: new WordTokenizer(),
chunkSize: 1024
);
var chunks = codeChunker.Chunk(sourceCode);
3. Semantic Search
var semanticChunker = new SemanticChunker(
tokenizer: new WordTokenizer(),
embeddingModel: embeddings,
threshold: 0.5f
);
var chunks = semanticChunker.Chunk(text);
// Chunks grouped by semantic meaning
4. RAG Pipeline
var pipeline = new Pipeline()
.ProcessWith("text")
.ChunkWith("recursive", new { chunk_size = 1024 })
.RunAsync(texts: documentText);
Why Chonkie.Net?
✅ Type Safety - Full C# 14 support
✅ Almost Production Ready - 900+ tests, zero warnings
✅ Extensively Documented - Tutorials and guides
✅ Complete Features - Feature parity with Python Chonkie, all major RAG components included
Minimum Requirements
- .NET 10.0 or higher
- C# 14 features enabled
- Windows, Linux, or macOS
Contributing
Contributions are welcome! Please visit GitHub Repository.
License
Licensed under Apache License 2.0. See LICENSE for details.
Learn More
- Official Repo: https://github.com/gianni-rg/Chonkie.Net
- Python Chonkie: https://github.com/chonkie-inc/chonkie
- Documentation: Check the
/docsfolder in the repository
| Product | Versions Compatible and additional computed target framework versions. |
|---|---|
| .NET | net10.0 is compatible. net10.0-android was computed. net10.0-browser was computed. net10.0-ios was computed. net10.0-maccatalyst was computed. net10.0-macos was computed. net10.0-tvos was computed. net10.0-windows was computed. |
-
net10.0
- Azure.AI.OpenAI (>= 2.8.0-beta.1)
- Chonkie.Core (>= 0.1.0-preview.87)
- Microsoft.Extensions.AI (>= 10.2.0)
- Microsoft.Extensions.AI.Abstractions (>= 10.2.0)
- Microsoft.Extensions.AI.OpenAI (>= 10.2.0-preview.1.26063.2)
- Microsoft.Extensions.Logging.Abstractions (>= 10.0.2)
- Microsoft.ML.OnnxRuntime (>= 1.24.1)
- Microsoft.ML.Tokenizers (>= 2.0.0)
- OllamaSharp (>= 5.4.16)
- OpenAI (>= 2.8.0)
- System.Numerics.Tensors (>= 10.0.2)
NuGet packages (5)
Showing the top 5 NuGet packages that depend on Chonkie.Embeddings:
| Package | Downloads |
|---|---|
|
Chonkie.Net
Meta-package that depends on all Chonkie.Net libraries. |
|
|
Chonkie.Handshakes
The lightweight ingestion library for fast, efficient and robust RAG pipelines. Chonkie.Net provides production-ready chunkers, embeddings, vector database integrations, and complete RAG system support. |
|
|
Chonkie.Refineries
The lightweight ingestion library for fast, efficient and robust RAG pipelines. Chonkie.Net provides production-ready chunkers, embeddings, vector database integrations, and complete RAG system support. |
|
|
Chonkie.Pipeline
The lightweight ingestion library for fast, efficient and robust RAG pipelines. Chonkie.Net provides production-ready chunkers, embeddings, vector database integrations, and complete RAG system support. |
|
|
Chonkie.Chunkers
The lightweight ingestion library for fast, efficient and robust RAG pipelines. Chonkie.Net provides production-ready chunkers, embeddings, vector database integrations, and complete RAG system support. |
GitHub repositories
This package is not used by any popular GitHub repositories.
| Version | Downloads | Last Updated |
|---|---|---|
| 0.1.0-preview.87 | 86 | 2/16/2026 |