AiGeekSquad.AIContext
1.0.42
dotnet add package AiGeekSquad.AIContext --version 1.0.42
NuGet\Install-Package AiGeekSquad.AIContext -Version 1.0.42
<PackageReference Include="AiGeekSquad.AIContext" Version="1.0.42" />
<PackageVersion Include="AiGeekSquad.AIContext" Version="1.0.42" />
<PackageReference Include="AiGeekSquad.AIContext" />
paket add AiGeekSquad.AIContext --version 1.0.42
#r "nuget: AiGeekSquad.AIContext, 1.0.42"
#:package AiGeekSquad.AIContext@1.0.42
#addin nuget:?package=AiGeekSquad.AIContext&version=1.0.42
#tool nuget:?package=AiGeekSquad.AIContext&version=1.0.42
AiGeekSquad.AIContext
A comprehensive C# library for AI-powered context management, providing intelligent text processing capabilities for modern AI applications. This library combines semantic text chunking and Maximum Marginal Relevance (MMR) algorithms to help you build better RAG systems, search engines, and content recommendation platforms.
✨ Key Features
- 🧠 Semantic Text Chunking: Intelligent text splitting based on semantic similarity analysis
- 🎯 Maximum Marginal Relevance (MMR): High-performance algorithm for relevance-diversity balance
- ⚖️ Generic Ranking Engine: Multi-criteria ranking with weighted scoring functions and normalization strategies
- 🛠️ Extensible Architecture: Dependency injection ready with clean interfaces
- 📊 High Performance: Optimized for .NET 9.0 with comprehensive benchmarks
🚀 Quick Start
Installation
dotnet add package AiGeekSquad.AIContext
Basic Usage
Semantic Text Chunking
using AiGeekSquad.AIContext.Chunking;
// Create a chunker with your embedding provider
var tokenCounter = new MLTokenCounter();
var embeddingGenerator = new YourEmbeddingProvider(); // Implement IEmbeddingGenerator
var chunker = SemanticTextChunker.Create(tokenCounter, embeddingGenerator);
// Configure chunking for your use case
var options = new SemanticChunkingOptions
{
MaxTokensPerChunk = 512,
MinTokensPerChunk = 10,
BreakpointPercentileThreshold = 0.75 // Higher = more semantic breaks
};
// Process a document with metadata
var text = @"
Artificial intelligence is transforming how we work and live. Machine learning
algorithms can process vast amounts of data to find patterns humans might miss.
In the business world, companies adopt AI for customer service, fraud detection,
and process automation. Chatbots handle routine inquiries while algorithms
detect suspicious transactions in real-time.";
var metadata = new Dictionary<string, object>
{
["Source"] = "AI Technology Overview",
["DocumentId"] = "doc-123"
};
await foreach (var chunk in chunker.ChunkAsync(text, metadata, options))
{
Console.WriteLine($"Chunk {chunk.StartIndex}-{chunk.EndIndex}:");
Console.WriteLine($" Text: {chunk.Text.Trim()}");
Console.WriteLine($" Tokens: {chunk.Metadata["TokenCount"]}");
Console.WriteLine($" Segments: {chunk.Metadata["SegmentCount"]}");
Console.WriteLine();
}
Maximum Marginal Relevance for Diverse Results
using MathNet.Numerics.LinearAlgebra;
using AiGeekSquad.AIContext.Ranking;
// Simulate document embeddings (from your vector database)
var documents = new List<Vector<double>>
{
Vector<double>.Build.DenseOfArray(new double[] { 0.9, 0.1, 0.0 }), // ML intro
Vector<double>.Build.DenseOfArray(new double[] { 0.85, 0.15, 0.0 }), // Advanced ML (similar!)
Vector<double>.Build.DenseOfArray(new double[] { 0.1, 0.8, 0.1 }), // Sports content
Vector<double>.Build.DenseOfArray(new double[] { 0.0, 0.1, 0.9 }) // Cooking content
};
var documentTitles = new[]
{
"Introduction to Machine Learning",
"Advanced Machine Learning Techniques", // Very similar to first
"Basketball Training Guide",
"Italian Cooking Recipes"
};
// User query: interested in machine learning
var query = Vector<double>.Build.DenseOfArray(new double[] { 0.9, 0.1, 0.0 });
// Compare pure relevance vs MMR
Console.WriteLine("Pure Relevance (λ = 1.0):");
var pureRelevance = MaximumMarginalRelevance.ComputeMMR(
vectors: documents, query: query, lambda: 1.0, topK: 3);
foreach (var (index, score) in pureRelevance)
Console.WriteLine($" {documentTitles[index]} (score: {score:F3})");
Console.WriteLine("\nMMR Balanced (λ = 0.7):");
var mmrResults = MaximumMarginalRelevance.ComputeMMR(
vectors: documents, query: query, lambda: 0.7, topK: 3);
foreach (var (index, score) in mmrResults)
Console.WriteLine($" {documentTitles[index]} (score: {score:F3})");
// MMR avoids selecting both similar ML documents!
⚖️ Generic Ranking Engine for Multi-Criteria Ranking
The Generic Ranking Engine combines multiple scoring functions with configurable weights and normalization strategies to rank items based on multiple criteria. It supports positive weights for similarity scoring and negative weights for dissimilarity scoring.
Key Features:
- Multiple scoring functions with individual weights and normalizers
- Built-in normalizers: MinMax, ZScore, Percentile
- Combination strategies: WeightedSum, Reciprocal Rank Fusion, Hybrid
- Extensible architecture for custom scoring functions and strategies
using System;
using System.Collections.Generic;
using System.Linq;
using AiGeekSquad.AIContext.Ranking;
using AiGeekSquad.AIContext.Ranking.Normalizers;
using AiGeekSquad.AIContext.Ranking.Strategies;
// Example: Ranking search results with multiple criteria
public class SearchResult
{
public string Title { get; set; }
public double RelevanceScore { get; set; }
public int PopularityRank { get; set; }
public DateTime PublishedDate { get; set; }
}
// Custom scoring functions
public class RelevanceScorer : IScoringFunction<SearchResult>
{
public string Name => "Relevance";
public double ComputeScore(SearchResult item) => item.RelevanceScore;
public double[] ComputeScores(IReadOnlyList<SearchResult> items) =>
items.Select(ComputeScore).ToArray();
}
public class PopularityScorer : IScoringFunction<SearchResult>
{
public string Name => "Popularity";
public double ComputeScore(SearchResult item) => 1.0 / item.PopularityRank;
public double[] ComputeScores(IReadOnlyList<SearchResult> items) =>
items.Select(ComputeScore).ToArray();
}
// Create search results
var results = new List<SearchResult>
{
new() { Title = "AI Guide", RelevanceScore = 0.9, PopularityRank = 5 },
new() { Title = "ML Tutorial", RelevanceScore = 0.7, PopularityRank = 1 },
new() { Title = "Data Science", RelevanceScore = 0.8, PopularityRank = 3 }
};
// Configure scoring functions with weights and normalization
var scoringFunctions = new List<WeightedScoringFunction<SearchResult>>
{
new(new RelevanceScorer(), weight: 0.7) { Normalizer = new MinMaxNormalizer() },
new(new PopularityScorer(), weight: 0.3) { Normalizer = new ZScoreNormalizer() }
};
// Rank using WeightedSum strategy
var engine = new RankingEngine<SearchResult>();
var rankedResults = engine.Rank(results, scoringFunctions, new WeightedSumStrategy());
foreach (var result in rankedResults)
{
Console.WriteLine($"Rank {result.Rank}: {result.Item.Title} (Score: {result.FinalScore:F3})");
}
Available Normalizers:
MinMaxNormalizer
: Scales scores to [0,1] rangeZScoreNormalizer
: Standardizes scores using mean and standard deviationPercentileNormalizer
: Converts scores to percentile ranks
Available Strategies:
WeightedSumStrategy
: Simple weighted combination of normalized scoresReciprocalRankFusionStrategy
: Combines rankings using reciprocal rank fusionHybridStrategy
: Combines multiple strategies with configurable weights
🎯 Real-World Examples
Complete RAG System Pipeline
using AiGeekSquad.AIContext.Chunking;
using AiGeekSquad.AIContext.Ranking;
// 1. INDEXING: Chunk documents for vector storage
var documents = new[] { "AI research paper content...", "ML tutorial content..." };
var allChunks = new List<TextChunk>();
foreach (var doc in documents)
{
await foreach (var chunk in chunker.ChunkAsync(doc, metadata))
{
allChunks.Add(chunk);
// Store chunk.Text and embedding in your vector database
}
}
// 2. RETRIEVAL: User asks a question
var userQuestion = "What are the applications of machine learning?";
var queryEmbedding = await embeddingGenerator.GenerateEmbeddingAsync(userQuestion);
// Get candidate chunks from vector database (similarity search)
var candidates = await vectorDb.SearchSimilarAsync(queryEmbedding, topK: 20);
// 3. CONTEXT SELECTION: Use MMR for diverse, relevant context
var selectedContext = MaximumMarginalRelevance.ComputeMMR(
vectors: candidates.Select(c => c.Embedding).ToList(),
query: queryEmbedding,
lambda: 0.8, // Prioritize relevance but ensure diversity
topK: 5 // Limit context for LLM token limits
);
// 4. GENERATION: Send to LLM with selected context
var contextText = string.Join("\n\n",
selectedContext.Select(s => candidates[s.Index].Text));
var prompt = $"Context:\n{contextText}\n\nQuestion: {userQuestion}\nAnswer:";
var response = await llm.GenerateAsync(prompt);
Smart Document Processing
// Custom splitter for legal documents
var legalSplitter = SentenceTextSplitter.WithPattern(
@"(?<=\d+\.)\s+(?=[A-Z])"); // Split on numbered sections
var chunker = SemanticTextChunker.Create(tokenCounter, embeddingGenerator, legalSplitter);
// Process with domain-specific options
var options = new SemanticChunkingOptions
{
MaxTokensPerChunk = 1024, // Larger chunks for legal context
BreakpointPercentileThreshold = 0.8 // More conservative splitting
};
await foreach (var chunk in chunker.ChunkAsync(legalDocument, metadata, options))
{
// Each chunk maintains legal context integrity
await indexService.AddChunkAsync(chunk);
}
Content Recommendation with Diversity
// User has read these articles (represented as embeddings)
var userHistory = new List<Vector<double>> { /* user's read articles */ };
// Available articles to recommend
var availableArticles = new List<(string title, Vector<double> embedding)>
{
("Machine Learning Basics", mlBasicsEmbedding),
("Advanced ML Techniques", advancedMlEmbedding), // Similar to above
("Data Science Career Guide", dataScienceEmbedding),
("Python Programming Tips", pythonEmbedding)
};
// User's interests (derived from their history)
var userInterestVector = ComputeUserInterestVector(userHistory);
// Get diverse recommendations (avoid recommending similar content)
var recommendations = MaximumMarginalRelevance.ComputeMMR(
vectors: availableArticles.Select(a => a.embedding).ToList(),
query: userInterestVector,
lambda: 0.6, // Balance relevance with diversity
topK: 3
);
foreach (var (index, score) in recommendations)
{
Console.WriteLine($"Recommended: {availableArticles[index].title}");
}
📝 Markdown Support
The SentenceTextSplitter
now supports markdown-aware text splitting, providing intelligent handling of markdown documents while preserving the structural integrity of markdown elements. This feature is especially powerful for processing documentation, README files, and other markdown content in RAG systems.
Key Features
- 🎯 Atomic Markdown Elements: Lists, headers, code blocks, and other markdown elements are treated as indivisible segments
- 📋 Comprehensive List Support: Handles unordered (
-
,*
,+
), ordered (1.
,2.
), and nested lists - 💻 Code Preservation: Maintains fenced code blocks, indented code, and inline code as complete units
- 🔗 Link and Image Handling: Preserves markdown links and images within their containing paragraphs
- 📖 Blockquote Support: Treats blockquote lines as atomic segments
- 🏷️ Header Recognition: Each markdown header becomes a separate segment
Factory Methods
using AiGeekSquad.AIContext.Chunking;
// Enable markdown-aware splitting with default sentence pattern
var markdownSplitter = SentenceTextSplitter.ForMarkdown();
// Use custom pattern with markdown awareness
var customMarkdownSplitter = SentenceTextSplitter.WithPatternForMarkdown(@"(?<=\.)\s+(?=[A-Z])");
// Compare with regular mode (backward compatible)
var regularSplitter = SentenceTextSplitter.Default;
Markdown vs Regular Mode Comparison
var markdownText = @"
# Introduction
This is a paragraph with a sentence. Another sentence here.
- First list item with text
- Second item
- Nested item
" + "```csharp" + @"
var code = ""Hello World"";
" + "```" + @"
> This is a blockquote.
> Second line of quote.
";
// Regular mode - splits by sentences, ignores markdown structure
var regularSplitter = SentenceTextSplitter.Default;
await foreach (var segment in regularSplitter.SplitAsync(markdownText))
{
Console.WriteLine($"Regular: {segment.Text}");
}
// Markdown mode - preserves markdown structure
var markdownSplitter = SentenceTextSplitter.ForMarkdown();
await foreach (var segment in markdownSplitter.SplitAsync(markdownText))
{
Console.WriteLine($"Markdown: {segment.Text}");
}
Regular Mode Output:
Regular: # Introduction
Regular: This is a paragraph with a sentence.
Regular: Another sentence here.
Regular: - First list item with text
Regular: - Second item
Regular: - Nested item
// ... splits code block and quotes by sentences
Markdown Mode Output:
Markdown: # Introduction
Markdown: This is a paragraph with a sentence.
Markdown: Another sentence here.
Markdown: - First list item with text
Markdown: - Second item
Markdown: - Nested item
Markdown: ```csharp
var code = "Hello World";
List Handling Examples
Markdown mode excels at handling various list types as atomic segments:
Unordered Lists
var splitter = SentenceTextSplitter.ForMarkdown();
var listText = @"
- First item with multiple sentences. This stays together!
* Second item using asterisk
+ Third item using plus sign
";
await foreach (var segment in splitter.SplitAsync(listText))
{
Console.WriteLine($"List item: {segment.Text}");
}
// Output:
// List item: - First item with multiple sentences. This stays together!
// List item: * Second item using asterisk
// List item: + Third item using plus sign
Ordered Lists
var orderedText = @"
1. First numbered item
2. Second item with details. Multiple sentences preserved.
3. Third item
";
await foreach (var segment in splitter.SplitAsync(orderedText))
{
Console.WriteLine($"Ordered: {segment.Text}");
}
// Each numbered item becomes one segment, regardless of internal sentences
Nested Lists
var nestedText = @"
- Parent item
- Child item one
- Child item two
* Grandchild item
- Another parent
";
await foreach (var segment in splitter.SplitAsync(nestedText))
{
Console.WriteLine($"Nested: '{segment.Text}'");
}
// Output preserves indentation:
// Nested: '- Parent item'
// Nested: ' - Child item one'
// Nested: ' - Child item two'
// Nested: ' * Grandchild item'
// Nested: '- Another parent'
Best Practices
- Documentation Processing: Use markdown mode when processing technical documentation, README files, or any structured markdown content
- Preserve Context: Markdown elements like lists and code blocks maintain their formatting and context
- RAG Systems: Ideal for RAG systems that need to preserve markdown structure for better context retrieval
- Mixed Content: Handles documents that mix regular prose with markdown elements seamlessly
When to Use Each Mode
Use Case | Regular Mode | Markdown Mode |
---|---|---|
Plain text documents | ✅ | ❌ |
Email content | ✅ | ❌ |
Technical documentation | ❌ | ✅ |
README files | ❌ | ✅ |
API documentation | ❌ | ✅ |
Mixed markdown/prose | ❌ | ✅ |
Code-heavy documents | ❌ | ✅ |
⚙️ Configuration
Chunking Options
Option | Default | Description |
---|---|---|
MaxTokensPerChunk |
512 | Maximum tokens per chunk |
MinTokensPerChunk |
10 | Minimum tokens per chunk |
BreakpointPercentileThreshold |
0.75 | Semantic breakpoint sensitivity |
BufferSize |
1 | Context window for embedding generation |
EnableEmbeddingCaching |
true | Cache embeddings for performance |
Custom Text Splitters
The SentenceTextSplitter
class provides intelligent sentence boundary detection with support for both regular text and markdown content:
- Default Pattern: Optimized for English text with built-in handling of common titles and abbreviations
- Handled Abbreviations: Mr., Mrs., Ms., Dr., Prof., Sr., Jr.
- Custom Patterns: Create domain-specific splitters for specialized content
- 🆕 Markdown Support: New markdown-aware mode preserves markdown structure and elements
// Default splitter - handles English titles automatically
var defaultSplitter = SentenceTextSplitter.Default;
// Custom pattern for numbered sections (e.g., legal documents)
var customSplitter = SentenceTextSplitter.WithPattern(@"(?<=\.)\s+(?=\d+\.)");
// NEW: Markdown-aware splitters
var markdownSplitter = SentenceTextSplitter.ForMarkdown();
var customMarkdownSplitter = SentenceTextSplitter.WithPatternForMarkdown(@"(?<=\.)\s+(?=[A-Z])");
// Use with semantic chunker
var chunker = SemanticTextChunker.Create(tokenCounter, embeddingGenerator, customSplitter);
// Use markdown splitter for documentation processing
var markdownChunker = SemanticTextChunker.Create(tokenCounter, embeddingGenerator, markdownSplitter);
Note: The default pattern prevents incorrect sentence breaks after common English titles like "Dr. Smith" or "Mrs. Johnson", ensuring better semantic coherence in your chunks. For markdown content, use the new markdown-aware factory methods to preserve the structural integrity of lists, code blocks, headers, and other markdown elements.
🏗️ Core Interfaces
Implement these interfaces to integrate with your AI infrastructure:
// Implement for your embedding provider
public interface IEmbeddingGenerator
{
IAsyncEnumerable<Vector<double>> GenerateBatchEmbeddingsAsync(
IEnumerable<string> texts,
CancellationToken cancellationToken = default);
}
// Implement for custom text splitting
public interface ITextSplitter
{
IAsyncEnumerable<TextSegment> SplitAsync(
string text,
CancellationToken cancellationToken = default);
}
// Real token counting
public interface ITokenCounter
{
Task<int> CountTokensAsync(string text, CancellationToken cancellationToken = default);
}
📊 Performance
- Semantic Chunking: Streaming processing with
IAsyncEnumerable
for large documents - MMR Algorithm: ~2ms for 1,000 vectors, ~120KB memory allocation
- Token Counting: Real GPT-4 compatible tokenizer using Microsoft.ML.Tokenizers
📦 Dependencies
- MathNet.Numerics (v5.0.0): Vector operations and similarity calculations
- Microsoft.ML.Tokenizers (v0.22.0): Real tokenization for accurate token counting
- .NET 9.0: Target framework for optimal performance
📖 Additional Resources
- Repository: Source code and development information
- MMR Documentation: Detailed MMR algorithm documentation
- Examples: Sample implementations and use cases
- API Reference: Complete API documentation
🌟 Support
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Documentation: Wiki
Built with ❤️ for the AI community by AiGeekSquad
Product | Versions Compatible and additional computed target framework versions. |
---|---|
.NET | net5.0 was computed. net5.0-windows was computed. net6.0 was computed. net6.0-android was computed. net6.0-ios was computed. net6.0-maccatalyst was computed. net6.0-macos was computed. net6.0-tvos was computed. net6.0-windows was computed. net7.0 was computed. net7.0-android was computed. net7.0-ios was computed. net7.0-maccatalyst was computed. net7.0-macos was computed. net7.0-tvos was computed. net7.0-windows was computed. net8.0 was computed. net8.0-android was computed. net8.0-browser was computed. net8.0-ios was computed. net8.0-maccatalyst was computed. net8.0-macos was computed. net8.0-tvos was computed. net8.0-windows was computed. net9.0 was computed. net9.0-android was computed. net9.0-browser was computed. net9.0-ios was computed. net9.0-maccatalyst was computed. net9.0-macos was computed. net9.0-tvos was computed. net9.0-windows was computed. net10.0 was computed. net10.0-android was computed. net10.0-browser was computed. net10.0-ios was computed. net10.0-maccatalyst was computed. net10.0-macos was computed. net10.0-tvos was computed. net10.0-windows was computed. |
.NET Core | netcoreapp3.0 was computed. netcoreapp3.1 was computed. |
.NET Standard | netstandard2.1 is compatible. |
MonoAndroid | monoandroid was computed. |
MonoMac | monomac was computed. |
MonoTouch | monotouch was computed. |
Tizen | tizen60 was computed. |
Xamarin.iOS | xamarinios was computed. |
Xamarin.Mac | xamarinmac was computed. |
Xamarin.TVOS | xamarintvos was computed. |
Xamarin.WatchOS | xamarinwatchos was computed. |
-
.NETStandard 2.1
- Markdig (>= 0.41.3)
- MathNet.Numerics (>= 5.0.0)
- Microsoft.ML.Tokenizers (>= 1.0.2)
- Microsoft.ML.Tokenizers.Data.Cl100kBase (>= 1.0.2)
NuGet packages (1)
Showing the top 1 NuGet packages that depend on AiGeekSquad.AIContext:
Package | Downloads |
---|---|
AiGeekSquad.AIContext.MEAI
Microsoft Extensions AI Abstractions adapter for AiGeekSquad.AIContext semantic chunking library. Enables seamless integration between Microsoft's AI abstractions and AIContext's semantic text chunking capabilities by providing an adapter that converts between Microsoft's IEmbeddingGenerator interface and AIContext's embedding requirements. |
GitHub repositories
This package is not used by any popular GitHub repositories.
Version | Downloads | Last Updated |
---|---|---|
1.0.42 | 119 | 8/19/2025 |
1.0.41 | 138 | 8/14/2025 |
1.0.39 | 138 | 8/14/2025 |
1.0.38 | 134 | 8/14/2025 |
1.0.37 | 136 | 8/14/2025 |
1.0.35 | 216 | 8/6/2025 |
1.0.33 | 200 | 8/4/2025 |
1.0.32 | 112 | 7/29/2025 |
1.0.31 | 113 | 7/29/2025 |
1.0.30 | 109 | 7/29/2025 |
1.0.27 | 109 | 7/29/2025 |
1.0.26 | 479 | 7/22/2025 |
1.0.25 | 475 | 7/22/2025 |
1.0.24 | 476 | 7/22/2025 |
1.0.21 | 476 | 7/22/2025 |
1.0.20 | 475 | 7/21/2025 |
1.0.19 | 74 | 7/11/2025 |
1.0.18 | 70 | 7/11/2025 |
1.0.17 | 80 | 7/11/2025 |
1.0.16 | 78 | 7/11/2025 |
1.0.15 | 79 | 7/11/2025 |
1.0.14 | 87 | 7/11/2025 |