FluxCurator.Core 0.1.0

There is a newer version of this package available.
See the version list below for details.
dotnet add package FluxCurator.Core --version 0.1.0
                    
NuGet\Install-Package FluxCurator.Core -Version 0.1.0
                    
This command is intended to be used within the Package Manager Console in Visual Studio, as it uses the NuGet module's version of Install-Package.
<PackageReference Include="FluxCurator.Core" Version="0.1.0" />
                    
For projects that support PackageReference, copy this XML node into the project file to reference the package.
<PackageVersion Include="FluxCurator.Core" Version="0.1.0" />
                    
Directory.Packages.props
<PackageReference Include="FluxCurator.Core" />
                    
Project file
For projects that support Central Package Management (CPM), copy this XML node into the solution Directory.Packages.props file to version the package.
paket add FluxCurator.Core --version 0.1.0
                    
#r "nuget: FluxCurator.Core, 0.1.0"
                    
#r directive can be used in F# Interactive and Polyglot Notebooks. Copy this into the interactive tool or source code of the script to reference the package.
#:package FluxCurator.Core@0.1.0
                    
#:package directive can be used in C# file-based apps starting in .NET 10 preview 4. Copy this into a .cs file before any lines of code to reference the package.
#addin nuget:?package=FluxCurator.Core&version=0.1.0
                    
Install as a Cake Addin
#tool nuget:?package=FluxCurator.Core&version=0.1.0
                    
Install as a Cake Tool

FluxCurator

Clean, protect, and chunk your text for RAG pipelines — no dependencies required.

NuGet .NET License

Overview

FluxCurator is a text preprocessing library for RAG (Retrieval-Augmented Generation) pipelines. It provides PII masking, content filtering, and intelligent text chunking with first-class Korean language support.

Zero Dependencies Philosophy: Core functionality (FluxCurator.Core) works standalone with no external dependencies. The main package (FluxCurator) adds optional LocalEmbedder integration for semantic chunking.

Features

  • PII Masking - Auto-detect and mask emails, phone numbers, Korean RRN, credit cards
  • Content Filtering - Filter harmful content with customizable rules and blocklists
  • Smart Chunking - Rule-based chunking (sentence, paragraph, token)
  • Semantic Chunking - Embedding-based chunking for semantic boundaries
  • Hierarchical Chunking - Document structure-aware chunking with parent-child relationships
  • Korean-First Design - Optimized for Korean text (습니다체, 해요체, sentence endings)
  • Multi-Language Support - 11 languages including Korean, English, Japanese, Chinese
  • Pipeline Processing - Combine filtering, masking, and chunking in one call
  • Dependency Injection - Full DI support with IServiceCollection extensions
  • FileFlux Integration - Seamless integration with FileFlux document processing

Installation

# Main package (includes LocalEmbedder for semantic chunking)
dotnet add package FluxCurator

# Core package only (zero dependencies)
dotnet add package FluxCurator.Core

Quick Start

Basic Chunking

using FluxCurator;
using FluxCurator.Core.Domain;

// Create curator with default options
var curator = new FluxCurator();

// Chunk text using sentence strategy
var chunks = await curator.ChunkAsync(text);

foreach (var chunk in chunks)
{
    Console.WriteLine($"Chunk {chunk.Index + 1}/{chunk.TotalChunks}:");
    Console.WriteLine(chunk.Content);
    Console.WriteLine($"Tokens: ~{chunk.Metadata.EstimatedTokenCount}");
}

Dependency Injection

// Program.cs or Startup.cs
services.AddFluxCurator(options =>
{
    options.DefaultChunkOptions = ChunkOptions.ForRAG;
    options.EnablePIIMasking = true;
    options.EnableContentFiltering = true;
});

// Or with LocalEmbedder for semantic chunking
services.AddFluxCuratorWithLocalEmbedder(options =>
{
    options.DefaultChunkOptions = new ChunkOptions
    {
        Strategy = ChunkingStrategy.Semantic,
        TargetChunkSize = 512
    };
});

Using IChunkerFactory

// Inject IChunkerFactory for flexible chunker creation
public class MyService
{
    private readonly IChunkerFactory _chunkerFactory;

    public MyService(IChunkerFactory chunkerFactory)
    {
        _chunkerFactory = chunkerFactory;
    }

    public async Task<IReadOnlyList<DocumentChunk>> ProcessAsync(string text)
    {
        // Create specific chunker
        var chunker = _chunkerFactory.CreateChunker(ChunkingStrategy.Hierarchical);
        return await chunker.ChunkAsync(text, ChunkOptions.Default);
    }
}

PII Masking

// Enable PII masking
var curator = new FluxCurator()
    .WithPIIMasking();

// Mask PII in text
var result = curator.MaskPII("Contact: 010-1234-5678, Email: test@example.com");
Console.WriteLine(result.MaskedText);
// Output: "Contact: [PHONE], Email: [EMAIL]"

Korean RRN Detection

var curator = new FluxCurator()
    .WithPIIMasking(PIIMaskingOptions.ForKorean);

var result = curator.MaskPII("RRN: 901231-1234567");
// Output: "RRN: [RRN]"
// Validates using Modulo-11 checksum algorithm

Hierarchical Chunking

var curator = new FluxCurator()
    .WithChunkingOptions(opt =>
    {
        opt.Strategy = ChunkingStrategy.Hierarchical;
        opt.MaxChunkSize = 1024;
    });

var chunks = await curator.ChunkAsync(markdownText);

foreach (var chunk in chunks)
{
    // Access hierarchy information
    var level = chunk.Metadata.Custom?["HierarchyLevel"];
    var parentId = chunk.Metadata.Custom?["ParentId"];
    var sectionPath = chunk.Location.SectionPath;

    Console.WriteLine($"[Level {level}] {sectionPath}");
    Console.WriteLine(chunk.Content);
}

Full Pipeline Processing

// Complete preprocessing pipeline
var curator = new FluxCurator()
    .WithContentFiltering()
    .WithPIIMasking()
    .WithChunkingOptions(ChunkOptions.ForKorean);

// Process: Filter → Mask PII → Chunk
var result = await curator.PreprocessAsync(text);

Console.WriteLine(result.GetSummary());
// Output: "Produced 5 chunk(s). Filtered 2 content item(s). Masked 3 PII item(s)."

Semantic Chunking

// With LocalEmbedder integration (auto-loaded via DI)
var curator = new FluxCurator()
    .UseEmbedder(myEmbedder)
    .WithChunkingOptions(opt =>
    {
        opt.Strategy = ChunkingStrategy.Semantic;
        opt.SemanticSimilarityThreshold = 0.5f;
    });

var chunks = await curator.ChunkAsync(text);
// Chunks at natural semantic boundaries

Chunking Strategies

Strategy Description Embedder Required Best For
Auto Automatically select best strategy No General use
Sentence Split by sentence boundaries No Conversational text
Paragraph Split by paragraph boundaries No Structured documents
Token Split by token count No Consistent chunk sizes
Semantic Split by semantic similarity Yes RAG applications
Hierarchical Preserve document structure with parent-child relationships No Technical docs, Markdown

Supported Languages

FluxCurator includes language profiles for accurate sentence detection and token estimation:

Language Code Features
Korean ko 습니다체/해요체 endings, Korean sentence markers
English en Standard sentence boundaries
Japanese ja Japanese sentence endings (。、!?)
Chinese (Simplified) zh Chinese punctuation
Chinese (Traditional) zh-TW Traditional Chinese support
Spanish es Spanish punctuation
French fr French punctuation
German de German punctuation
Portuguese pt Portuguese punctuation
Russian ru Cyrillic support
Arabic ar RTL and Arabic punctuation

PII Types Supported

Type Description Validation
Email Email addresses TLD validation
Phone Phone numbers (KR/US/International) Format validation
KoreanRRN Korean Resident Registration Number Modulo-11 checksum
CreditCard Credit card numbers Luhn algorithm
KoreanBRN Korean Business Registration Number Format validation

Configuration Options

ChunkOptions

var options = new ChunkOptions
{
    Strategy = ChunkingStrategy.Sentence,
    TargetChunkSize = 512,
    MinChunkSize = 100,
    MaxChunkSize = 1024,
    OverlapSize = 50,
    LanguageCode = "ko",  // null = auto-detect
    PreserveSentences = true,
    PreserveParagraphs = true,
    SemanticSimilarityThreshold = 0.5f
};

// Preset configurations
ChunkOptions.Default       // General purpose
ChunkOptions.ForRAG        // Optimized for RAG (512 target, semantic)
ChunkOptions.ForKorean     // Optimized for Korean (400 target)
ChunkOptions.FixedSize(256, 32)  // Fixed token size with overlap

Masking Strategies

Strategy Example Output
Token [EMAIL], [PHONE]
Asterisk ****@****.com
Redact [REDACTED]
Partial jo**@ex****.com
Hash [HASH:a1b2c3d4]
Remove (empty)

Integration with Iyulab Ecosystem

FluxCurator is part of the Iyulab open-source RAG ecosystem:

┌─────────────────────────────────────────────────────────────┐
│                    Foundation Layer                          │
├─────────────────────────────────────────────────────────────┤
│  LocalEmbedder    LocalReranker    FluxCurator  FluxImprover│
│  (Embeddings)     (Reranking)      (Chunking)   (LLM-based) │
└───────────┬───────────────────────────┬─────────────────────┘
            │                           │
            ▼                           ▼
┌───────────────────────────────────────────────────────────────┐
│                    Processing Layer                           │
├───────────────────────────────────────────────────────────────┤
│        FileFlux (Document Processing)    WebFlux (Web)        │
└───────────────────────────┬───────────────────────────────────┘
                            │
                            ▼
┌───────────────────────────────────────────────────────────────┐
│                    Storage Layer                              │
├───────────────────────────────────────────────────────────────┤
│                    FluxIndex (Vector DB)                      │
└───────────────────────────┬───────────────────────────────────┘
                            │
                            ▼
┌───────────────────────────────────────────────────────────────┐
│                    Application Layer                          │
├───────────────────────────────────────────────────────────────┤
│                        Filer (App)                            │
└───────────────────────────────────────────────────────────────┘

FileFlux Integration

using FileFlux.Infrastructure.Strategies;
using FileFlux.Infrastructure.Adapters;

// Use FluxCurator chunking in FileFlux
var chunkerFactory = new ChunkerFactory(embedder);
var strategy = new FluxCuratorChunkingStrategy(
    chunkerFactory,
    ChunkingStrategy.Hierarchical);

var chunks = await strategy.ChunkAsync(documentContent, options);

// Convert between chunk types
var fileFluxChunks = fluxCuratorChunks.ToFileFluxChunks();
var curatorChunks = fileFluxChunks.ToFluxCuratorChunks();

Project Structure

FluxCurator/
├── src/
│   ├── FluxCurator.Core/              # Zero-dependency core
│   │   ├── Core/                      # Interfaces
│   │   │   ├── IChunker.cs
│   │   │   ├── IChunkerFactory.cs
│   │   │   ├── IEmbedder.cs
│   │   │   └── ILanguageProfile.cs
│   │   ├── Domain/                    # Models
│   │   │   ├── ChunkOptions.cs
│   │   │   ├── DocumentChunk.cs
│   │   │   ├── ChunkingStrategy.cs
│   │   │   └── PIIMaskingOptions.cs
│   │   └── Infrastructure/            # Implementations
│   │       ├── Chunking/
│   │       │   ├── ChunkerBase.cs
│   │       │   ├── SentenceChunker.cs
│   │       │   ├── ParagraphChunker.cs
│   │       │   ├── TokenChunker.cs
│   │       │   ├── SemanticChunker.cs
│   │       │   └── HierarchicalChunker.cs
│   │       └── Languages/
│   │           ├── LanguageProfileRegistry.cs
│   │           ├── KoreanLanguageProfile.cs
│   │           └── EnglishLanguageProfile.cs
│   │
│   └── FluxCurator/                   # Main package
│       ├── Infrastructure/
│       │   └── Chunking/
│       │       └── ChunkerFactory.cs  # Factory with all strategies
│       ├── ServiceCollectionExtensions.cs
│       └── FluxCurator.cs             # Main API
│
└── docs/                              # Documentation
    ├── getting-started.md
    ├── chunking-strategies.md
    ├── di-integration.md
    └── fileflux-integration.md

Documentation

Roadmap

  • Core chunking strategies (Sentence, Paragraph, Token)
  • Korean language profile with 11 language support
  • Language detection
  • Batch processing
  • PII masking (Korean RRN, phone, email, credit card)
  • Content filtering
  • Semantic chunking
  • Hierarchical chunking
  • Dependency Injection support
  • FileFlux integration
  • Additional language profiles (Vietnamese, Thai)
  • Custom detector registration
  • Streaming chunk support

Contributing

Contributions are welcome! Please see CONTRIBUTING.md for guidelines.

License

MIT License - see LICENSE for details.


Part of the Iyulab Open Source Ecosystem

Product Compatible and additional computed target framework versions.
.NET net10.0 is compatible.  net10.0-android was computed.  net10.0-browser was computed.  net10.0-ios was computed.  net10.0-maccatalyst was computed.  net10.0-macos was computed.  net10.0-tvos was computed.  net10.0-windows was computed. 
Compatible target framework(s)
Included target framework(s) (in package)
Learn more about Target Frameworks and .NET Standard.
  • net10.0

    • No dependencies.

NuGet packages (1)

Showing the top 1 NuGet packages that depend on FluxCurator.Core:

Package Downloads
FluxCurator

Text preprocessing library for RAG pipelines: PII masking, content filtering, and intelligent chunking including semantic chunking with Korean language support.

GitHub repositories

This package is not used by any popular GitHub repositories.

Version Downloads Last Updated
0.6.6 98 12/28/2025
0.6.5 300 12/22/2025
0.6.4 302 12/18/2025
0.6.3 267 12/17/2025
0.6.2 282 12/17/2025
0.6.1 268 12/16/2025
0.6.0 231 12/14/2025
0.5.1 410 12/11/2025
0.5.0 990 12/1/2025
0.4.0 492 12/1/2025
0.3.0 484 12/1/2025
0.2.0 411 12/1/2025
0.1.0 354 11/30/2025