WebFlux 0.1.2

dotnet add package WebFlux --version 0.1.2
                    
NuGet\Install-Package WebFlux -Version 0.1.2
                    
This command is intended to be used within the Package Manager Console in Visual Studio, as it uses the NuGet module's version of Install-Package.
<PackageReference Include="WebFlux" Version="0.1.2" />
                    
For projects that support PackageReference, copy this XML node into the project file to reference the package.
<PackageVersion Include="WebFlux" Version="0.1.2" />
                    
Directory.Packages.props
<PackageReference Include="WebFlux" />
                    
Project file
For projects that support Central Package Management (CPM), copy this XML node into the solution Directory.Packages.props file to version the package.
paket add WebFlux --version 0.1.2
                    
#r "nuget: WebFlux, 0.1.2"
                    
#r directive can be used in F# Interactive and Polyglot Notebooks. Copy this into the interactive tool or source code of the script to reference the package.
#:package WebFlux@0.1.2
                    
#:package directive can be used in C# file-based apps starting in .NET 10 preview 4. Copy this into a .cs file before any lines of code to reference the package.
#addin nuget:?package=WebFlux&version=0.1.2
                    
Install as a Cake Addin
#tool nuget:?package=WebFlux&version=0.1.2
                    
Install as a Cake Tool

WebFlux

A .NET SDK for preprocessing web content for RAG (Retrieval-Augmented Generation) systems.

NuGet Version NuGet Downloads .NET Support License

Overview

WebFlux is a .NET library that processes web content into chunks suitable for RAG systems. It handles the complete pipeline from web crawling to content chunking, with support for various content formats and processing strategies.

What is WebFlux?

WebFlux transforms web content into structured, semantic chunks optimized for retrieval systems. The library provides:

  • Content Extraction: Parse HTML, Markdown, JSON, XML, and other web formats
  • Content Analysis: Analyze document structure, quality, and metadata
  • Content Reconstruction: Optionally enhance content with LLM-based strategies
  • Content Chunking: Split content into semantic chunks with configurable strategies

Architecture

WebFlux follows an interface-based architecture where the library defines the contracts, and consuming applications provide implementations for AI services:

What WebFlux Provides:

  • Processing pipeline and orchestration
  • Content extraction and parsing
  • Chunking strategies and algorithms
  • Web crawling and metadata analysis
  • Interface definitions for AI services

What You Provide:

  • LLM service implementation (ITextCompletionService)
  • Embedding service implementation (ITextEmbeddingService)
  • Image processing implementation (IImageToTextService) - optional
  • Vector storage implementation (IVectorStore)

This design allows you to use any LLM provider (OpenAI, Anthropic, Azure, local models) while maintaining a consistent processing pipeline.

Features

  • 4-Stage Processing Pipeline: Extract → Analyze → Reconstruct → Chunk
  • Multiple Chunking Strategies: Auto, Smart, Semantic, Intelligent, MemoryOptimized, Paragraph, FixedSize
  • Content Reconstruction: Optional LLM-based enhancement with None, Summarize, Expand, Rewrite, Enrich strategies
  • Web Metadata Support: robots.txt, sitemap.xml, ai.txt, llms.txt, manifest.json, and 10+ other standards
  • Multimodal Processing: Text and image content processing
  • Streaming Support: Process large websites with AsyncEnumerable
  • Parallel Processing: Concurrent crawling and processing
  • Extensible Design: Implement custom extractors, strategies, and processors

Installation

NuGet Package Manager:

Install-Package WebFlux

dotnet CLI:

dotnet add package WebFlux

.csproj:

<PackageReference Include="WebFlux" Version="0.1.0" />

Quick Start

using WebFlux;
using Microsoft.Extensions.DependencyInjection;

var services = new ServiceCollection();

// Register your AI service implementations
services.AddScoped<ITextCompletionService, YourLLMService>();
services.AddScoped<ITextEmbeddingService, YourEmbeddingService>();
services.AddScoped<IImageToTextService, YourVisionService>(); // Optional

// Register your vector store implementation
services.AddScoped<IVectorStore, YourVectorStore>();

// Register WebFlux
services.AddWebFlux();

var provider = services.BuildServiceProvider();
var processor = provider.GetRequiredService<IWebContentProcessor>();

// Process a website
var options = new CrawlOptions
{
    MaxDepth = 3,
    MaxPages = 100,
    RespectRobotsTxt = true
};

await foreach (var result in processor.ProcessWithProgressAsync("https://example.com", options))
{
    if (result.IsSuccess && result.Result != null)
    {
        foreach (var chunk in result.Result)
        {
            // Store chunks in your vector database
            await StoreChunk(chunk);
        }
    }
}

Core Concepts

Processing Pipeline

WebFlux processes web content through four stages:

  1. Extract: Fetch and parse web content (HTML, Markdown, JSON, XML, PDF)
  2. Analyze: Analyze document structure, quality metrics, and metadata
  3. Reconstruct: Optionally enhance content using LLM strategies
  4. Chunk: Split content into semantic chunks for retrieval

Chunking Strategies

Choose a chunking strategy based on your content and requirements:

Strategy Use Case
Auto Automatically selects the best strategy
Smart HTML documentation, structured content
Semantic General web pages, articles
Intelligent Blogs, news, knowledge bases
MemoryOptimized Large documents, memory constraints
Paragraph Markdown docs, natural boundaries
FixedSize Uniform chunks, testing

Reconstruction Strategies

Optionally enhance content quality before chunking:

Strategy Description Requires LLM
None Use original content No
Summarize Create condensed version Yes
Expand Add explanations and examples Yes
Rewrite Improve clarity and consistency Yes
Enrich Add context and metadata Yes

Note: LLM-based strategies require ITextCompletionService implementation. If not provided, the system automatically falls back to "None" strategy with appropriate warnings.

Web Metadata Standards

WebFlux analyzes multiple web standards to optimize crawling and content extraction:

  • robots.txt: Crawling rules and permissions
  • sitemap.xml: Site structure and URL discovery
  • ai.txt: AI usage policies and guidelines
  • llms.txt: Site structure for AI agents
  • manifest.json: PWA metadata
  • security.txt: Security policies
  • .well-known: Standard metadata
  • And more (package.json, ads.txt, humans.txt, etc.)

Documentation

For detailed guides and advanced usage:

Example: Basic Usage

// Simple single-page processing
var processor = provider.GetRequiredService<IWebContentProcessor>();

var chunks = await processor.ChunkAsync(
    "https://example.com/article",
    new ChunkingOptions
    {
        Strategy = "Auto",
        MaxChunkSize = 512,
        OverlapSize = 64
    }
);

foreach (var chunk in chunks)
{
    Console.WriteLine($"Chunk {chunk.ChunkIndex}: {chunk.Content}");
}

Example: Custom Implementation

// Implement your LLM service
public class MyLLMService : ITextCompletionService
{
    public async Task<string> CompleteAsync(
        string prompt,
        TextCompletionOptions? options = null,
        CancellationToken cancellationToken = default)
    {
        // Your implementation here
        return await CallYourLLMAPI(prompt, options, cancellationToken);
    }
}

// Register and use
services.AddScoped<ITextCompletionService, MyLLMService>();

Contributing

Contributions are welcome! Please see our contributing guidelines for details.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Support

Product Compatible and additional computed target framework versions.
.NET net8.0 is compatible.  net8.0-android was computed.  net8.0-browser was computed.  net8.0-ios was computed.  net8.0-maccatalyst was computed.  net8.0-macos was computed.  net8.0-tvos was computed.  net8.0-windows was computed.  net9.0 is compatible.  net9.0-android was computed.  net9.0-browser was computed.  net9.0-ios was computed.  net9.0-maccatalyst was computed.  net9.0-macos was computed.  net9.0-tvos was computed.  net9.0-windows was computed.  net10.0 was computed.  net10.0-android was computed.  net10.0-browser was computed.  net10.0-ios was computed.  net10.0-maccatalyst was computed.  net10.0-macos was computed.  net10.0-tvos was computed.  net10.0-windows was computed. 
Compatible target framework(s)
Included target framework(s) (in package)
Learn more about Target Frameworks and .NET Standard.

NuGet packages (1)

Showing the top 1 NuGet packages that depend on WebFlux:

Package Downloads
FluxIndex.Extensions.WebFlux

FluxIndex.Extensions.WebFlux - Web content processing integration for FluxIndex using WebFlux library

GitHub repositories

This package is not used by any popular GitHub repositories.

Version Downloads Last Updated
0.1.2 35 10/2/2025
0.1.1 325 9/18/2025
0.1.0 245 9/17/2025