WebFlux 0.1.2

.NET 8.0

dotnet add package WebFlux --version 0.1.2

NuGet\Install-Package WebFlux -Version 0.1.2

This command is intended to be used within the Package Manager Console in Visual Studio, as it uses the NuGet module's version of Install-Package.

<PackageReference Include="WebFlux" Version="0.1.2" />

For projects that support PackageReference, copy this XML node into the project file to reference the package.

<PackageVersion Include="WebFlux" Version="0.1.2" />
                    

                            Directory.Packages.props

<PackageReference Include="WebFlux" />
                    

                            Project file

For projects that support Central Package Management (CPM), copy this XML node into the solution Directory.Packages.props file to version the package.

paket add WebFlux --version 0.1.2

The NuGet Team does not provide support for this client. Please contact its maintainers for support.

#r "nuget: WebFlux, 0.1.2"

#r directive can be used in F# Interactive and Polyglot Notebooks. Copy this into the interactive tool or source code of the script to reference the package.

#:package WebFlux@0.1.2

#:package directive can be used in C# file-based apps starting in .NET 10 preview 4. Copy this into a .cs file before any lines of code to reference the package.

#addin nuget:?package=WebFlux&version=0.1.2
                    

                            Install as a Cake Addin

#tool nuget:?package=WebFlux&version=0.1.2
                    

                            Install as a Cake Tool

The NuGet Team does not provide support for this client. Please contact its maintainers for support.

WebFlux

A .NET SDK for preprocessing web content for RAG (Retrieval-Augmented Generation) systems.

Overview

WebFlux is a .NET library that processes web content into chunks suitable for RAG systems. It handles the complete pipeline from web crawling to content chunking, with support for various content formats and processing strategies.

What is WebFlux?

WebFlux transforms web content into structured, semantic chunks optimized for retrieval systems. The library provides:

Content Extraction: Parse HTML, Markdown, JSON, XML, and other web formats
Content Analysis: Analyze document structure, quality, and metadata
Content Reconstruction: Optionally enhance content with LLM-based strategies
Content Chunking: Split content into semantic chunks with configurable strategies

Architecture

WebFlux follows an interface-based architecture where the library defines the contracts, and consuming applications provide implementations for AI services:

What WebFlux Provides:

Processing pipeline and orchestration
Content extraction and parsing
Chunking strategies and algorithms
Web crawling and metadata analysis
Interface definitions for AI services

What You Provide:

LLM service implementation (ITextCompletionService)
Embedding service implementation (ITextEmbeddingService)
Image processing implementation (IImageToTextService) - optional
Vector storage implementation (IVectorStore)

This design allows you to use any LLM provider (OpenAI, Anthropic, Azure, local models) while maintaining a consistent processing pipeline.

Features

4-Stage Processing Pipeline: Extract → Analyze → Reconstruct → Chunk
Multiple Chunking Strategies: Auto, Smart, Semantic, Intelligent, MemoryOptimized, Paragraph, FixedSize
Content Reconstruction: Optional LLM-based enhancement with None, Summarize, Expand, Rewrite, Enrich strategies
Web Metadata Support: robots.txt, sitemap.xml, ai.txt, llms.txt, manifest.json, and 10+ other standards
Multimodal Processing: Text and image content processing
Streaming Support: Process large websites with AsyncEnumerable
Parallel Processing: Concurrent crawling and processing
Extensible Design: Implement custom extractors, strategies, and processors

Installation

NuGet Package Manager:

Install-Package WebFlux

dotnet CLI:

dotnet add package WebFlux

.csproj:

<PackageReference Include="WebFlux" Version="0.1.0" />

Quick Start

using WebFlux;
using Microsoft.Extensions.DependencyInjection;

var services = new ServiceCollection();

// Register your AI service implementations
services.AddScoped<ITextCompletionService, YourLLMService>();
services.AddScoped<ITextEmbeddingService, YourEmbeddingService>();
services.AddScoped<IImageToTextService, YourVisionService>(); // Optional

// Register your vector store implementation
services.AddScoped<IVectorStore, YourVectorStore>();

// Register WebFlux
services.AddWebFlux();

var provider = services.BuildServiceProvider();
var processor = provider.GetRequiredService<IWebContentProcessor>();

// Process a website
var options = new CrawlOptions
{
    MaxDepth = 3,
    MaxPages = 100,
    RespectRobotsTxt = true
};

await foreach (var result in processor.ProcessWithProgressAsync("https://example.com", options))
{
    if (result.IsSuccess && result.Result != null)
    {
        foreach (var chunk in result.Result)
        {
            // Store chunks in your vector database
            await StoreChunk(chunk);
        }
    }
}

Core Concepts

Processing Pipeline

WebFlux processes web content through four stages:

Extract: Fetch and parse web content (HTML, Markdown, JSON, XML, PDF)
Analyze: Analyze document structure, quality metrics, and metadata
Reconstruct: Optionally enhance content using LLM strategies
Chunk: Split content into semantic chunks for retrieval

Chunking Strategies

Choose a chunking strategy based on your content and requirements:

Strategy	Use Case
Auto	Automatically selects the best strategy
Smart	HTML documentation, structured content
Semantic	General web pages, articles
Intelligent	Blogs, news, knowledge bases
MemoryOptimized	Large documents, memory constraints
Paragraph	Markdown docs, natural boundaries
FixedSize	Uniform chunks, testing

Reconstruction Strategies

Optionally enhance content quality before chunking:

Strategy	Description	Requires LLM
None	Use original content	No
Summarize	Create condensed version	Yes
Expand	Add explanations and examples	Yes
Rewrite	Improve clarity and consistency	Yes
Enrich	Add context and metadata	Yes

Note: LLM-based strategies require ITextCompletionService implementation. If not provided, the system automatically falls back to "None" strategy with appropriate warnings.

Web Metadata Standards

WebFlux analyzes multiple web standards to optimize crawling and content extraction:

robots.txt: Crawling rules and permissions
sitemap.xml: Site structure and URL discovery
ai.txt: AI usage policies and guidelines
llms.txt: Site structure for AI agents
manifest.json: PWA metadata
security.txt: Security policies
.well-known: Standard metadata
And more (package.json, ads.txt, humans.txt, etc.)

Documentation

For detailed guides and advanced usage:

Tutorial: Step-by-step installation and usage guide
Pipeline Design: Processing pipeline architecture
Interfaces: Interface contracts and implementations
Chunking Strategies: Detailed strategy guide

Example: Basic Usage

// Simple single-page processing
var processor = provider.GetRequiredService<IWebContentProcessor>();

var chunks = await processor.ChunkAsync(
    "https://example.com/article",
    new ChunkingOptions
    {
        Strategy = "Auto",
        MaxChunkSize = 512,
        OverlapSize = 64
    }
);

foreach (var chunk in chunks)
{
    Console.WriteLine($"Chunk {chunk.ChunkIndex}: {chunk.Content}");
}

Example: Custom Implementation

// Implement your LLM service
public class MyLLMService : ITextCompletionService
{
    public async Task<string> CompleteAsync(
        string prompt,
        TextCompletionOptions? options = null,
        CancellationToken cancellationToken = default)
    {
        // Your implementation here
        return await CallYourLLMAPI(prompt, options, cancellationToken);
    }
}

// Register and use
services.AddScoped<ITextCompletionService, MyLLMService>();

Contributing

Contributions are welcome! Please see our contributing guidelines for details.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Support

Issues: GitHub Issues
Documentation: docs/
NuGet: WebFlux Package

Product	Compatible and additional computed target framework versions.
.NET	net8.0 is compatible. net8.0-android was computed. net8.0-browser was computed. net8.0-ios was computed. net8.0-maccatalyst was computed. net8.0-macos was computed. net8.0-tvos was computed. net8.0-windows was computed. net9.0 is compatible. net9.0-android was computed. net9.0-browser was computed. net9.0-ios was computed. net9.0-maccatalyst was computed. net9.0-macos was computed. net9.0-tvos was computed. net9.0-windows was computed. net10.0 was computed. net10.0-android was computed. net10.0-browser was computed. net10.0-ios was computed. net10.0-maccatalyst was computed. net10.0-macos was computed. net10.0-tvos was computed. net10.0-windows was computed.

Product

.NET

Compatible target framework(s)

Included target framework(s) (in package)

Learn more about Target Frameworks and .NET Standard.

net8.0
- HtmlAgilityPack (>= 1.12.3)
- Markdig (>= 0.42.0)
- Microsoft.Extensions.Caching.Abstractions (>= 9.0.9)
- Microsoft.Extensions.Caching.Memory (>= 9.0.9)
- Microsoft.Extensions.Configuration (>= 9.0.9)
- Microsoft.Extensions.Configuration.Abstractions (>= 9.0.9)
- Microsoft.Extensions.Configuration.Binder (>= 9.0.9)
- Microsoft.Extensions.DependencyInjection (>= 9.0.9)
- Microsoft.Extensions.DependencyInjection.Abstractions (>= 9.0.9)
- Microsoft.Extensions.Http (>= 9.0.9)
- Microsoft.Extensions.Logging (>= 9.0.9)
- Microsoft.Extensions.Logging.Abstractions (>= 9.0.9)
- Microsoft.Playwright (>= 1.55.0)
- Polly (>= 8.6.3)
- Polly.Extensions.Http (>= 3.0.0)
- System.Text.Json (>= 9.0.9)
- System.Threading.Channels (>= 9.0.9)
- YamlDotNet (>= 16.3.0)
net9.0
- HtmlAgilityPack (>= 1.12.3)
- Markdig (>= 0.42.0)
- Microsoft.Extensions.Caching.Abstractions (>= 9.0.9)
- Microsoft.Extensions.Caching.Memory (>= 9.0.9)
- Microsoft.Extensions.Configuration (>= 9.0.9)
- Microsoft.Extensions.Configuration.Abstractions (>= 9.0.9)
- Microsoft.Extensions.Configuration.Binder (>= 9.0.9)
- Microsoft.Extensions.DependencyInjection (>= 9.0.9)
- Microsoft.Extensions.DependencyInjection.Abstractions (>= 9.0.9)
- Microsoft.Extensions.Http (>= 9.0.9)
- Microsoft.Extensions.Logging (>= 9.0.9)
- Microsoft.Extensions.Logging.Abstractions (>= 9.0.9)
- Microsoft.Playwright (>= 1.55.0)
- Polly (>= 8.6.3)
- Polly.Extensions.Http (>= 3.0.0)
- System.Text.Json (>= 9.0.9)
- System.Threading.Channels (>= 9.0.9)
- YamlDotNet (>= 16.3.0)

NuGet packages (1)

Showing the top 1 NuGet packages that depend on WebFlux:

Package	Downloads
FluxIndex.Extensions.WebFlux FluxIndex.Extensions.WebFlux - Web content processing integration for FluxIndex using WebFlux library	679

GitHub repositories

This package is not used by any popular GitHub repositories.

Version	Downloads	Last Updated
0.1.2	35	10/2/2025
0.1.1	325	9/18/2025
0.1.0	245	9/17/2025