Kreuzberg 4.3.7

.NET 10.0

dotnet add package Kreuzberg --version 4.3.7

NuGet\Install-Package Kreuzberg -Version 4.3.7

This command is intended to be used within the Package Manager Console in Visual Studio, as it uses the NuGet module's version of Install-Package.

<PackageReference Include="Kreuzberg" Version="4.3.7" />

For projects that support PackageReference, copy this XML node into the project file to reference the package.

<PackageVersion Include="Kreuzberg" Version="4.3.7" />
                    

                            Directory.Packages.props

<PackageReference Include="Kreuzberg" />
                    

                            Project file

For projects that support Central Package Management (CPM), copy this XML node into the solution Directory.Packages.props file to version the package.

paket add Kreuzberg --version 4.3.7

The NuGet Team does not provide support for this client. Please contact its maintainers for support.

#r "nuget: Kreuzberg, 4.3.7"

#r directive can be used in F# Interactive and Polyglot Notebooks. Copy this into the interactive tool or source code of the script to reference the package.

#:package Kreuzberg@4.3.7

#:package directive can be used in C# file-based apps starting in .NET 10 preview 4. Copy this into a .cs file before any lines of code to reference the package.

#addin nuget:?package=Kreuzberg&version=4.3.7
                    

                            Install as a Cake Addin

#tool nuget:?package=Kreuzberg&version=4.3.7
                    

                            Install as a Cake Tool

The NuGet Team does not provide support for this client. Please contact its maintainers for support.

C#

Extract text, tables, images, and metadata from 75+ file formats including PDF, Office documents, and images. .NET bindings with full type safety, async/await support, and .NET 10.0+ compatibility.

Installation

Package Installation

Install via NuGet:

dotnet add package Kreuzberg

Or via NuGet Package Manager:

Install-Package Kreuzberg

System Requirements

.NET 10.0+ required
Optional: ONNX Runtime version 1.24+ for embeddings support
Optional: Tesseract OCR for OCR functionality

Quick Start

Basic Extraction

Extract text, metadata, and structure from any supported document format:

using Kreuzberg;

var config = new ExtractionConfig
{
    UseCache = true,
    EnableQualityProcessing = true
};

var result = KreuzbergClient.ExtractFileSync("document.pdf", config);

Console.WriteLine(result.Content);
Console.WriteLine($"MIME Type: {result.MimeType}");

Common Use Cases

Extract with Custom Configuration

Most use cases benefit from configuration to control extraction behavior:

With OCR (for scanned documents):

using Kreuzberg;

var config = new ExtractionConfig
{
    Ocr = new OcrConfig
    {
        Backend = "tesseract",
        Language = "eng+deu+fra",
        TesseractConfig = new TesseractConfig
        {
            Psm = 3
        }
    }
};

var result = KreuzbergClient.ExtractFileSync("document.pdf", config);
Console.WriteLine(result.Content);

Table Extraction

See Table Extraction Guide for detailed examples.

Processing Multiple Files

using Kreuzberg;
using System.Collections.Generic;

class Program
{
    static async Task Main()
    {
        var config = new ExtractionConfig
        {
            UseCache = true,
            EnableQualityProcessing = true
        };

        var filePaths = new[]
        {
            "document1.pdf",
            "document2.pdf",
            "document3.pdf"
        };

        try
        {
            var batchResults = new List<ExtractionResult>();

            foreach (var filePath in filePaths)
            {
                var result = await KreuzbergClient.ExtractFileAsync(filePath, config);
                batchResults.Add(result);
                Console.WriteLine($"Processed {filePath}: {result.Content.Length} chars");
            }

            var tasks = filePaths.Select(path =>
                KreuzbergClient.ExtractFileAsync(path, config)
            ).ToArray();

            var results = await Task.WhenAll(tasks);

            var totalChars = results.Sum(r => r.Content.Length);
            Console.WriteLine($"Total extracted: {totalChars} characters");
        }
        catch (KreuzbergException ex)
        {
            Console.WriteLine($"Batch processing error: {ex.Message}");
        }
    }
}

Async Processing

For non-blocking document processing:

using Kreuzberg;

class Program
{
    static async Task Main()
    {
        try
        {
            var result = await KreuzbergClient.ExtractFileAsync("document.pdf");

            Console.WriteLine($"Content length: {result.Content.Length}");
            Console.WriteLine($"MIME type: {result.MimeType}");

            var tasks = new[]
            {
                KreuzbergClient.ExtractFileAsync("file1.pdf"),
                KreuzbergClient.ExtractFileAsync("file2.pdf"),
                KreuzbergClient.ExtractFileAsync("file3.pdf")
            };

            var results = await Task.WhenAll(tasks);

            foreach (var r in results)
            {
                Console.WriteLine($"Extracted {r.Content.Length} characters");
            }
        }
        catch (KreuzbergException ex)
        {
            Console.WriteLine($"Extraction failed: {ex.Message}");
        }
    }
}

Next Steps

Installation Guide - Platform-specific setup
API Documentation - Complete API reference
Examples & Guides - Full code examples and usage guides
Configuration Guide - Advanced configuration options

Features

Supported File Formats (75+)

75+ file formats across 8 major categories with intelligent format detection and comprehensive metadata extraction.

Office Documents

Category	Formats	Capabilities
Word Processing	`.docx`, `.odt`	Full text, tables, images, metadata, styles
Spreadsheets	`.xlsx`, `.xlsm`, `.xlsb`, `.xls`, `.xla`, `.xlam`, `.xltm`, `.ods`	Sheet data, formulas, cell metadata, charts
Presentations	`.pptx`, `.ppt`, `.ppsx`	Slides, speaker notes, images, metadata
PDF	`.pdf`	Text, tables, images, metadata, OCR support
eBooks	`.epub`, `.fb2`	Chapters, metadata, embedded resources

Images (OCR-Enabled)

Category	Formats	Features
Raster	`.png`, `.jpg`, `.jpeg`, `.gif`, `.webp`, `.bmp`, `.tiff`, `.tif`	OCR, table detection, EXIF metadata, dimensions, color space
Advanced	`.jp2`, `.jpx`, `.jpm`, `.mj2`, `.jbig2`, `.jb2`, `.pnm`, `.pbm`, `.pgm`, `.ppm`	OCR via hayro-jpeg2000 (pure Rust decoder), JBIG2 support, table detection, format-specific metadata
Vector	`.svg`	DOM parsing, embedded text, graphics metadata

Web & Data

Category	Formats	Features
Markup	`.html`, `.htm`, `.xhtml`, `.xml`, `.svg`	DOM parsing, metadata (Open Graph, Twitter Card), link extraction
Structured Data	`.json`, `.yaml`, `.yml`, `.toml`, `.csv`, `.tsv`	Schema detection, nested structures, validation
Text & Markdown	`.txt`, `.md`, `.markdown`, `.djot`, `.rst`, `.org`, `.rtf`	CommonMark, GFM, Djot, reStructuredText, Org Mode

Email & Archives

Category	Formats	Features
Email	`.eml`, `.msg`	Headers, body (HTML/plain), attachments, threading
Archives	`.zip`, `.tar`, `.tgz`, `.gz`, `.7z`	File listing, nested archives, metadata

Academic & Scientific

Category	Formats	Features
Citations	`.bib`, `.biblatex`, `.ris`, `.nbib`, `.enw`, `.csl`	Structured parsing: RIS (structured), PubMed/MEDLINE, EndNote XML (structured), BibTeX, CSL JSON
Scientific	`.tex`, `.latex`, `.typst`, `.jats`, `.ipynb`, `.docbook`	LaTeX, Jupyter notebooks, PubMed JATS
Documentation	`.opml`, `.pod`, `.mdoc`, `.troff`	Technical documentation formats

Complete Format Reference

Key Capabilities

Text Extraction - Extract all text content with position and formatting information
Metadata Extraction - Retrieve document properties, creation date, author, etc.
Table Extraction - Parse tables with structure and cell content preservation
Image Extraction - Extract embedded images and render page previews
OCR Support - Integrate multiple OCR backends for scanned documents
Async/Await - Non-blocking document processing with concurrent operations
Plugin System - Extensible post-processing for custom text transformation
Embeddings - Generate vector embeddings using ONNX Runtime models
Batch Processing - Efficiently process multiple documents in parallel
Memory Efficient - Stream large files without loading entirely into memory
Language Detection - Detect and support multiple languages in documents
Configuration - Fine-grained control over extraction behavior

Performance Characteristics

Format	Speed	Memory	Notes
PDF (text)	10-100 MB/s	~50MB per doc	Fastest extraction
Office docs	20-200 MB/s	~100MB per doc	DOCX, XLSX, PPTX
Images (OCR)	1-5 MB/s	Variable	Depends on OCR backend
Archives	5-50 MB/s	~200MB per doc	ZIP, TAR, etc.
Web formats	50-200 MB/s	Streaming	HTML, XML, JSON

OCR Support

Kreuzberg supports multiple OCR backends for extracting text from scanned documents and images:

Tesseract
Paddleocr

OCR Configuration Example

using Kreuzberg;

var config = new ExtractionConfig
{
    Ocr = new OcrConfig
    {
        Backend = "tesseract",
        Language = "eng+deu+fra",
        TesseractConfig = new TesseractConfig
        {
            Psm = 3
        }
    }
};

var result = KreuzbergClient.ExtractFileSync("document.pdf", config);
Console.WriteLine(result.Content);

Async Support

This binding provides full async/await support for non-blocking document processing:

using Kreuzberg;

class Program
{
    static async Task Main()
    {
        try
        {
            var result = await KreuzbergClient.ExtractFileAsync("document.pdf");

            Console.WriteLine($"Content length: {result.Content.Length}");
            Console.WriteLine($"MIME type: {result.MimeType}");

            var tasks = new[]
            {
                KreuzbergClient.ExtractFileAsync("file1.pdf"),
                KreuzbergClient.ExtractFileAsync("file2.pdf"),
                KreuzbergClient.ExtractFileAsync("file3.pdf")
            };

            var results = await Task.WhenAll(tasks);

            foreach (var r in results)
            {
                Console.WriteLine($"Extracted {r.Content.Length} characters");
            }
        }
        catch (KreuzbergException ex)
        {
            Console.WriteLine($"Extraction failed: {ex.Message}");
        }
    }
}

Plugin System

Kreuzberg supports extensible post-processing plugins for custom text transformation and filtering.

For detailed plugin documentation, visit Plugin System Guide.

Embeddings Support

Generate vector embeddings for extracted text using the built-in ONNX Runtime support. Requires ONNX Runtime installation.

Embeddings Guide

Batch Processing

Process multiple documents efficiently:

using Kreuzberg;
using System.Collections.Generic;

class Program
{
    static async Task Main()
    {
        var config = new ExtractionConfig
        {
            UseCache = true,
            EnableQualityProcessing = true
        };

        var filePaths = new[]
        {
            "document1.pdf",
            "document2.pdf",
            "document3.pdf"
        };

        try
        {
            var batchResults = new List<ExtractionResult>();

            foreach (var filePath in filePaths)
            {
                var result = await KreuzbergClient.ExtractFileAsync(filePath, config);
                batchResults.Add(result);
                Console.WriteLine($"Processed {filePath}: {result.Content.Length} chars");
            }

            var tasks = filePaths.Select(path =>
                KreuzbergClient.ExtractFileAsync(path, config)
            ).ToArray();

            var results = await Task.WhenAll(tasks);

            var totalChars = results.Sum(r => r.Content.Length);
            Console.WriteLine($"Total extracted: {totalChars} characters");
        }
        catch (KreuzbergException ex)
        {
            Console.WriteLine($"Batch processing error: {ex.Message}");
        }
    }
}

Configuration

For advanced configuration options including language detection, table extraction, OCR settings, and more:

Configuration Guide

Documentation

Contributing

Contributions are welcome! See Contributing Guide.

License

MIT License - see LICENSE file for details.

Support

Discord Community: Join our Discord
GitHub Issues: Report bugs
Discussions: Ask questions

Product	Compatible and additional computed target framework versions.
.NET	net10.0 is compatible. net10.0-android was computed. net10.0-browser was computed. net10.0-ios was computed. net10.0-maccatalyst was computed. net10.0-macos was computed. net10.0-tvos was computed. net10.0-windows was computed.

Compatible target framework(s)

Included target framework(s) (in package)

Learn more about Target Frameworks and .NET Standard.

net10.0
- No dependencies.

NuGet packages

This package is not used by any NuGet packages.

GitHub repositories

This package is not used by any popular GitHub repositories.

Version	Downloads	Last Updated
4.3.7	0	2/20/2026
4.3.6	37	2/19/2026
4.3.5	79	2/17/2026
4.3.4	88	2/16/2026
4.3.3	88	2/14/2026
4.3.2	93	2/13/2026
4.3.1	103	2/12/2026
4.3.0	102	2/11/2026
4.2.15	98	2/8/2026
4.2.14	89	2/7/2026
4.2.13	90	2/7/2026
4.2.12	90	2/6/2026
4.2.11	101	2/6/2026
4.2.10	92	2/5/2026
4.2.9	92	2/3/2026
4.2.8	99	2/2/2026
4.2.7	88	2/1/2026
4.2.6	104	1/31/2026
4.2.5	99	1/30/2026
4.2.4	104	1/29/2026

Version 4.3.7