PaddleOcrNet 1.0.0
dotnet add package PaddleOcrNet --version 1.0.0
NuGet\Install-Package PaddleOcrNet -Version 1.0.0
<PackageReference Include="PaddleOcrNet" Version="1.0.0" />
<PackageVersion Include="PaddleOcrNet" Version="1.0.0" />
<PackageReference Include="PaddleOcrNet" />
paket add PaddleOcrNet --version 1.0.0
#r "nuget: PaddleOcrNet, 1.0.0"
#:package PaddleOcrNet@1.0.0
#addin nuget:?package=PaddleOcrNet&version=1.0.0
#tool nuget:?package=PaddleOcrNet&version=1.0.0
<p align="center"> <img src="icon.png" alt="PaddleOcrNet" width="140" height="140" /> </p>
<h1 align="center">PaddleOcrNet</h1>
<p align="center"> <strong>The complete PaddleOCR document pipeline — natively in .NET, on ONNX Runtime.</strong><br/> Turn scans, photos, and PDFs into text, tables, formulas — and answers.<br/> <em>No Python. No native PaddlePaddle. No sidecar server. Just a NuGet package.</em> </p>
<p align="center"> <a href="https://www.nuget.org/packages/PaddleOcrNet"><img src="https://img.shields.io/nuget/v/PaddleOcrNet.svg?label=NuGet&color=004880" alt="NuGet"/></a> <a href="https://www.nuget.org/packages/PaddleOcrNet"><img src="https://img.shields.io/nuget/dt/PaddleOcrNet.svg?label=Downloads&color=004880" alt="Downloads"/></a> <img src="https://img.shields.io/badge/models-PP--OCRv5%20%2B%20PP--StructureV3-ff6f00" alt="PP-OCRv5 + PP-StructureV3"/> <img src="https://img.shields.io/badge/languages-80%2B-1f6feb" alt="80+ languages"/> <img src="https://img.shields.io/badge/.NET-10.0-512BD4" alt=".NET 10"/> <img src="https://img.shields.io/badge/AOT-ready-2ea44f" alt="AOT ready"/> <a href="LICENSE"><img src="https://img.shields.io/badge/license-MIT-blue.svg" alt="License: MIT"/></a> </p>
PaddleOcrNet turns scanned documents, photos, and PDFs into structured text — and into answers. It runs the full PP-OCRv5 + PP-StructureV3 pipeline — text detection, recognition, orientation correction, layout analysis, table extraction, and formula recognition — entirely in managed .NET on ONNX Runtime, then layers on LLM-backed key-information extraction and document Q&A through any provider you choose. Models download and cache on first use; everything after that runs in-process, offline-capable, and trim/AOT-friendly.
Highlights
- High-accuracy text OCR — DB detection + SVTR recognition (PP-OCRv5) handles dense invoices, forms, receipts, handwriting, rotated scans, and curved text.
- 80+ languages across 12 script families, with one shared detector and per-script recognizer packs.
- Automatic language detection — pass
OcrLanguage.Autoand PaddleOcrNet identifies the script and pulls the right model on demand (Python PaddleOCR requires you to name the language up front). - Document understanding —
AnalyzeDocumentAsyncreturns layout regions, reading order, tables as HTML, and formulas as LaTeX, and serializes the whole document to Markdown, HTML, JSON, Word, or Excel. - Ask your documents — LLM-backed key-information extraction, Q&A, and chart-to-data parsing,
provider-agnostic: bring your own
IChatModelor use the built-in OpenAI-compatible adapter (OpenAI, Azure, Ollama, vLLM, Groq, …). Charts are reconstructed to Markdown tables via a vision model — no local GPU. Or extract labeled fields offline with the heuristic, layout-based KIE extractor — no LLM, no network. - PDF in, searchable PDF out — rasterize and OCR PDFs, or emit a searchable PDF with an invisible text layer.
- Robust by design — singleton-safe, thread-safe ONNX sessions; DI + health checks; OpenTelemetry metrics; typed exceptions; input/decompression-bomb guards; checksum-verified model downloads.
- Deploys anywhere — pure-managed (no OpenCV), CPU by default, optional CUDA, Native AOT and single-file publish supported. Mobile models are a few MB each.
Installation
dotnet add package PaddleOcrNet
# Optional — NVIDIA CUDA 12+ acceleration (used automatically when present):
dotnet add package PaddleOcrNet.Gpu
Requires .NET 10 (net10.0). Windows, Linux, and macOS (x64/arm64).
Quick start
using PaddleOcrNet.Models;
using PaddleOcrNet.Services;
// ONNX models download + cache on first use; construction itself loads nothing.
await using var ocr = new PaddleOcrService();
OcrResult result = await ocr.ExtractTextFromImage("invoice.png", OcrLanguage.English);
Console.WriteLine(result.FullText);
foreach (var line in result.Lines)
Console.WriteLine($"[{line.Confidence:F2}] {line.Text}");
Input can be a file path, byte[], Stream, or an already-decoded Image<Rgb24>:
await ocr.ExtractTextFromImage(bytes, OcrLanguage.English);
await ocr.ExtractTextFromImage(stream, new[] { OcrLanguage.English, OcrLanguage.German });
// Detect-only (bounding boxes for redaction / cropping — no recognition):
var regions = await ocr.DetectRegionsAsync("page.png");
// Recognize caller-supplied regions (skip detection):
var partial = await ocr.RecognizeRegionsAsync(image, regions, new[] { OcrLanguage.English });
Automatic language detection
// OcrLanguage.Auto → PaddleOcrNet detects the dominant script, downloads the matching pack, and reports it.
OcrResult r = await ocr.ExtractTextFromImage("multilingual.png", OcrLanguage.Auto);
Console.WriteLine(string.Join(", ", r.DetectedLanguages)); // e.g. "arabic, latin, ch"
Document structure analysis
AnalyzeDocumentAsync runs the PP-StructureV3 pipeline — orientation → layout detection → per-region
OCR / table / formula → reading-order reconstruction — and returns a structured document you can export
straight to Markdown or JSON.
using PaddleOcrNet.Models;
using PaddleOcrNet.Services;
using PaddleOcrNet.Structure;
await using var ocr = new PaddleOcrService();
StructureResult doc = await ocr.AnalyzeDocumentAsync("report.png", new StructureOptions
{
Languages = new[] { OcrLanguage.English },
UseDocOrientation = true, // auto-rotate skewed scans (0/90/180/270°)
RecognizeTables = true, // tables → HTML
RecognizeFormulas = true, // formulas → LaTeX
});
foreach (var block in doc.Blocks)
Console.WriteLine($"#{block.Order} {block.Type} — {block.Text}");
string markdown = doc.ToMarkdown(); // titles, paragraphs, tables (HTML), formulas ($$…$$)
string json = doc.ToJson(); // structured blocks with bounding boxes + reading order
| Stage | Model | Output |
|---|---|---|
| Layout analysis | PP-DocLayoutV3 (RT-DETR) | region boxes + 25 block types |
| Table recognition | SLANet_plus (default) · SLANeXt v2 | <table> HTML with cell text matched into the grid |
| Formula recognition | LaTeX-OCR | LaTeX string |
| Orientation / unwarp | PP-LCNet · UVDoc | de-skewed, de-warped page |
| Reading order | XY-cut | multi-column document order |
For tables, set StructureOptions.TableModel = TableRecognitionModel.SlaNeXt to use the PP-StructureV3 v2
path: a PP-LCNet classifier decides wired (bordered) vs wireless (borderless) and runs the matching
SLANeXt model — often more accurate on clearly bordered/borderless tables (downloads three small models on
first use). The default, SlanetPlus, is a single end-to-end model.
Exports with embedded figures & native equations
The DOCX and HTML exporters take an optional image overload — pass the same image you analyzed and
figure / chart / seal regions are cropped and embedded as real pixels (DOCX gets an inline word/media/
image part; HTML gets a data:image/png;base64,… <img>). The no-image overloads keep their bbox-placeholder
behavior. The image-aware DOCX path also renders recovered formula LaTeX as native Word equations (OMML)
via a best-effort LaTeX→OMML converter (PaddleOcrNet.Structure.Export.LatexToOmml) — fractions,
sub/superscripts, roots, Greek letters, n-ary sum/integral/product, and common operators; unsupported
constructs degrade gracefully to text.
using PaddleOcrNet.Structure.Export;
StructureResult doc = await ocr.AnalyzeDocumentAsync("report.png");
using var page = SixLabors.ImageSharp.Image.Load<SixLabors.ImageSharp.PixelFormats.Rgb24>("report.png");
byte[] docx = doc.ToDocx(page); // figures/charts/seals as inline images; formulas as OMML equations
string html = doc.ToHtml(page, "Report"); // figures/charts/seals as inline <img data:image/png;base64,…>
Supported languages
A single DB detector serves every language; recognition selects a per-script recognizer pack
(PP-OCRv5 mobile + the matching character dictionary). Languages are expressed with the OcrLanguage
enum; each value maps to one of the representative recognizer codes below:
| Pack | Codes |
|---|---|
| Chinese / English / Japanese (default) | ch zh en ja |
| Latin | latin fr de es it pt nl pl tr vi … |
| Cyrillic | cyrillic ru uk bg sr be mn … |
| Arabic | arabic ar fa ur ug |
| Devanagari | devanagari hi mr ne sa … |
| Korean | korean ko |
| Japanese (full) | japan |
| Thai · Greek · Telugu · Tamil | thai/th · greek/el · telugu/te · tamil/ta |
| Traditional Chinese | chinese_cht cht zh_tra |
| East-Slavic | eslav ru_eslav uk_eslav be_eslav |
Or pass OcrLanguage.Auto to detect the script automatically.
Languages are enum-only — the OCR methods take OcrLanguage (there are no raw-string overloads). The
single-language overload defaults to OcrLanguage.Auto, so ExtractTextFromImage("x.png") auto-detects
with zero configuration:
using PaddleOcrNet.Models;
await ocr.ExtractTextFromImage("page.png"); // zero-config: defaults to OcrLanguage.Auto
await ocr.ExtractTextFromImage("page.png", OcrLanguage.French); // single language
await ocr.ExtractTextFromImage("page.png", new[] { OcrLanguage.English, OcrLanguage.German }); // multiple
await ocr.ExtractTextFromImage("page.png", OcrLanguage.Auto); // explicit auto-detect
// Got raw codes from config or the command line? Parse them into the enum:
OcrLanguage lang = OcrLanguageExtensions.FromCode("en");
IReadOnlyList<OcrLanguage> langs = OcrLanguageExtensions.FromCodes(new[] { "en", "de" });
ASP.NET Core / dependency injection
using PaddleOcrNet.Models;
builder.Services.AddPaddleOcrNet(o =>
{
o.UseTextLineOrientation = true; // correct 180°-flipped lines
o.ModelCachePath = "/var/cache/ocr";
});
// Readiness probe — Healthy once models for these languages are cached:
builder.Services.AddHealthChecks()
.AddPaddleOcrHealthCheck(languages: new[] { OcrLanguage.English, OcrLanguage.ChineseSimplified });
IPaddleOcrService is registered as a singleton — ONNX sessions are expensive to build and safe to
share across threads. Call WarmUp(...) to pre-load models off the request path.
Configuration
| Concern | How |
|---|---|
| GPU | Add PaddleOcrNet.Gpu; CUDA 12+ is detected and used automatically, otherwise CPU. |
| Model cache | %LOCALAPPDATA% / ~/.local/share by default; override via ModelCachePath or PADDLEOCRNET_CACHE. |
| Model host | Defaults to the public Hugging Face repo; point at a private mirror via PADDLEOCRNET_MODEL_BASE_URL or ModelDownloadOptions.BaseUrlOverride. |
| Offline / air-gapped | Pre-seed the cache (or a mirror) and run fully offline; downloads are SHA-256 verified. |
| Throughput | BatchSize, MaxDegreeOfParallelism, and reading-order / paragraph grouping via RecognitionOptions. |
| Input limits | Built-in max-pixel / PDF page guards against decompression bombs. |
Output formats
OcrResult exports to plain text, JSON, hOCR, ALTO XML, and TSV; documents export to
Markdown, HTML, JSON, Word (.docx), and Excel (.xlsx) (with native tables / merged
cells). The ToDocx(image) / ToHtml(image) overloads embed figure/chart/seal regions as real pixels,
and DOCX formulas render as native Word equations (OMML). Multi-page Markdown can be stitched with
ConcatenateMarkdownPages; PDFs can be re-emitted as searchable PDFs. All exporters are AOT-safe via a
source-generated JSON context.
Document intelligence (LLM-backed KIE & Q&A)
The PaddleOcrNet.Intelligence layer adds key-information extraction and document Q&A on top of OCR/structure
analysis — provider-agnostic. Plug in any LLM by implementing IChatModel, or use the built-in
OpenAI-compatible adapter, which targets OpenAI, Azure OpenAI, Ollama, vLLM, LM Studio, Groq, Together,
DeepSeek, Mistral, and any other OpenAI-style /chat/completions endpoint.
using PaddleOcrNet.Intelligence;
// Pick any provider — here OpenAI; swap for .AzureOpenAi(...), .Ollama(...), or .Generic(...).
var chat = new OpenAiCompatibleChatModel(OpenAiCompatibleOptions.OpenAi(apiKey, "gpt-4o-mini"));
var docs = new DocumentIntelligenceEngine(ocrService, chat);
// Key-information extraction (returns a JSON-grounded key → value result).
var info = await docs.ExtractKeyInformationAsync("invoice.png", new[] { "Invoice Number", "Vendor", "Total" });
Console.WriteLine(info["Total"]);
// Document question-answering.
var answer = await docs.AskAsync("contract.pdf", "What is the termination notice period?");
Console.WriteLine(answer.Answer);
DI: services.AddOpenAiCompatibleChatModel(...) (or AddChatModel(myModel)) + AddPaddleOcrDocumentIntelligence().
The model is grounded on the parsed document Markdown by default; set DocumentIntelligenceOptions.UseVision
to also attach the page image when the model is multimodal.
Chart → data (vision-LLM, PP-Chart2Table equivalent)
ParseChartsAsync detects chart/plot regions in a document and reconstructs the data behind each one as a
GitHub-flavored Markdown table — the provider-agnostic equivalent of PaddleOCR's PP-Chart2Table. It crops
each detected chart region and sends only those pixels to a vision-capable model, so it works with
any vision provider (OpenAI gpt-4o, Azure, or a local Ollama qwen2.5-vl / llama3.2-vision) and needs
no local GPU — the provider performs the vision inference.
using PaddleOcrNet.Intelligence;
// Reconstruct the data behind every chart in a document (needs a vision-capable model).
ChartParseResult charts = await docs.ParseChartsAsync("report.png");
foreach (ParsedChart chart in charts.Charts)
{
Console.WriteLine($"{chart.ChartType}: {chart.Title}");
Console.WriteLine(chart.DataMarkdown); // a Markdown table of the chart's data
}
With the built-in OpenAiCompatibleChatModel, vision is on by default
(OpenAiCompatibleOptions.SupportsVision defaults to true); if the configured model isn't vision-capable
and the document has charts, the call throws NotSupportedException. Customize the extraction prompt via
DocumentIntelligenceOptions.ChartExtractionSystemPromptOverride. This is the vision-LLM path — there is no
bundled offline chart ONNX model.
Offline (non-LLM) KIE
IOfflineKeyInformationExtractor is the offline alternative to LLM-backed ExtractKeyInformationAsync —
use it when you can't or don't want to call a model. It's a heuristic, geometry-based extractor (no LLM, no
network, CPU-only): for each key it finds the label in the OCR text and reads the value inline (Key: value),
to the right (same row), or below. It returns the same KeyInformationResult (with Usage / Model /
RawJson left null). Best-effort — it works best on clearly labeled forms and invoices.
using PaddleOcrNet.Intelligence.Offline;
var extractor = new OfflineKeyInformationExtractor(ocrService);
OcrResult ocr = await ocrService.ExtractTextFromImage("invoice.png");
KeyInformationResult result = extractor.Extract(ocr, new[] { "Invoice Number", "Total" });
Console.WriteLine(result["Total"]);
// Or OCR + extract in one call (uses OcrLanguage.Auto):
result = await extractor.ExtractAsync("invoice.png", new[] { "Invoice Number", "Total" });
DI: services.AddPaddleOcrOfflineKie(); (requires AddPaddleOcrNet()), then inject
IOfflineKeyInformationExtractor.
Models & licensing
PaddleOcrNet ships no weights — on first use it downloads PP-OCRv5 / PP-StructureV3 ONNX models and their dictionaries (SHA-256 verified) to the local cache. The models are derived from PaddleOCR (Apache-2.0, © PaddlePaddle/Baidu); the formula model is RapidLaTeXOCR (MIT). See NOTICE for attribution. The library itself is MIT — see LICENSE.
Note on formula recognition: PaddleOCR's PP-FormulaNet cannot be exported to ONNX, so PaddleOcrNet uses the equivalent LaTeX-OCR model for formula → LaTeX.
Roadmap
Already shipped: detection, recognition (multilingual + auto-detect), orientation, unwarp, layout, tables, formulas, reading order, Markdown/HTML/JSON/DOCX/XLSX export (with embedded figure/chart/seal pixels and native Word equations via OMML), the PDF pipeline, LLM-backed document intelligence (key-information extraction, Q&A, and chart-to-data parsing — the PP-Chart2Table-equivalent vision-LLM path), and a heuristic, layout-based offline KIE extractor as the non-LLM alternative. Under consideration:
- Optional RT-DETR table-cell detector path for table recognition (SLANeXt already recovers cells from its own location head, so this is an accuracy enhancement rather than a gap)
- A model-based on-device (ONNX VI-LayoutXLM) KIE path to complement the current heuristic offline extractor
- PP-OCRv6 model line
- Additional per-language recognizer packs
Why PaddleOcrNet?
- vs. Python PaddleOCR — same models and accuracy, but no Python runtime, no
paddlepaddlenative dependency, and no server process. Ships as a single NuGet package with first-class .NET ergonomics (DI, health checks, AOT) and adds automatic language detection. - vs. cloud OCR APIs — runs entirely in-process and offline; no per-page fees, no data leaving your infrastructure.
- vs. EasyOCR-based libraries — PP-OCRv5 is materially stronger on dense documents, tables, rotated scans, handwriting, and CJK, and adds full document-structure understanding.
Contact
For work inquiries, collaboration, feature requests, or any questions, reach out to:
Farhan Lodi — farhanlodi31@gmail.com
License
MIT © PaddleOcrNet contributors. Downloaded models are Apache-2.0 / MIT and attributed to their authors (see NOTICE).
| Product | Versions Compatible and additional computed target framework versions. |
|---|---|
| .NET | net10.0 is compatible. net10.0-android was computed. net10.0-browser was computed. net10.0-ios was computed. net10.0-maccatalyst was computed. net10.0-macos was computed. net10.0-tvos was computed. net10.0-windows was computed. |
-
net10.0
- Clipper2 (>= 2.0.0)
- Docnet.Core (>= 2.6.0)
- Microsoft.Extensions.DependencyInjection.Abstractions (>= 10.0.9)
- Microsoft.Extensions.Diagnostics.HealthChecks (>= 10.0.9)
- Microsoft.Extensions.Logging.Abstractions (>= 10.0.9)
- Microsoft.ML.OnnxRuntime (>= 1.27.0)
- SixLabors.ImageSharp (>= 3.1.12)
NuGet packages (1)
Showing the top 1 NuGet packages that depend on PaddleOcrNet:
| Package | Downloads |
|---|---|
|
EasyOcrSharp
High-accuracy native .NET OCR powered by EasyOCR's neural models running on ONNX Runtime. No Python required. |
GitHub repositories
This package is not used by any popular GitHub repositories.
| Version | Downloads | Last Updated |
|---|---|---|
| 1.0.0 | 89 | 6/21/2026 |