Angri450.Nong.MultiModal 1.0.0

There is a newer version of this package available.
See the version list below for details.

dotnet add package Angri450.Nong.MultiModal --version 1.0.0

NuGet\Install-Package Angri450.Nong.MultiModal -Version 1.0.0

This command is intended to be used within the Package Manager Console in Visual Studio, as it uses the NuGet module's version of Install-Package.

<PackageReference Include="Angri450.Nong.MultiModal" Version="1.0.0" />

For projects that support PackageReference, copy this XML node into the project file to reference the package.

<PackageVersion Include="Angri450.Nong.MultiModal" Version="1.0.0" />
                    

                            Directory.Packages.props

<PackageReference Include="Angri450.Nong.MultiModal" />
                    

                            Project file

For projects that support Central Package Management (CPM), copy this XML node into the solution Directory.Packages.props file to version the package.

paket add Angri450.Nong.MultiModal --version 1.0.0

The NuGet Team does not provide support for this client. Please contact its maintainers for support.

#r "nuget: Angri450.Nong.MultiModal, 1.0.0"

#r directive can be used in F# Interactive and Polyglot Notebooks. Copy this into the interactive tool or source code of the script to reference the package.

#:package Angri450.Nong.MultiModal@1.0.0

#:package directive can be used in C# file-based apps starting in .NET 10 preview 4. Copy this into a .cs file before any lines of code to reference the package.

#addin nuget:?package=Angri450.Nong.MultiModal&version=1.0.0
                    

                            Install as a Cake Addin

#tool nuget:?package=Angri450.Nong.MultiModal&version=1.0.0
                    

                            Install as a Cake Tool

The NuGet Team does not provide support for this client. Please contact its maintainers for support.

Angri450.Nong.MultiModal

Multi-modal document processing library. OCR first, speech-to-text and text-to-speech coming. Pure .NET, minimal dependencies.

Install

dotnet add package Angri450.Nong.MultiModal

Optional: local OCR

pip install paddlepaddle paddleocr

Skip this if you only use the cloud API.

Quick Start

Cloud API (PaddleOCR-VL-1.6)

using MultiModalCore;

// Token read from PADDLEOCR_TOKEN environment variable
var client = new PaddleOcrVlClient();

// Process a local file → Markdown
var mdFiles = await client.ProcessAsync("scan.pdf", "output/");
// → output/doc_0.md, output/doc_1.md, ...

// Process a URL
await client.ProcessAsync("https://example.com/contract.png", "output/");

// Process raw bytes
byte[] fileBytes = ...;
await client.ProcessAsync(fileBytes, "scan.png", "output/");

// Process → Word (layout-preserving, depends on Angri450.Nong.Docx)
var docxPath = await client.ProcessToWordAsync("scan.pdf", "output/result.docx");

Local OCR

var local = new LocalOcrClient(pythonExe: "python", lang: "ch");

// Check environment
var (ok, msg) = await local.CheckEnvironmentAsync();
if (!ok) Console.WriteLine("Install: pip install paddlepaddle paddleocr");

// Recognize a single image
var blocks = await local.RecognizeAsync("crop.png");
foreach (var b in blocks)
    Console.WriteLine($"[{b.Confidence:P0}] {b.Text}");

Step-by-step control

var client = new PaddleOcrVlClient();
var jobId = await client.SubmitFileAsync("scan.pdf");
var resultUrl = await client.WaitForJobAsync(jobId, TimeSpan.FromSeconds(5));
var mdFiles = await client.DownloadResultsAsync(resultUrl, "output/");

// Or get structured data for custom processing
var ocrResult = await client.DownloadResultsStructuredAsync(resultUrl, "output/");

API Reference

PaddleOcrVlClient (cloud)

Method	Description
`ProcessAsync(input, outputDir)`	One-shot: submit → wait → download Markdown
`ProcessToWordAsync(input, docxPath)`	One-shot: submit → wait → download → Word
`SubmitFileAsync(path)`	Submit local file, returns jobId
`SubmitBytesAsync(bytes, name)`	Submit in-memory data, returns jobId
`SubmitUrlAsync(url)`	Submit remote URL, returns jobId
`WaitForJobAsync(jobId, interval)`	Poll until done, returns result URL
`DownloadResultsAsync(resultUrl, dir)`	Download and save Markdown + images
`DownloadResultsStructuredAsync(resultUrl, dir)`	Download and return `OcrResult`

LocalOcrClient (local CPU)

Method	Description
`RecognizeAsync(imagePath)`	OCR a single image
`RecognizeAsync(imageBytes)`	OCR from memory
`RecognizeBatchAsync(paths)`	OCR multiple images
`CheckEnvironmentAsync()`	Verify Python + PaddleOCR installation

Options

var options = new OcrOptions
{
    UseDocOrientationClassify = true,   // orientation detection
    UseDocUnwarping = true,             // document unwarping
    UseChartRecognition = true,         // chart parsing
};
await client.ProcessAsync("scan.pdf", "output/", options);

Word Output Pipeline

ProcessToWordAsync produces layout-preserving .docx files:

Cloud API returns prunedResult.parsing_res_list — each block has block_label, block_content, and block_bbox
LayoutToWordConverter maps blocks to Angri450.Nong.Docx primitives:
- doc_title → DocumentWriter.Title()
- paragraph_title → DocumentWriter.Heading(2)
- text → DocumentWriter.Body()
- image → ImageEmbedder.EmbedSingleImage() (actual download + embed)
- table → DocumentWriter.Table() (HTML → OpenXML)
- vision_footnote → DocumentWriter.Footnote()
Multi-column pages auto-detected from block_bbox coordinates, rendered with borderless tables
ElementOrder.RectifyTree() fixes OpenXML element ordering before save

Dependency Chain

Angri450.Nong.MultiModal
└── Angri450.Nong.Docx
    └── DocumentFormat.OpenXml

No System.Drawing.Common, no SixLabors.ImageSharp. Image dimensions are read via ImageHeaderReader.cs — 120 lines of pure C# binary header parsing for PNG/JPEG/GIF/BMP/TIFF.

Roadmap

Hybrid mode: cloud layout analysis + local CPU OCR → save quota, increase speed
ONNX Runtime migration for local OCR (remove Python dependency)
Speech-to-text (STT)
Text-to-speech (TTS)

License

Apache-2.0

Product	Compatible and additional computed target framework versions.
.NET	net11.0 is compatible.

Compatible target framework(s)

Included target framework(s) (in package)

Learn more about Target Frameworks and .NET Standard.

net11.0
- Angri450.Nong.Docx (>= 2.0.0)
- DocumentFormat.OpenXml (>= 3.5.1)

NuGet packages

This package is not used by any NuGet packages.

GitHub repositories

This package is not used by any popular GitHub repositories.

Version	Downloads	Last Updated
3.0.6	85	6/1/2026
3.0.5	86	6/1/2026
3.0.2	86	6/1/2026
3.0.1	83	6/1/2026
3.0.0	92	6/1/2026
1.0.0	86	5/30/2026