Angri450.Nong.MultiModal 3.0.6

.NET 8.0

dotnet add package Angri450.Nong.MultiModal --version 3.0.6

NuGet\Install-Package Angri450.Nong.MultiModal -Version 3.0.6

This command is intended to be used within the Package Manager Console in Visual Studio, as it uses the NuGet module's version of Install-Package.

<PackageReference Include="Angri450.Nong.MultiModal" Version="3.0.6" />

For projects that support PackageReference, copy this XML node into the project file to reference the package.

<PackageVersion Include="Angri450.Nong.MultiModal" Version="3.0.6" />
                    

                            Directory.Packages.props

<PackageReference Include="Angri450.Nong.MultiModal" />
                    

                            Project file

For projects that support Central Package Management (CPM), copy this XML node into the solution Directory.Packages.props file to version the package.

paket add Angri450.Nong.MultiModal --version 3.0.6

The NuGet Team does not provide support for this client. Please contact its maintainers for support.

#r "nuget: Angri450.Nong.MultiModal, 3.0.6"

#r directive can be used in F# Interactive and Polyglot Notebooks. Copy this into the interactive tool or source code of the script to reference the package.

#:package Angri450.Nong.MultiModal@3.0.6

#:package directive can be used in C# file-based apps starting in .NET 10 preview 4. Copy this into a .cs file before any lines of code to reference the package.

#addin nuget:?package=Angri450.Nong.MultiModal&version=3.0.6
                    

                            Install as a Cake Addin

#tool nuget:?package=Angri450.Nong.MultiModal&version=3.0.6
                    

                            Install as a Cake Tool

The NuGet Team does not provide support for this client. Please contact its maintainers for support.

Angri450.Nong.MultiModal

Multi-modal document processing library. Cloud OCR (PaddleOCR-VL-1.6), local CPU OCR (PaddleOCR), and pure .NET image structure analysis. Speech-to-text and text-to-speech planned.

Supported Platforms

.NET 8.0 and above (net8.0, net9.0, net10.0, net11.0). Windows, macOS, Linux.

Install

dotnet add package Angri450.Nong.MultiModal

Optional: local OCR

pip install paddlepaddle paddleocr

Skip if you only use cloud API or image analysis — no Python required for those.

Three Capabilities

Capability	Class	Dependencies
Cloud OCR	`PaddleOcrVlClient`	Network + `PADDLEOCR_TOKEN`
Image Analysis	`ImageAnalyzer`	None (pure .NET via SkiaSharp in ThirdParty)
Local OCR	`LocalOcrClient`	Python + PaddleOCR

ImageAnalyzer — Pure .NET Image Structure Analysis

Load any PNG/JPEG image and get a detailed structural report — no OCR, no Python, no cloud API. Understand image layout in code.

using MultiModalCore;

var analyzer = new ImageAnalyzer();

// From file
var layout = analyzer.Analyze("diagram.png", targetWidth: 50);

// From bytes
var layout = analyzer.Analyze(imageBytes, targetWidth: 60);

// Print an ASCII map of the image layout
Console.WriteLine(layout.AsciiMap);

// Check whitespace ratio
Console.WriteLine($"Whitespace: {layout.WhitespaceRatio:P0}");

// Get detected content regions
foreach (var region in layout.Regions)
    Console.WriteLine($"[{region.Type}] at ({region.X},{region.Y}) {region.Width}x{region.Height}");

// Get content bounding box (useful for auto-cropping)
Console.WriteLine($"Content area: {layout.ContentWidth}x{layout.ContentHeight}");

ImageLayout Properties

Property	Type	Description
`AsciiMap`	`string`	Pixel-to-character text map — print to "see" layout
`WhitespaceRatio`	`double`	Percentage of white/blank pixels (0–1)
`Regions`	`List<ContentRegion>`	Connected non-white content blocks
`ContentMinX/Y`	`int`	Top-left of content bounding box
`ContentWidth/Height`	`int`	Size of content bounding box
`BlackPixelCount`	`int`	Text-like dark pixels
`GraphicPixelCount`	`int`	Color/graphic pixels
`EdgePixelCount`	`int`	Edge/border pixels

ContentRegion Properties

Property	Type	Description
`X, Y, Width, Height`	`int`	Region bounding box in sample coordinates
`Type`	`RegionType`	`Text`, `Graphic`, `Edge`, or `Background`
`PixelCount`	`int`	Number of connected pixels in this region

How It Works

Load image via SkiaSharp → decode pixels
Downsample to target width (default 60 chars wide)
Classify each sample block: white (>240 brightness), black (<40), graphic (colorful), edge (dark gray)
Flood-fill to find connected non-white regions
Return ASCII map + region list + whitespace stats

ASCII Map Characters

Char	Meaning
(space)	White/background
`#`	Black/dark text
`O`	Colored graphics
`+`	Edge/border

Use Cases

Debug diagram/chart output — verify layout, whitespace ratio, content positioning
Pre-process before cloud OCR — find text regions, skip blank pages
Validate generated images — check that charts and diagrams render correctly
Image quality checks — detect excessive whitespace, broken rendering

Cloud OCR (PaddleOCR-VL-1.6)

var client = new PaddleOcrVlClient();  // Token from PADDLEOCR_TOKEN env var

// File → Markdown
await client.ProcessAsync("scan.pdf", "output/");

// File → Word (layout-preserving)
await client.ProcessToWordAsync("scan.pdf", "output/result.docx");

// URL → Markdown
await client.ProcessAsync("https://example.com/document.png", "output/");

// Raw bytes → Markdown
await client.ProcessAsync(fileBytes, "scan.png", "output/");

Step-by-Step Control

var jobId = await client.SubmitFileAsync("scan.pdf");
var resultUrl = await client.WaitForJobAsync(jobId, TimeSpan.FromSeconds(5));
var mdFiles = await client.DownloadResultsAsync(resultUrl, "output/");

// Structured data for custom processing
var ocrResult = await client.DownloadResultsStructuredAsync(resultUrl, "output/");
foreach (var page in ocrResult.Pages)
    foreach (var block in page.Blocks)
        Console.WriteLine($"[{block.Label}] {block.Content}");

Options

var options = new OcrOptions
{
    UseDocOrientationClassify = true,   // Auto-detect orientation
    UseDocUnwarping = true,             // Document unwarping
    UseChartRecognition = true,         // Chart parsing
};
await client.ProcessAsync("scan.pdf", "output/", options);

Local CPU OCR (PaddleOCR)

var local = new LocalOcrClient(pythonExe: "python", lang: "ch");

var (ok, msg) = await local.CheckEnvironmentAsync();
if (!ok) Console.WriteLine("Install: pip install paddlepaddle paddleocr");

var blocks = await local.RecognizeAsync("crop.png");
foreach (var b in blocks)
    Console.WriteLine($"[{b.Confidence:P0}] {b.Text}");

Word Output Pipeline

ProcessToWordAsync produces layout-preserving .docx:

Cloud API returns parsing_res_list — each block has block_label, block_content, block_bbox
LayoutToWordConverter maps blocks to Docx primitives:
- doc_title → Title, paragraph_title → Heading, text → Body
- image → embedded image (actual download), table → OpenXML table
- vision_footnote → Footnote
Multi-column pages auto-detected from block_bbox coordinates
ElementOrder.RectifyTree() fixes OpenXML ordering before save

Dependencies

Angri450.Nong.Docx — Word generation for ProcessToWordAsync output
Angri450.Nong.ThirdParty — SkiaSharp (merged, used by ImageAnalyzer)

API Reference

ImageAnalyzer

Method	Description
`Analyze(path, targetWidth)`	Analyze image from file path
`Analyze(bytes, targetWidth)`	Analyze image from byte array
`Analyze(bitmap, targetWidth)`	Analyze from SKBitmap

PaddleOcrVlClient (Cloud)

Method	Description
`ProcessAsync(input, outputDir)`	Submit → wait → download Markdown
`ProcessToWordAsync(input, docxPath)`	Submit → wait → download → Word
`SubmitFileAsync(path)`	Submit local file, returns jobId
`SubmitBytesAsync(bytes, name)`	Submit in-memory data
`SubmitUrlAsync(url)`	Submit remote URL
`WaitForJobAsync(jobId, interval)`	Poll until done, returns result URL
`DownloadResultsAsync(resultUrl, dir)`	Download Markdown + images
`DownloadResultsStructuredAsync(resultUrl, dir)`	Download and return `OcrResult`

LocalOcrClient (CPU)

Method	Description
`RecognizeAsync(path)`	OCR a single image
`RecognizeAsync(bytes)`	OCR from memory
`RecognizeBatchAsync(paths)`	OCR multiple images
`CheckEnvironmentAsync()`	Verify Python + PaddleOCR

Source

https://github.com/angri450/Nong.NET — Issues and PRs welcome.

License

MIT

Product	Compatible and additional computed target framework versions.
.NET	net8.0 is compatible. net8.0-android was computed. net8.0-browser was computed. net8.0-ios was computed. net8.0-maccatalyst was computed. net8.0-macos was computed. net8.0-tvos was computed. net8.0-windows was computed. net9.0 was computed. net9.0-android was computed. net9.0-browser was computed. net9.0-ios was computed. net9.0-maccatalyst was computed. net9.0-macos was computed. net9.0-tvos was computed. net9.0-windows was computed. net10.0 was computed. net10.0-android was computed. net10.0-browser was computed. net10.0-ios was computed. net10.0-maccatalyst was computed. net10.0-macos was computed. net10.0-tvos was computed. net10.0-windows was computed.

Product

.NET

Compatible target framework(s)

Included target framework(s) (in package)

Learn more about Target Frameworks and .NET Standard.

net8.0
- Angri450.Nong.Docx (>= 3.0.2)

NuGet packages

This package is not used by any NuGet packages.

GitHub repositories

This package is not used by any popular GitHub repositories.

Version	Downloads	Last Updated
3.0.6	75	6/1/2026
3.0.5	76	6/1/2026
3.0.2	76	6/1/2026
3.0.1	75	6/1/2026
3.0.0	84	6/1/2026
1.0.0	78	5/30/2026