Angri450.Nong.MultiModal
3.0.6
dotnet add package Angri450.Nong.MultiModal --version 3.0.6
NuGet\Install-Package Angri450.Nong.MultiModal -Version 3.0.6
<PackageReference Include="Angri450.Nong.MultiModal" Version="3.0.6" />
<PackageVersion Include="Angri450.Nong.MultiModal" Version="3.0.6" />
<PackageReference Include="Angri450.Nong.MultiModal" />
paket add Angri450.Nong.MultiModal --version 3.0.6
#r "nuget: Angri450.Nong.MultiModal, 3.0.6"
#:package Angri450.Nong.MultiModal@3.0.6
#addin nuget:?package=Angri450.Nong.MultiModal&version=3.0.6
#tool nuget:?package=Angri450.Nong.MultiModal&version=3.0.6
Angri450.Nong.MultiModal
Multi-modal document processing library. Cloud OCR (PaddleOCR-VL-1.6), local CPU OCR (PaddleOCR), and pure .NET image structure analysis. Speech-to-text and text-to-speech planned.
Supported Platforms
.NET 8.0 and above (net8.0, net9.0, net10.0, net11.0). Windows, macOS, Linux.
Install
dotnet add package Angri450.Nong.MultiModal
Optional: local OCR
pip install paddlepaddle paddleocr
Skip if you only use cloud API or image analysis — no Python required for those.
Three Capabilities
| Capability | Class | Dependencies |
|---|---|---|
| Cloud OCR | PaddleOcrVlClient |
Network + PADDLEOCR_TOKEN |
| Image Analysis | ImageAnalyzer |
None (pure .NET via SkiaSharp in ThirdParty) |
| Local OCR | LocalOcrClient |
Python + PaddleOCR |
ImageAnalyzer — Pure .NET Image Structure Analysis
Load any PNG/JPEG image and get a detailed structural report — no OCR, no Python, no cloud API. Understand image layout in code.
using MultiModalCore;
var analyzer = new ImageAnalyzer();
// From file
var layout = analyzer.Analyze("diagram.png", targetWidth: 50);
// From bytes
var layout = analyzer.Analyze(imageBytes, targetWidth: 60);
// Print an ASCII map of the image layout
Console.WriteLine(layout.AsciiMap);
// Check whitespace ratio
Console.WriteLine($"Whitespace: {layout.WhitespaceRatio:P0}");
// Get detected content regions
foreach (var region in layout.Regions)
Console.WriteLine($"[{region.Type}] at ({region.X},{region.Y}) {region.Width}x{region.Height}");
// Get content bounding box (useful for auto-cropping)
Console.WriteLine($"Content area: {layout.ContentWidth}x{layout.ContentHeight}");
ImageLayout Properties
| Property | Type | Description |
|---|---|---|
AsciiMap |
string |
Pixel-to-character text map — print to "see" layout |
WhitespaceRatio |
double |
Percentage of white/blank pixels (0–1) |
Regions |
List<ContentRegion> |
Connected non-white content blocks |
ContentMinX/Y |
int |
Top-left of content bounding box |
ContentWidth/Height |
int |
Size of content bounding box |
BlackPixelCount |
int |
Text-like dark pixels |
GraphicPixelCount |
int |
Color/graphic pixels |
EdgePixelCount |
int |
Edge/border pixels |
ContentRegion Properties
| Property | Type | Description |
|---|---|---|
X, Y, Width, Height |
int |
Region bounding box in sample coordinates |
Type |
RegionType |
Text, Graphic, Edge, or Background |
PixelCount |
int |
Number of connected pixels in this region |
How It Works
- Load image via SkiaSharp → decode pixels
- Downsample to target width (default 60 chars wide)
- Classify each sample block: white (>240 brightness), black (<40), graphic (colorful), edge (dark gray)
- Flood-fill to find connected non-white regions
- Return ASCII map + region list + whitespace stats
ASCII Map Characters
| Char | Meaning |
|---|---|
(space) |
White/background |
# |
Black/dark text |
O |
Colored graphics |
+ |
Edge/border |
Use Cases
- Debug diagram/chart output — verify layout, whitespace ratio, content positioning
- Pre-process before cloud OCR — find text regions, skip blank pages
- Validate generated images — check that charts and diagrams render correctly
- Image quality checks — detect excessive whitespace, broken rendering
Cloud OCR (PaddleOCR-VL-1.6)
var client = new PaddleOcrVlClient(); // Token from PADDLEOCR_TOKEN env var
// File → Markdown
await client.ProcessAsync("scan.pdf", "output/");
// File → Word (layout-preserving)
await client.ProcessToWordAsync("scan.pdf", "output/result.docx");
// URL → Markdown
await client.ProcessAsync("https://example.com/document.png", "output/");
// Raw bytes → Markdown
await client.ProcessAsync(fileBytes, "scan.png", "output/");
Step-by-Step Control
var jobId = await client.SubmitFileAsync("scan.pdf");
var resultUrl = await client.WaitForJobAsync(jobId, TimeSpan.FromSeconds(5));
var mdFiles = await client.DownloadResultsAsync(resultUrl, "output/");
// Structured data for custom processing
var ocrResult = await client.DownloadResultsStructuredAsync(resultUrl, "output/");
foreach (var page in ocrResult.Pages)
foreach (var block in page.Blocks)
Console.WriteLine($"[{block.Label}] {block.Content}");
Options
var options = new OcrOptions
{
UseDocOrientationClassify = true, // Auto-detect orientation
UseDocUnwarping = true, // Document unwarping
UseChartRecognition = true, // Chart parsing
};
await client.ProcessAsync("scan.pdf", "output/", options);
Local CPU OCR (PaddleOCR)
var local = new LocalOcrClient(pythonExe: "python", lang: "ch");
var (ok, msg) = await local.CheckEnvironmentAsync();
if (!ok) Console.WriteLine("Install: pip install paddlepaddle paddleocr");
var blocks = await local.RecognizeAsync("crop.png");
foreach (var b in blocks)
Console.WriteLine($"[{b.Confidence:P0}] {b.Text}");
Word Output Pipeline
ProcessToWordAsync produces layout-preserving .docx:
- Cloud API returns
parsing_res_list— each block hasblock_label,block_content,block_bbox LayoutToWordConvertermaps blocks to Docx primitives:doc_title→ Title,paragraph_title→ Heading,text→ Bodyimage→ embedded image (actual download),table→ OpenXML tablevision_footnote→ Footnote
- Multi-column pages auto-detected from
block_bboxcoordinates ElementOrder.RectifyTree()fixes OpenXML ordering before save
Dependencies
Angri450.Nong.Docx— Word generation forProcessToWordAsyncoutputAngri450.Nong.ThirdParty— SkiaSharp (merged, used by ImageAnalyzer)
API Reference
ImageAnalyzer
| Method | Description |
|---|---|
Analyze(path, targetWidth) |
Analyze image from file path |
Analyze(bytes, targetWidth) |
Analyze image from byte array |
Analyze(bitmap, targetWidth) |
Analyze from SKBitmap |
PaddleOcrVlClient (Cloud)
| Method | Description |
|---|---|
ProcessAsync(input, outputDir) |
Submit → wait → download Markdown |
ProcessToWordAsync(input, docxPath) |
Submit → wait → download → Word |
SubmitFileAsync(path) |
Submit local file, returns jobId |
SubmitBytesAsync(bytes, name) |
Submit in-memory data |
SubmitUrlAsync(url) |
Submit remote URL |
WaitForJobAsync(jobId, interval) |
Poll until done, returns result URL |
DownloadResultsAsync(resultUrl, dir) |
Download Markdown + images |
DownloadResultsStructuredAsync(resultUrl, dir) |
Download and return OcrResult |
LocalOcrClient (CPU)
| Method | Description |
|---|---|
RecognizeAsync(path) |
OCR a single image |
RecognizeAsync(bytes) |
OCR from memory |
RecognizeBatchAsync(paths) |
OCR multiple images |
CheckEnvironmentAsync() |
Verify Python + PaddleOCR |
Source
https://github.com/angri450/Nong.NET — Issues and PRs welcome.
License
MIT
| Product | Versions Compatible and additional computed target framework versions. |
|---|---|
| .NET | net8.0 is compatible. net8.0-android was computed. net8.0-browser was computed. net8.0-ios was computed. net8.0-maccatalyst was computed. net8.0-macos was computed. net8.0-tvos was computed. net8.0-windows was computed. net9.0 was computed. net9.0-android was computed. net9.0-browser was computed. net9.0-ios was computed. net9.0-maccatalyst was computed. net9.0-macos was computed. net9.0-tvos was computed. net9.0-windows was computed. net10.0 was computed. net10.0-android was computed. net10.0-browser was computed. net10.0-ios was computed. net10.0-maccatalyst was computed. net10.0-macos was computed. net10.0-tvos was computed. net10.0-windows was computed. |
-
net8.0
- Angri450.Nong.Docx (>= 3.0.2)
NuGet packages
This package is not used by any NuGet packages.
GitHub repositories
This package is not used by any popular GitHub repositories.