Angri450.Nong.MultiModal
1.0.0
There is a newer version of this package available.
See the version list below for details.
See the version list below for details.
dotnet add package Angri450.Nong.MultiModal --version 1.0.0
NuGet\Install-Package Angri450.Nong.MultiModal -Version 1.0.0
This command is intended to be used within the Package Manager Console in Visual Studio, as it uses the NuGet module's version of Install-Package.
<PackageReference Include="Angri450.Nong.MultiModal" Version="1.0.0" />
For projects that support PackageReference, copy this XML node into the project file to reference the package.
<PackageVersion Include="Angri450.Nong.MultiModal" Version="1.0.0" />
<PackageReference Include="Angri450.Nong.MultiModal" />
For projects that support Central Package Management (CPM), copy this XML node into the solution Directory.Packages.props file to version the package.
paket add Angri450.Nong.MultiModal --version 1.0.0
The NuGet Team does not provide support for this client. Please contact its maintainers for support.
#r "nuget: Angri450.Nong.MultiModal, 1.0.0"
#r directive can be used in F# Interactive and Polyglot Notebooks. Copy this into the interactive tool or source code of the script to reference the package.
#:package Angri450.Nong.MultiModal@1.0.0
#:package directive can be used in C# file-based apps starting in .NET 10 preview 4. Copy this into a .cs file before any lines of code to reference the package.
#addin nuget:?package=Angri450.Nong.MultiModal&version=1.0.0
#tool nuget:?package=Angri450.Nong.MultiModal&version=1.0.0
The NuGet Team does not provide support for this client. Please contact its maintainers for support.
Angri450.Nong.MultiModal
Multi-modal document processing library. OCR first, speech-to-text and text-to-speech coming. Pure .NET, minimal dependencies.
Install
dotnet add package Angri450.Nong.MultiModal
Optional: local OCR
pip install paddlepaddle paddleocr
Skip this if you only use the cloud API.
Quick Start
Cloud API (PaddleOCR-VL-1.6)
using MultiModalCore;
// Token read from PADDLEOCR_TOKEN environment variable
var client = new PaddleOcrVlClient();
// Process a local file → Markdown
var mdFiles = await client.ProcessAsync("scan.pdf", "output/");
// → output/doc_0.md, output/doc_1.md, ...
// Process a URL
await client.ProcessAsync("https://example.com/contract.png", "output/");
// Process raw bytes
byte[] fileBytes = ...;
await client.ProcessAsync(fileBytes, "scan.png", "output/");
// Process → Word (layout-preserving, depends on Angri450.Nong.Docx)
var docxPath = await client.ProcessToWordAsync("scan.pdf", "output/result.docx");
Local OCR
var local = new LocalOcrClient(pythonExe: "python", lang: "ch");
// Check environment
var (ok, msg) = await local.CheckEnvironmentAsync();
if (!ok) Console.WriteLine("Install: pip install paddlepaddle paddleocr");
// Recognize a single image
var blocks = await local.RecognizeAsync("crop.png");
foreach (var b in blocks)
Console.WriteLine($"[{b.Confidence:P0}] {b.Text}");
Step-by-step control
var client = new PaddleOcrVlClient();
var jobId = await client.SubmitFileAsync("scan.pdf");
var resultUrl = await client.WaitForJobAsync(jobId, TimeSpan.FromSeconds(5));
var mdFiles = await client.DownloadResultsAsync(resultUrl, "output/");
// Or get structured data for custom processing
var ocrResult = await client.DownloadResultsStructuredAsync(resultUrl, "output/");
API Reference
PaddleOcrVlClient (cloud)
| Method | Description |
|---|---|
ProcessAsync(input, outputDir) |
One-shot: submit → wait → download Markdown |
ProcessToWordAsync(input, docxPath) |
One-shot: submit → wait → download → Word |
SubmitFileAsync(path) |
Submit local file, returns jobId |
SubmitBytesAsync(bytes, name) |
Submit in-memory data, returns jobId |
SubmitUrlAsync(url) |
Submit remote URL, returns jobId |
WaitForJobAsync(jobId, interval) |
Poll until done, returns result URL |
DownloadResultsAsync(resultUrl, dir) |
Download and save Markdown + images |
DownloadResultsStructuredAsync(resultUrl, dir) |
Download and return OcrResult |
LocalOcrClient (local CPU)
| Method | Description |
|---|---|
RecognizeAsync(imagePath) |
OCR a single image |
RecognizeAsync(imageBytes) |
OCR from memory |
RecognizeBatchAsync(paths) |
OCR multiple images |
CheckEnvironmentAsync() |
Verify Python + PaddleOCR installation |
Options
var options = new OcrOptions
{
UseDocOrientationClassify = true, // orientation detection
UseDocUnwarping = true, // document unwarping
UseChartRecognition = true, // chart parsing
};
await client.ProcessAsync("scan.pdf", "output/", options);
Word Output Pipeline
ProcessToWordAsync produces layout-preserving .docx files:
- Cloud API returns
prunedResult.parsing_res_list— each block hasblock_label,block_content, andblock_bbox LayoutToWordConvertermaps blocks toAngri450.Nong.Docxprimitives:doc_title→DocumentWriter.Title()paragraph_title→DocumentWriter.Heading(2)text→DocumentWriter.Body()image→ImageEmbedder.EmbedSingleImage()(actual download + embed)table→DocumentWriter.Table()(HTML → OpenXML)vision_footnote→DocumentWriter.Footnote()
- Multi-column pages auto-detected from
block_bboxcoordinates, rendered with borderless tables ElementOrder.RectifyTree()fixes OpenXML element ordering before save
Dependency Chain
Angri450.Nong.MultiModal
└── Angri450.Nong.Docx
└── DocumentFormat.OpenXml
No System.Drawing.Common, no SixLabors.ImageSharp. Image dimensions are read via ImageHeaderReader.cs — 120 lines of pure C# binary header parsing for PNG/JPEG/GIF/BMP/TIFF.
Roadmap
- Hybrid mode: cloud layout analysis + local CPU OCR → save quota, increase speed
- ONNX Runtime migration for local OCR (remove Python dependency)
- Speech-to-text (STT)
- Text-to-speech (TTS)
License
Apache-2.0
| Product | Versions Compatible and additional computed target framework versions. |
|---|---|
| .NET | net11.0 is compatible. |
Compatible target framework(s)
Included target framework(s) (in package)
Learn more about Target Frameworks and .NET Standard.
-
net11.0
- Angri450.Nong.Docx (>= 2.0.0)
- DocumentFormat.OpenXml (>= 3.5.1)
NuGet packages
This package is not used by any NuGet packages.
GitHub repositories
This package is not used by any popular GitHub repositories.