HPD-TextExtract 0.5.5

dotnet add package HPD-TextExtract --version 0.5.5
                    
NuGet\Install-Package HPD-TextExtract -Version 0.5.5
                    
This command is intended to be used within the Package Manager Console in Visual Studio, as it uses the NuGet module's version of Install-Package.
<PackageReference Include="HPD-TextExtract" Version="0.5.5" />
                    
For projects that support PackageReference, copy this XML node into the project file to reference the package.
<PackageVersion Include="HPD-TextExtract" Version="0.5.5" />
                    
Directory.Packages.props
<PackageReference Include="HPD-TextExtract" />
                    
Project file
For projects that support Central Package Management (CPM), copy this XML node into the solution Directory.Packages.props file to version the package.
paket add HPD-TextExtract --version 0.5.5
                    
#r "nuget: HPD-TextExtract, 0.5.5"
                    
#r directive can be used in F# Interactive and Polyglot Notebooks. Copy this into the interactive tool or source code of the script to reference the package.
#:package HPD-TextExtract@0.5.5
                    
#:package directive can be used in C# file-based apps starting in .NET 10 preview 4. Copy this into a .cs file before any lines of code to reference the package.
#addin nuget:?package=HPD-TextExtract&version=0.5.5
                    
Install as a Cake Addin
#tool nuget:?package=HPD-TextExtract&version=0.5.5
                    
Install as a Cake Tool

HPD-TextExtract

.NET-native text and document extraction library with rich PDF structure, diagnostics, OCR planning, and layout-aware outputs.

Install

dotnet add package HPD-TextExtract

Use When

Use this package when an app or library needs to turn files, byte payloads, or URLs into text plus structured extraction metadata.

Supported inputs include:

  • PDF
  • Plain text, Markdown, JSON, and XML
  • HTML and web URLs
  • Microsoft Word, Excel, and PowerPoint Open XML documents
  • Images through an injected OCR engine

For HPD agent middleware, use HPD-Agent.TextExtraction, which builds on this package.

Quick Start

using HPD.TextExtract;

using var extractor = new TextExtractionUtility();
var result = await extractor.ExtractTextAsync("document.pdf");

if (!result.IsSuccess)
{
    throw new InvalidOperationException(result.ErrorMessage);
}

Console.WriteLine(result.ExtractedText);

Binary Payloads

using HPD.TextExtract;
using HPD.TextExtract.Models;

var bytes = await File.ReadAllBytesAsync("document.pdf");

using var extractor = new TextExtractionUtility();
var result = await extractor.ExtractTextAsync(
    bytes,
    mimeType: MimeTypes.Pdf,
    fileName: "document.pdf");

Dependency Injection

using HPD.TextExtract;

builder.Services.AddTextExtraction();

Register custom decoders or OCR engines when the built-in behavior is not enough:

using HPD.TextExtract;
using HPD.TextExtract.Decoders;

builder.Services.AddTextExtractionWithOcr<MyOcrEngine>();

PDF Notes

PDF extraction is powered by PDFium through PDFiumCore.

The PDF pipeline extracts native text, glyph geometry, font metadata, colors, embedded image regions, optional screenshots, and diagnostics. It can also plan OCR for scanned or low-quality pages when an OCR executor is configured.

PDFium is a native dependency. The NuGet dependency brings platform-specific native assets, including macOS arm64 and x64, Windows x64/x86, and Linux x64 through the upstream PDFium packages. Native libraries are deployed as sidecar runtime assets; they are not embedded inside HPD.TextExtract.dll.

Output Shape

TextExtractionResult gives the simple text view:

  • IsSuccess
  • ExtractedText
  • FileName
  • MimeType
  • ProcessingTime
  • ErrorMessage

For richer callers, TextExtractionResult.Extraction exposes:

  • Content.Sections
  • Pages
  • TextItems
  • Assets
  • Diagnostics
  • Metadata

Target Frameworks

This package targets the repo-standard modern frameworks:

  • net8.0
  • net9.0
  • net10.0

It is configured for trimming, single-file analysis, and Native AOT analysis.

Product Compatible and additional computed target framework versions.
.NET net8.0 is compatible.  net8.0-android was computed.  net8.0-browser was computed.  net8.0-ios was computed.  net8.0-maccatalyst was computed.  net8.0-macos was computed.  net8.0-tvos was computed.  net8.0-windows was computed.  net9.0 is compatible.  net9.0-android was computed.  net9.0-browser was computed.  net9.0-ios was computed.  net9.0-maccatalyst was computed.  net9.0-macos was computed.  net9.0-tvos was computed.  net9.0-windows was computed.  net10.0 is compatible.  net10.0-android was computed.  net10.0-browser was computed.  net10.0-ios was computed.  net10.0-maccatalyst was computed.  net10.0-macos was computed.  net10.0-tvos was computed.  net10.0-windows was computed. 
Compatible target framework(s)
Included target framework(s) (in package)
Learn more about Target Frameworks and .NET Standard.

NuGet packages (1)

Showing the top 1 NuGet packages that depend on HPD-TextExtract:

Package Downloads
HPD-Agent.TextExtraction

HPD Agent document handling middleware and builder extensions powered by HPD-TextExtract.

GitHub repositories

This package is not used by any popular GitHub repositories.

Version Downloads Last Updated
0.5.5 32 6/21/2026
0.5.0 103 6/11/2026