FieldCure.DocumentParsers.Pdf 1.1.0

This package has been deprecated as it is legacy and is no longer maintained.

Suggested Alternatives

Additional Details

Replaced by FieldCure.DocumentParsers 2.0 (PDF text) + FieldCure.DocumentParsers.Imaging 1.0 (page rendering). See the 2.0.0 release notes for migration.

dotnet add package FieldCure.DocumentParsers.Pdf --version 1.1.0

NuGet\Install-Package FieldCure.DocumentParsers.Pdf -Version 1.1.0

This command is intended to be used within the Package Manager Console in Visual Studio, as it uses the NuGet module's version of Install-Package.

<PackageReference Include="FieldCure.DocumentParsers.Pdf" Version="1.1.0" />

For projects that support PackageReference, copy this XML node into the project file to reference the package.

<PackageVersion Include="FieldCure.DocumentParsers.Pdf" Version="1.1.0" />
                    

                            Directory.Packages.props

<PackageReference Include="FieldCure.DocumentParsers.Pdf" />
                    

                            Project file

For projects that support Central Package Management (CPM), copy this XML node into the solution Directory.Packages.props file to version the package.

paket add FieldCure.DocumentParsers.Pdf --version 1.1.0

The NuGet Team does not provide support for this client. Please contact its maintainers for support.

#r "nuget: FieldCure.DocumentParsers.Pdf, 1.1.0"

#r directive can be used in F# Interactive and Polyglot Notebooks. Copy this into the interactive tool or source code of the script to reference the package.

#:package FieldCure.DocumentParsers.Pdf@1.1.0

#:package directive can be used in C# file-based apps starting in .NET 10 preview 4. Copy this into a .cs file before any lines of code to reference the package.

#addin nuget:?package=FieldCure.DocumentParsers.Pdf&version=1.1.0
                    

                            Install as a Cake Addin

#tool nuget:?package=FieldCure.DocumentParsers.Pdf&version=1.1.0
                    

                            Install as a Cake Tool

The NuGet Team does not provide support for this client. Please contact its maintainers for support.

FieldCure.DocumentParsers.Pdf

PDF text extraction and page image rendering — an extension package for FieldCure.DocumentParsers.

Features

Text extraction — Page-by-page text extraction via PdfPig with ## Page {n} headers
Page rendering — Each page rendered as PNG via PDFtoImage (PDFium)
IMediaDocumentParser — Implements both ExtractText and ExtractImages for PDF
Factory integration — One-line registration with DocumentParserFactory

Install

dotnet add package FieldCure.DocumentParsers.Pdf

Quick Start

using FieldCure.DocumentParsers;
using FieldCure.DocumentParsers.Pdf;

// Register PDF support (call once at startup)
DocumentParserFactoryExtensions.AddPdfSupport();

// Text extraction
var parser = DocumentParserFactory.GetParser(".pdf");
var text = parser!.ExtractText(File.ReadAllBytes("document.pdf"));

// Page image rendering
var mediaParser = (IMediaDocumentParser)parser;
var images = mediaParser.ExtractImages(File.ReadAllBytes("document.pdf"), dpi: 150);
foreach (var img in images)
    File.WriteAllBytes($"{img.Label}.png", img.Data);

Dependencies

FieldCure.DocumentParsers — IDocumentParser interface
PdfPig — PDF text extraction
PDFtoImage — PDF page rendering (PDFium native)

FieldCure.DocumentParsers — DOCX, HWPX, XLSX, PPTX text extraction
FieldCure.AssistStudio.Core — AI provider client library

License

Product	Compatible and additional computed target framework versions.
.NET	net8.0 is compatible. net8.0-android was computed. net8.0-browser was computed. net8.0-ios was computed. net8.0-maccatalyst was computed. net8.0-macos was computed. net8.0-tvos was computed. net8.0-windows was computed. net9.0 was computed. net9.0-android was computed. net9.0-browser was computed. net9.0-ios was computed. net9.0-maccatalyst was computed. net9.0-macos was computed. net9.0-tvos was computed. net9.0-windows was computed. net10.0 was computed. net10.0-android was computed. net10.0-browser was computed. net10.0-ios was computed. net10.0-maccatalyst was computed. net10.0-macos was computed. net10.0-tvos was computed. net10.0-windows was computed.

Product

.NET

Compatible target framework(s)

Included target framework(s) (in package)

Learn more about Target Frameworks and .NET Standard.

net8.0
- FieldCure.DocumentParsers (>= 1.1.0)
- PdfPig (>= 0.1.14)
- PDFtoImage (>= 5.2.0)

NuGet packages (1)

Showing the top 1 NuGet packages that depend on FieldCure.DocumentParsers.Pdf:

Package	Downloads
FieldCure.DocumentParsers.Pdf.Ocr Tesseract OCR fallback for scanned PDFs in FieldCure.DocumentParsers.Pdf	273

GitHub repositories

This package is not used by any popular GitHub repositories.

Version	Downloads	Last Updated
1.1.0	181	4/8/2026
1.0.0	138	4/6/2026
0.2.0	137	3/27/2026
0.1.0	111	3/26/2026

# Release Notes — FieldCure.DocumentParsers.Pdf

## [1.1.0] - 2026-04-08

### Added
- `IOcrEngine` interface for pluggable OCR engines
- `PdfParser(IOcrEngine)` constructor overload with OCR fallback for scanned pages
- `AddPdfSupport(IOcrEngine)` factory registration overload
- Pages with < 5% meaningful text are automatically rendered at 300 DPI and sent to OCR

### Note
Backward compatible — existing parameterless constructor and `AddPdfSupport()` unchanged.

## [1.0.0] - 2026-04-06

### Note
First stable release, aligned with FieldCure.DocumentParsers 1.0.0.
No API changes from 0.2.0 — public surface (`PdfParser`, `AddPdfSupport()`) is now committed as stable.

## [0.2.0] - 2026-03-27

### Added
- Unit test project (`DocumentParsers.Pdf.Tests`) with 11 tests covering text extraction, page headers, content ordering, and image rendering
- Higher DPI rendering test (`ExtractImages_HigherDpiProducesLargerImage`)

### Changed
- Migrated to independent repository (fieldcure/fieldcure-document-parsers)
- `RepositoryUrl` updated to `https://github.com/fieldcure/fieldcure-document-parsers`

## [0.1.0] - 2026-03-25

### Added
- `PdfParser` — PDF text extraction via PdfPig with `## Page {n}` page headers
- `PdfParser.ExtractImages` — PDF page rendering to PNG via PDFtoImage (PDFium)
- `IMediaDocumentParser` support for combined text + image extraction
- `DocumentParserFactoryExtensions.AddPdfSupport()` — one-line factory registration
- Extracted from AssistStudio.Core to enable independent package consumption