FieldCure.DocumentParsers.Pdf
0.2.0
FieldCure.DocumentParsers 2.0.0
Additional DetailsReplaced by FieldCure.DocumentParsers 2.0 (PDF text) + FieldCure.DocumentParsers.Imaging 1.0 (page rendering). See the 2.0.0 release notes for migration.
See the version list below for details.
dotnet add package FieldCure.DocumentParsers.Pdf --version 0.2.0
NuGet\Install-Package FieldCure.DocumentParsers.Pdf -Version 0.2.0
<PackageReference Include="FieldCure.DocumentParsers.Pdf" Version="0.2.0" />
<PackageVersion Include="FieldCure.DocumentParsers.Pdf" Version="0.2.0" />
<PackageReference Include="FieldCure.DocumentParsers.Pdf" />
paket add FieldCure.DocumentParsers.Pdf --version 0.2.0
#r "nuget: FieldCure.DocumentParsers.Pdf, 0.2.0"
#:package FieldCure.DocumentParsers.Pdf@0.2.0
#addin nuget:?package=FieldCure.DocumentParsers.Pdf&version=0.2.0
#tool nuget:?package=FieldCure.DocumentParsers.Pdf&version=0.2.0
FieldCure.DocumentParsers.Pdf
PDF text extraction and page image rendering — an extension package for FieldCure.DocumentParsers.
Features
- Text extraction — Page-by-page text extraction via PdfPig with
## Page {n}headers - Page rendering — Each page rendered as PNG via PDFtoImage (PDFium)
- IMediaDocumentParser — Implements both
ExtractTextandExtractImagesfor PDF - Factory integration — One-line registration with
DocumentParserFactory
Install
dotnet add package FieldCure.DocumentParsers.Pdf
Quick Start
using FieldCure.DocumentParsers;
using FieldCure.DocumentParsers.Pdf;
// Register PDF support (call once at startup)
DocumentParserFactoryExtensions.AddPdfSupport();
// Text extraction
var parser = DocumentParserFactory.GetParser(".pdf");
var text = parser!.ExtractText(File.ReadAllBytes("document.pdf"));
// Page image rendering
var mediaParser = (IMediaDocumentParser)parser;
var images = mediaParser.ExtractImages(File.ReadAllBytes("document.pdf"), dpi: 150);
foreach (var img in images)
File.WriteAllBytes($"{img.Label}.png", img.Data);
Dependencies
- FieldCure.DocumentParsers —
IDocumentParserinterface - PdfPig — PDF text extraction
- PDFtoImage — PDF page rendering (PDFium native)
Related Packages
- FieldCure.DocumentParsers — DOCX, HWPX, XLSX, PPTX text extraction
- FieldCure.AssistStudio.Core — AI provider client library
License
MIT — Copyright (c) 2026 FieldCure Co., Ltd.
| Product | Versions Compatible and additional computed target framework versions. |
|---|---|
| .NET | net8.0 is compatible. net8.0-android was computed. net8.0-browser was computed. net8.0-ios was computed. net8.0-maccatalyst was computed. net8.0-macos was computed. net8.0-tvos was computed. net8.0-windows was computed. net9.0 was computed. net9.0-android was computed. net9.0-browser was computed. net9.0-ios was computed. net9.0-maccatalyst was computed. net9.0-macos was computed. net9.0-tvos was computed. net9.0-windows was computed. net10.0 was computed. net10.0-android was computed. net10.0-browser was computed. net10.0-ios was computed. net10.0-maccatalyst was computed. net10.0-macos was computed. net10.0-tvos was computed. net10.0-windows was computed. |
-
net8.0
- FieldCure.DocumentParsers (>= 0.3.0)
- PdfPig (>= 0.1.14)
- PDFtoImage (>= 5.2.0)
NuGet packages (1)
Showing the top 1 NuGet packages that depend on FieldCure.DocumentParsers.Pdf:
| Package | Downloads |
|---|---|
|
FieldCure.DocumentParsers.Pdf.Ocr
Tesseract OCR fallback for scanned PDFs in FieldCure.DocumentParsers.Pdf |
GitHub repositories
This package is not used by any popular GitHub repositories.
# Release Notes — FieldCure.DocumentParsers.Pdf
## [0.2.0] - 2026-03-27
### Added
- Unit test project (`DocumentParsers.Pdf.Tests`) with 11 tests covering text extraction, page headers, content ordering, and image rendering
- Higher DPI rendering test (`ExtractImages_HigherDpiProducesLargerImage`)
### Changed
- Migrated to independent repository (fieldcure/fieldcure-document-parsers)
- `RepositoryUrl` updated to `https://github.com/fieldcure/fieldcure-document-parsers`
## [0.1.0] - 2026-03-25
### Added
- `PdfParser` — PDF text extraction via PdfPig with `## Page {n}` page headers
- `PdfParser.ExtractImages` — PDF page rendering to PNG via PDFtoImage (PDFium)
- `IMediaDocumentParser` support for combined text + image extraction
- `DocumentParserFactoryExtensions.AddPdfSupport()` — one-line factory registration
- Extracted from AssistStudio.Core to enable independent package consumption