GroupDocs.Parser 22.2.0

There is a newer version of this package available.
See the version list below for details.
dotnet add package GroupDocs.Parser --version 22.2.0                
NuGet\Install-Package GroupDocs.Parser -Version 22.2.0                
This command is intended to be used within the Package Manager Console in Visual Studio, as it uses the NuGet module's version of Install-Package.
<PackageReference Include="GroupDocs.Parser" Version="22.2.0" />                
For projects that support PackageReference, copy this XML node into the project file to reference the package.
paket add GroupDocs.Parser --version 22.2.0                
#r "nuget: GroupDocs.Parser, 22.2.0"                
#r directive can be used in F# Interactive and Polyglot Notebooks. Copy this into the interactive tool or source code of the script to reference the package.
// Install GroupDocs.Parser as a Cake Addin
#addin nuget:?package=GroupDocs.Parser&version=22.2.0

// Install GroupDocs.Parser as a Cake Tool
#tool nuget:?package=GroupDocs.Parser&version=22.2.0                

Image Detection & Extraction Parser .NET API

Version 22.2.0 Nuget

banner

Product Page | Docs | Demos | API Reference | Examples | Blog | Search | Free Support | Temporary License

This text parser on-premise API works well to search & extract formatted text as well as the raw text from a variety of documents of supported file formats.

Document Parser Processing Features

  • Parse documents by user-defined templates.
  • Extract plain and structured text.
  • Extract text areas with coordinates, text styles, and other information.
  • Search text by a keyword or regular expression; extract text around that word.
  • Extract HTML or Markdown (MD) formatted text for a fast preview.
  • Increase performance by extracting raw text.
  • Extract formatted text, metadata, images, containers, and attachments.
  • Extract table of contents for some supported document formats.
  • Parse form data from PDF documents.

What's new in v22.2.0

Extract Barcodes from Images

Introduced barcode extraction feature for image formats.

The following C# code snippet shows how to get barcodes from an image via API:

// Create an instance of Parser class
using (Parser parser = new Parser(Constants.SampleImageWithBarcodes))
{
    // Check if the file supports barcodes extraction
    if (!parser.Features.Barcodes)
    {
        Console.WriteLine("File doesn't support barcodes extraction.");
        return;
    }

    // Extract barcodes from the image.
    IEnumerable<PageBarcodeArea> barcodes = parser.GetBarcodes();

    // Iterate over barcodes
    foreach (PageBarcodeArea barcode in barcodes)
    {
        // Print the page index
        Console.WriteLine("Page: " + barcode.Page.Index.ToString());
        // Print the barcode value
        Console.WriteLine("Value: " + barcode.Value);
    }
}

Parse Barcodes from Documents

Ability to define fields of barcode type (TemplateBarcode) in templates.

// Define a barcode field
TemplateBarcode barcode = new TemplateBarcode(
    new Rectangle(new Point(590, 80), new Size(150, 150)),
    "QR");

// Create a template
Template template = new Template(new TemplateItem[] { barcode });

// Create an instance of Parser class
using (Parser parser = new Parser(Constants.SamplePdfWithBarcodes))
{
    // Parse the document by the template
    DocumentData data = parser.ParseByTemplate(template);

    // Print all extracted data
    for (int i = 0; i < data.Count; i++)
    {
        Console.Write(data[i].Name + ": ");
        PageBarcodeArea area = data[i].PageArea as PageBarcodeArea;
        Console.WriteLine(area == null ? "Not a template barcode field" : area.Value);
    }
}

Image Detection by Content

Image detection and extraction ability allows you to extract images from compressed ZIP archives or email attachments.

// Create an instance of Parser class
using (Parser parser = new Parser(filePath))
{
    // Extract images
    IEnumerable<PageImageArea> images = parser.GetImages();
    // Check if images extraction is supported
    if (images == null)
    {
        Console.WriteLine("Images extraction isn't supported");
        return;
    }
    // Iterate over images
    foreach (PageImageArea image in images)
    {
        // Print a page index, rectangle and image type:
        Console.WriteLine(string.Format("Page: {0}, R: {1}, Type: {2}", image.Page.Index, image.Rectangle, image.FileType));
    }
}

Support for .NET 6

Support for .NET 6 has been introduced.

Get Info of Password Protected File

Auto-detect the file type of a password-protected OOXML document.

// Get a file info
Options.FileInfo info = Parser.GetFileInfo(filePath, new LoadOptions("password"));
// Check IsEncrypted property
Console.WriteLine(info.IsEncrypted ? "Password is required" : "");
// Print the file type
Console.WriteLine(info.FileType.ToString());

For a complete list of features, enhancements, and bug fixes in this release please visit, GroupDocs.Parser for .NET 22.2 Release Notes.

Parse Document by Template

Word Processing: DOC, DOT, DOCX, DOCM, DOTX, DOTM, ODT, OTT, RTF, TXT
Spreadsheet: XLS, XLT, XLSX, XLSM, XLSB, XLTX, XLTM, ODS, OTS, XLA, XLAM, NUMBERS
Presentation: PPS, POT, PPTX, PPTM, POTX, POTM, PPSX, PPSM, ODP, OTP
Portable: PDF

Extract Text (Accurate)

Word Processing: DOC, DOT, DOCX, DOCM, DOTX, DOTM, ODT, OTT, RTF
Spreadsheet: XLS, XLT, XLSX, XLSM, XLSB, XLTX, XLTM, ODS, OTS, CSV, XLA, XLAM, NUMBERS
Presentation: PPS, POT, PPTX, PPTM, POTX, POTM, PPSX, PPSM, ODP, OTP
Email: EML, EMLX, MSG
Markup: XHTML, MHTML, MD, XML
eBook: CHM, EPUB, FB2
Portable: PDF
OneNote: ONE
Databases: Databases are supported via ADO.NET. To work with the corresponding database format install its database provider.

Extract Text (Raw)

Spreadsheet: XLS, XLT, XLSX, XLSM, XLTX, XLTM, XLA, XLAM
Presentation: PPT, PPS, POT, PPTX, PPTM, POTX, POTM, PPSX, PPSM
Portable: PDF

Extract Structured Text and Formatted Text

Word Processing: DOC, DOT, DOCX, DOCM, DOTX, DOTM, ODT, OTT, RTF
Spreadsheet: XLS, XLT, XLSX, XLSM, XLTX, XLTM, XLA, XLAM
Presentation: PPT, PPS, POT, PPTX, PPTM, POTX, POTM, PPSX, PPSM, ODP, OTP
Email: EML, EMLX, MSG
Markup: MD (Formatted Text is Not supported)
eBook: CHM, EPUB, FB2

Please visit the Supported Document Formats for more details.

Platform Independence

GroupDocs.Parser for .NET does not require any external software or third-party tool to be installed. GroupDocs.Parser for .NET supports any 32-bit or 64-bit operating system where .NET or Mono framework is installed. The other details are as follows:

Microsoft Windows: Microsoft Windows Desktop (x86, x64) (XP & up), Microsoft Windows Server (x86, x64) (2000 & up), Windows Azure
Mac OS: Mac OS X
Linux: Linux (Ubuntu, OpenSUSE, CentOS and others)
Development Environments: Microsoft Visual Studio (2010 & up), Xamarin.Android, Xamarin.IOS, Xamarin.Mac, MonoDevelop 2.4 and later.
Supported Frameworks: GroupDocs.Conversion for .NET supports .NET and Mono frameworks.

Get Started

Are you ready to give GroupDocs.Parser for .NET a try? Simply execute Install-Package GroupDocs.Parser from Package Manager Console in Visual Studio to fetch & reference GroupDocs.Parser assembly in your project. If you already have GroupDocs.Parser for .Net and want to upgrade it, please execute Update-Package GroupDocs.Parser to get the latest version.

Please check the GitHub Repository for other common usage scenarios.

Extract all Images and Save them in PNG Format via C# Code

// create an instance of Parser class
using(Parser parser = new Parser(Constants.SampleZip)) {
    // extract images from document
    IEnumerable < PageImageArea > images = parser.GetImages();
    // check if images extraction is supported
    if (images == null) {
        Console.WriteLine("Page images extraction isn't supported");
        return;
    }
    // create the options to save images in PNG format
    ImageOptions options = new ImageOptions(ImageFormat.Png);
    int imageNumber = 0;
    // iterate over images
    foreach(PageImageArea image in images) {
        // save the image to the png file
        image.Save(imageNumber.ToString() + ".png", options);
        imageNumber++;
    }
}

Product Page | Docs | Demos | API Reference | Examples | Blog | Search | Free Support | Temporary License

Product Compatible and additional computed target framework versions.
.NET net5.0 was computed.  net5.0-windows was computed.  net6.0 was computed.  net6.0-android was computed.  net6.0-ios was computed.  net6.0-maccatalyst was computed.  net6.0-macos was computed.  net6.0-tvos was computed.  net6.0-windows was computed.  net7.0 was computed.  net7.0-android was computed.  net7.0-ios was computed.  net7.0-maccatalyst was computed.  net7.0-macos was computed.  net7.0-tvos was computed.  net7.0-windows was computed.  net8.0 was computed.  net8.0-android was computed.  net8.0-browser was computed.  net8.0-ios was computed.  net8.0-maccatalyst was computed.  net8.0-macos was computed.  net8.0-tvos was computed.  net8.0-windows was computed. 
.NET Core netcoreapp2.0 was computed.  netcoreapp2.1 was computed.  netcoreapp2.2 was computed.  netcoreapp3.0 was computed.  netcoreapp3.1 was computed. 
.NET Standard netstandard2.0 is compatible.  netstandard2.1 was computed. 
.NET Framework net20 is compatible.  net35 was computed.  net40 was computed.  net403 was computed.  net45 was computed.  net451 was computed.  net452 was computed.  net46 was computed.  net461 was computed.  net462 was computed.  net463 was computed.  net47 was computed.  net471 was computed.  net472 was computed.  net48 was computed.  net481 was computed. 
MonoAndroid monoandroid was computed. 
MonoMac monomac was computed. 
MonoTouch monotouch was computed. 
Tizen tizen40 was computed.  tizen60 was computed. 
Xamarin.iOS xamarinios was computed. 
Xamarin.Mac xamarinmac was computed. 
Xamarin.TVOS xamarintvos was computed. 
Xamarin.WatchOS xamarinwatchos was computed. 
Compatible target framework(s)
Included target framework(s) (in package)
Learn more about Target Frameworks and .NET Standard.

NuGet packages

This package is not used by any NuGet packages.

GitHub repositories

This package is not used by any popular GitHub repositories.

Version Downloads Last updated
24.11.0 500 11/29/2024
24.10.0 2,438 11/1/2024
24.9.0 2,417 9/30/2024
24.8.0 41,248 8/30/2024
24.7.0 1,563 7/24/2024
24.6.0 2,837 6/29/2024
24.5.0 6,086 5/31/2024
24.4.0 6,540 4/23/2024
24.2.1 7,740 3/13/2024
24.2.0 1,313 2/29/2024
23.12.0 134,721 12/23/2023
23.11.0 37,457 11/24/2023
23.10.0 13,946 10/21/2023
23.8.0 65,643 8/18/2023
23.5.0 85,893 5/31/2023
23.3.0 16,152 3/31/2023
23.2.0 22,871 3/1/2023
22.11.1 26,754 1/17/2023
22.11.0 38,899 11/29/2022
22.8.0 74,931 8/12/2022
22.6.0 31,447 6/7/2022
22.2.0 37,633 2/25/2022
21.5.0 63,591 5/31/2021
21.2.0 51,183 2/22/2021
20.12.0 24,522 12/30/2020
20.10.0 171,440 10/27/2020
20.8.0 49,255 8/19/2020
20.6.1 47,577 6/30/2020
20.6.0 20,164 6/19/2020
20.5.0 35,328 5/8/2020
20.3.0 48,583 3/19/2020
20.1.0 35,856 1/31/2020
19.12.0 33,538 12/27/2019
19.11.0 28,459 11/22/2019
19.9.0 2,810 9/27/2019
19.5.0 3,040 5/29/2019
18.12.0 3,215 12/11/2018
18.11.0 2,702 11/8/2018
18.10.0 2,793 10/10/2018
18.9.0 2,773 9/5/2018
18.8.0 2,842 8/7/2018
18.7.0 2,792 7/3/2018
18.5.0 3,021 5/23/2018