GroupDocs.Parser.NETFramework
24.10.0
dotnet add package GroupDocs.Parser.NETFramework --version 24.10.0
NuGet\Install-Package GroupDocs.Parser.NETFramework -Version 24.10.0
<PackageReference Include="GroupDocs.Parser.NETFramework" Version="24.10.0" />
paket add GroupDocs.Parser.NETFramework --version 24.10.0
#r "nuget: GroupDocs.Parser.NETFramework, 24.10.0"
// Install GroupDocs.Parser.NETFramework as a Cake Addin #addin nuget:?package=GroupDocs.Parser.NETFramework&version=24.10.0 // Install GroupDocs.Parser.NETFramework as a Cake Tool #tool nuget:?package=GroupDocs.Parser.NETFramework&version=24.10.0
Advanced Document Parsing API for .NET
Important Note: Starting from 24.2.0, the GroupDocs.Parser package has been split into two distinct platform packages: .NET Standard and .NET Framework. The GroupDocs.Parser package is specifically designed to support the .NET Standard platform, making it compatible with .NET Core, .NET 5, .NET 6, etc. It includes backward compatibility improvements, allowing it to function with .NET Framework versions starting from 4.6.2. In addition, we have introduced the GroupDocs.Parser.NETFramework package, which is optimized to run seamlessly in the .NET Framework runtime because it includes all the GroupDocs product libraries in their respective .NET Framework versions. It is tailored specifically for .NET Framework users and offers better dependency resolution for those utilizing the .NET Framework. We hope these changes will enhance your experience and provide a more streamlined approach to using the GroupDocs.Parser package. If you have any further questions or concerns, please don't hesitate to reach out to our free support forum.
GroupDocs.Parser vs GroupDocs.Parser.NETFramework
The list of features for both versions, GroupDocs.Parser (for .NET Standard) and GroupDocs.Parser.NETFramework (for .NET Framework), is generally the same in terms of core functionality. Both versions offer advanced capabilities such as text extraction, metadata extraction, image extraction, and parsing of complex document structures across a wide range of document formats like PDFs, Word, Excel, and more.
However, the key differences lie in their compatibility and performance optimizations:
- GroupDocs.Parser: This version is tailored for .NET Standard and is compatible with .NET Core, .NET 5, .NET 6, and later versions. It is designed to be cross-platform, working across Windows, Linux, and macOS environments.
- GroupDocs.Parser.NETFramework: This version is optimized specifically for the .NET Framework environment. It includes all the necessary libraries specifically built for .NET Framework, ensuring better dependency management and performance within this environment. This version is ideal for developers working with legacy .NET Framework projects.
GroupDocs.Parser.NETFramework is a powerful and specialized .NET API designed for developers working exclusively within the .NET Framework environment. This package offers advanced text, metadata, and image extraction capabilities across a wide range of document formats including PDF, Word, Excel, PowerPoint, and more. Optimized for seamless integration into .NET Framework projects, GroupDocs.Parser.NETFramework provides robust parsing features such as template-based data extraction, OCR support, and the ability to handle complex document structures. It includes all necessary libraries for the .NET Framework, ensuring smooth operation without the need for additional configurations. Ideal for developers seeking high performance and precision in document parsing within legacy .NET Framework applications.
Text Extraction
- Extract text from various document formats (PDF, Microsoft Word, Excel, etc.).
- Extract text with its formatting retained, including font styles, sizes, and colors.
- Search for specific text within a document and extract it.
- Optical Character Recognition (OCR) support to extract text from images within documents.
Metadata Extraction
- Extract metadata from documents, including properties like author, title, subject, etc.
- Extract document properties such as creation date, modification date, and more.
- Extract field-specific data such as invoice numbers, dates, and other custom fields.
Image and Attachment Extraction
- Extract images embedded within documents.
- Extract file attachments from documents such as PDF and email files.
- Extract and recognize barcodes from documents.
Document Structure Analysis
- Analyze and extract information from structured documents like tables, lists, and paragraphs.
- Extract tables and their content from documents.
- Extract hyperlinks from documents.
- Extract bookmarks from documents like PDF files.
PDF-Specific Parsing
- Specialized parsing capabilities for PDF documents, including text extraction, image extraction, and metadata retrieval.
- Extract page count and other PDF-specific properties.
- Extract and manage PDF bookmarks.
Email Parsing
- Extract text, attachments, and metadata from email formats like EML, MSG, etc.
- Extract email-specific properties such as sender, receiver, subject, and email body.
Spreadsheet Parsing
- Extract text, metadata, and other data from Excel spreadsheets.
- Extract specific ranges, cells, or entire sheets from Excel documents.
Presentation Parsing
- Extract text, images, and metadata from PowerPoint presentations.
- Extract slide-specific content, including notes, shapes, and text.
Template-Based Data Extraction
- Define and use templates to extract data based on specific document structures.
- Template editor for creating and editing templates for structured data extraction.
- Custom parsing logic to implement specific content extraction rules based on custom templates.
Advanced Features
- Support for multiple file formats such as PDF, DOCX, XLSX, PPTX, RTF, TXT, and more.
- Cross-Platform Compatibility: Works across Windows, Linux, and macOS.
- Integration with .NET applications for seamless functionality.
- High performance optimized for handling large documents efficiently.
- Secure parsing that does not compromise the document's security and integrity.
- Scalability to handle large volumes of documents in batch processing.
Additional Features
- Retrieve the number of pages in a document.
- Extract data from forms and other interactive elements within documents.
- Support for content-aware parsing to detect and extract specific types of data.
Supported Document Formats
Word Processing
Document Type | Parse Document by Template | Extract Text (Accurate) | Extract Text (Raw) | Extract Structured Text and Formatted Text | Extract Text Areas | Extract Metadata | Extract Images | Extract Containers and Attachments | Parse Form Data | Extract Table of Contents | Scan Barcode |
---|---|---|---|---|---|---|---|---|---|---|---|
DOC - Microsoft Word Document | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ||
DOT - Microsoft Word Document Template | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ||
DOCX - Office Open XML Document | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ||
DOCM - Office Open XML Macro-Enabled Document | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ||
DOTX - Office Open XML Document Template | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ||
DOTM - Office Open XML Document Macro-Enabled Template | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ||
TXT - Plain text | ✔ | ||||||||||
ODT - Open Document Text | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | |||
OTT - Open Document Text Template | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | |||
RTF - Rich Text Format | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ |
Document Type | Parse Document by Template | Extract Text (Accurate) | Extract Text (Raw) | Extract Structured Text and Formatted Text | Extract Text Areas | Extract Metadata | Extract Images | Extract Containers and Attachments | Parse Form Data | Extract Table of Contents | Scan Barcode |
---|---|---|---|---|---|---|---|---|---|---|---|
PDF - Portable Document Format | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ |
Markup
Document Type | Parse Document by Template | Extract Text (Accurate) | Extract Text (Raw) | Extract Structured Text and Formatted Text | Extract Text Areas | Extract Metadata | Extract Images | Extract Containers and Attachments | Parse Form Data | Extract Table of Contents | Scan Barcode |
---|---|---|---|---|---|---|---|---|---|---|---|
XHTML - Extensible Hypertext Markup Language File | ✔ | ✔ | |||||||||
MHTML - MIME HTML File | ✔ | ✔ | |||||||||
MD - Markdown | ✔ | ✔ (Formatted Text is Not supported) | |||||||||
XML - XML File | ✔ |
Ebook
Document Type | Parse Document by Template | Extract Text (Accurate) | Extract Text (Raw) | Extract Structured Text and Formatted Text | Extract Text Areas | Extract Metadata | Extract Images | Extract Containers and Attachments | Parse Form Data | Extract Table of Contents | Scan Barcode |
---|---|---|---|---|---|---|---|---|---|---|---|
CHM - Compiled HTML Help File | ✔ | ✔ | ✔ | ✔ | ✔ | ||||||
EPUB - Digital E-Book File Format | ✔ | ✔ | ✔ | ✔ | ✔ | ||||||
FB2 - FictionBook 2.0 File | ✔ | ✔ | |||||||||
MOBI - Mobipocket | ✔ | ||||||||||
AZW3 - Kindle Format 8 | ✔ |
Spreadsheet
Document Type | Parse Document by Template | Extract Text (Accurate) | Extract Text (Raw) | Extract Structured Text and Formatted Text | Extract Text Areas | Extract Metadata | Extract Images | Extract Containers and Attachments | Parse Form Data | Extract Table of Contents | Scan Barcode |
---|---|---|---|---|---|---|---|---|---|---|---|
XLS - Microsoft Excel Spreadsheet | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | |||
XLT - Microsoft Excel Template | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | |||
XLSX - Office Open XML Spreadsheet | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | |||
XLSM - Office Open XML Macro-Enabled Spreadsheet | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | |||
XLSB - Office Open XML Binary Spreadsheet | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ||||
XLTX - Office Open XML Spreadsheet Template | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | |||
XLTM - Office Open XML Macro-Enabled Spreadsheet Template | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | |||
ODS - Open Document Spreadsheet | ✔ | ✔ | ✔ | ✔ | ✔ | ||||||
OTS - Open Document Spreadsheet Template | ✔ | ✔ | ✔ | ✔ | ✔ | ||||||
CSV - Comma Separated Values | ✔ | ||||||||||
XLA - Excel Add-In File | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ||||
XLAM - Excel Open XML Macro-Enabled Add-In | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ||||
NUMBERS - Apple iWork Numbers | ✔ | ✔ | ✔ | ✔ |
Presentation
Document Type | Parse Document by Template | Extract Text (Accurate) | Extract Text (Raw) | Extract Structured Text and Formatted Text | Extract Text Areas | Extract Metadata | Extract Images | Extract Containers and Attachments | Parse Form Data | Extract Table of Contents | Scan Barcode |
---|---|---|---|---|---|---|---|---|---|---|---|
PPT - PowerPoint Presentation | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ||
PPS - PowerPoint Slideshow | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ||
POT - PowerPoint Template | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ||
PPTX - Office Open XML Presentation | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ||
PPTM - Office Open XML Macro-Enabled Presentation | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ||
POTX - Office Open XML Presentation Template | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ||
POTM - Office Open XML Macro-Enabled Presentation Template | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ||
PPSX - Office Open XML Presentation Slideshow | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ||
PPSM - Office Open XML Macro-Enabled Presentation Slideshow | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ||
ODP - Open Document Presentation | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | |||||
OTP - Open Document Presentation Template | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ |
Document Type | Parse Document by Template | Extract Text (Accurate) | Extract Text (Raw) | Extract Structured Text and Formatted Text | Extract Text Areas | Extract Metadata | Extract Images | Extract Containers and Attachments | Parse Form Data | Extract Table of Contents | Scan Barcode |
---|---|---|---|---|---|---|---|---|---|---|---|
PST - Outlook Personal Information Store File | ✔ | ||||||||||
OST - Outlook Offline Data File | ✔ | ||||||||||
EML - E-Mail Message | ✔ | ✔ | ✔ | ✔ | ✔ | ||||||
EMLX - Apple Mail Message | ✔ | ✔ | ✔ | ✔ | ✔ | ||||||
MSG - Outlook Mail Message | ✔ | ✔ | ✔ | ✔ | ✔ |
Note
Document Type | Parse Document by Template | Extract Text (Accurate) | Extract Text (Raw) | Extract Structured Text and Formatted Text | Extract Text Areas | Extract Metadata | Extract Images | Extract Containers and Attachments | Parse Form Data | Extract Table of Contents | Scan Barcode |
---|---|---|---|---|---|---|---|---|---|---|---|
ONE - OneNote Document | ✔ |
Archive
Document Type | Parse Document by Template | Extract Text (Accurate) | Extract Text (Raw) | Extract Structured Text and Formatted Text | Extract Text Areas | Extract Metadata | Extract Images | Extract Containers and Attachments | Parse Form Data | Extract Table of Contents | Scan Barcode |
---|---|---|---|---|---|---|---|---|---|---|---|
7Z* - 7Z File | ✔ | ✔ | |||||||||
ZIP - Zipped File | ✔ | ✔ | |||||||||
RAR - Rar File | ✔ | ✔ | |||||||||
TAR - Tar File | ✔ | ✔ | |||||||||
GZ - GZip file | ✔ | ✔ | |||||||||
BZ2 - BZip2 File | ✔ | ✔ |
Note: Encrypted 7-zip archives are not supported.
Image*
Document Type | Parse Document by Template | Extract Text (Accurate) | Extract Text (Raw) | Extract Structured Text and Formatted Text | Extract Text Areas | Extract Metadata | Extract Images | Extract Containers and Attachments | Parse Form Data | Extract Table of Contents | Scan Barcode |
---|---|---|---|---|---|---|---|---|---|---|---|
BMP - Bitmap Image file | ✔ | ✔ | |||||||||
GIF - Graphical Interchange Format | ✔ | ||||||||||
JP2 - JPEG 2000 | ✔ | ||||||||||
JPG, JPEG - JPEG Image file | ✔ | ✔ | |||||||||
PNG - Portable Network Graphics | ✔ | ✔ | |||||||||
TIF, TIFF - Tagged Image File Format | ✔ | ✔ | |||||||||
DICOM - DICOM (Digital Imaging and Communications in Medicine) | ✔ | ||||||||||
DJVU - DjVu File Format | ✔ | ✔ | |||||||||
EMF - Enhanced metafile | ✔ | ||||||||||
J2K - JPEG 2000 | ✔ | ||||||||||
PS - PostScript File Format | ✔ | ||||||||||
PSD - Photoshop Document | ✔ | ||||||||||
SVG - Scalar Vector Graphics file | ✔ | ||||||||||
SVGZ - Scalar Vector Graphics file (with gzip compression) | ✔ | ||||||||||
WEBP - WebP Image File Format | ✔ | ||||||||||
WMF - Microsoft Windows Metafile | ✔ |
Database
Document Type | Parse Document by Template | Extract Text (Accurate) | Extract Text (Raw) | Extract Structured Text and Formatted Text | Extract Text Areas | Extract Metadata | Extract Images | Extract Containers and Attachments | Parse Form Data | Extract Table of Contents | Scan Barcode |
---|---|---|---|---|---|---|---|---|---|---|---|
ADO.NET | ✔ | ✔ |
Platform Independence
GroupDocs.Parser for .NET does not require any external software or third-party tool to be installed. GroupDocs.Parser for .NET supports any 32-bit or 64-bit operating system where .NET or Mono framework is installed. The other details are as follows:
Microsoft Windows: Microsoft Windows Desktop (x86, x64) (XP & up), Microsoft Windows Server (x86, x64) (2000 & up), Windows Azure
Mac OS: Mac OS X
Linux: Linux (Ubuntu, OpenSUSE, CentOS and others)
Development Environments: Microsoft Visual Studio (2010 & up), Xamarin.Android, Xamarin.IOS, Xamarin.Mac, MonoDevelop 2.4 and later.
Supported Frameworks: GroupDocs.Conversion for .NET supports .NET and Mono frameworks.
Get Started
Are you ready to give GroupDocs.Parser for .NET a try? Simply execute Install-Package GroupDocs.Parser
from Package Manager Console in Visual Studio to fetch & reference GroupDocs.Parser assembly in your project. If you already have GroupDocs.Parser for .Net and want to upgrade it, please execute Update-Package GroupDocs.Parser
to get the latest version.
Please check the GitHub Repository for other common usage scenarios.
How to Install GroupDocs.Parser for .NET
1. Install from NuGet
Option 1: Using Package Manager GUI
Open Visual Studio:
- Load your solution/project.
Access NuGet Package Manager:
- Go to
Tools -> NuGet Package Manager -> Manage NuGet Packages for Solution
. - Alternatively, right-click the solution or project in Solution Explorer and select
Manage NuGet Packages
.
- Go to
Search for GroupDocs.Parser:
- Navigate to the
Browse
tab. - Type “GroupDocs.Parser” in the search box.
- Navigate to the
Install the Package:
- Click the
Install
button to add the latest version of GroupDocs.Parser to your project.
- Click the
Option 2: Using Package Manager Console
Open Visual Studio:
- Load your solution/project.
Open Package Manager Console:
- Go to
Tools -> NuGet Package Manager -> Package Manager Console
.
- Go to
Install GroupDocs.Parser:
- Type the command
Install-Package GroupDocs.Parser
and press Enter.
- Type the command
Verify Installation:
- GroupDocs.Parser should now be referenced in your application.
2. Handling .NET Framework and .NET Standard
- Starting with version 24.2, GroupDocs.Parser is split into two packages: one for .NET Framework and one for .NET Standard.
- For .NET Framework projects:
- Ensure
AutoGenerateBindingRedirects
is enabled. - Add the following to your project file for unit tests:
- Ensure
<PropertyGroup>
<AutoGenerateBindingRedirects>true</AutoGenerateBindingRedirects>
<GenerateBindingRedirectsOutputType>true</GenerateBindingRedirectsOutputType>
</PropertyGroup>
3. Install from the Official GroupDocs Website
Download GroupDocs.Parser:
- Visit the official GroupDocs website and download the package.
Unpack or Install:
- Unzip the archive or run the MSI installer.
Add a Reference in Visual Studio:
- In Solution Explorer, right-click the
References
node of your project and selectAdd Reference
. - If you used the MSI installer, select GroupDocs.Parser from the
.NET
tab. Otherwise, browse to the location of theGroupDocs.Parser.dll
file.
- In Solution Explorer, right-click the
Confirm Reference:
- Ensure GroupDocs.Parser appears under the
References
node in your project.
- Ensure GroupDocs.Parser appears under the
4. Additional Considerations
.NET Standard 2.0 Version:
- This version has external references to several packages like
System.Drawing.Common
,System.Text.Encoding.CodePages
,SkiaSharp
, etc.
- This version has external references to several packages like
Linux Environment:
- Install the following packages for proper functionality:
libgdiplus
libc6-dev
ttf-mscorefonts-installer
(e.g.,sudo apt-get install ttf-mscorefonts-installer
)
- Also, ensure
SkiaSharp.NativeAssets.Linux.NoDependencies
is installed.
- Install the following packages for proper functionality:
GroupDocs.Parser for .NET Coding Samples
Code Sample 1: Extracting Text from a PDF Document
This code loads a PDF file (sample.pdf
) and extracts its text content using the GetText()
method. The extracted text is then displayed in the console.
using GroupDocs.Parser;
using GroupDocs.Parser.Options;
public class ExtractTextFromPdf
{
public static void Run()
{
// Load the PDF document
using (Parser parser = new Parser("sample.pdf"))
{
// Extract text from the document
string text = parser.GetText();
// Output the extracted text
Console.WriteLine(text);
}
}
}
Code Sample 2: Extracting Images from a Word Document
This code loads a Word document (sample.docx
) and extracts all images found within the document. Each image is saved as a separate PNG file.
using GroupDocs.Parser;
using GroupDocs.Parser.Data;
public class ExtractImagesFromWord
{
public static void Run()
{
// Load the Word document
using (Parser parser = new Parser("sample.docx"))
{
// Get images from the document
IEnumerable<PageImageArea> images = parser.GetImages();
// Save each image to a file
int imageNumber = 1;
foreach (PageImageArea image in images)
{
image.Save($"image{imageNumber++}.png");
}
}
}
}
Code Sample 3: Parsing Metadata from an Excel Spreadsheet
This code loads an Excel spreadsheet (sample.xlsx
) and extracts its metadata, such as author, title, and creation date. The metadata is then displayed in the console.
using GroupDocs.Parser;
using GroupDocs.Parser.Data;
public class ExtractMetadataFromExcel
{
public static void Run()
{
// Load the Excel spreadsheet
using (Parser parser = new Parser("sample.xlsx"))
{
// Get document's metadata
IEnumerable<MetadataItem> metadata = parser.GetMetadata();
// Output the metadata
foreach (var item in metadata)
{
Console.WriteLine($"{item.Name}: {item.Value}");
}
}
}
}
Tags
.NET Framework | Document Parsing | Metadata Extraction | Text Extraction | Image Extraction | PDF Parsing | Word Parsing | Excel Parsing | Email Parsing | Data Extraction | Structured Data Parsing | Barcode Extraction | Document Management | API Integration | Legacy Systems | Performance Optimization | Document Processing | Visual Studio | Windows | C# | Programming
Product | Versions Compatible and additional computed target framework versions. |
---|---|
.NET Framework | net462 is compatible. net463 was computed. net47 was computed. net471 was computed. net472 was computed. net48 was computed. net481 was computed. |
-
.NETFramework 4.6.2
- Aspose.Drawing.Common (>= 24.9.0)
- Microsoft.Extensions.Logging.Abstractions (>= 7.0.0)
- Microsoft.ML.OnnxRuntime (>= 1.18.1)
- System.Diagnostics.DiagnosticSource (>= 7.0.0)
- System.Drawing.Common (>= 5.0.3)
- System.Numerics.Vectors (>= 4.5.0)
- System.Security.Permissions (>= 4.5.0)
- System.Text.Encoding.CodePages (>= 8.0.0)
NuGet packages
This package is not used by any NuGet packages.
GitHub repositories
This package is not used by any popular GitHub repositories.