GroupDocs.Parser-Cloud
23.7.0
dotnet add package GroupDocs.Parser-Cloud --version 23.7.0
NuGet\Install-Package GroupDocs.Parser-Cloud -Version 23.7.0
<PackageReference Include="GroupDocs.Parser-Cloud" Version="23.7.0" />
paket add GroupDocs.Parser-Cloud --version 23.7.0
#r "nuget: GroupDocs.Parser-Cloud, 23.7.0"
// Install GroupDocs.Parser-Cloud as a Cake Addin #addin nuget:?package=GroupDocs.Parser-Cloud&version=23.7.0 // Install GroupDocs.Parser-Cloud as a Cake Tool #tool nuget:?package=GroupDocs.Parser-Cloud&version=23.7.0
Document Parsing & Data Extraction API for .NET Cloud
GroupDocs.Parser Cloud is a robust REST API designed to streamline document parsing and data extraction in your cloud-based .NET applications. Whether you need to extract text, images, metadata, or structured data using custom templates, this API offers high accuracy, fast processing, and scalability for enterprise-level operations. It supports a wide range of document formats, integrates seamlessly with multiple programming languages, and ensures secure API access with JWT authentication. With features like batch processing, Docker support, and comprehensive SDKs, GroupDocs.Parser Cloud is ideal for any document processing or data extraction task.
General Features
Text Extraction
Extract text from a wide range of document formats.
Document Info Extraction
Extract metadata and other document information such as title, author, and subject.
Image Extraction
Extract images embedded within documents.
Container Items Info Extraction
Extract information from container file formats like ZIP, PST, and OST.
Parse by Template
Parse documents by using custom templates for structured data extraction.
Document Processing Features
Metadata Extraction
Extracts metadata such as author, creation date, etc., from supported file formats.
Template-Based Parsing
Define templates for structured data extraction, ideal for processing forms, invoices, and other structured documents.
Batch Processing
Process multiple documents in a single request, making it efficient for large-scale operations.
Integration Features
RESTful API
Access the parser features via a REST API for easy integration into any platform.
SDK Availability
SDKs available for multiple programming languages including .NET, Java, Python, PHP, and more.
Platform Agnostic
Can be used across various platforms such as Windows, macOS, and Linux.
Security and Authentication
JWT Authentication
Ensures secure API access through JSON Web Token (JWT) authentication.
Client ID and Secret
Use Client ID and Secret for making secure API calls.
Data Encryption
Supports secure and encrypted communication between the client and the API.
Performance Features
High Accuracy
Provides accurate text extraction using advanced algorithms.
Fast Processing
Optimized for quick data extraction, suitable for high-performance applications.
Scalability
Can handle large volumes of documents efficiently, supporting enterprise-level operations.
Usability Features
Comprehensive Documentation
Extensive documentation and code samples available to help developers get started quickly.
API Explorer
Built-in API explorer for testing and exploring the API functionalities directly in the browser.
Multi-Platform Support
Compatible with various operating systems including Windows, Linux, and macOS.
Deployment and Hosting
Docker Support
Can be deployed in a Docker container for private cloud or on-premises hosting.
Self-Hosting
Allows running the API on your infrastructure with full control over the environment.
Automatic Scaling
Automatically scales to meet varying workloads, ensuring high availability.
Supported Document Formats
The following table indicates the file formats from which GroupDocs.Parser Cloud can extract data.
Document Type | File Format | Parse Document by Template | Extract Text | Extract Document Info | Extract Images | Extract Container Items Info |
---|---|---|---|---|---|---|
Word Processing | DOC - Microsoft Word Document | ✔ | ✔ | ✔ | ✔ | |
DOT - Microsoft Word Document Template | ✔ | ✔ | ✔ | ✔ | ||
DOCX - Office Open XML Document | ✔ | ✔ | ✔ | ✔ | ||
DOCM - Office Open XML Macro-Enabled Document | ✔ | ✔ | ✔ | ✔ | ||
DOTX - Office Open XML Document Template | ✔ | ✔ | ✔ | ✔ | ||
DOTM - Office Open XML Document Macro-Enabled Template | ✔ | ✔ | ✔ | ✔ | ||
TXT - Plain Text | ✔ | ✔ | ||||
ODT - Open Document Text | ✔ | ✔ | ✔ | ✔ | ||
OTT - Open Document Text Template | ✔ | ✔ | ✔ | ✔ | ||
RTF - Rich Text Format | ✔ | ✔ | ✔ | ✔ | ||
PDF - Portable Document Format File | ✔ | ✔ | ✔ | ✔ | ||
Markup | HTML - Hypertext Markup Language File | ✔ | ✔ | |||
XHTML - Extensible Hypertext Markup Language File | ✔ | ✔ | ||||
MHTML - MIME HTML File | ✔ | ✔ | ||||
MD - Markdown | ✔ | ✔ | ||||
XML - XML File | ✔ | ✔ | ||||
Ebooks | CHM - Compiled HTML Help File | ✔ | ✔ | |||
EPUB - Digital E-Book File Format | ✔ | ✔ | ||||
FB2 - FictionBook 2.0 File | ✔ | ✔ | ||||
Spreadsheet | XLS - Microsoft Excel Spreadsheet | ✔ | ✔ | ✔ | ✔ | |
XLT - Microsoft Excel Template | ✔ | ✔ | ✔ | ✔ | ||
XLSX - Office Open XML Spreadsheet | ✔ | ✔ | ✔ | ✔ | ||
XLSM - Office Open XML Macro-Enabled Spreadsheet | ✔ | ✔ | ✔ | ✔ | ||
XLSB - Office Open XML Binary Spreadsheet | ✔ | ✔ | ✔ | ✔ | ||
XLTX - Office Open XML Spreadsheet Template | ✔ | ✔ | ✔ | ✔ | ||
XLTM - Office Open XML Macro-Enabled Spreadsheet Template | ✔ | ✔ | ✔ | ✔ | ||
ODS - Open Document Spreadsheet | ✔ | ✔ | ✔ | ✔ | ||
OTS - Open Document Spreadsheet Template | ✔ | ✔ | ✔ | ✔ | ||
CSV - Comma Separated Values | ✔ | ✔ | ||||
XLA - Excel Add-In File | ✔ | ✔ | ✔ | ✔ | ||
XLAM - Excel Open XML Macro-Enabled Add-In | ✔ | ✔ | ✔ | ✔ | ||
NUMBERS - Apple iWork Numbers | ✔ | ✔ | ✔ | ✔ | ||
Presentations | PPT - PowerPoint Presentation | ✔ | ✔ | ✔ | ✔ | |
PPS - PowerPoint Slideshow | ✔ | ✔ | ✔ | ✔ | ||
POT - PowerPoint Template | ✔ | ✔ | ✔ | ✔ | ||
PPTX - Office Open XML Presentation | ✔ | ✔ | ✔ | ✔ | ||
PPTM - Office Open XML Macro-Enabled Presentation | ✔ | ✔ | ✔ | ✔ | ||
POTX - Office Open XML Presentation Template | ✔ | ✔ | ✔ | ✔ | ||
POTM - Office Open XML Macro-Enabled Presentation Template | ✔ | ✔ | ✔ | ✔ | ||
PPSX - Office Open XML Presentation Slideshow | ✔ | ✔ | ✔ | ✔ | ||
PPSM - Office Open XML Macro-Enabled Presentation Slideshow | ✔ | ✔ | ✔ | ✔ | ||
ODP - Open Document Presentation | ✔ | ✔ | ✔ | ✔ | ||
OTP - Open Document Presentation Template | ✔ | ✔ | ✔ | ✔ | ||
Emails | PST - Outlook Personal Information Store File | ✔ | ✔ | |||
OST - Outlook Offline Data File | ✔ | ✔ | ||||
EML - E-Mail Message | ✔ | ✔ | ✔ | |||
EMLX - Apple Mail Message | ✔ | ✔ | ✔ | |||
MSG - Outlook Mail Message | ✔ | ✔ | ✔ | |||
Notes | ONE - OneNote Document | ✔ | ✔ | |||
Archives | ZIP - Zipped File | ✔ | ✔ |
Get Started
You do not need to install anything to get started with GroupDocs.Parser Cloud SDK for .Net. Just create an account at GroupDocs for Cloud and get your application information.
Simply execute Install-Package GroupDocs.Parser-Cloud
from Package Manager Console in Visual Studio to fetch & reference GroupDocs.Parser assembly in your project. If you already have GroupDocs.Parser Cloud SDK for .Net and want to upgrade it, please execute Update-Package GroupDocs.Parser-Cloud
to get the latest version.
Please check the GitHub Repository for common usage scenarios.
GroupDocs.Parser Cloud API Code Samples
These code samples demonstrate various parsing capabilities of GroupDocs.Parser Cloud, including extracting text, extracting images, and parsing documents by template.
Extracting Text from a Document
Learn how to extract text from a document using the GroupDocs.Parser Cloud API. This example demonstrates the text extraction process in C#.
using System;
using GroupDocs.Parser.Cloud.Sdk.Api;
using GroupDocs.Parser.Cloud.Sdk.Model.Requests;
namespace GroupDocs.Parser.Cloud.Sdk.Examples
{
class Extract_Text_From_Document
{
public static void Run()
{
// Get your AppSID and AppKey from https://dashboard.groupdocs.cloud/ (free registration required)
var configuration = new Configuration
{
AppSid = "YOUR_APP_SID",
AppKey = "YOUR_APP_KEY"
};
// Initialize the Parser API instance
var apiInstance = new ParserApi(configuration);
try
{
// Define the document to parse
var fileInfo = new FileInfo { Folder = "path/to/folder", Name = "document.docx" };
// Create a text extraction request
var request = new ExtractTextRequest(fileInfo);
// Extract text from the document
var response = apiInstance.ExtractText(request);
// Output the extracted text to the console
Console.WriteLine("Extracted Text: " + response.Text);
}
catch (Exception e)
{
// Handle any exceptions that occur during the API call
Console.WriteLine("Exception when calling ParserApi.ExtractText: " + e.Message);
}
}
}
}
Extracting Images from a Document
Learn how to extract images embedded within a document using the GroupDocs.Parser Cloud API. This example illustrates the process in C#.
using System;
using GroupDocs.Parser.Cloud.Sdk.Api;
using GroupDocs.Parser.Cloud.Sdk.Model.Requests;
namespace GroupDocs.Parser.Cloud.Sdk.Examples
{
class Extract_Images_From_Document
{
public static void Run()
{
// Get your AppSID and AppKey from https://dashboard.groupdocs.cloud/ (free registration required)
var configuration = new Configuration
{
AppSid = "YOUR_APP_SID",
AppKey = "YOUR_APP_KEY"
};
// Initialize the Parser API instance
var apiInstance = new ParserApi(configuration);
try
{
// Define the document to parse
var fileInfo = new FileInfo { Folder = "path/to/folder", Name = "document.pdf" };
// Create an image extraction request
var request = new ExtractImagesRequest(fileInfo);
// Extract images from the document
var response = apiInstance.ExtractImages(request);
// Loop through and output each extracted image's info
foreach (var image in response.Images)
{
Console.WriteLine("Image Format: " + image.Format + ", Image Path: " + image.Path);
}
}
catch (Exception e)
{
// Handle any exceptions that occur during the API call
Console.WriteLine("Exception when calling ParserApi.ExtractImages: " + e.Message);
}
}
}
}
Parsing Document by Template
Learn how to parse a document by using a custom template for structured data extraction with the GroupDocs.Parser Cloud API. This example shows the template-based parsing in C#.
using System;
using GroupDocs.Parser.Cloud.Sdk.Api;
using GroupDocs.Parser.Cloud.Sdk.Model.Requests;
using GroupDocs.Parser.Cloud.Sdk.Model;
namespace GroupDocs.Parser.Cloud.Sdk.Examples
{
class Parse_Document_By_Template
{
public static void Run()
{
// Get your AppSID and AppKey from https://dashboard.groupdocs.cloud/ (free registration required)
var configuration = new Configuration
{
AppSid = "YOUR_APP_SID",
AppKey = "YOUR_APP_KEY"
};
// Initialize the Parser API instance
var apiInstance = new ParserApi(configuration);
try
{
// Define the document and template file
var fileInfo = new FileInfo { Folder = "path/to/folder", Name = "invoice.pdf" };
var templatePath = "path/to/template.json";
// Create a template-based parsing request
var request = new ParseRequest(fileInfo, templatePath);
// Parse the document using the template
var response = apiInstance.Parse(request);
// Output the parsed data to the console
foreach (var field in response.Fields)
{
Console.WriteLine("Field Name: " + field.Name + ", Field Value: " + field.Value);
}
}
catch (Exception e)
{
// Handle any exceptions that occur during the API call
Console.WriteLine("Exception when calling ParserApi.Parse: " + e.Message);
}
}
}
}
Tags
Document Data Extraction
| REST API
| GroupDocs.Parser
| Text Extraction
| Image Extraction
| Template Parsing
| Markdown Extraction
| HTML Extraction
| Container Files
| Data Parsing
| Document Information
| File Management
| Cloud Storage
| SDKs
| Cross Platform
| Storage API
| File Operations
| Folder Operations
| Security and Authentication
| Document Parsing
| API Integration
| Data Extraction
| ZIP Files
| PDF
| PST/OST Files
| Extract Images
| Document Processing
| Data Extraction API
| GroupDocs SDK
| API Explorer
| Metadata Extraction
Product | Versions Compatible and additional computed target framework versions. |
---|---|
.NET | net5.0 was computed. net5.0-windows was computed. net6.0 was computed. net6.0-android was computed. net6.0-ios was computed. net6.0-maccatalyst was computed. net6.0-macos was computed. net6.0-tvos was computed. net6.0-windows was computed. net7.0 was computed. net7.0-android was computed. net7.0-ios was computed. net7.0-maccatalyst was computed. net7.0-macos was computed. net7.0-tvos was computed. net7.0-windows was computed. net8.0 was computed. net8.0-android was computed. net8.0-browser was computed. net8.0-ios was computed. net8.0-maccatalyst was computed. net8.0-macos was computed. net8.0-tvos was computed. net8.0-windows was computed. |
.NET Core | netcoreapp2.0 was computed. netcoreapp2.1 was computed. netcoreapp2.2 was computed. netcoreapp3.0 was computed. netcoreapp3.1 was computed. |
.NET Standard | netstandard2.0 is compatible. netstandard2.1 was computed. |
.NET Framework | net20 is compatible. net35 was computed. net40 was computed. net403 was computed. net45 was computed. net451 was computed. net452 was computed. net46 was computed. net461 was computed. net462 was computed. net463 was computed. net47 was computed. net471 was computed. net472 was computed. net48 was computed. net481 was computed. |
MonoAndroid | monoandroid was computed. |
MonoMac | monomac was computed. |
MonoTouch | monotouch was computed. |
Tizen | tizen40 was computed. tizen60 was computed. |
Xamarin.iOS | xamarinios was computed. |
Xamarin.Mac | xamarinmac was computed. |
Xamarin.TVOS | xamarintvos was computed. |
Xamarin.WatchOS | xamarinwatchos was computed. |
-
.NETFramework 2.0
- Newtonsoft.Json (>= 9.0.1)
-
.NETStandard 2.0
- Newtonsoft.Json (>= 9.0.1)
- System.Diagnostics.TraceSource (>= 4.3.0)
- System.Net.Requests (>= 4.3.0)
NuGet packages
This package is not used by any NuGet packages.
GitHub repositories
This package is not used by any popular GitHub repositories.