DocumentAtom.Sdk 1.0.0

dotnet add package DocumentAtom.Sdk --version 1.0.0
                    
NuGet\Install-Package DocumentAtom.Sdk -Version 1.0.0
                    
This command is intended to be used within the Package Manager Console in Visual Studio, as it uses the NuGet module's version of Install-Package.
<PackageReference Include="DocumentAtom.Sdk" Version="1.0.0" />
                    
For projects that support PackageReference, copy this XML node into the project file to reference the package.
<PackageVersion Include="DocumentAtom.Sdk" Version="1.0.0" />
                    
Directory.Packages.props
<PackageReference Include="DocumentAtom.Sdk" />
                    
Project file
For projects that support Central Package Management (CPM), copy this XML node into the solution Directory.Packages.props file to version the package.
paket add DocumentAtom.Sdk --version 1.0.0
                    
#r "nuget: DocumentAtom.Sdk, 1.0.0"
                    
#r directive can be used in F# Interactive and Polyglot Notebooks. Copy this into the interactive tool or source code of the script to reference the package.
#:package DocumentAtom.Sdk@1.0.0
                    
#:package directive can be used in C# file-based apps starting in .NET 10 preview 4. Copy this into a .cs file before any lines of code to reference the package.
#addin nuget:?package=DocumentAtom.Sdk&version=1.0.0
                    
Install as a Cake Addin
#tool nuget:?package=DocumentAtom.Sdk&version=1.0.0
                    
Install as a Cake Tool

<img src="https://raw.githubusercontent.com/jchristn/DocumentAtom/refs/heads/main/assets/icon.png" width="256" height="256">

DocumentAtom.Sdk

NuGet Version NuGet

DocumentAtom.Sdk is a C# SDK that provides a simple and efficient way to interact with DocumentAtom server instances. It enables document atomization, type detection, and health monitoring through a clean, async API.

IMPORTANT - DocumentAtom.Sdk assumes you have deployed the DocumentAtom REST server. If you are integrating a DocumentAtom library directly into your code, use of this SDK is not necessary.

Overview

DocumentAtom.Sdk allows you to:

  • Process Documents: Extract atoms from various document formats (PDF, Word, Excel, PowerPoint, HTML, Markdown, etc.)
  • Detect Document Types: Automatically identify document types based on content analysis
  • Monitor Health: Check server status and health
  • Async Operations: Full async/await support for all operations

Installation

Install the package via NuGet:

Install-Package DocumentAtom.Sdk

Or via .NET CLI:

dotnet add package DocumentAtom.Sdk

Quick Start

Basic Usage

using DocumentAtom.Sdk;
using DocumentAtom.Core.Atoms;

// Initialize the SDK
var sdk = new DocumentAtomSdk("http://localhost:8000");

// Check server health
bool isHealthy = await sdk.Health.IsHealthy();
Console.WriteLine($"Server is healthy: {isHealthy}");

// Process a document
byte[] documentData = await File.ReadAllBytesAsync("document.pdf");
List<Atom>? atoms = await sdk.Atom.ProcessPdf(documentData);

if (atoms != null)
{
    foreach (var atom in atoms)
    {
        Console.WriteLine($"Atom Type: {atom.Type}, Content: {atom.Text}");
    }
}

With Logging

var sdk = new DocumentAtomSdk("http://localhost:8000");
sdk.LogRequests = true;
sdk.LogResponses = true;
sdk.Logger = (severity, message) => Console.WriteLine($"[{severity}] {message}");

API Reference

DocumentAtomSdk Class

The main SDK class that provides access to all functionality.

Constructor
public DocumentAtomSdk(string endpoint)
  • endpoint: DocumentAtom server endpoint URL
Properties
  • Endpoint: Server endpoint URL
  • TimeoutMs: Request timeout in milliseconds (default: 300000)
  • LogRequests: Enable request logging
  • LogResponses: Enable response logging
  • Logger: Custom logger delegate
Main API Groups
  • Atom: Document processing methods
  • TypeDetection: Document type detection
  • Health: Server health and status

Document Processing (Atom Methods)

Process various document types and extract atoms:

Supported Document Types
  • CSV: ProcessCsv(byte[] data, bool extractOcr = false)
  • Excel: ProcessExcel(byte[] data, bool extractOcr = false)
  • HTML: ProcessHtml(byte[] data)
  • JSON: ProcessJson(byte[] data)
  • Markdown: ProcessMarkdown(byte[] data)
  • OCR: ProcessOcr(byte[] data)
  • PDF: ProcessPdf(byte[] data, bool extractOcr = false)
  • PNG: ProcessPng(byte[] data)
  • PowerPoint: ProcessPowerPoint(byte[] data, bool extractOcr = false)
  • RTF: ProcessRtf(byte[] data, bool extractOcr = false)
  • Text: ProcessText(byte[] data)
  • Word: ProcessWord(byte[] data, bool extractOcr = false)
  • XML: ProcessXml(byte[] data)
Example: Processing Multiple Document Types
// Process a PDF document
byte[] pdfData = await File.ReadAllBytesAsync("document.pdf");
List<Atom>? pdfAtoms = await sdk.Atom.ProcessPdf(pdfData, extractOcr: true);

// Process a Word document
byte[] wordData = await File.ReadAllBytesAsync("document.docx");
List<Atom>? wordAtoms = await sdk.Atom.ProcessWord(wordData, extractOcr: true);

// Process an Excel spreadsheet
byte[] excelData = await File.ReadAllBytesAsync("spreadsheet.xlsx");
List<Atom>? excelAtoms = await sdk.Atom.ProcessExcel(excelData);

Type Detection

Automatically detect document types:

// Detect document type
byte[] documentData = await File.ReadAllBytesAsync("unknown-document");
TypeResult? result = await sdk.TypeDetection.DetectType(documentData);

if (result != null)
{
    Console.WriteLine($"MIME Type: {result.MimeType}");
    Console.WriteLine($"Extension: {result.Extension}");
    Console.WriteLine($"Document Type: {result.Type}");
}

// With content type hint
TypeResult? resultWithHint = await sdk.TypeDetection.DetectType(
    documentData, 
    contentType: "application/pdf"
);

Health Monitoring

Check server health and status:

// Check if server is healthy
bool isHealthy = await sdk.Health.IsHealthy();

// Get server status
string? status = await sdk.Health.GetStatus();
Console.WriteLine($"Server status: {status}");

Atom Types

The SDK returns Atom objects that contain structured document content:

Atom Properties

  • GUID: Unique identifier
  • ParentGUID: Parent atom identifier (for hierarchical structures)
  • Type: Atom type (Text, Image, Binary, Table, List)
  • PageNumber: Page number where available
  • Position: Ordinal position
  • Length: Content length
  • Text: Text content
  • MD5Hash, SHA1Hash, SHA256Hash: Content hashes
  • Quarks: Sub-atomic particles (chunks)

Specialized Atom Types

  • TableAtom: Contains table data with rows and columns
  • ImageAtom: Contains image data and OCR text
  • BinaryAtom: Contains binary data
  • ListAtom: Contains ordered or unordered lists

Error Handling

The SDK handles errors gracefully and returns null for failed operations:

try
{
    List<Atom>? atoms = await sdk.Atom.ProcessPdf(documentData);
    
    if (atoms == null)
    {
        Console.WriteLine("Failed to process document or no atoms extracted");
    }
    else
    {
        Console.WriteLine($"Successfully extracted {atoms.Count} atoms");
    }
}
catch (Exception ex)
{
    Console.WriteLine($"Error: {ex.Message}");
}

Configuration

Timeout Configuration

var sdk = new DocumentAtomSdk("http://localhost:8000");
sdk.TimeoutMs = 600000; // 10 minutes

Logging Configuration

var sdk = new DocumentAtomSdk("http://localhost:8000");
sdk.LogRequests = true;
sdk.LogResponses = true;
sdk.Logger = (severity, message) => 
{
    // Custom logging implementation
    File.AppendAllText("sdk.log", $"[{DateTime.UtcNow}] [{severity}] {message}\n");
};

Server Requirements

The SDK requires a running DocumentAtom server instance. You can:

  1. Run locally: Start the DocumentAtom.Server project
  2. Use Docker: Pull and run the jchristn/documentatom Docker image
  3. Deploy: Deploy to your preferred hosting environment

Server Setup

# Using Docker
docker run -p 8000:8000 jchristn/documentatom

# Or run the DocumentAtom.Server project locally
# Default endpoint: http://localhost:8000

Examples

Complete Example: Document Processing Pipeline

using DocumentAtom.Sdk;
using DocumentAtom.Core.Atoms;

public class DocumentProcessor
{
    private readonly DocumentAtomSdk _sdk;
    
    public DocumentProcessor(string endpoint)
    {
        _sdk = new DocumentAtomSdk(endpoint);
        _sdk.Logger = (severity, message) => Console.WriteLine($"[{severity}] {message}");
    }
    
    public async Task ProcessDocumentAsync(string filePath)
    {
        try
        {
            // Check server health
            if (!await _sdk.Health.IsHealthy())
            {
                throw new Exception("DocumentAtom server is not healthy");
            }
            
            // Read document
            byte[] documentData = await File.ReadAllBytesAsync(filePath);
            
            // Detect document type
            TypeResult? typeResult = await _sdk.TypeDetection.DetectType(documentData);
            Console.WriteLine($"Detected type: {typeResult?.Type}");
            
            // Process based on type
            List<Atom>? atoms = typeResult?.Type switch
            {
                DocumentTypeEnum.Pdf => await _sdk.Atom.ProcessPdf(documentData, extractOcr: true),
                DocumentTypeEnum.Word => await _sdk.Atom.ProcessWord(documentData, extractOcr: true),
                DocumentTypeEnum.Excel => await _sdk.Atom.ProcessExcel(documentData),
                DocumentTypeEnum.PowerPoint => await _sdk.Atom.ProcessPowerPoint(documentData, extractOcr: true),
                DocumentTypeEnum.Html => await _sdk.Atom.ProcessHtml(documentData),
                DocumentTypeEnum.Markdown => await _sdk.Atom.ProcessMarkdown(documentData),
                DocumentTypeEnum.Json => await _sdk.Atom.ProcessJson(documentData),
                DocumentTypeEnum.Xml => await _sdk.Atom.ProcessXml(documentData),
                DocumentTypeEnum.Csv => await _sdk.Atom.ProcessCsv(documentData),
                DocumentTypeEnum.Text => await _sdk.Atom.ProcessText(documentData),
                DocumentTypeEnum.Rtf => await _sdk.Atom.ProcessRtf(documentData, extractOcr: true),
                DocumentTypeEnum.Png => await _sdk.Atom.ProcessPng(documentData),
                _ => throw new NotSupportedException($"Document type {typeResult?.Type} is not supported")
            };
            
            if (atoms != null)
            {
                Console.WriteLine($"Extracted {atoms.Count} atoms from {filePath}");
                
                // Process atoms
                foreach (var atom in atoms)
                {
                    ProcessAtom(atom);
                }
            }
        }
        catch (Exception ex)
        {
            Console.WriteLine($"Error processing {filePath}: {ex.Message}");
        }
    }
    
    private void ProcessAtom(Atom atom)
    {
        Console.WriteLine($"Atom: {atom.Type} - {atom.Text?.Substring(0, Math.Min(100, atom.Text.Length ?? 0))}...");
        
        // Process based on atom type
        switch (atom.Type)
        {
            case AtomTypeEnum.Table:
                Console.WriteLine($"  Table: {atom.Rows} rows, {atom.Columns} columns");
                break;
            case AtomTypeEnum.Image:
                Console.WriteLine($"  Image: {atom.Binary?.Length ?? 0} bytes");
                break;
            case AtomTypeEnum.Text:
                Console.WriteLine($"  Text: {atom.Text?.Length ?? 0} characters");
                break;
        }
    }
    
    public void Dispose()
    {
        _sdk?.Dispose();
    }
}

Dependencies

  • DocumentAtom (1.1.0): Core document processing library
  • DocumentAtom.TypeDetection (1.0.37): Document type detection
  • RestWrapper (3.1.6): HTTP client wrapper
  • System.Text.Json (8.0.5): JSON serialization

Version History

Please refer to CHANGELOG.md for version history.

Contributing

Contributions are welcome! Please feel free to submit issues, enhancement requests, or pull requests.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Support

For support, please:

  1. Check the DocumentAtom documentation
  2. Review the test projects in the solution
  3. File an issue in the repository
Product Compatible and additional computed target framework versions.
.NET net8.0 is compatible.  net8.0-android was computed.  net8.0-browser was computed.  net8.0-ios was computed.  net8.0-maccatalyst was computed.  net8.0-macos was computed.  net8.0-tvos was computed.  net8.0-windows was computed.  net9.0 was computed.  net9.0-android was computed.  net9.0-browser was computed.  net9.0-ios was computed.  net9.0-maccatalyst was computed.  net9.0-macos was computed.  net9.0-tvos was computed.  net9.0-windows was computed.  net10.0 was computed.  net10.0-android was computed.  net10.0-browser was computed.  net10.0-ios was computed.  net10.0-maccatalyst was computed.  net10.0-macos was computed.  net10.0-tvos was computed.  net10.0-windows was computed. 
Compatible target framework(s)
Included target framework(s) (in package)
Learn more about Target Frameworks and .NET Standard.

NuGet packages

This package is not used by any NuGet packages.

GitHub repositories

This package is not used by any popular GitHub repositories.

Version Downloads Last Updated
1.0.0 127 11/28/2025

Initial release