Mythosia.Documents.Abstractions
1.2.0
dotnet add package Mythosia.Documents.Abstractions --version 1.2.0
NuGet\Install-Package Mythosia.Documents.Abstractions -Version 1.2.0
<PackageReference Include="Mythosia.Documents.Abstractions" Version="1.2.0" />
<PackageVersion Include="Mythosia.Documents.Abstractions" Version="1.2.0" />
<PackageReference Include="Mythosia.Documents.Abstractions" />
paket add Mythosia.Documents.Abstractions --version 1.2.0
#r "nuget: Mythosia.Documents.Abstractions, 1.2.0"
#:package Mythosia.Documents.Abstractions@1.2.0
#addin nuget:?package=Mythosia.Documents.Abstractions&version=1.2.0
#tool nuget:?package=Mythosia.Documents.Abstractions&version=1.2.0
Mythosia.Documents.Abstractions
Core document abstractions for structured document loading and parsing. Framework-agnostic — usable with any RAG pipeline or document processing system.
Installation
dotnet add package Mythosia.Documents.Abstractions
Key Types
DoclingDocument
Unified structured document representation following the docling convention. Content items are stored in flat lists; the tree structure is maintained via body/furniture root nodes.
using Mythosia.Documents;
using Mythosia.Documents.Elements;
var doc = new DoclingDocument
{
Name = "report",
Source = "docs/report.pdf",
};
// Builder API
doc.AddTitle("Annual Report");
doc.AddHeading("Revenue", level: 2);
doc.AddParagraph("Total revenue increased by 15%.");
doc.AddCode("var x = 42;", language: "csharp");
// Export to Markdown
string markdown = doc.ToMarkdown();
// Optional: override table rendering strategy
doc.TableSerializer = new SemanticTableSerializer();
string semanticMarkdown = doc.ToMarkdown();
For plain-text content that should be preserved as-is, use RawContent:
var doc = new DoclingDocument
{
Name = "notes",
Source = "notes.txt",
RawContent = rawText, // ToMarkdown() returns this directly
};
Markdown Serialization
DoclingDocument.ToMarkdown() uses MarkdownSerializer to render the body tree. Body text is escaped by default so source text such as *literal*, [brackets], | pipes, and backticks stays literal Markdown content instead of becoming formatting.
using Mythosia.Documents.Elements;
var doc = new DoclingDocument();
doc.AddParagraph("Keep *this* literal and preserve [brackets].");
string safeMarkdown = doc.ToMarkdown();
// Keep \*this\* literal and preserve \[brackets\].
var serializer = new MarkdownSerializer
{
EscapeText = false,
};
string rawMarkdown = serializer.Serialize(doc);
MarkdownSerializer also clamps heading output to Markdown # through ###### and inserts a blank line when a list is followed by another block element, preventing the next paragraph, heading, table, code block, formula, or image placeholder from being absorbed into the list.
Table Serialization
Table rendering is pluggable via ITableSerializer. The default is GridTableSerializer (standard Markdown pipe table). Switch to SemanticTableSerializer for form-style documents:
using Mythosia.Documents.Elements;
// Default: pipe table
var doc = new DoclingDocument { Name = "report" };
string md = doc.ToMarkdown(); // uses GridTableSerializer
// Semantic: bold group labels for form-style tables
doc.TableSerializer = new SemanticTableSerializer();
string md2 = doc.ToMarkdown(); // uses SemanticTableSerializer
| Serializer | Output Style |
|---|---|
GridTableSerializer |
Standard Markdown pipe table (default) |
SemanticTableSerializer |
Form-style with **bold labels** and inline data |
IDocumentLoader
public interface IDocumentLoader
{
Task<IReadOnlyList<DoclingDocument>> LoadAsync(
string source, CancellationToken cancellationToken = default);
}
IDocumentParser
public interface IDocumentParser
{
bool CanParse(string source);
Task<DoclingDocument> ParseAsync(string source, CancellationToken ct = default);
}
Element Types (Mythosia.Documents.Elements)
| Type | Description |
|---|---|
TextItem |
Paragraph, generic text |
TitleItem |
Document title rendered as Markdown H1 |
SectionHeaderItem |
Section heading rendered as Markdown H2-H6 for standard heading levels |
CodeItem |
Code block with language |
DocListItem |
List item (ordered/unordered) |
TableItem / TableData / TableCell |
Table structure |
TableSemanticView |
Semantic group/column analysis for table layout |
PictureItem |
Image placeholder |
GroupItem |
Container (chapter, slide, sheet) |
Related Packages
| Package | Description |
|---|---|
| Mythosia.Documents.Hwp | HWP (Korean word processor) loader |
| Mythosia.Documents.Office | Word / Excel / PowerPoint loaders |
| Mythosia.Documents.Pdf | PDF loader (PdfPig) |
| Mythosia.AI.Rag | RAG pipeline that consumes DoclingDocument |
| Product | Versions Compatible and additional computed target framework versions. |
|---|---|
| .NET | net5.0 was computed. net5.0-windows was computed. net6.0 was computed. net6.0-android was computed. net6.0-ios was computed. net6.0-maccatalyst was computed. net6.0-macos was computed. net6.0-tvos was computed. net6.0-windows was computed. net7.0 was computed. net7.0-android was computed. net7.0-ios was computed. net7.0-maccatalyst was computed. net7.0-macos was computed. net7.0-tvos was computed. net7.0-windows was computed. net8.0 was computed. net8.0-android was computed. net8.0-browser was computed. net8.0-ios was computed. net8.0-maccatalyst was computed. net8.0-macos was computed. net8.0-tvos was computed. net8.0-windows was computed. net9.0 was computed. net9.0-android was computed. net9.0-browser was computed. net9.0-ios was computed. net9.0-maccatalyst was computed. net9.0-macos was computed. net9.0-tvos was computed. net9.0-windows was computed. net10.0 was computed. net10.0-android was computed. net10.0-browser was computed. net10.0-ios was computed. net10.0-maccatalyst was computed. net10.0-macos was computed. net10.0-tvos was computed. net10.0-windows was computed. |
| .NET Core | netcoreapp3.0 was computed. netcoreapp3.1 was computed. |
| .NET Standard | netstandard2.1 is compatible. |
| MonoAndroid | monoandroid was computed. |
| MonoMac | monomac was computed. |
| MonoTouch | monotouch was computed. |
| Tizen | tizen60 was computed. |
| Xamarin.iOS | xamarinios was computed. |
| Xamarin.Mac | xamarinmac was computed. |
| Xamarin.TVOS | xamarintvos was computed. |
| Xamarin.WatchOS | xamarinwatchos was computed. |
-
.NETStandard 2.1
- System.Text.Json (>= 10.0.7)
NuGet packages (3)
Showing the top 3 NuGet packages that depend on Mythosia.Documents.Abstractions:
| Package | Downloads |
|---|---|
|
Mythosia.Documents.Pdf
PDF document loader. Parses PDF files into DoclingDocument structured models via PdfPig. Font-size based heading detection, bullet/numbered list recognition, spatial paragraph grouping. Supports encrypted PDFs, metadata extraction, and page number headers. |
|
|
Mythosia.Documents.Office
Office document loaders for Word (.docx), Excel (.xlsx), and PowerPoint (.pptx). Parses documents into DoclingDocument structured models via OpenXml, preserving heading hierarchy and slide content order. |
|
|
Mythosia.Documents.Hwp
HWP document loader. Parses Korean word-processor (.hwp) files into DoclingDocument structured models via HwpLibSharp. Section/paragraph text extraction with table support. |
GitHub repositories
This package is not used by any popular GitHub repositories.
v1.2.0: MarkdownSerializer now escapes Markdown-significant characters in body text by default via EscapeText, clamps heading output to H1-H6, and inserts blank lines when leaving list blocks. RawContent continues to bypass serialization. Updated System.Text.Json to 10.0.7.
v1.1.0: Added pluggable table serialization. ITableSerializer strategy interface with GridTableSerializer (pipe table, default) and SemanticTableSerializer (form-style group rendering with bold labels). DoclingDocument.TableSerializer property allows per-document override. TableData and TableSemanticView for structural table analysis.
v1.0.0: Initial release as Mythosia.Documents.Abstractions. DoclingDocument structured model with body tree, RawContent bypass, Metadata, Builder API, and Markdown export. IDocumentLoader returns DoclingDocument. Element types in Mythosia.Documents.Elements namespace.