DocumentFormat.OpenXml.Markdown
1.0.1
See the version list below for details.
dotnet add package DocumentFormat.OpenXml.Markdown --version 1.0.1
NuGet\Install-Package DocumentFormat.OpenXml.Markdown -Version 1.0.1
<PackageReference Include="DocumentFormat.OpenXml.Markdown" Version="1.0.1" />
<PackageVersion Include="DocumentFormat.OpenXml.Markdown" Version="1.0.1" />
<PackageReference Include="DocumentFormat.OpenXml.Markdown" />
paket add DocumentFormat.OpenXml.Markdown --version 1.0.1
#r "nuget: DocumentFormat.OpenXml.Markdown, 1.0.1"
#:package DocumentFormat.OpenXml.Markdown@1.0.1
#addin nuget:?package=DocumentFormat.OpenXml.Markdown&version=1.0.1
#tool nuget:?package=DocumentFormat.OpenXml.Markdown&version=1.0.1
DocumentFormat.OpenXml.Markdown
A lightweight, powerful extension library for the official Open XML SDK. It reliably converts Word (.docx), Excel (.xlsx), and PowerPoint (.pptx) documents into clean, GitHub-Flavored Markdown.
Perfect for migrating internal documents, formatting wiki content, or pre-processing Office files into LLM-friendly plain text!
Features
- 📝 Wordprocessing (
.docx)
Maps paragraph styles to standard Markdown headings. Parses inline text formatting like bold (**), italics (*), and strikethrough (~~). Supports multi-level nested lists and extracts hyperlinks and tables. - 📊 Spreadsheet (
.xlsx)
Resolves the SharedString table and extracts cached formula values. Renders worksheets as elegant GitHub-Flavored Markdown (GFM) tables. - 📽️ Presentation (
.pptx)
Iterates through slides sequentially. Re-maps slide titles to headings, and extracts nested text from Shapes, TextBoxes, and Tables. - 🖼️ Flexible Image Handling
Easily control how embedded images are handled. You can safely ignore them, inline them directly via Base64 Data URIs, or export them to a local asset directory with custom URL prefixes.
Getting Started
Installation
Note: Instructions will be finalized when the package is published to NuGet.
dotnet add package DocumentFormat.OpenXml.Markdown
Basic Usage
Usage is incredibly simple and stream-aware. You can pass raw OpenXML types or let the library infer the document naturally via a standard Stream.
Convert from a Stream
The converter will inspect the package's contents and automatically identify if it's Word, Excel, or PowerPoint.
using DocumentFormat.OpenXml.Markdown;
using System.IO;
var fileStream = File.OpenRead("SampleDocument.docx");
string markdown = MarkdownConverter.ConvertToMarkdown(fileStream);
Console.WriteLine(markdown);
Convert from OpenXML objects
If you are already interacting with the Open XML SDK, you can convert specific documents you've already opened.
using DocumentFormat.OpenXml.Packaging;
using DocumentFormat.OpenXml.Markdown;
using (var wordDoc = WordprocessingDocument.Open("SampleDocument.docx", false))
{
string markdown = MarkdownConverter.ConvertToMarkdown(wordDoc);
}
Advanced Usage: Image Handling Configurations
By default, any visual image found in the document stream is encoded internally as a Base64 string. However, you can change this behavior by instantiating MarkdownConverterSettings:
var settings = new MarkdownConverterSettings
{
// Mode: Ignore -> Ignores images entirely
// Mode: Base64 -> Default behavior ()
// Mode: ExportToFolder -> Writes images to disk and generates links
ImageExportMode = ImageExportMode.ExportToFolder,
// If using ExportToFolder, define where to persist the binaries:
AssetExportDirectory = "./docs/assets/",
// (Optional) Define a prefix for generated URLs (e.g. for CDN hosting):
AssetLinkUrlPrefix = "https://cdn.example.com/assets/"
};
using var wordDoc = WordprocessingDocument.Open("Report.docx", false);
string markdown = MarkdownConverter.ConvertToMarkdown(wordDoc, settings);
Complex Formatting Notes
- Tables: Small, simple tables work best. For Word and PowerPoint tables containing lists or multiple paragraphs in a single cell, the library uses
<br>tags to preserve visual structure within GFM-compliant rows. - Lists: Multi-level bullets are supported for both Word and PowerPoint, utilizing standard Markdown indentation logic.
Contributing
We welcome contributions! To collaborate on this repository:
- Ensure you agree to the
.editorconfigformatting rules. - Verify all modifications still allow for cross-compilation across .NET 10+ properly.
- Write sufficient xUnit coverage for robust edge cases.
License
This project is licensed under the same MIT License constraints as the official DocumentFormat.OpenXml SDK repository.
| Product | Versions Compatible and additional computed target framework versions. |
|---|---|
| .NET | net10.0 is compatible. net10.0-android was computed. net10.0-browser was computed. net10.0-ios was computed. net10.0-maccatalyst was computed. net10.0-macos was computed. net10.0-tvos was computed. net10.0-windows was computed. |
-
net10.0
- DocumentFormat.OpenXml (>= 3.5.1)
- IsNullEnforcer (>= 1.0.0)
NuGet packages
This package is not used by any NuGet packages.
GitHub repositories
This package is not used by any popular GitHub repositories.