KreuzbergDev.HtmlToMarkdown
2.25.2
dotnet add package KreuzbergDev.HtmlToMarkdown --version 2.25.2
NuGet\Install-Package KreuzbergDev.HtmlToMarkdown -Version 2.25.2
<PackageReference Include="KreuzbergDev.HtmlToMarkdown" Version="2.25.2" />
<PackageVersion Include="KreuzbergDev.HtmlToMarkdown" Version="2.25.2" />
<PackageReference Include="KreuzbergDev.HtmlToMarkdown" />
paket add KreuzbergDev.HtmlToMarkdown --version 2.25.2
#r "nuget: KreuzbergDev.HtmlToMarkdown, 2.25.2"
#:package KreuzbergDev.HtmlToMarkdown@2.25.2
#addin nuget:?package=KreuzbergDev.HtmlToMarkdown&version=2.25.2
#tool nuget:?package=KreuzbergDev.HtmlToMarkdown&version=2.25.2
html-to-markdown
<div align="center" style="display: flex; flex-wrap: wrap; gap: 8px; justify-content: center; margin: 20px 0;">
<a href="https://crates.io/crates/html-to-markdown-rs"> <img src="https://img.shields.io/crates/v/html-to-markdown-rs?label=Rust&color=007ec6" alt="Rust"> </a> <a href="https://pypi.org/project/html-to-markdown/"> <img src="https://img.shields.io/pypi/v/html-to-markdown?label=Python&color=007ec6" alt="Python"> </a> <a href="https://www.npmjs.com/package/@kreuzberg/html-to-markdown-node"> <img src="https://img.shields.io/npm/v/@kreuzberg/html-to-markdown-node?label=Node.js&color=007ec6" alt="Node.js"> </a> <a href="https://www.npmjs.com/package/@kreuzberg/html-to-markdown-wasm"> <img src="https://img.shields.io/npm/v/@kreuzberg/html-to-markdown-wasm?label=WASM&color=007ec6" alt="WASM"> </a> <a href="https://central.sonatype.com/artifact/dev.kreuzberg/html-to-markdown"> <img src="https://img.shields.io/maven-central/v/dev.kreuzberg/html-to-markdown?label=Java&color=007ec6" alt="Java"> </a> <a href="https://pkg.go.dev/github.com/kreuzberg-dev/html-to-markdown/packages/go/v2/htmltomarkdown"> <img src="https://img.shields.io/badge/Go-v2.24.1-007ec6" alt="Go"> </a> <a href="https://www.nuget.org/packages/KreuzbergDev.HtmlToMarkdown/"> <img src="https://img.shields.io/nuget/v/KreuzbergDev.HtmlToMarkdown?label=C%23&color=007ec6" alt="C#"> </a> <a href="https://packagist.org/packages/kreuzberg-dev/html-to-markdown"> <img src="https://img.shields.io/packagist/v/kreuzberg-dev/html-to-markdown?label=PHP&color=007ec6" alt="PHP"> </a> <a href="https://rubygems.org/gems/html-to-markdown"> <img src="https://img.shields.io/gem/v/html-to-markdown?label=Ruby&color=007ec6" alt="Ruby"> </a> <a href="https://hex.pm/packages/html_to_markdown"> <img src="https://img.shields.io/hexpm/v/html_to_markdown?label=Elixir&color=007ec6" alt="Elixir"> </a> <a href="https://kreuzberg-dev.r-universe.dev/htmltomarkdown"> <img src="https://img.shields.io/badge/R-htmltomarkdown-007ec6" alt="R"> </a>
<a href="https://github.com/kreuzberg-dev/html-to-markdown/blob/main/LICENSE"> <img src="https://img.shields.io/badge/License-MIT-blue.svg" alt="License"> </a> </div>
<img width="3384" height="573" alt="Linkedin- Banner" src="https://github.com/user-attachments/assets/478a83da-237b-446b-b3a8-e564c13e00a8" />
<div align="center" style="margin-top: 20px;"> <a href="https://discord.gg/pXxagNK2zN"> <img height="22" src="https://img.shields.io/badge/Discord-Join%20our%20community-7289da?logo=discord&logoColor=white" alt="Discord"> </a> </div>
High-performance HTML → Markdown conversion powered by Rust. Shipping as a Rust crate, Python package, PHP extension, Ruby gem, Elixir Rustler NIF, Node.js bindings, WebAssembly, and standalone CLI with identical rendering behavior across all runtimes.
Key Features
- Blazing Fast – Rust-powered core delivers 10-80× faster conversion than pure Python alternatives (150–280 MB/s)
- Polyglot – Native bindings for Rust, Python, TypeScript/Node.js, Ruby, PHP, Go, Java, C#, and Elixir
- Smart Conversion – Handles complex documents including nested tables, code blocks, task lists, and hOCR OCR output
- Metadata Extraction – Extract document metadata (title, description, headers, links, images, structured data) alongside conversion
- Visitor Pattern – Custom callbacks for domain-specific dialects, content filtering, URL rewriting, accessibility validation
- Highly Configurable – Control heading styles, code block fences, list formatting, whitespace handling, and HTML sanitization
- Tag Preservation – Keep specific HTML tags unconverted when markdown isn't expressive enough
- Secure by Default – Built-in HTML sanitization prevents malicious content
- Consistent Output – Identical markdown rendering across all language bindings
Installation
Each language binding provides comprehensive documentation with installation instructions, examples, and best practices. Choose your platform to get started:
Scripting Languages:
- Python – PyPI package, metadata extraction, visitor pattern, CLI included
- Ruby – RubyGems package, RBS type definitions, Steep checking
- PHP – Composer package + PIE extension, PHP 8.4+, PHPStan level 9
- Elixir – Hex package, Rustler NIF bindings, Elixir 1.19+
- R – r-universe package, extendr bindings, R 4.3+
JavaScript/TypeScript:
- Node.js / TypeScript – Native NAPI-RS bindings for Node.js/Bun, fastest performance, WebAssembly for browsers/Deno
Compiled Languages:
- Go – Go module with FFI bindings, automatic library download
- Java – Maven Central, Panama Foreign Function & Memory API, Java 24+
- C# – NuGet package, .NET 8.0+, P/Invoke FFI bindings
Native:
- Rust – Core library, flexible feature flags, zero-copy APIs
Command-Line:
- CLI – Cross-platform binary via
cargo install html-to-markdown-clior Homebrew:brew install kreuzberg-dev/tap/html-to-markdown
<details> <summary><strong>Metadata Extraction</strong></summary>
Extract comprehensive metadata during conversion: title, description, headers, links, images, structured data (JSON-LD, Microdata, RDFa). Use cases: SEO extraction, table-of-contents generation, link validation, accessibility auditing, content migration.
</details>
<details> <summary><strong>Visitor Pattern</strong></summary>
Customize HTML→Markdown conversion with callbacks for specific elements. Intercept links, images, headings, lists, and more. Use cases: domain-specific Markdown dialects (Obsidian, Notion), content filtering, URL rewriting, accessibility validation, analytics.
Supported in: Rust, Python (sync & async), TypeScript/Node.js (sync & async), Ruby, and PHP.
Visitor Support Matrix
| Binding | Visitor Support | Async Support | Best For |
|---|---|---|---|
| Rust | ✅ Yes | ✅ Tokio | Core library, performance-critical code |
| Python | ✅ Yes | ✅ asyncio | Server-side, bulk processing |
| TypeScript/Node.js | ✅ Yes | ✅ Promise-based | Server-side Node.js/Bun, best performance |
| Ruby | ✅ Yes | ❌ No | Server-side Ruby on Rails, Sinatra |
| PHP | ✅ Yes | ❌ No | Server-side PHP, content management |
| Go | ❌ No | — | Basic conversion only |
| Java | ❌ No | — | Basic conversion only |
| C# | ❌ No | — | Basic conversion only |
| Elixir | ❌ No | — | Basic conversion only |
| WebAssembly | ❌ No | — | Browser, Edge, Deno (FFI limitations) |
For WASM users needing visitor functionality, see WASM Visitor Alternatives for recommended approaches.
</details>
<details> <summary><strong>Performance & Benchmarking</strong></summary>
Rust-powered core delivers 150–280 MB/s throughput (10-80× faster than pure Python alternatives). Includes benchmarking tools, memory profiling, streaming strategies, and optimization tips.
</details>
<details> <summary><strong>Tag Preservation</strong></summary>
Keep specific HTML tags unconverted when Markdown isn't expressive enough. Useful for tables, SVG, custom elements, or when you need mixed HTML/Markdown output.
See language-specific documentation for preserveTags configuration.
</details>
<details> <summary><strong>Skipping Images</strong></summary>
Skip all images during conversion using the skip_images option. Useful for text-only extraction or when you want to filter out visual content.
Rust:
use html_to_markdown_rs::{convert, ConversionOptions};
let options = ConversionOptions {
skip_images: true,
..Default::default()
};
let html = r#"<p>Text with <img src="image.jpg" alt="pic"> image</p>"#;
let markdown = convert(html, Some(options))?;
// Output: "Text with image" (image tags are removed)
Python:
from html_to_markdown import convert, ConversionOptions
options = ConversionOptions(skip_images=True)
markdown = convert(html, options)
TypeScript/Node.js:
import { convert, ConversionOptions } from '@kreuzberg/html-to-markdown-node';
const options: ConversionOptions = {
skipImages: true,
};
const markdown = convert(html, options);
Ruby:
require 'html_to_markdown'
options = HtmlToMarkdown::ConversionOptions.new(skip_images: true)
markdown = HtmlToMarkdown.convert(html, options)
PHP:
use Goldziher\HtmlToMarkdown\HtmlToMarkdown;
use Goldziher\HtmlToMarkdown\Options;
$options = new Options(['skip_images' => true]);
$markdown = HtmlToMarkdown::convert($html, $options);
This option is available across all language bindings. When enabled, all <img> tags and their associated markdown image syntax are removed from the output.
</details>
<details> <summary><strong>Secure by Default</strong></summary>
Built-in HTML sanitization prevents XSS attacks and malicious content. Powered by ammonia with safe defaults. Configurable via sanitize options.
</details>
Contributing
Contributions are welcome! See CONTRIBUTING.md for guidelines on:
- Setting up the development environment
- Running tests locally (Rust 95%+ coverage, language bindings 80%+)
- Submitting pull requests
- Reporting issues
All contributions must follow code quality standards enforced via pre-commit hooks (prek).
License
MIT License – see LICENSE for details. You can use html-to-markdown freely in both commercial and closed-source products with no obligations, no viral effects, and no licensing restrictions.
| Product | Versions Compatible and additional computed target framework versions. |
|---|---|
| .NET | net10.0 is compatible. net10.0-android was computed. net10.0-browser was computed. net10.0-ios was computed. net10.0-maccatalyst was computed. net10.0-macos was computed. net10.0-tvos was computed. net10.0-windows was computed. |
-
net10.0
- No dependencies.
NuGet packages
This package is not used by any NuGet packages.
GitHub repositories
This package is not used by any popular GitHub repositories.
| Version | Downloads | Last Updated |
|---|---|---|
| 2.25.2 | 32 | 2/25/2026 |
| 2.25.1 | 87 | 2/17/2026 |
| 2.25.0 | 88 | 2/15/2026 |
| 2.24.6 | 90 | 2/14/2026 |
| 2.24.5 | 94 | 2/1/2026 |
| 2.24.4 | 98 | 1/31/2026 |
| 2.24.3 | 91 | 1/31/2026 |
| 2.24.2 | 89 | 1/30/2026 |
| 2.24.1 | 90 | 1/28/2026 |
| 2.24.0 | 101 | 1/24/2026 |
| 2.23.6 | 100 | 1/21/2026 |
| 2.23.5 | 96 | 1/21/2026 |
| 2.23.4 | 96 | 1/20/2026 |
| 2.23.3 | 96 | 1/20/2026 |
| 2.23.2 | 93 | 1/20/2026 |
| 2.23.1 | 101 | 1/20/2026 |
| 2.23.0 | 90 | 1/18/2026 |
| 2.22.6 | 92 | 1/16/2026 |
| 2.22.5 | 89 | 1/16/2026 |
| 2.22.2 | 95 | 1/13/2026 |