DocSharp.Common
0.3.0
See the version list below for details.
dotnet add package DocSharp.Common --version 0.3.0
NuGet\Install-Package DocSharp.Common -Version 0.3.0
<PackageReference Include="DocSharp.Common" Version="0.3.0" />
<PackageVersion Include="DocSharp.Common" Version="0.3.0" />
<PackageReference Include="DocSharp.Common" />
paket add DocSharp.Common --version 0.3.0
#r "nuget: DocSharp.Common, 0.3.0"
#addin nuget:?package=DocSharp.Common&version=0.3.0
#tool nuget:?package=DocSharp.Common&version=0.3.0
DocSharp
DocSharp is a pure C# library to convert between document formats without Office interop or native dependencies.
The following packages are currently available:
- DocSharp.Binary: convert Office 97-2003 binary documents (doc, xls, ppt) to OpenXML documents (docx, xlsx, pptx). This is a fork of the abandoned b2xtranslator project which provides critical fixes.
- DocSharp.Docx: convert DOCX to Markdown, RTF and plain text (.txt). Possible applications include generating Open XML documents in C# and exporting for other editors, or loading Word documents in a RichTextBox / RichEditBox control.
- DocSharp.Markdown: convert Markdown to DOCX using a custom Markdig renderer.
Packages can be installed via NuGet:
There is no common DOM to manipulate or generate documents, this library is mainly for conversion. However, the Docx package provides some helper methods on top of the Open XML SDK that may be extended in the future.
If your main purpose is creating documents from scratch you can consider the following libraries: OfficeIMO, OpenXML-Office, ClosedXML, ShapeCrawler, QuestPDF, MigraDoc.
Supported features
- Binary formats: almost all doc/xls/ppt features were supported by the original project, but exceptions occurred when using .NET (rather than .NET Framework) or loading specific documents/encodings. Most errors should be fixed now but more work is needed to make the library reliable; if you find other bugs, you are welcome to open an issue (please attach a sample file if the issue only occurs for specific documents).
- DOCX to RTF:
- Font formatting, paragraphs, tables and lists
- Not all properties are supported, e.g. advanced positioning and conditional formatting for tables or some list types are not recognized.
- Images:
- JPEG, PNG, EMF and WMF are supported.
- Only inline images are supported (wrap layouts are not yet implemented).
- Hyperlinks and bookmarks
- Page setup: size, orientation, margins, borders, background color
- Header and footer
- Endnotes and footnotes
- Drop caps
- Fields (partial) and page numbers
- TODO: math formulas, drawings, OLE objects, comments, improve support for right-to-left and complex script languages
- Font formatting, paragraphs, tables and lists
- DOCX to Markdown:
- Text and basic formatting
- Bold, italic, underline, strikethrough, superscript, subscript
- Heading 1-6 styles
- Any highlight color is converted to
<mark>
- Inline images
ImagesOutputFolder
needs to be set to an existing directory, otherwise images are skipped. An absolute URI is used by default; to produce a relative URI setImagesBaseUriOverride
to any not-null folder path (empty string or "." means same folder as the Markdown file, "../images" means images subfolder in the parent folder).- Some image types are not recognized (e.g. WordPad embeds images in a different way compared to MS Word and other word processors).
- Images should be in JPEG, PNG or GIF format to be supported by browsers; BMP is partially supported but not recommended. There is currently no automatic image conversion implemented.
- Crop and effects are not supported.
- Lists (partial)
- Tables (values only)
- Hyperlinks and bookmarks
- Page breaks are converted to horizontal lines
- TODO: math formulas, charts
- Text and basic formatting
- Markdown to DOCX:
- Basic Markdown features (headings, bold, italic, strikethrough, superscript, subscript)
- Few basic HTML tags such as
<u>
,<sup>
,<sub>
,<mark>
are also supported
- Few basic HTML tags such as
- Quotes and code blocks
- Lists
- External hyperlinks
- Bookmarks for internal hyperlinks to headings (GitHub-like auto-identifiers)
- Images
- The converter attempts to read local images and download online images (http/https URLs only). If this behavior is not desired, set
SkipImages
to true. - Images specified as absolute URLs are processed by default. For relative URLs
ImagesBaseUri
needs to be set to an absolute local directory path or http(s) URL, which will be combined with the image file name at runtime, such as:C:\Data
+./images/image1.jpg
. - WEBP and AVIF images are ignored as they are not supported in DOCX documents; base64 is also ignored as it is rarely used and not supported by many Markdown processors.
- Width and height must be specified in DOCX. The converter tries to scale the original image file dimensions to fit the page, but it's not always accurate.
- The converter attempts to read local images and download online images (http/https URLs only). If this behavior is not desired, set
- Tables (experimental)
- TODO: other internal hyperlinks types, math and other extensions, raw HTML blocks, async functions/progress callback (some tasks such as downloading images may take some time)
- Basic Markdown features (headings, bold, italic, strikethrough, superscript, subscript)
Usage
You can refer to the project Wiki or sample app.
Roadmap
- Support more elements and attributes, and fix issues on edge cases
- Reverse RTF to DOCX conversion
Credits
Dependencies:
- Open XML SDK
- Markdig - for DocSharp.Markdown only
Forked:
Others:
- Html2OpenXml for images header decoding and unit conversions.
License
DocSharp is licensed under MIT license and can be used for both open source and commercial projects.
If you find the library useful, adding a star is highly appreciated.
Product | Versions Compatible and additional computed target framework versions. |
---|---|
.NET | net8.0 is compatible. net8.0-android was computed. net8.0-browser was computed. net8.0-ios was computed. net8.0-maccatalyst was computed. net8.0-macos was computed. net8.0-tvos was computed. net8.0-windows was computed. net9.0 was computed. net9.0-android was computed. net9.0-browser was computed. net9.0-ios was computed. net9.0-maccatalyst was computed. net9.0-macos was computed. net9.0-tvos was computed. net9.0-windows was computed. net10.0 was computed. net10.0-android was computed. net10.0-browser was computed. net10.0-ios was computed. net10.0-maccatalyst was computed. net10.0-macos was computed. net10.0-tvos was computed. net10.0-windows was computed. |
-
net8.0
- System.Text.Encoding.CodePages (>= 8.0.0)
NuGet packages (4)
Showing the top 4 NuGet packages that depend on DocSharp.Common:
Package | Downloads |
---|---|
DocSharp.Docx
.NET library for converting documents. The DocSharp.Docx package provides DOCX to RTF, Markdown and plain text conversions. |
|
DocSharp.Markdown
.NET library for converting documents. The DocSharp.Markdown package provides Markdown to DOCX and RTF conversion. |
|
DocSharp.SystemDrawing
.NET library for converting documents. The DocSharp.SystemDrawing package provides helper functions to convert unsupported images when creating documents, using System.Drawing.Common (Windows only) as graphics library. |
|
DocSharp.ImageSharp
.NET library for converting documents. The DocSharp.ImageSharp package provides helper functions to convert unsupported images when creating documents, using ImageSharp as graphics library. |
GitHub repositories
This package is not used by any popular GitHub repositories.