WikitextParser 0.1.2

.NET 9.0

dotnet add package WikitextParser --version 0.1.2

NuGet\Install-Package WikitextParser -Version 0.1.2

This command is intended to be used within the Package Manager Console in Visual Studio, as it uses the NuGet module's version of Install-Package.

<PackageReference Include="WikitextParser" Version="0.1.2" />

For projects that support PackageReference, copy this XML node into the project file to reference the package.

<PackageVersion Include="WikitextParser" Version="0.1.2" />
                    

                            Directory.Packages.props

<PackageReference Include="WikitextParser" />
                    

                            Project file

For projects that support Central Package Management (CPM), copy this XML node into the solution Directory.Packages.props file to version the package.

paket add WikitextParser --version 0.1.2

The NuGet Team does not provide support for this client. Please contact its maintainers for support.

#r "nuget: WikitextParser, 0.1.2"

#r directive can be used in F# Interactive and Polyglot Notebooks. Copy this into the interactive tool or source code of the script to reference the package.

#:package WikitextParser@0.1.2

#:package directive can be used in C# file-based apps starting in .NET 10 preview 4. Copy this into a .cs file before any lines of code to reference the package.

#addin nuget:?package=WikitextParser&version=0.1.2
                    

                            Install as a Cake Addin

#tool nuget:?package=WikitextParser&version=0.1.2
                    

                            Install as a Cake Tool

The NuGet Team does not provide support for this client. Please contact its maintainers for support.

WikitextParser

WikitextParser is a dependency-free .NET 9 library for parsing Wikipedia's wikitext markup into a structured object model. It provides both a low-level API for direct element access and a high-level API for parsing a full page into a semantic structure of sections, subsections, and an infobox.

The entire library was developed to parse real-world Wikipedia page and can handle a variety of common wikitext syntax.

Features

Dual-Level API:
- High-Level API (Parser.ParsePage): Parses an entire page into a structured Page object with a lead section, an infobox, and a nested hierarchy of sections.
- Low-Level API (Parser.Parse): Parses wikitext into a flat IEnumerable<WikitextElement> for custom processing and analysis.
Content Conversion: Convert any parsed element, section, or the entire page into either HTML or plain text.
No Dependencies: The core library is self-contained and does not require any external packages.

Installation

This library will be available via NuGet. You can install it using the .NET CLI:

dotnet add package WikitextParser

Or through the NuGet Package Manager in Visual Studio.

Usage Guide

Using the parser is straightforward. The recommended approach is to use the high-level ParsePage API for most use cases.

High-Level API: Parsing a Page

This is the easiest way to get a structured representation of a wiki page.

using WikitextParser;
using WikitextParser.Models;

string wikitext = File.ReadAllText("my_wikitext_file.txt");

// 1. Parse the entire page into a Page object
Page page = Parser.ParsePage(wikitext);

// 2. Access the Infobox
if (page.Infobox != null)
{
    var creatorParam = page.Infobox.Parameters.FirstOrDefault(p => p.Key == "creator");
    if (creatorParam != null)
    {
        Console.WriteLine($"Creator: {creatorParam.Value.ConvertToText()}");
    }
}

// 3. Access the Lead Content (content before the first heading)
foreach (var element in page.LeadContent)
{
    // Ignores metadata templates like {{Short description|...}}
    if (element is not WikiKeyValuePairElement)
    {
        Console.WriteLine(element.ConvertToText());
    }
}

// 4. Recursively process all sections and subsections
void PrintSection(Section section, int indent = 0)
{
    string indentStr = new string(' ', indent * 2);
    Console.WriteLine($"{indentStr}{section.Heading.ConvertToText().Trim()}");

    // Print main article links for the section
    foreach (var mainLink in section.MainArticleLinks)
    {
        Console.WriteLine($"{indentStr}  (See also: {mainLink.Parameters.First().Value.ConvertToText()})");
    }

    // Print the section's direct content
    foreach(var content in section.ContentElements)
    {
        Console.WriteLine($"{indentStr}  [Content: {content.GetType().Name}]");
    }
    
    // Recurse into subsections
    foreach (var sub in section.Subsections)
    {
        PrintSection(sub, indent + 1);
    }
}

Console.WriteLine("\n--- PAGE STRUCTURE ---");
foreach (var section in page.Sections)
{
    PrintSection(section);
}

// 5. Convert the entire page to HTML or Text
string fullHtml = page.ConvertToHtml();
string fullText = page.ConvertToText();

Low-Level API: Parsing Elements

If you need fine-grained control or want to process elements as a simple stream, you can use the low-level Parse method.

using WikitextParser;
using WikitextParser.Elements;

string wikitext = "== Section 1 ==\nThis is ''italic'' text.\n\n[[Category:My Category]]";

var elements = Parser.Parse(wikitext);

foreach (var element in elements)
{
    switch (element.Type)
    {
        case WikitextElementType.Heading:
            var heading = (HeadingElement)element;
            Console.WriteLine($"Found H{heading.HeadingLevel}: {heading.ConvertToText().Trim()}");
            break;
            
        case WikitextElementType.Paragraph:
            var paragraph = (ParagraphElement)element;
            Console.WriteLine($"Found Paragraph: {paragraph.ConvertToText().Trim()}");
            break;
            
        case WikitextElementType.Category:
            var category = (CategoryElement)element;
            Console.WriteLine($"Found Category: {category.CategoryName}");
            break;
            
        default:
            Console.WriteLine($"Found other element type: {element.Type}");
            break;
    }
}

Supported Elements

The parser currently supports the following wikitext syntax:

Element	Wikitext Syntax	Notes
Paragraphs	`Some text.`	Paragraphs are separated by one or more blank lines (`\n\n`). Single newlines are treated as line breaks within a paragraph.
Headings	`== H2 ==`, `=== H3 ===`, etc.	Parses heading levels 2 through 6.
Bold	`'''Bold Text'''`
Italic	`''Italic Text''`
Bold & Italic	`'''''Bold Italic'''''`
Internal Links	`[[Page Name]]`, `[[Page Name\|Display Text]]`
Templates	`{{TemplateName\|param1\|key=value}}`	Supports named and positional parameters, as well as nested templates.
Infoboxes	`{{Infobox ... }}`	Recognized as a special `TemplateElement` and separated in the high-level API.
Simple KVP	`{{Short description\|...}}`	Simple single-line templates are parsed as `WikiKeyValuePairElement`.
Tables	`{\| ... \|}`	Supports table attributes, rows (`\|-`), headers (`!`), cells (`\|`), and cell attributes.
References	`<ref>...</ref>`, `<ref name="..."/>`, `<ref name="...">...</ref>`	Parses standard, named, and self-closing reference tags.
File/Image	`[[File:Name.jpg\|thumb\|caption]]`	Parses filename, options (like `thumb`, `right`, `220px`), and the final caption.
Category Links	`[[Category:Category Name\|Sort Key]]`	Parsed as distinct `CategoryElement` objects.
HTML Comments	``	Comments are parsed and preserved.

Conversion to HTML and Text

Every parsed element (WikitextElement), as well as the high-level Page and Section objects, includes two methods: ConvertToHtml() and ConvertToText().

ConvertToHtml(): Produces a basic HTML representation of the content.
ConvertToText(): Produces a plain text representation, stripping out all markup.

Page page = Parser.ParsePage(wikitext);

// Convert the entire page
string fullHtml = page.ConvertToHtml();
string fullText = page.ConvertToText();

Notice: The built-in conversion methods are designed to be simple and provide a reasonable default output. They are not customizable. For more advanced formatting, custom logic, or to handle specific templates in a unique way, it is recommended to traverse the parsed element tree and build your own custom conversion logic.

Building from Source

To build the project yourself:

Install dotnet sdk (9 or newer)
Clone the repository.
Navigate to the root directory.
Run dotnet build.

License

This project is licensed under the MIT License.

History

0.0.1 - Initial release
0.1.2 - Fix "plainlist" template not parsing correctly

Product	Compatible and additional computed target framework versions.
.NET	net9.0 is compatible. net9.0-android was computed. net9.0-browser was computed. net9.0-ios was computed. net9.0-maccatalyst was computed. net9.0-macos was computed. net9.0-tvos was computed. net9.0-windows was computed. net10.0 was computed. net10.0-android was computed. net10.0-browser was computed. net10.0-ios was computed. net10.0-maccatalyst was computed. net10.0-macos was computed. net10.0-tvos was computed. net10.0-windows was computed.

Product

.NET

Compatible target framework(s)

Included target framework(s) (in package)

Learn more about Target Frameworks and .NET Standard.

net9.0
- No dependencies.

NuGet packages

This package is not used by any NuGet packages.

GitHub repositories

This package is not used by any popular GitHub repositories.

Version	Downloads	Last Updated
0.1.2	61	8/23/2025
0.1.1	55	8/23/2025
0.0.1	106	8/17/2025