TurboXml 2.0.0
See the version list below for details.
dotnet add package TurboXml --version 2.0.0
NuGet\Install-Package TurboXml -Version 2.0.0
<PackageReference Include="TurboXml" Version="2.0.0" />
paket add TurboXml --version 2.0.0
#r "nuget: TurboXml, 2.0.0"
// Install TurboXml as a Cake Addin #addin nuget:?package=TurboXml&version=2.0.0 // Install TurboXml as a Cake Tool #tool nuget:?package=TurboXml&version=2.0.0
TurboXml
<img align="right" width="160px" src="https://raw.githubusercontent.com/xoofx/TurboXml/main/img/TurboXml.png">
TurboXml is a .NET library that provides a lightweight and fast SAX - Simple API XML parser by using callbacks.
This is the equivalent of
System.Xml.XmlReader
but faster with no allocations. 🚀
✨ Features
- Should be slightly faster than
System.Xml.XmlReader
- Zero Allocation XML Parser
- Callbacks received
ReadOnlySpan<char>
for the parsed elements. - Parse from small to very large XML documents, without allocating!
- Callbacks received
- Optimized with SIMD
- TurboXml is using some SIMD to improve parsing of large portions of XML documents.
- Provide precise source location of the XML elements parsed (to report warning/errors)
- Compatible with
net8.0+
- NativeAOT ready
📃 User Guide
TurboXML is in the family of the SAX parsers and so you need to implement the callbacks defined by IXmlReadHandler
.
By default this handler implements empty interface methods that you can easily override:
var xml = "<?xml version=\"1.0\"?><root enabled=\"true\">Hello World!</root>";
var handler = new MyXmlHandler();
XmlParser.Parse(xml, ref handler);
// Will print:
//
// BeginTag(1:23): root
// Attribute(1:28)-(1:36): enabled="true"
// Content(1:43): Hello World!
// EndTag(1:57): root
struct MyXmlHandler : IXmlReadHandler
{
public void OnBeginTag(ReadOnlySpan<char> name, int line, int column)
=> Console.WriteLine($"BeginTag({line + 1}:{column + 1}): {name}");
public void OnEndTagEmpty()
=> Console.WriteLine($"EndTagEmpty");
public void OnEndTag(ReadOnlySpan<char> name, int line, int column)
=> Console.WriteLine($"EndTag({line + 1}:{column + 1}): {name}");
public void OnAttribute(ReadOnlySpan<char> name, ReadOnlySpan<char> value, int nameLine, int nameColumn, int valueLine, int valueColumn)
=> Console.WriteLine($"Attribute({nameLine + 1}:{nameColumn + 1})-({valueLine + 1}:{valueColumn + 1}): {name}=\"{value}\"");
public void OnText(ReadOnlySpan<char> text, int line, int column)
=> Console.WriteLine($"Content({line + 1}:{column + 1}): {text}");
}
📊 Benchmarks
The solution contains 2 benchmarks:
BenchStream
that parses 240+ MSBuild xml files (targets and props) from the .NET 8 (or latest SDK) installedBenchString
that parses theTiger.svg
in memory from a string.
In general, the advantages of TurboXml
over System.Xml.XmlReader
:
- It should be slightly faster - specially if tag names, attributes or even content are bigger than 8 consecutive characters by using SIMD instructions.
- It will make almost zero allocations - apart for the internal buffers used to pass data as
ReadOnlySpan<char>
back the the XML Handler.
Stream Results
BenchmarkDotNet v0.13.12, Windows 11 (10.0.22631.3085/23H2/2023Update/SunValley3)
AMD Ryzen 9 7950X, 1 CPU, 32 logical and 16 physical cores
.NET SDK 8.0.101
[Host] : .NET 8.0.1 (8.0.123.58001), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
DefaultJob : .NET 8.0.1 (8.0.123.58001), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
Method | Mean | Error | StdDev | Gen0 | Gen1 | Allocated |
---|---|---|---|---|---|---|
TurboXml - Stream | 3.993 ms | 0.0780 ms | 0.0729 ms | - | - | 13.18 KB |
System.Xml.XmlReader - Stream | 4.163 ms | 0.0386 ms | 0.0361 ms | 328.1250 | 46.8750 | 5415.45 KB |
String Results
Method | Mean | Error | StdDev | Gen0 | Gen1 | Allocated |
---|---|---|---|---|---|---|
TurboXml | 52.14 us | 1.040 us | 1.491 us | - | - | - |
System.Xml.XmlReader | 56.98 us | 0.393 us | 0.348 us | 2.9297 | 0.2441 | 49304 B |
🚨 XML Conformance and Known Limitations
This parser is following the Extensible Markup Language (XML) 1.0 (Fifth Edition) and should support any XML valid documents, except for the known limitations described below:
- For simplicity of the implementation, this parser does not support DTD, custom entities and XML directives (
<!DOCTYPE ...>
). If you are looking for this, you should instead useSystem.Xml.XmlReader
. - This parser checks for well formed XML, matching begin and end tags and report an error if they are not matching
- This parser does not check for duplicated attributes.
- It is the responsibility of the XML handler to implement such a check. The rationale is that the check can be performed more efficiently depending on user scenarios (e.g bit flags...etc.)
🏗️ Build
You need to install the .NET 8 SDK. Then from the root folder:
$ dotnet build src -c Release
🪪 License
This software is released under the BSD-2-Clause license.
🤗 Author
Alexandre Mutel aka xoofx.
Product | Versions Compatible and additional computed target framework versions. |
---|---|
.NET | net8.0 is compatible. net8.0-android was computed. net8.0-browser was computed. net8.0-ios was computed. net8.0-maccatalyst was computed. net8.0-macos was computed. net8.0-tvos was computed. net8.0-windows was computed. |
-
net8.0
- No dependencies.
NuGet packages
This package is not used by any NuGet packages.
GitHub repositories (1)
Showing the top 1 popular GitHub repositories that depend on TurboXml:
Repository | Stars |
---|---|
MichalStrehovsky/sizoscope
.NET tool to analyze size of Native AOT binaries.
|