TinyTokenizer 0.1.0

There is a newer version of this package available.
See the version list below for details.
dotnet add package TinyTokenizer --version 0.1.0
                    
NuGet\Install-Package TinyTokenizer -Version 0.1.0
                    
This command is intended to be used within the Package Manager Console in Visual Studio, as it uses the NuGet module's version of Install-Package.
<PackageReference Include="TinyTokenizer" Version="0.1.0" />
                    
For projects that support PackageReference, copy this XML node into the project file to reference the package.
<PackageVersion Include="TinyTokenizer" Version="0.1.0" />
                    
Directory.Packages.props
<PackageReference Include="TinyTokenizer" />
                    
Project file
For projects that support Central Package Management (CPM), copy this XML node into the solution Directory.Packages.props file to version the package.
paket add TinyTokenizer --version 0.1.0
                    
#r "nuget: TinyTokenizer, 0.1.0"
                    
#r directive can be used in F# Interactive and Polyglot Notebooks. Copy this into the interactive tool or source code of the script to reference the package.
#:package TinyTokenizer@0.1.0
                    
#:package directive can be used in C# file-based apps starting in .NET 10 preview 4. Copy this into a .cs file before any lines of code to reference the package.
#addin nuget:?package=TinyTokenizer&version=0.1.0
                    
Install as a Cake Addin
#tool nuget:?package=TinyTokenizer&version=0.1.0
                    
Install as a Cake Tool

TinyTokenizer

A high-performance, zero-allocation tokenizer library for .NET that parses text into abstract tokens using ReadOnlySpan<char> for maximum efficiency.

Features

  • Zero-allocation parsing — Uses ReadOnlySpan<char> internally for fast, allocation-free text traversal
  • Recursive declaration blocks — Automatically parses nested {}, [], and () blocks with child tokens
  • Configurable symbols — Define which characters are recognized as symbol tokens
  • Immutable tokens — All token types are immutable record classes
  • Error recovery — Gracefully handles malformed input with ErrorToken and continues parsing

Installation

Add a reference to the TinyTokenizer project or include the source files in your solution.

Quick Start

using TinyTokenizer;

// Create a tokenizer with source text
var tokenizer = new Tokenizer("func(a, b)".AsMemory());
var tokens = tokenizer.Tokenize();

// tokens contains:
// - TextToken("func")
// - BlockToken("(a, b)") with children:
//   - TextToken("a")
//   - SymbolToken(",")
//   - WhitespaceToken(" ")
//   - TextToken("b")

Token Types

Type Description Example
TextToken Plain text content hello, func, 123
WhitespaceToken Spaces, tabs, newlines , \t, \n
SymbolToken Configurable symbol characters /, :, ,, ;
BlockToken Declaration blocks with delimiters {...}, [...], (...)
ErrorToken Parsing errors (unmatched delimiters) } without opening {

BlockToken Properties

var tokenizer = new Tokenizer("{inner content}".AsMemory());
var tokens = tokenizer.Tokenize();
var block = (BlockToken)tokens[0];

block.FullContent;      // "{inner content}" (includes delimiters)
block.InnerContent;     // "inner content" (excludes delimiters)
block.Children;         // ImmutableArray<Token> of parsed inner tokens
block.OpeningDelimiter; // '{'
block.ClosingDelimiter; // '}'
block.Type;             // TokenType.BraceBlock

Configuration

Customize the tokenizer with TokenizerOptions:

// Default symbols: / : , ; = + - * < > ! & | . @ # ? % ^ ~ \
var options = TokenizerOptions.Default;

// Add custom symbols
options = TokenizerOptions.Default.WithAdditionalSymbols('$', '_');

// Remove symbols (they become part of text tokens)
options = TokenizerOptions.Default.WithoutSymbols('/');

// Replace entire symbol set
options = TokenizerOptions.Default.WithSymbols(':', ',', ';');

// Use with tokenizer
var tokenizer = new Tokenizer(source.AsMemory(), options);

Nested Blocks

Declaration blocks are parsed recursively:

var tokenizer = new Tokenizer("{outer [inner (deepest)]}".AsMemory());
var tokens = tokenizer.Tokenize();

var braceBlock = (BlockToken)tokens[0];           // {outer [inner (deepest)]}
var bracketBlock = (BlockToken)braceBlock.Children[2];  // [inner (deepest)]
var parenBlock = (BlockToken)bracketBlock.Children[2];  // (deepest)

Error Handling

The tokenizer produces ErrorToken for malformed input and continues parsing:

var tokenizer = new Tokenizer("}hello{".AsMemory());
var tokens = tokenizer.Tokenize();

// tokens contains:
// - ErrorToken("}", "Unexpected closing delimiter '}'", position: 0)
// - TextToken("hello")
// - ErrorToken("{", "Unclosed block starting with '{'", position: 6)

// Check for errors
if (tokens.HasErrors())
{
    foreach (var error in tokens.GetErrors())
    {
        Console.WriteLine($"Error at {error.Position}: {error.ErrorMessage}");
    }
}

Utility Extensions

Extensions on ImmutableArray<Token> for common operations:

// Check if any errors exist (including nested)
bool hasErrors = tokens.HasErrors();

// Get all errors (including nested)
IEnumerable<ErrorToken> errors = tokens.GetErrors();

// Get all tokens of a specific type (including nested)
IEnumerable<TextToken> textTokens = tokens.OfTokenType<TextToken>();
IEnumerable<BlockToken> blocks = tokens.OfTokenType<BlockToken>();

API Reference

Tokenizer (ref struct)

// Constructor
public Tokenizer(ReadOnlyMemory<char> source, TokenizerOptions? options = null)

// Tokenize the source
public ImmutableArray<Token> Tokenize()

Token (abstract record)

public abstract record Token(ReadOnlyMemory<char> Content, TokenType Type)
{
    public ReadOnlySpan<char> ContentSpan { get; }
}

TokenType (enum)

public enum TokenType
{
    BraceBlock,       // { }
    BracketBlock,     // [ ]
    ParenthesisBlock, // ( )
    Symbol,           // configurable characters
    Text,             // plain text
    Whitespace,       // spaces, tabs, newlines
    Error             // parsing errors
}

Requirements

  • .NET 8.0 or later

License

MIT

Product Compatible and additional computed target framework versions.
.NET net8.0 is compatible.  net8.0-android was computed.  net8.0-browser was computed.  net8.0-ios was computed.  net8.0-maccatalyst was computed.  net8.0-macos was computed.  net8.0-tvos was computed.  net8.0-windows was computed.  net9.0 was computed.  net9.0-android was computed.  net9.0-browser was computed.  net9.0-ios was computed.  net9.0-maccatalyst was computed.  net9.0-macos was computed.  net9.0-tvos was computed.  net9.0-windows was computed.  net10.0 was computed.  net10.0-android was computed.  net10.0-browser was computed.  net10.0-ios was computed.  net10.0-maccatalyst was computed.  net10.0-macos was computed.  net10.0-tvos was computed.  net10.0-windows was computed. 
Compatible target framework(s)
Included target framework(s) (in package)
Learn more about Target Frameworks and .NET Standard.
  • net8.0

    • No dependencies.

NuGet packages

This package is not used by any NuGet packages.

GitHub repositories

This package is not used by any popular GitHub repositories.

Version Downloads Last Updated
0.10.0 149 1/7/2026 0.10.0 is deprecated.
0.9.0 123 1/5/2026
0.8.0 131 1/4/2026
0.7.0 127 1/3/2026
0.6.8 122 1/2/2026
0.6.7 123 1/2/2026
0.6.6 121 1/2/2026
0.6.5 124 1/1/2026
0.6.4 124 1/1/2026
0.6.3 120 1/1/2026
0.6.2 126 1/1/2026
0.6.1 119 12/31/2025
0.6.0 125 12/31/2025
0.5.1 122 12/31/2025
0.5.0 126 12/30/2025
0.4.1 116 12/29/2025
0.4.0 113 12/29/2025
0.3.0 123 12/27/2025
0.2.0 189 12/26/2025
0.1.0 200 12/25/2025