TinyTokenizer 0.5.0
See the version list below for details.
dotnet add package TinyTokenizer --version 0.5.0
NuGet\Install-Package TinyTokenizer -Version 0.5.0
<PackageReference Include="TinyTokenizer" Version="0.5.0" />
<PackageVersion Include="TinyTokenizer" Version="0.5.0" />
<PackageReference Include="TinyTokenizer" />
paket add TinyTokenizer --version 0.5.0
#r "nuget: TinyTokenizer, 0.5.0"
#:package TinyTokenizer@0.5.0
#addin nuget:?package=TinyTokenizer&version=0.5.0
#tool nuget:?package=TinyTokenizer&version=0.5.0
TinyTokenizer
A high-performance, zero-allocation tokenizer library for .NET 8+ with SIMD-optimized character matching.
Features
- High performance — Zero-allocation parsing with SIMD-optimized
SearchValues<char> - TinyAst — Red-green syntax tree with fluent queries, editing, and undo/redo
- Schema system — Unified configuration for tokenization + semantic node definitions
- Semantic nodes — Pattern-based AST matching (function calls, property access, etc.)
- TreeWalker — DOM-style filtered tree traversal
- Async streaming — Tokenize from
StreamandPipeReaderwithIAsyncEnumerable - Full tokenization — Blocks, strings, numbers, comments, operators, tagged identifiers
- Error recovery — Gracefully handles malformed input and continues parsing
Installation
dotnet add package TinyTokenizer
Or add a reference to the TinyTokenizer project in your solution.
Quick Start
using TinyTokenizer;
// Simple tokenization
var tokens = "func(a, b)".TokenizeToTokens();
// tokens contains:
// - IdentToken("func")
// - BlockToken("(a, b)") with children:
// - IdentToken("a")
// - SymbolToken(",")
// - WhitespaceToken(" ")
// - IdentToken("b")
// With options
var options = TokenizerOptions.Default
.WithCommentStyles(CommentStyle.CStyleSingleLine, CommentStyle.CStyleMultiLine);
var tokens = "x = 42; // comment".TokenizeToTokens(options);
TinyAst — Syntax Tree API
TinyTokenizer includes a syntax tree for efficient AST manipulation with fluent queries and undo/redo support.
Quick Start
using TinyTokenizer.Ast;
// Parse source into a syntax tree
var tree = SyntaxTree.Parse("function foo() { return 1; }");
// Query nodes
var idents = tree.Leaves.Where(l => l.Kind == NodeKind.Ident);
// Fluent mutations with undo support
tree.CreateEditor()
.Replace(Query.Ident("foo"), "bar") // Concise named query
.Insert(Query.BraceBlock.First().InnerStart(), "console.log('enter');")
.Commit();
// Undo/redo
tree.Undo();
tree.Redo();
Querying with NodeQuery
The Query static class provides a fluent CSS-like selector API:
using TinyTokenizer.Ast;
// Named queries (with specific text) - NEW concise API!
Query.Ident("main") // Identifier with text "main"
Query.Symbol(".") // Dot symbol
Query.Operator("=>") // Arrow operator
Query.Numeric("42") // Number literal "42"
// Any-kind queries (match any of that kind)
Query.AnyIdent // All identifiers
Query.AnyNumeric // All numbers
Query.AnyString // All strings
Query.AnyOperator // All operators
Query.AnySymbol // All symbols
Query.AnyTaggedIdent // All tagged identifiers
// Blocks
Query.BraceBlock // All { } blocks
Query.BracketBlock // All [ ] blocks
Query.ParenBlock // All ( ) blocks
Query.AnyBlock // Any block type
// Filters on any-kind queries
Query.AnyIdent.WithText("foo") // Exact match (same as Query.Ident("foo"))
Query.AnyIdent.WithTextContaining("test") // Contains
Query.AnyIdent.WithTextStartingWith("_") // Prefix
Query.AnyIdent.Where(n => n.Width > 5) // Custom predicate
// Pseudo-selectors
Query.AnyIdent.First() // First match only
Query.AnyIdent.Last() // Last match only
Query.AnyIdent.Nth(2) // Third match (0-indexed)
// Composition
Query.AnyIdent | Query.AnyNumeric // Union (OR)
Query.AnyIdent & Query.Leaf // Intersection (AND)
// Sequence combinators
Query.Sequence(Query.AnyIdent, Query.ParenBlock) // Match ident then paren block
Query.AnyIdent.Then(Query.ParenBlock) // Fluent chaining
// Repetition combinators
Query.AnyIdent.Optional() // Match 0 or 1
Query.AnyIdent.ZeroOrMore() // Match 0+
Query.AnyIdent.OneOrMore() // Match 1+
Query.AnyIdent.Exactly(3) // Match exactly 3
Query.AnyIdent.Repeat(2, 5) // Match 2 to 5
Query.Any.Until(Query.Newline) // Repeat until terminator (not consumed)
// Lookahead assertions
Query.AnyIdent.FollowedBy(Query.ParenBlock) // Positive lookahead
Query.Ident.NotFollowedBy(Query.ParenBlock) // Negative lookahead
Insertion Positions
Queries resolve to insertion points with position modifiers:
// Relative to matched node
Query.Ident.First().Before() // Insert before first ident
Query.Ident.First().After() // Insert after first ident
// Inside blocks
Query.BraceBlock.First().InnerStart() // After opening {
Query.BraceBlock.First().InnerEnd() // Before closing }
Named Node Queries (INamedNode)
Syntax nodes that implement INamedNode can be queried by name:
// Find functions by name
Query.Syntax<GlslFunctionSyntax>().Named("main")
Query.Syntax<GlslDirectiveSyntax>().Named("version")
// Use with insertion positions
tree.CreateEditor()
.Insert(Query.Syntax<MyFunctionSyntax>().Named("foo").Before(), "// comment\n")
.Commit();
Block Container Queries (IBlockContainerNode)
Syntax nodes that implement IBlockContainerNode expose named blocks for injection:
// Insert into a named block of a syntax node
var mainQuery = Query.Syntax<GlslFunctionSyntax>().Named("main");
tree.CreateEditor()
.Insert(mainQuery.InnerStart("body"), "\n // entry") // Start of body block
.Insert(mainQuery.InnerEnd("body"), "\n // exit") // End of body block
.Insert(mainQuery.InnerStart("params"), "int x") // Start of params block
.Commit();
SyntaxEditor
The SyntaxEditor provides batched mutations with atomic commit:
var tree = SyntaxTree.Parse("a + b");
var editor = tree.CreateEditor();
// Queue multiple edits
editor
.Replace(Query.Ident.WithText("a"), "x")
.Replace(Query.Ident.WithText("b"), "y")
.Insert(Query.Operator.First().Before(), "(")
.Insert(Query.Operator.First().After(), ")");
// Apply atomically (supports undo)
editor.Commit();
// Or discard
editor.Rollback();
Common SyntaxEditor Patterns
Insert before a function/block:
var tree = SyntaxTree.Parse("function {body}");
tree.CreateEditor()
.Insert(Query.BraceBlock.First().Before(), "/* decorator */ ")
.Commit();
// Result: "function /* decorator */ {body}"
Insert at the start of a function body (after opening brace):
var tree = SyntaxTree.Parse("function {existing}");
tree.CreateEditor()
.Insert(Query.BraceBlock.First().InnerStart(), "console.log('enter'); ")
.Commit();
// Result: "function {console.log('enter'); existing}"
Insert at the end of a function body (before closing brace):
var tree = SyntaxTree.Parse("function {existing}");
tree.CreateEditor()
.Insert(Query.BraceBlock.First().InnerEnd(), " return result;")
.Commit();
// Result: "function {existing return result;}"
Insert after a function/block:
var tree = SyntaxTree.Parse("function {body}");
tree.CreateEditor()
.Insert(Query.BraceBlock.First().After(), " // end of function")
.Commit();
// Result: "function {body} // end of function"
Multiple insertions in one commit:
var tree = SyntaxTree.Parse("fn {body}");
tree.CreateEditor()
.Insert(Query.BraceBlock.First().Before(), "/* before */ ")
.Insert(Query.BraceBlock.First().InnerStart(), "start(); ")
.Insert(Query.BraceBlock.First().InnerEnd(), " end();")
.Insert(Query.BraceBlock.First().After(), " /* after */")
.Commit();
// Result: "fn /* before */ {start(); body end();} /* after */"
Replace multiple occurrences:
var tree = SyntaxTree.Parse("a + b + a");
tree.CreateEditor()
.Replace(Query.Ident.WithText("a"), "x") // Replaces ALL 'a' with 'x'
.Commit();
// Result: "x + b + x"
Remove nodes:
var tree = SyntaxTree.Parse("keep remove keep");
tree.CreateEditor()
.Remove(Query.Ident.WithText("remove"))
.Commit();
// Result: "keep keep"
The editor supports Insert, Remove, and Replace operations. All changes can be undone with tree.Undo() and redone with tree.Redo().
Schema — Unified Configuration
The Schema class provides unified configuration for both tokenization and semantic node definitions.
Quick Start
using TinyTokenizer.Ast;
// Create a schema with tokenization settings and semantic definitions
var schema = Schema.Create()
.WithOperators(CommonOperators.JavaScript)
.WithCommentStyles(CommentStyle.CStyleSingleLine, CommentStyle.CStyleMultiLine)
.Define(BuiltInDefinitions.FunctionName)
.Define(BuiltInDefinitions.ArrayAccess)
.Define(BuiltInDefinitions.PropertyAccess)
.Define(BuiltInDefinitions.MethodCall)
.Build();
// Parse with schema
var tree = SyntaxTree.Parse("obj.method(x)", schema);
// Match semantic nodes using the attached schema
var methods = tree.Match<MethodCallNode>().ToList();
Built-in Schema
// Schema.Default includes:
// - CommonOperators.Universal
// - C-style single and multi-line comments
// - FunctionName, ArrayAccess, PropertyAccess, MethodCall definitions
var tree = SyntaxTree.Parse(source, Schema.Default);
Converting from TokenizerOptions
// Create schema from existing TokenizerOptions
var options = TokenizerOptions.Default
.WithOperators(CommonOperators.CFamily)
.WithCommentStyles(CommentStyle.CStyleSingleLine);
var schema = Schema.FromOptions(options);
TreeWalker — DOM-Style Traversal
The TreeWalker provides filtered tree traversal similar to the W3C DOM TreeWalker specification.
Basic Usage
var tree = SyntaxTree.Parse("foo { bar(x) }");
// Create walker from tree or node
var walker = tree.CreateTreeWalker();
// or: var walker = new TreeWalker(tree.Root);
// Enumerate all descendants
foreach (var node in walker.DescendantsAndSelf())
{
Console.WriteLine($"{node.Kind} at {node.Position}");
}
Filtered Traversal
// Filter by node type using NodeFilter flags
var leafWalker = new TreeWalker(tree.Root, NodeFilter.Leaves);
foreach (var leaf in leafWalker.DescendantsAndSelf())
{
// Only leaf nodes (identifiers, operators, etc.)
}
var blockWalker = new TreeWalker(tree.Root, NodeFilter.Blocks);
foreach (var block in blockWalker.DescendantsAndSelf())
{
// Only block nodes ({ }, [ ], ( ))
}
Custom Filter Functions
// Use FilterResult for fine-grained control
var walker = new TreeWalker(
tree.Root,
NodeFilter.All,
node => node.Kind == NodeKind.Ident
? FilterResult.Accept // Include this node
: FilterResult.Skip); // Skip node, but check children
var idents = walker.DescendantsAndSelf().ToList();
The walker also provides cursor-based navigation (NextNode, ParentNode, FirstChild, etc.) and enumeration methods (Descendants, Ancestors, FollowingSiblings).
Semantic Nodes — AST Pattern Matching
Semantic nodes provide a way to match structural patterns in the AST and create typed wrapper objects.
Quick Start
using TinyTokenizer.Ast;
// Parse with schema
var tree = SyntaxTree.Parse("foo(x) + bar.baz", Schema.Default);
// Find all function names (identifiers followed by parentheses)
var funcNames = tree.Match<FunctionNameNode>().ToList();
// funcNames[0].Name == "foo"
// Find all property accesses
var props = tree.Match<PropertyAccessNode>().ToList();
// props[0].Object == "bar", props[0].Property == "baz"
Built-in Semantic Nodes
| Type | Query Pattern | Example |
|---|---|---|
FunctionNameNode |
Query.Ident.FollowedBy(Query.ParenBlock) |
foo in foo(x) |
ArrayAccessNode |
Query.Sequence(Query.Ident, Query.BracketBlock) |
arr[0] |
PropertyAccessNode |
Query.Sequence(Query.Ident, Query.Symbol, Query.Ident) |
obj.prop |
MethodCallNode |
Query.Sequence(Query.Ident, Query.Symbol, Query.Ident, Query.ParenBlock) |
obj.method(x) |
Custom Semantic Nodes
Define your own semantic node types:
// 1. Define the node class
public sealed class LambdaNode : SemanticNode
{
public LambdaNode(NodeMatch match, NodeKind kind) : base(match, kind) { }
public RedBlock Parameters => Part<RedBlock>(0);
public RedBlock Body => Part<RedBlock>(2);
}
// 2. Create a definition with pattern using Query combinators
var lambdaDef = Semantic.Define<LambdaNode>("Lambda")
.Match(Query.Sequence(Query.ParenBlock, Query.Operator.WithText("=>"), Query.BraceBlock))
.Create((match, kind) => new LambdaNode(match, kind))
.WithPriority(15)
.Build();
// 3. Add to schema
var schema = Schema.Create()
.WithOperators(CommonOperators.JavaScript)
.Define(lambdaDef)
.Build();
// 4. Match
var tree = SyntaxTree.Parse("(x) => { return x; }", schema);
var lambdas = tree.Match<LambdaNode>().ToList();
Query Combinators Reference
| Combinator | Description | Example |
|---|---|---|
Query.Ident("x") |
Specific identifier | Query.Ident("main") |
Query.Symbol(".") |
Specific symbol | Query.Symbol(".") |
Query.Operator("=>") |
Specific operator | Query.Operator("=>") |
Query.Numeric("42") |
Specific number | Query.Numeric("3.14") |
Query.AnyIdent |
Any identifier | Query.AnyIdent |
Query.AnySymbol |
Any symbol | Query.AnySymbol |
Query.AnyOperator |
Any operator | Query.AnyOperator |
Query.AnyNumeric |
Any number literal | Query.AnyNumeric |
Query.AnyString |
Any string literal | Query.AnyString |
Query.AnyTaggedIdent |
Any tagged identifier | Query.AnyTaggedIdent |
Query.ParenBlock |
( ) block |
Query.ParenBlock |
Query.BraceBlock |
{ } block |
Query.BraceBlock |
Query.BracketBlock |
[ ] block |
Query.BracketBlock |
Query.Any |
Any single node | Query.Any |
Query.Newline |
Node preceded by newline | Query.Newline |
Query.Sequence(...) |
Match A then B then C | Query.Sequence(Query.AnyIdent, Query.ParenBlock) |
a \| b |
Match A or B (union) | Query.AnyIdent \| Query.AnyNumeric |
.Optional() |
Match zero or one | Query.AnyOperator.Optional() |
.ZeroOrMore() |
Match zero or more | Query.AnyIdent.ZeroOrMore() |
.OneOrMore() |
Match one or more | Query.AnyIdent.OneOrMore() |
.Exactly(n) |
Match exactly n | Query.AnyIdent.Exactly(3) |
.Repeat(min, max) |
Match min to max | Query.AnyIdent.Repeat(2, 5) |
.Until(terminator) |
Repeat until terminator | Query.Any.Until(Query.Newline) |
.FollowedBy(q) |
Positive lookahead | Query.AnyIdent.FollowedBy(Query.ParenBlock) |
.NotFollowedBy(q) |
Negative lookahead | Query.AnyIdent.NotFollowedBy(Query.ParenBlock) |
.Then(q) |
Fluent sequence | Query.AnyIdent.Then(Query.ParenBlock) |
Async Tokenization
// From Stream
await using var stream = File.OpenRead("source.txt");
var tokens = await stream.TokenizeAsync();
// Streaming with IAsyncEnumerable
await foreach (var token in stream.TokenizeStreamingAsync())
{
Console.WriteLine(token);
}
Also supports PipeReader and custom encoding options.
Error Handling
The tokenizer produces ErrorToken for malformed input and continues parsing:
var tree = SyntaxTree.Parse("}hello{");
// Query for error nodes
var errors = tree.Root.Children
.Where(n => n.Kind == NodeKind.Error)
.Cast<RedLeaf>();
foreach (var error in errors)
{
Console.WriteLine($"Error at {error.Position}: {error.Text}");
}
Benchmarks
Performance comparison of the optimized SearchValues<char> implementation vs the baseline ImmutableHashSet<char>:
| Input Size | Baseline | Optimized | Speedup |
|---|---|---|---|
| Small (~50 chars) | 377 ns | 245 ns | 1.54x |
| Medium (~1KB) | 6,866 ns | 3,020 ns | 2.27x |
| Large (~100KB) | 1,907 μs | 781 μs | 2.44x |
| JSON (~10KB) | 130 μs | 87 μs | 1.51x |
| Whitespace-heavy | 9,808 ns | 3,661 ns | 2.68x |
Run benchmarks yourself:
dotnet run -c Release --project TinyTokenizer.Benchmarks -- --filter "*"
API Reference
Core Types
// Parse source into syntax tree (recommended)
var tree = SyntaxTree.Parse(source, Schema.Default);
// Low-level tokenization (if needed)
var tokens = source.TokenizeToTokens(options);
// Async streaming from files
await foreach (var token in stream.TokenizeStreamingAsync()) { }
Schema Configuration
// Built-in operator sets
CommonOperators.Universal // Basic: ==, !=, &&, ||, etc.
CommonOperators.CFamily // C/C++: ++, --, ->, ::, etc.
CommonOperators.JavaScript // JS: ===, =>, ?., ??, etc.
// Built-in comment styles
CommentStyle.CStyleSingleLine // //
CommentStyle.CStyleMultiLine // /* */
CommentStyle.HashSingleLine // #
// Create custom schema
var schema = Schema.Create()
.WithOperators(CommonOperators.JavaScript)
.WithCommentStyles(CommentStyle.CStyleSingleLine, CommentStyle.CStyleMultiLine)
.WithTagPrefixes('#', '@')
.Define(BuiltInDefinitions.FunctionName)
.Build();
NodeKind Values
| Kind | Description | Example |
|---|---|---|
Ident |
Identifiers | foo, myVar |
Whitespace |
Spaces, tabs, newlines | , \n |
Symbol |
Single characters | ,, ;, : |
Operator |
Multi-char operators | ==, !=, => |
Numeric |
Numbers | 123, 3.14 |
String |
Quoted strings | "hello" |
TaggedIdent |
Prefixed identifiers | #define, @attr |
BraceBlock |
Curly braces | { } |
BracketBlock |
Square brackets | [ ] |
ParenBlock |
Parentheses | ( ) |
Error |
Parse errors | unmatched } |
Requirements
- .NET 8.0 or later
License
MIT
| Product | Versions Compatible and additional computed target framework versions. |
|---|---|
| .NET | net8.0 is compatible. net8.0-android was computed. net8.0-browser was computed. net8.0-ios was computed. net8.0-maccatalyst was computed. net8.0-macos was computed. net8.0-tvos was computed. net8.0-windows was computed. net9.0 was computed. net9.0-android was computed. net9.0-browser was computed. net9.0-ios was computed. net9.0-maccatalyst was computed. net9.0-macos was computed. net9.0-tvos was computed. net9.0-windows was computed. net10.0 was computed. net10.0-android was computed. net10.0-browser was computed. net10.0-ios was computed. net10.0-maccatalyst was computed. net10.0-macos was computed. net10.0-tvos was computed. net10.0-windows was computed. |
-
net8.0
- Microsoft.Extensions.Logging.Abstractions (>= 10.0.1)
- System.IO.Pipelines (>= 8.0.0)
NuGet packages
This package is not used by any NuGet packages.
GitHub repositories
This package is not used by any popular GitHub repositories.
| Version | Downloads | Last Updated | |
|---|---|---|---|
| 0.10.0 | 149 | 1/7/2026 | |
| 0.9.0 | 123 | 1/5/2026 | |
| 0.8.0 | 130 | 1/4/2026 | |
| 0.7.0 | 127 | 1/3/2026 | |
| 0.6.8 | 122 | 1/2/2026 | |
| 0.6.7 | 123 | 1/2/2026 | |
| 0.6.6 | 121 | 1/2/2026 | |
| 0.6.5 | 124 | 1/1/2026 | |
| 0.6.4 | 124 | 1/1/2026 | |
| 0.6.3 | 120 | 1/1/2026 | |
| 0.6.2 | 126 | 1/1/2026 | |
| 0.6.1 | 119 | 12/31/2025 | |
| 0.6.0 | 125 | 12/31/2025 | |
| 0.5.1 | 122 | 12/31/2025 | |
| 0.5.0 | 126 | 12/30/2025 | |
| 0.4.1 | 116 | 12/29/2025 | |
| 0.4.0 | 113 | 12/29/2025 | |
| 0.3.0 | 123 | 12/27/2025 | |
| 0.2.0 | 189 | 12/26/2025 | |
| 0.1.0 | 200 | 12/25/2025 |