TinyTokenizer 0.6.5
See the version list below for details.
dotnet add package TinyTokenizer --version 0.6.5
NuGet\Install-Package TinyTokenizer -Version 0.6.5
<PackageReference Include="TinyTokenizer" Version="0.6.5" />
<PackageVersion Include="TinyTokenizer" Version="0.6.5" />
<PackageReference Include="TinyTokenizer" />
paket add TinyTokenizer --version 0.6.5
#r "nuget: TinyTokenizer, 0.6.5"
#:package TinyTokenizer@0.6.5
#addin nuget:?package=TinyTokenizer&version=0.6.5
#tool nuget:?package=TinyTokenizer&version=0.6.5
TinyTokenizer
A high-performance, zero-allocation tokenizer library for .NET 8+ with SIMD-optimized character matching.
Features
- High performance — Zero-allocation parsing with SIMD-optimized
SearchValues<char> - TinyAst — Red-green syntax tree with fluent queries, editing, and undo/redo
- Schema system — Unified configuration for tokenization + syntax node definitions
- Syntax nodes — Pattern-based AST matching (function calls, property access, etc.)
- TreeWalker — DOM-style filtered tree traversal
- Async streaming — Tokenize from
StreamandPipeReaderwithIAsyncEnumerable - Full tokenization — Blocks, strings, numbers, comments, operators, tagged identifiers
- Error recovery — Gracefully handles malformed input and continues parsing
Installation
dotnet add package TinyTokenizer
Or add a reference to the TinyTokenizer project in your solution.
Quick Start
using TinyTokenizer;
// Simple tokenization
var tokens = "func(a, b)".TokenizeToTokens();
// tokens contains:
// - IdentToken("func")
// - BlockToken("(a, b)") with children:
// - IdentToken("a")
// - SymbolToken(",")
// - WhitespaceToken(" ")
// - IdentToken("b")
// With options
var options = TokenizerOptions.Default
.WithCommentStyles(CommentStyle.CStyleSingleLine, CommentStyle.CStyleMultiLine);
var tokens = "x = 42; // comment".TokenizeToTokens(options);
TinyAst — Syntax Tree API
TinyTokenizer includes a syntax tree for efficient AST manipulation with fluent queries and undo/redo support.
Quick Start
using TinyTokenizer.Ast;
// Parse source into a syntax tree
var tree = SyntaxTree.Parse("function foo() { return 1; }");
// Query nodes
var idents = tree.Leaves.Where(l => l.Kind == NodeKind.Ident);
// Fluent mutations with undo support
tree.CreateEditor()
.Replace(Query.Ident("foo"), "bar") // Concise named query
.Insert(Query.BraceBlock.First().InnerStart(), "console.log('enter');")
.Commit();
// Undo/redo
tree.Undo();
tree.Redo();
Querying with NodeQuery
The Query static class provides a fluent CSS-like selector API:
using TinyTokenizer.Ast;
// Named queries (with specific text) - NEW concise API!
Query.Ident("main") // Identifier with text "main"
Query.Symbol(".") // Dot symbol
Query.Operator("=>") // Arrow operator
Query.Numeric("42") // Number literal "42"
// Any-kind queries (match any of that kind)
Query.AnyIdent // All identifiers
Query.AnyNumeric // All numbers
Query.AnyString // All strings
Query.AnyOperator // All operators
Query.AnySymbol // All symbols
Query.AnyTaggedIdent // All tagged identifiers
// Blocks
Query.BraceBlock // All { } blocks
Query.BracketBlock // All [ ] blocks
Query.ParenBlock // All ( ) blocks
Query.AnyBlock // Any block type
// Filters (use .WithText* when you need to filter an any-kind query)
Query.AnyIdent.WithText("foo") // Same as Query.Ident("foo") but on any-kind
Query.AnyIdent.WithTextContaining("test") // Contains substring
Query.AnyIdent.WithTextStartingWith("_") // Starts with prefix
Query.AnyIdent.Where(n => n.Width > 5) // Custom predicate
// Pseudo-selectors
Query.AnyIdent.First() // First match only
Query.AnyIdent.Last() // Last match only
Query.AnyIdent.Nth(2) // Third match (0-indexed)
// Composition
Query.AnyIdent | Query.AnyNumeric // Union (OR)
Query.AnyIdent & Query.Leaf // Intersection (AND)
// Sequence combinators
Query.Sequence(Query.AnyIdent, Query.ParenBlock) // Match ident then paren block
Query.AnyIdent.Then(Query.ParenBlock) // Fluent chaining
// Repetition combinators
Query.AnyIdent.Optional() // Match 0 or 1
Query.AnyIdent.ZeroOrMore() // Match 0+
Query.AnyIdent.OneOrMore() // Match 1+
Query.AnyIdent.Exactly(3) // Match exactly 3
Query.AnyIdent.Repeat(2, 5) // Match 2 to 5
Query.Any.Until(Query.Newline) // Repeat until terminator (not consumed)
// Lookahead assertions
Query.AnyIdent.FollowedBy(Query.ParenBlock) // Positive lookahead
Query.Ident.NotFollowedBy(Query.ParenBlock) // Negative lookahead
// Exact node reference (when you have a specific RedNode)
Query.Exact(myRedNode) // Match this specific node instance
Insertion Positions
Queries resolve to insertion points with position modifiers:
// Relative to matched node
Query.Ident.First().Before() // Insert before first ident
Query.Ident.First().After() // Insert after first ident
// Inside blocks
Query.BraceBlock.First().InnerStart() // After opening {
Query.BraceBlock.First().InnerEnd() // Before closing }
Named Node Queries (INamedNode)
Syntax nodes that implement INamedNode can be queried by name:
// Find functions by name
Query.Syntax<GlslFunctionSyntax>().Named("main")
Query.Syntax<GlslDirectiveSyntax>().Named("version")
// Use with insertion positions
tree.CreateEditor()
.Insert(Query.Syntax<MyFunctionSyntax>().Named("foo").Before(), "// comment\n")
.Commit();
Block Container Queries (IBlockContainerNode)
Syntax nodes that implement IBlockContainerNode expose named blocks for injection:
// Insert into a named block of a syntax node
var mainQuery = Query.Syntax<GlslFunctionSyntax>().Named("main");
tree.CreateEditor()
.Insert(mainQuery.InnerStart("body"), "\n // entry") // Start of body block
.Insert(mainQuery.InnerEnd("body"), "\n // exit") // End of body block
.Insert(mainQuery.InnerStart("params"), "int x") // Start of params block
.Commit();
SyntaxEditor
The SyntaxEditor provides batched mutations with atomic commit. You can work directly with RedNode references or use query-based selection.
Working with RedNode references (preferred for known nodes):
var tree = SyntaxTree.Parse("a + b");
var idents = tree.Select(Query.AnyIdent).ToList();
tree.CreateEditor()
.Replace(idents[0], "x") // Replace specific node
.Replace(idents[1], "y") // Replace another node
.InsertBefore(idents[0], "(") // Insert before node
.InsertAfter(idents[1], ")") // Insert after node
.Commit();
// Result: "(x + y)"
Working with queries (for pattern matching):
var tree = SyntaxTree.Parse("a + b");
tree.CreateEditor()
.Replace(Query.Ident("a"), "x") // Replace by query
.Replace(Query.Ident("b"), "y")
.Insert(Query.AnyOperator.First().Before(), "(")
.Insert(Query.AnyOperator.First().After(), ")")
.Commit();
// Result: "(x + y)"
Batch operations on multiple nodes:
var tree = SyntaxTree.Parse("a b c");
var idents = tree.Select(Query.AnyIdent);
tree.CreateEditor()
.Replace(idents, "X") // Replace all at once
.Commit();
// Result: "X X X"
Common SyntaxEditor Patterns
Insert around a node (using RedNode directly):
var tree = SyntaxTree.Parse("function {body}");
var block = tree.Select(Query.BraceBlock.First()).Single();
tree.CreateEditor()
.InsertBefore(block, "/* decorator */ ")
.InsertAfter(block, " // end")
.Commit();
// Result: "function /* decorator */ {body} // end"
Insert inside blocks (using queries for position):
var tree = SyntaxTree.Parse("function {existing}");
tree.CreateEditor()
.Insert(Query.BraceBlock.First().InnerStart(), "console.log('enter'); ")
.Insert(Query.BraceBlock.First().InnerEnd(), " return result;")
.Commit();
// Result: "function {console.log('enter'); existing return result;}"
Replace with transformation:
var tree = SyntaxTree.Parse("hello world");
var idents = tree.Select(Query.AnyIdent);
tree.CreateEditor()
.Replace(idents, n => ((RedLeaf)n).Text.ToUpper())
.Commit();
// Result: "HELLO WORLD"
Edit content (preserves trivia automatically):
The Edit methods transform node content while automatically preserving surrounding whitespace and comments. Unlike Replace, the transformer receives only the content string (without trivia):
var tree = SyntaxTree.Parse(" hello world ");
// Edit: transformer receives "hello" and "world" (no trivia)
tree.CreateEditor()
.Edit(Query.AnyIdent, content => content.ToUpper())
.Commit();
// Result: " HELLO WORLD " (whitespace preserved)
// Works with any transformation
tree.CreateEditor()
.Edit(Query.AnyNumeric, content => (int.Parse(content) * 2).ToString())
.Commit();
// Query-based or node-based
var node = tree.Select(Query.Ident("foo")).Single();
tree.CreateEditor()
.Edit(node, content => $"[{content}]")
.Commit();
Edit vs Replace:
Replace(query, node => ...)— transformer receives fullRedNode, you handle triviaEdit(query, content => ...)— transformer receives content string only, trivia auto-preserved
Replace with another node:
var tree = SyntaxTree.Parse("old");
var oldNode = tree.Select(Query.Ident("old")).Single();
// From another tree
var sourceTree = SyntaxTree.Parse("new");
var newNode = sourceTree.Select(Query.Ident("new")).Single();
tree.CreateEditor()
.Replace(oldNode, newNode) // Copy node from another tree
.Commit();
// Result: "new"
Remove specific nodes:
var tree = SyntaxTree.Parse("keep remove keep");
var toRemove = tree.Select(Query.Ident("remove")).Single();
tree.CreateEditor()
.Remove(toRemove) // Remove specific node
.Commit();
// Result: "keep keep"
Batch operations with multiple queries:
var tree = SyntaxTree.Parse("foo bar baz");
var queries = new[] { Query.Ident("foo"), Query.Ident("baz") };
tree.CreateEditor()
.Replace(queries, "X") // Replace all matches
.Commit();
// Result: "X bar X"
Use Query.Exact for precise targeting:
var tree = SyntaxTree.Parse("a b c");
var bNode = tree.Select(Query.Ident("b")).Single();
// Use Query.Exact when you need a query but have a specific node
tree.CreateEditor()
.Replace(Query.Exact(bNode), "X") // Same as .Replace(bNode, "X")
.Commit();
// Result: "a X c"
The editor supports Insert, InsertBefore, InsertAfter, Remove, Replace, and Edit operations. All changes can be undone with tree.Undo() and redone with tree.Redo().
Schema — Unified Configuration
The Schema class provides unified configuration for both tokenization and syntax node definitions.
Quick Start
using TinyTokenizer.Ast;
// Create a schema with tokenization settings and syntax definitions
var schema = Schema.Create()
.WithOperators(CommonOperators.JavaScript)
.WithCommentStyles(CommentStyle.CStyleSingleLine, CommentStyle.CStyleMultiLine)
.Define(BuiltInDefinitions.FunctionName)
.Define(BuiltInDefinitions.ArrayAccess)
.Define(BuiltInDefinitions.PropertyAccess)
.Define(BuiltInDefinitions.MethodCall)
.Build();
// Parse with schema
var tree = SyntaxTree.Parse("obj.method(x)", schema);
// Match syntax nodes using the attached schema
var methods = tree.Match<MethodCallSyntax>().ToList();
Built-in Schema
// Schema.Default includes:
// - CommonOperators.Universal
// - C-style single and multi-line comments
// - FunctionName, ArrayAccess, PropertyAccess, MethodCall definitions
var tree = SyntaxTree.Parse(source, Schema.Default);
Converting from TokenizerOptions
// Create schema from existing TokenizerOptions
var options = TokenizerOptions.Default
.WithOperators(CommonOperators.CFamily)
.WithCommentStyles(CommentStyle.CStyleSingleLine);
var schema = Schema.FromOptions(options);
TreeWalker — DOM-Style Traversal
The TreeWalker provides filtered tree traversal similar to the W3C DOM TreeWalker specification.
Basic Usage
var tree = SyntaxTree.Parse("foo { bar(x) }");
// Create walker from tree or node
var walker = tree.CreateTreeWalker();
// or: var walker = new TreeWalker(tree.Root);
// Enumerate all descendants
foreach (var node in walker.DescendantsAndSelf())
{
Console.WriteLine($"{node.Kind} at {node.Position}");
}
Filtered Traversal
// Filter by node type using NodeFilter flags
var leafWalker = new TreeWalker(tree.Root, NodeFilter.Leaves);
foreach (var leaf in leafWalker.DescendantsAndSelf())
{
// Only leaf nodes (identifiers, operators, etc.)
}
var blockWalker = new TreeWalker(tree.Root, NodeFilter.Blocks);
foreach (var block in blockWalker.DescendantsAndSelf())
{
// Only block nodes ({ }, [ ], ( ))
}
Custom Filter Functions
// Use FilterResult for fine-grained control
var walker = new TreeWalker(
tree.Root,
NodeFilter.All,
node => node.Kind == NodeKind.Ident
? FilterResult.Accept // Include this node
: FilterResult.Skip); // Skip node, but check children
var idents = walker.DescendantsAndSelf().ToList();
The walker also provides cursor-based navigation (NextNode, ParentNode, FirstChild, etc.) and enumeration methods (Descendants, Ancestors, FollowingSiblings).
Syntax Nodes — AST Pattern Matching
Syntax nodes provide a way to match structural patterns in the AST and create typed wrapper objects.
Quick Start
using TinyTokenizer.Ast;
// Parse with schema
var tree = SyntaxTree.Parse("foo(x) + bar.baz", Schema.Default);
// Find all function calls (identifiers followed by parentheses)
var funcCalls = tree.Match<FunctionCallSyntax>().ToList();
// funcCalls[0].Name == "foo"
// Find all property accesses
var props = tree.Match<PropertyAccessSyntax>().ToList();
// props[0].Object == "bar", props[0].Property == "baz"
Built-in Syntax Nodes
| Type | Query Pattern | Example |
|---|---|---|
FunctionCallSyntax |
Query.Ident.FollowedBy(Query.ParenBlock) |
foo(x) |
ArrayAccessSyntax |
Query.Sequence(Query.Ident, Query.BracketBlock) |
arr[0] |
PropertyAccessSyntax |
Query.Sequence(Query.Ident, Query.Symbol, Query.Ident) |
obj.prop |
MethodCallSyntax |
Query.Sequence(Query.Ident, Query.Symbol, Query.Ident, Query.ParenBlock) |
obj.method(x) |
Custom Syntax Nodes
Define your own syntax node types:
// 1. Define the node class
public sealed class LambdaSyntax : SyntaxNode
{
// Constructor receives an opaque CreationContext - just pass it to base
protected LambdaSyntax(CreationContext context)
: base(context) { }
// Access child nodes by index (determined by the pattern)
public RedBlock Parameters => GetTypedChild<RedBlock>(0);
public RedLeaf Arrow => GetTypedChild<RedLeaf>(1);
public RedBlock Body => GetTypedChild<RedBlock>(2);
}
// 2. Create a definition with pattern using Query combinators
var lambdaDef = Syntax.Define<LambdaSyntax>("Lambda")
.Match(Query.Sequence(Query.ParenBlock, Query.Operator("=>"), Query.BraceBlock))
.WithPriority(15)
.Build();
// 3. Add to schema
var schema = Schema.Create()
.WithOperators(CommonOperators.JavaScript)
.DefineSyntax(lambdaDef)
.Build();
// 4. Match
var tree = SyntaxTree.Parse("(x) => { return x; }", schema);
var lambdas = tree.Match<LambdaSyntax>().ToList();
INamedNode and IBlockContainerNode
Implement these interfaces for enhanced querying capabilities:
// A function syntax node with named access and block containers
public sealed class FunctionSyntax : SyntaxNode, INamedNode, IBlockContainerNode
{
protected FunctionSyntax(CreationContext context)
: base(context) { }
// INamedNode - enables Query.Syntax<T>().Named("foo")
public string Name => GetTypedChild<RedLeaf>(1).Text;
// IBlockContainerNode - enables .InnerStart("body"), .InnerEnd("params")
public IReadOnlyList<string> BlockNames => ["body", "params"];
public RedBlock GetBlock(string? name = null) => name switch
{
null or "body" => GetTypedChild<RedBlock>(3), // { }
"params" => GetTypedChild<RedBlock>(2), // ( )
_ => throw new ArgumentException($"Unknown block: {name}")
};
}
// Usage with fluent queries
var mainQuery = Query.Syntax<FunctionSyntax>().Named("main");
tree.CreateEditor()
.Insert(mainQuery.Before(), "// Entry point\n")
.Insert(mainQuery.InnerStart("body"), "\n console.log('enter');")
.Insert(mainQuery.InnerEnd("body"), "\n console.log('exit');")
.Commit();
Query Combinators Reference
| Combinator | Description | Example |
|---|---|---|
Query.Ident("x") |
Specific identifier | Query.Ident("main") |
Query.Symbol(".") |
Specific symbol | Query.Symbol(".") |
Query.Operator("=>") |
Specific operator | Query.Operator("=>") |
Query.Numeric("42") |
Specific number | Query.Numeric("3.14") |
Query.AnyIdent |
Any identifier | Query.AnyIdent |
Query.AnySymbol |
Any symbol | Query.AnySymbol |
Query.AnyOperator |
Any operator | Query.AnyOperator |
Query.AnyNumeric |
Any number literal | Query.AnyNumeric |
Query.AnyString |
Any string literal | Query.AnyString |
Query.AnyTaggedIdent |
Any tagged identifier | Query.AnyTaggedIdent |
Query.ParenBlock |
( ) block |
Query.ParenBlock |
Query.BraceBlock |
{ } block |
Query.BraceBlock |
Query.BracketBlock |
[ ] block |
Query.BracketBlock |
Query.Any |
Any single node | Query.Any |
Query.Newline |
Node preceded by newline | Query.Newline |
Query.Sequence(...) |
Match A then B then C | Query.Sequence(Query.AnyIdent, Query.ParenBlock) |
a \| b |
Match A or B (union) | Query.AnyIdent \| Query.AnyNumeric |
.Optional() |
Match zero or one | Query.AnyOperator.Optional() |
.ZeroOrMore() |
Match zero or more | Query.AnyIdent.ZeroOrMore() |
.OneOrMore() |
Match one or more | Query.AnyIdent.OneOrMore() |
.Exactly(n) |
Match exactly n | Query.AnyIdent.Exactly(3) |
.Repeat(min, max) |
Match min to max | Query.AnyIdent.Repeat(2, 5) |
.Until(terminator) |
Repeat until terminator | Query.Any.Until(Query.Newline) |
.FollowedBy(q) |
Positive lookahead | Query.AnyIdent.FollowedBy(Query.ParenBlock) |
.NotFollowedBy(q) |
Negative lookahead | Query.AnyIdent.NotFollowedBy(Query.ParenBlock) |
.Then(q) |
Fluent sequence | Query.AnyIdent.Then(Query.ParenBlock) |
Query.Exact(node) |
Exact node reference | Query.Exact(myRedNode) |
Async Tokenization
// From Stream
await using var stream = File.OpenRead("source.txt");
var tokens = await stream.TokenizeAsync();
// Streaming with IAsyncEnumerable
await foreach (var token in stream.TokenizeStreamingAsync())
{
Console.WriteLine(token);
}
Also supports PipeReader and custom encoding options.
Error Handling
The tokenizer produces ErrorToken for malformed input and continues parsing:
var tree = SyntaxTree.Parse("}hello{");
// Query for error nodes
var errors = tree.Root.Children
.Where(n => n.Kind == NodeKind.Error)
.Cast<RedLeaf>();
foreach (var error in errors)
{
Console.WriteLine($"Error at {error.Position}: {error.Text}");
}
Benchmarks
Performance comparison of the optimized SearchValues<char> implementation vs the baseline ImmutableHashSet<char>:
| Input Size | Baseline | Optimized | Speedup |
|---|---|---|---|
| Small (~50 chars) | 377 ns | 245 ns | 1.54x |
| Medium (~1KB) | 6,866 ns | 3,020 ns | 2.27x |
| Large (~100KB) | 1,907 μs | 781 μs | 2.44x |
| JSON (~10KB) | 130 μs | 87 μs | 1.51x |
| Whitespace-heavy | 9,808 ns | 3,661 ns | 2.68x |
Run benchmarks yourself:
dotnet run -c Release --project TinyTokenizer.Benchmarks -- --filter "*"
API Reference
Core Types
// Parse source into syntax tree (recommended)
var tree = SyntaxTree.Parse(source, Schema.Default);
// Low-level tokenization (if needed)
var tokens = source.TokenizeToTokens(options);
// Async streaming from files
await foreach (var token in stream.TokenizeStreamingAsync()) { }
Schema Configuration
// Built-in operator sets
CommonOperators.Universal // Basic: ==, !=, &&, ||, etc.
CommonOperators.CFamily // C/C++: ++, --, ->, ::, etc.
CommonOperators.JavaScript // JS: ===, =>, ?., ??, etc.
// Built-in comment styles
CommentStyle.CStyleSingleLine // //
CommentStyle.CStyleMultiLine // /* */
CommentStyle.HashSingleLine // #
// Create custom schema
var schema = Schema.Create()
.WithOperators(CommonOperators.JavaScript)
.WithCommentStyles(CommentStyle.CStyleSingleLine, CommentStyle.CStyleMultiLine)
.WithTagPrefixes('#', '@')
.Define(BuiltInDefinitions.FunctionName)
.Build();
NodeKind Values
| Kind | Description | Example |
|---|---|---|
Ident |
Identifiers | foo, myVar |
Whitespace |
Spaces, tabs, newlines | , \n |
Symbol |
Single characters | ,, ;, : |
Operator |
Multi-char operators | ==, !=, => |
Numeric |
Numbers | 123, 3.14 |
String |
Quoted strings | "hello" |
TaggedIdent |
Prefixed identifiers | #define, @attr |
BraceBlock |
Curly braces | { } |
BracketBlock |
Square brackets | [ ] |
ParenBlock |
Parentheses | ( ) |
Error |
Parse errors | unmatched } |
Requirements
- .NET 8.0 or later
License
MIT
| Product | Versions Compatible and additional computed target framework versions. |
|---|---|
| .NET | net8.0 is compatible. net8.0-android was computed. net8.0-browser was computed. net8.0-ios was computed. net8.0-maccatalyst was computed. net8.0-macos was computed. net8.0-tvos was computed. net8.0-windows was computed. net9.0 was computed. net9.0-android was computed. net9.0-browser was computed. net9.0-ios was computed. net9.0-maccatalyst was computed. net9.0-macos was computed. net9.0-tvos was computed. net9.0-windows was computed. net10.0 was computed. net10.0-android was computed. net10.0-browser was computed. net10.0-ios was computed. net10.0-maccatalyst was computed. net10.0-macos was computed. net10.0-tvos was computed. net10.0-windows was computed. |
-
net8.0
- System.IO.Pipelines (>= 8.0.0)
NuGet packages
This package is not used by any NuGet packages.
GitHub repositories
This package is not used by any popular GitHub repositories.
| Version | Downloads | Last Updated | |
|---|---|---|---|
| 0.10.0 | 149 | 1/7/2026 | |
| 0.9.0 | 123 | 1/5/2026 | |
| 0.8.0 | 130 | 1/4/2026 | |
| 0.7.0 | 127 | 1/3/2026 | |
| 0.6.8 | 122 | 1/2/2026 | |
| 0.6.7 | 123 | 1/2/2026 | |
| 0.6.6 | 121 | 1/2/2026 | |
| 0.6.5 | 124 | 1/1/2026 | |
| 0.6.4 | 124 | 1/1/2026 | |
| 0.6.3 | 120 | 1/1/2026 | |
| 0.6.2 | 126 | 1/1/2026 | |
| 0.6.1 | 119 | 12/31/2025 | |
| 0.6.0 | 125 | 12/31/2025 | |
| 0.5.1 | 122 | 12/31/2025 | |
| 0.5.0 | 126 | 12/30/2025 | |
| 0.4.1 | 116 | 12/29/2025 | |
| 0.4.0 | 113 | 12/29/2025 | |
| 0.3.0 | 123 | 12/27/2025 | |
| 0.2.0 | 189 | 12/26/2025 | |
| 0.1.0 | 200 | 12/25/2025 |