Ara3D.Parakeet 1.2.0

There is a newer version of this package available.
See the version list below for details.
dotnet add package Ara3D.Parakeet --version 1.2.0                
NuGet\Install-Package Ara3D.Parakeet -Version 1.2.0                
This command is intended to be used within the Package Manager Console in Visual Studio, as it uses the NuGet module's version of Install-Package.
<PackageReference Include="Ara3D.Parakeet" Version="1.2.0" />                
For projects that support PackageReference, copy this XML node into the project file to reference the package.
paket add Ara3D.Parakeet --version 1.2.0                
#r "nuget: Ara3D.Parakeet, 1.2.0"                
#r directive can be used in F# Interactive and Polyglot Notebooks. Copy this into the interactive tool or source code of the script to reference the package.
// Install Ara3D.Parakeet as a Cake Addin
#addin nuget:?package=Ara3D.Parakeet&version=1.2.0

// Install Ara3D.Parakeet as a Cake Tool
#tool nuget:?package=Ara3D.Parakeet&version=1.2.0                

Parakeet

Parakeet is a simple parsing library written in C#. Parakeet is the parsing library being used by the Plato programming language project to Parse both Plato and C# source code.

Parakeet1

Image by Hugo Wai

Overview

Parakeet is a recursive descent (RD) parsing library based on the parsing expression grammar (PEG) formalization introduced by Bryan Ford. Parakeet parsers are defined directly in C# using operator overloading. Parakeet combines both lexical analysis (aka tokenization) and syntactic analysis in a single pass.

For an example of how to define grammars in Parakeet see the C# Grammar.

More Details and Features

Parakeet was designed primarily for the challenge of parsing programming languages. It can be used in different contexts of course.

Parakeet supports:

  • Parsing error recovery
  • Run-time detection of stuck parsers
  • Line number and column number reporting
  • Immutable data structures
  • Operator overloading
  • Fluent API syntax (aka method chaining)
  • Automated creation of untyped parse trees
  • Code generation for converting untyped parse trees into a strongly typed concrete syntax tree (CST).

Steps

  1. Define a grammar in code: a class with a set of properties that map to rules
  2. Convert the input text into a parser input object
  3. Choose the starting rule of the grammar
  4. Call the match function of the starting rule
  5. Examine the resulting ParserState object
  6. If the result is null the parser failed to match, and failed to recover
    • Consider adding OnError to your grammar
  7. Convert

Primary Classes

  • ParserInput - Wraps a string with convenience functions to retrieve row/column, and potentially the source file name. Can be implicitly converted from a string
  • ParserState - Represents a position in the input and a pointer to the most recently created parse node.
  • ParserRange - Contains two ParserState objects, one representing the beginning of a range of input and the other the end.
  • ParserNode - A named node in parse node list.
  • ParserTree - A tree structure created from a linked list of ParserNode objects
  • ParserCache - Stores parser errors, and successful parse results to accelerated future lookups.
  • Rule - Base class of a parser, provides a match function that accepts a ParserState and a ParserCache.
  • Grammar - Base class of a collection of parsing rules, usually defined as properties.
  • CstNode - Base class of typed parse trees generated from Grammar objects.
  • CstClassBuilder - Static class with functions for generating AstNode classes and factory functions from ParserTree objects.
  • ParserError - An error created when a Sequence fails to match a child rule after an OnError rule.
  • ParserException - This represents an internal parser error which usually results from a grammar mistake.

Grammar

A Grammar is the base class for collections of parsing rules. Usually each parse rule in a grammar is defined as a computed property that either returns a NodeRule or a TokenRule. By using the functions Node and Token to wrap rule definitions, names are automatically assigned to each rule.

The GrammarExtensions.cs file contains a number of helper functions for outputting the definitions of grammars or to simplify parsing rules.

When defining rules it is important that any cyclical reference from a rule to itself uses at least one RecursiveRule in the relationship chain. This prevents stack overflow errors from occuring.

Rules

A Parakeet parser is defined by a class deriving from Rule. Some rules are defined by combining rules. Those combining rules are called "combinators".

Ever rule has a single function:

public ParserState Match(ParserState state, ParserCache cache)

The Match will return null if the Rule failed to match, or a ParserState object if successful.

Fluent Syntax for Rules

Rules can be combined using a fluent syntax (aka method chaining).

  • rule.At()new At(rule)
  • rule.NotAt()new NotAt(rule)
  • rule1.Then(rule2)new Sequence(rule1, rule2)
  • rule1.ThenNot(rule2)new Sequence(rule, rule2.NotAt())
  • rule.Optional()new Optional(rule)
  • rule1.Or(rule2)new Choice(rule1, rule2)
  • rule1.Except(rule2)new Sequence(rule2.NotAt(), rule1)
  • rule.ZeroOrMore()new ZeroOrMore(rule)
  • rule.OneOrMore()new Sequence(rule, rule.ZeroOrMoree)
  • char1.To(char2)new CharRangeRule(char1, char2)

Overloaded Operators for Rules

Rules can be combined using the following overloaded operators.

  • rule1 + rule2new SequenceRule(rule1, rule2)
  • rule1 | rule2new ChoiceRule(rule1, rule2)
  • !rulenew NotAt(rule)

Implicit Casts

The following implicit casts are defined for Rules:

  • Rule(string s)new StringMatchRule(s)
  • Rule(char c)new CharMatchRule(c)
  • Rule(char[] cs)new CharSetRule(cs)
  • Rule(string[] xs)new Choice(xs.Select(x => (Rule)x))

Primitive Rules

  • StringMatchRule - Matches a string
  • AnyCharRule - Matches any character, but fails at end of the file
  • CharRangeRule - Matches any character within a range
  • CharSetRule - Matches any character within a set
  • CharMatchRule - Matches a specific character

Rule Combinators

Rule combinators combine zero or more rules.

  • ZeroOrMore - Tries to match a child rule as many times as possible, returning the original ParserState if not successful at least once.
  • Optional - Tries to match a child rule exactly once, returning the original ParserState if the child rule fails.
  • Sequence - Matches a sequence of rules one by one, returning null if not successful.
  • Choice - Matches a collection of rules one by one, until one succeeeds, or null if not successful.
  • RecursiveRule - Matches a child rule defined by a lambda, thus allowing Rule definitions to have cycles.
  • TokenRule - A rule that just matches it child without creating a node. Used for defining grammars.
  • NodeRule - Creates a new ParserNode and adds it to the ParserState. May also eat whitespace (true by default).

Assertion Rules

Several rules never advance the parser state:

  • At - Returns the original parser state if the child rule succeeds, or null otherwise
  • NotAt - Returns the original parser state if the child rule fails, or null otherwise
  • EndOfInput - Returns the parser state if at the end of input, or null otherwise

Error Handling Rules

  • OnError - Used only within Sequence rules. Contains a child rule called the recovery rule. Will always return the ParserState when matched. If a sequence encounters an OnError node and one of the subsequent child rules fails, the parser will then try to match the OnError recovery rule to advance to a place where it is likely to be able to continue parsing successfully (e.g. the end of a statement).

Parse Trees

Parse trees generated from Parakeet are untyped. A parse node is created whenever a NodeRule rule successfully matches its child rule against the input. After parsing the list of parse nodes is converted into a tree structure.

Typed Parse Tree (CST)

A set of classes representing a strongly typed parse tree can be created automatically from a Parakeet grammar. This is called the Concrete Syntax Tree.

History

Parakeet evolved from the Jigsaw parser and applies lessons learned when writing the Myna parsing library in TypeScript as well as my first parsing library [YARD](https://www.codeproject.com/Articles/9121/Parsing-XML-in-C-using-the-YARD-Parser Parakeet is designed to be as fast as possible while retaining a clean and elegant grammar description.

C# and F# Parsing Libraries

Parser Generator Tools

References

FAQ

Q: Isn't a parse tree and concrete syntax tree (CST) the same thing?

Yes. Parakeet uses the term parse tree to refer to untyped parse tree, and CST to refer to the typed parse tree.

Q: What is the difference between a CST and an AST?

The CST is generated from parsing the input text as is. An AST is the result of transforming the CST into a form that has the same semantic meaning but is presumably simpler and easier.

Q: Why isn't Parakeet a code generator from the beginning

I find it easier to learn, understand, use, and debug libraries that aren't generated. Writing an extension to Parakeet that generates parser code would not be very hard.

Q: So why is the CST code generated?

Having a strongly typed CST is very beneficial when writing analysis and transformation tools especially for non-trivial grammars like those of programming languages. I don't know of any other way in C# to generate strongly typed libraries other than generating code.

Q: Isn't parsing library X faster?

Maybe. I designed Parakeet to be fast enough for my needs, but I prioritized making it robust and (hopefully) easy to use. See the related work section for other parsing libraries to consider.

Q: Can you provide some benchmarks? Or implement grammar X?

I'm kind of busy getting work done. If you are willing to fund this project, then we should talk: email me at cdiggins@gmail.com.

Product Compatible and additional computed target framework versions.
.NET net5.0 was computed.  net5.0-windows was computed.  net6.0 was computed.  net6.0-android was computed.  net6.0-ios was computed.  net6.0-maccatalyst was computed.  net6.0-macos was computed.  net6.0-tvos was computed.  net6.0-windows was computed.  net7.0 was computed.  net7.0-android was computed.  net7.0-ios was computed.  net7.0-maccatalyst was computed.  net7.0-macos was computed.  net7.0-tvos was computed.  net7.0-windows was computed.  net8.0 was computed.  net8.0-android was computed.  net8.0-browser was computed.  net8.0-ios was computed.  net8.0-maccatalyst was computed.  net8.0-macos was computed.  net8.0-tvos was computed.  net8.0-windows was computed. 
.NET Core netcoreapp2.0 was computed.  netcoreapp2.1 was computed.  netcoreapp2.2 was computed.  netcoreapp3.0 was computed.  netcoreapp3.1 was computed. 
.NET Standard netstandard2.0 is compatible.  netstandard2.1 was computed. 
.NET Framework net461 was computed.  net462 was computed.  net463 was computed.  net47 was computed.  net471 was computed.  net472 was computed.  net48 was computed.  net481 was computed. 
MonoAndroid monoandroid was computed. 
MonoMac monomac was computed. 
MonoTouch monotouch was computed. 
Tizen tizen40 was computed.  tizen60 was computed. 
Xamarin.iOS xamarinios was computed. 
Xamarin.Mac xamarinmac was computed. 
Xamarin.TVOS xamarintvos was computed. 
Xamarin.WatchOS xamarinwatchos was computed. 
Compatible target framework(s)
Included target framework(s) (in package)
Learn more about Target Frameworks and .NET Standard.

NuGet packages (2)

Showing the top 2 NuGet packages that depend on Ara3D.Parakeet:

Package Downloads
Ara3D.Parakeet.Grammars

A library of Parakeet grammars.

Ara3D.Parakeet.Cst

A library of strongly typed CST nodes and factories for select Parakeet grammars. Mostly auto-generated.

GitHub repositories

This package is not used by any popular GitHub repositories.

Version Downloads Last updated
1.4.5 204 10/9/2024
1.4.4 238 8/25/2024
1.4.3 353 3/19/2024
1.4.1 217 3/12/2024
1.3.2 114 3/5/2024
1.3.0 126 3/3/2024
1.2.1 124 2/21/2024
1.2.0 124 2/21/2024