OpenLanguage 1.0.1
dotnet add package OpenLanguage --version 1.0.1
NuGet\Install-Package OpenLanguage -Version 1.0.1
<PackageReference Include="OpenLanguage" Version="1.0.1" />
<PackageVersion Include="OpenLanguage" Version="1.0.1" />
<PackageReference Include="OpenLanguage" />
paket add OpenLanguage --version 1.0.1
#r "nuget: OpenLanguage, 1.0.1"
#:package OpenLanguage@1.0.1
#addin nuget:?package=OpenLanguage&version=1.0.1
#tool nuget:?package=OpenLanguage&version=1.0.1
OpenLanguage
OpenLanguage is a C# library providing lexers, parsers, and other processing tools for Open Office XML DSLs.
Features
WordprocessingML
- Field Instructions: Parse field instructions into strongly-typed Abstract Syntax Trees (AST)
- Grammar-Based: Uses GPLEX/GPPG for robust parsing of field instructions, expressions, and merge fields
- Comprehensive Field Types: Support for all field instructions - both standard ECMA variations and those specified in the ISO docs for legacy compatibility
- AST Manipulation: Programmatically access and modify parsed structures
- Field Reconstruction: Convert ASTs back to valid field instruction strings with round-trip fidelity
SpreadsheetML
- Formula Parsing: Parse SpreadsheetML formulas into Abstract Syntax Trees (AST)
- Grammar-Based: Uses GPLEX lexer (.lex) and GPPG yacc parser (.y) for concise and efficient grammar specification and parsing logic
- AST Manipulation: Access and modify parsed formula structures programmatically
- Formula Reconstruction: Convert modified ASTs back to valid Excel formula strings
- Reference Support: Handle A1, R1C1, table references, structured, and external references
Installation
Install via NuGet Package Manager:
dotnet add package OpenLanguage
Or via Package Manager Console:
Install-Package OpenLanguage
Quick Start
Formula Parsing
using OpenLanguage.SpreadsheetML.Formula;
// Parse an Excel formula
Ast.Node formula = FormulaParser.Parse("=SUM(A1:A10) * 2");
// Access the AST
Console.WriteLine($"Reconstructed: {formula.ToString()}");
// Try parsing with error handling
Ast.Node? maybeFormula = FormulaParser.TryParse("=INVALID_SYNTAX(");
if (maybeFormula == null)
{
Console.WriteLine("Parse failed - invalid syntax");
}
Field Instruction Processing
using OpenLanguage.WordprocessingML.FieldInstruction;
using OpenLanguage.WordprocessingML.FieldInstruction.Ast;
using OpenLanguage.WordprocessingML.Ast;
// Parse a field instruction into a strongly-typed AST node
var ast = FieldInstructionParser.Parse("MERGEFIELD FirstName \\* Upper");
// Check the type and use specific properties
if (ast is MergeFieldFieldInstruction mergeField)
{
Console.WriteLine($"Field Name: {mergeField.FieldName}");
if (mergeField.GeneralFormat?.Argument is StringLiteralNode format)
{
Console.WriteLine($"General Format: {format.Value}");
}
}
// Reconstruct field instruction
Console.WriteLine($"Field instruction: {ast.ToString()}");
Building from Source
Prerequisites
- .NET 8.0+ SDK
- CMake 3.20+ (for build system)
- cpp (C preprocessor for .y/.lex file processing)
Build Commands
The project uses a CMake-based build system with multiple targets:
# Configure build
cmake -B build
# Process .y/.lex files and generate code
cmake --build build --target process
# Build the solution
cmake --build build --target build
# Run tests
cmake --build build --target test
# Format code
cmake --build build --target format
# Generate documentation
cmake --build build --target doc
# Package for NuGet
cmake --build build --target pack
# Install git hooks
cmake --build build --target install-hooks
# Clean all build artifacts
cmake --build build --target clean-all
Alternative: Direct dotnet commands
# Restore dependencies
dotnet restore
# Build solution
dotnet build --configuration Release
# Run tests
dotnet test --configuration Release
# Format code
dotnet csharpier .
# Pack for NuGet
dotnet pack --configuration Release
Project Structure
OpenLanguage/
├── OpenLanguage/ # Main library
│ ├── SpreadsheetML/
│ │ └── Formula/ # SpreadsheetML formula processing
│ │ ├── Lang/
│ │ │ ├── Lex/ # Lexical analysis (.lex files)
│ │ │ └── Parse/ # Grammar parsing (.y files)
│ │ └── FormulaParser.cs # Main formula API and parser implementation
│ └── WordprocessingML/
│ ├── FieldInstruction/ # WordprocessingML field instructions
│ ├── MergeField/ # Mail merge functionality
│ └── Expression/ # Expression evaluation
├── OpenLanguage.Test/ # Unit tests
├── docs/ # Documentation for docfx
├── docfx/ # Docfx configuration
└── CMakeLists.txt # Build system configuration
Grammar Files
The project uses POSIX yacc/lex style grammar files for robust parsing:
- Formula Grammar:
SpreadsheetML/Formula/Lang/Parse/formula.y
- Formula Lexer:
SpreadsheetML/Formula/Lang/Lex/formula.lex
- Function Definitions:
SpreadsheetML/Formula/Lang/Lex/function/*.lex
These files are processed during build to generate C# parser code.
Documentation
For detailed documentation, please visit the project documentation site.
The source for the documentation is in the docs/
and docfx/
directories.
Development
Code Style
This project uses CSharpier for code formatting:
# Format entire solution
dotnet csharpier format .
# Check formatting
dotnet csharpier check .
Git Hooks
Install git hooks to ensure code quality:
cmake --build build --target install-hooks
This installs a pre-commit hook that:
- Runs code formatting
- Executes all tests
- Prevents commits if tests fail
Testing
The project uses xUnit for testing:
# Run all tests
dotnet test
# Run tests with coverage
dotnet test --collect:"XPlat Code Coverage"
# Run specific test project
dotnet test OpenLanguage.Test/
Performance
OpenLanguage is built with performance as a primary concern:
- Native AOT Ready: Full compatibility with .NET Native AOT
- Optimized Grammar: LALR yacc parser used for SpreadsheetML formula parsing with highly optimized, minimal lex grammar. Compared against the ABNF grammar specification at each step of implementation.
Compatibility
- Multi-framework: Supports both .NET 8.0 and .NET 9.0
- Native AOT: Full support for ahead-of-time compilation
TODO
WordprocessingML
Field Instruction
Evaluation
- Implement per-class evaluation of strongly typed field instructions.
- Implement execution of parsed merge field.
- Ensure evaluation is implemented with respect to configured
decimalSymbol
Misc
- Allow configuration option for
decimalSymbol
used for parsing floating point numbers - Complete exhaustive enumeration of
CountryRegion
enumerations.
LevelText
- Define and implement paragraph numbering level text placeholder syntax grammar
- Implement AST parsing of paragraph numbering level text
- Implement placeholder value evaluator
- Add API for final, evaluated numbering level text construction
SpreadsheetML
Test coverage is quite comprehensive, as are grammar and parser rule specifications - there should not be anything left to complete here as far as implementation of parsing, parsing dependencies, nor AST. However, optimization leaves a bit to be desired, and evaluation is unimplemented.
Optimization
- Runtime memory consumption of
FormulaParser
and AST node classes, as well as size of generated parser code, jumped by ~10x on adding thebuiltin_function_call_head_raw
rule. Investigate the cause and resolve. - Many Shift, Reduce, Shift/Reduce, and Reduce/Reduce conflicts are automatically resolved on code generation of parser. Investigate the cause and resolve.
Evaluation
- Implement evaluation of formula, with a per-AST node class method called
Evaluate
, on the abstractExpressionNode
class and overriden by derived classes.- Implement all builtin SpreadsheetML functions.
- Implement cell, sheet, table, and range reference resolution for read and write of underlying value.
- Implement arithmetic expression evaluation.
- Implement function reference resolution and call evaluation -
regardless of whether user-defined,
_xlpm.
-prefixed function references, orLAMBDA
functions.
- Bonus: implement a shared
SpreadsheetContext
to abstract common data reading and writing operations- Bonus if completed: in
SpreadsheetContext
, use generic underlying data representation which is derived from a commonSpreadsheet
class, allowing any underlying matrix-like data representation to be manipulated by formulas.
- Bonus if completed: in
Example Usage
- Write a toy formula interpreter when evaluation is implemented.
- Bonus: Realtime display and update of spreadsheet cell values in TUI with matrix display and formula input prompt as well as vim-style selection of corresponding cells. Simulaneously update underlying OPC package on modifying values.
PowerBI
- Define and implement grammar for Power Query
- Implement Power Query AST and parser
VBA
- Define and implement grammar for VBA Query
- Implement VBA Query AST and parser
Misc
Numbering Format
- Define and implement a numbering format grammar
- Implement numbering format AST representation
- Implement numbering format parser
- Implement numbering format applicator from AST class
See also
- ST_NumberFormat OOXML WML XSD Type
- Similar JS implementer specific to SML
- Another similar JS implementer specific to SML
- Standard format codes
- XSLT 1.0 Number Format Syntax. Very similar to SML's, seems to be identical to WML's except for possible differing defaults/keyword-based codes
Contributing
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature
) - Make your changes
- Ensure tests pass (
cmake --build build --target test
) - Ensure code is formatted (
cmake --build build --target format
) - Commit your changes (
git commit -m 'Add amazing feature'
) - Push to the branch (
git push origin feature/amazing-feature
) - Open a Pull Request
License
This project is licensed under the GNU General Public License v2.0 - see the LICENSE file for details.
Acknowledgments
- Built with YaccLexTools for code generation from GPPG and GPLEX grammar specification and logic
- Uses DocumentFormat.OpenXml for Open Office XML handling
Support
Product | Versions Compatible and additional computed target framework versions. |
---|---|
.NET | net8.0 is compatible. net8.0-android was computed. net8.0-browser was computed. net8.0-ios was computed. net8.0-maccatalyst was computed. net8.0-macos was computed. net8.0-tvos was computed. net8.0-windows was computed. net9.0 is compatible. net9.0-android was computed. net9.0-browser was computed. net9.0-ios was computed. net9.0-maccatalyst was computed. net9.0-macos was computed. net9.0-tvos was computed. net9.0-windows was computed. net10.0 was computed. net10.0-android was computed. net10.0-browser was computed. net10.0-ios was computed. net10.0-maccatalyst was computed. net10.0-macos was computed. net10.0-tvos was computed. net10.0-windows was computed. |
-
net8.0
- DocumentFormat.OpenXml (>= 3.3.0)
- System.Data.Odbc (>= 8.0.1)
- System.Linq.Expressions (>= 4.3.0)
- YaccLexTools.Gplex (>= 1.2.3.1)
- YaccLexTools.Gppg (>= 1.5.3.1)
-
net9.0
- DocumentFormat.OpenXml (>= 3.3.0)
- System.Data.Odbc (>= 9.0.9)
- System.Linq.Expressions (>= 4.3.0)
- YaccLexTools.Gplex (>= 1.2.3.1)
- YaccLexTools.Gppg (>= 1.5.3.1)
NuGet packages
This package is not used by any NuGet packages.
GitHub repositories
This package is not used by any popular GitHub repositories.