Nemesis.TextParsers.CodeGen 2.9.1

There is a newer version of this package available.
See the version list below for details.
dotnet add package Nemesis.TextParsers.CodeGen --version 2.9.1                
NuGet\Install-Package Nemesis.TextParsers.CodeGen -Version 2.9.1                
This command is intended to be used within the Package Manager Console in Visual Studio, as it uses the NuGet module's version of Install-Package.
<PackageReference Include="Nemesis.TextParsers.CodeGen" Version="2.9.1" />                
For projects that support PackageReference, copy this XML node into the project file to reference the package.
paket add Nemesis.TextParsers.CodeGen --version 2.9.1                
#r "nuget: Nemesis.TextParsers.CodeGen, 2.9.1"                
#r directive can be used in F# Interactive and Polyglot Notebooks. Copy this into the interactive tool or source code of the script to reference the package.
// Install Nemesis.TextParsers.CodeGen as a Cake Addin
#addin nuget:?package=Nemesis.TextParsers.CodeGen&version=2.9.1

// Install Nemesis.TextParsers.CodeGen as a Cake Tool
#tool nuget:?package=Nemesis.TextParsers.CodeGen&version=2.9.1                

Logo Nemesis.TextParsers

Build status - main Tests Last commit Last release

Code size Issues Commit activity GitHub stars

Nuget Downloads

Nuget Downloads

License FOSSA Status


Benefits and Features

TL;DR - are you looking for performant, non allocating serializer from structural object to flat, human editable string? Look no further. Benchmarks shows potential gains from using Nemesis.TextParsers

Method Count Mean Ratio Allocated
TextJson 10 121.02 us 1.00 35200 B
TextJsonBytes 10 120.79 us 1.00 30400 B
TextJsonNet 10 137.28 us 1.13 288000 B
TextParsers 10 49.02 us 0.41 6400 B
TextJson 100 846.06 us 1.00 195200 B
TextJsonBytes 100 845.84 us 1.00 163200 B
TextJsonNet 100 943.71 us 1.12 636800 B
TextParsers 100 463.33 us 0.55 42400 B
TextJson 1000 8,142.13 us 1.00 1639200 B
TextJsonBytes 1000 8,155.41 us 1.00 1247200 B
TextJsonNet 1000 8,708.12 us 1.07 3880800 B
TextParsers 1000 4,384.00 us 0.54 402400 B

More comprehensive examples are here

When stucked with a task of parsing various items form strings we often opt for TypeConverter. We tend to create methods like:

public static T FromString<T>(string text) =>
    (T)TypeDescriptor.GetConverter(typeof(T))
        .ConvertFromInvariantString(text);

or even create similar constructs to be in line with object oriented design:

public abstract class TextTypeConverter : TypeConverter
{
    public sealed override bool CanConvertFrom(ITypeDescriptorContext context, Type sourceType) =>
        sourceType == typeof(string) || base.CanConvertFrom(context, sourceType);

    public sealed override bool CanConvertTo(ITypeDescriptorContext context, Type destinationType) =>
        destinationType == typeof(string) || base.CanConvertTo(context, destinationType);
}

public abstract class BaseTextConverter<TValue> : TextTypeConverter
{
    public sealed override object ConvertFrom(ITypeDescriptorContext context, CultureInfo culture, object value) =>
        value is string text ? ParseString(text) : default;

    public abstract TValue ParseString(string text);
    

    public sealed override object ConvertTo(ITypeDescriptorContext context, CultureInfo culture, object value, Type destinationType) =>
        destinationType == typeof(string) ?
            FormatToString((TValue)value) :
            base.ConvertTo(context, culture, value, destinationType);

    public abstract string FormatToString(TValue value);
}

What is wrong with that? Well, nothing... except of performance and possibly - support for generics.

TypeConverter was designed around 2002 when processing power tended to double every now and then and (in my opinion) it was more suited for creating GUI-like editors where performance usually is not an issue. But imagine a service application like exchange trading suite that has to perform multiple operations per second and in such cases processor has more important thing to do than parsing strings.

Features

  1. as concise as possible - both JSON or XML exist but they are not ready to be created from hand by human support
  2. works in various architectures supporting .Net Core and .Net Standard and is culture independent
  3. support for basic system types (C#-like type names):
    • string
    • bool
    • byte/sbyte, short/ushort, int/uint, long/ulong
    • float/double
    • decimal
    • BigInteger
    • TimeSpan, DateTime/DateTimeOffset
    • Guid, Uri
  4. supports pattern based parsing/formatting via ToString/FromText methods placed inside type or static/instance factory
  5. supports compound types:
    • KeyValuePair<,> and ValueTuple of any arity
    • Enums (with underlying number types; code gen and reflection based)
    • Nullables
    • Dictionaries (built-in i.e. SortedDictionary/SortedList and custom ones)
    • Arrays (including jagged arrays)
    • Standard collections and collection contracts (List vs IList vs IEnumerable)
    • User defined collections
    • everything mentioned above but combined with inner elements properly escaped in final string i.e. SortedDictionary<char?, IList<float[][]>>
  6. ability to fallback to TypeConverter if no parsing/formatting strategy was found
  7. parsing is fast to while allocating as little memory as possible upon parsing. The following benchmark illustrates this speed via parsing 1000 element array
Method Mean Ratio Gen 0 Gen 1 Allocated Remarks
RegEx parsing 4,528.99 us 44.98 492.1875 - 2089896 B Regular expression with escaping support
StringSplitTest_KnownType 93.41 us 0.92 9.5215 0.1221 40032 B string.Split(..).Select(text=>int.Parse(text))
StringSplitTest_DynamicType 474.73 us 4.69 24.4141 - 104032 B string.Split + TypeDescriptor.GetConverter
SpanSplitTest_NoAlloc 101.00 us 1.00 - - - "1|2|3".AsSpan().Tokenize()
SpanSplitTest_Alloc 101.38 us 1.00 0.8545 - 4024 B "1|2|3".AsSpan().Tokenize(); var array = new int[1000];
  1. provides basic building blocks for parser's callers to be able to create their own transformers/factories
    • LeanCollection that can store 1,2,3 or more elements
    • SpanSplit - string.Split equivalent is provided to accept faster representation of string - ReadOnlySpan<char>. Supports both standard and custom escaping sequences
    • access to every implemented parser/formatter
  2. basic LINQ support
var avg = "1|2|3".AsSpan()
    .Tokenize('|', '\\', true)
    .Parse('\\', '∅', '|')
    .Average(DoubleTransformer.Instance);
  1. basic support for GUI editors for compound types like collections/dictionaries: CollectionMeta, DictionaryMeta
  2. lean/frugal implementation of StringBuilder - ValueSequenceBuilder
Span<char> initialBuffer = stackalloc char[32];
using var accumulator = new ValueSequenceBuilder<char>initialBuffer);
using (var enumerator = coll.GetEnumerator())
    while (enumerator.MoveNext())
        FormatElement(formatter, enumerator.Current, ref accumulator);
return accumulator.AsSpanTo(accumulator.Length > 0 ? accumulator.Length - 1 : 0).ToString();
  1. usage of C# 9.0 code-gen (and Incremental Code Generators) to provide several transformers for common cases where parsing logic is straightforward

Todo / road map

  • ability to format to buffer i.e. TryFormat pattern
  • support for ILookup<,>, IGrouping<,>
  • support for native parsing/formatting of F# types (map, collections, records...)

Funding

Open source software is free to use but creating and maintaining is a laborious effort. Should you wish to support us in our noble endeavour, please consider the following donation methods: Donate using Liberapay Liberapay receiving

If you just want to say thanks, you can buy me a ☕ or ⭐ any of my repositories.

License

FOSSA Status

There are no supported framework assets in this package.

Learn more about Target Frameworks and .NET Standard.

This package has no dependencies.

NuGet packages

This package is not used by any NuGet packages.

GitHub repositories

This package is not used by any popular GitHub repositories.

Version Downloads Last updated
2.9.15 120 8/9/2024
2.9.6 112 8/8/2024
2.9.2 220 1/3/2024
2.9.1 150 1/1/2024
2.8.2 144 12/19/2023
2.7.2 168 7/16/2023
2.7.1 150 7/14/2023
2.7.0 150 7/14/2023
2.6.3 450 5/30/2022
2.6.2 355 3/1/2021
2.6.1 339 2/25/2021
2.6.0 341 2/25/2021
0.0.0-alpha.0.335 61 1/1/2024

# Release 2.9.1 - Source code generator for enum types

## What's Changed
* Implement source code generator for enum by @MichalBrylka in https://github.com/nemesissoft/Nemesis.TextParsers/pull/16

**Full Changelog**: https://github.com/nemesissoft/Nemesis.TextParsers/compare/v2.7.2...2.9.1

## Code generator for enum types
With this feature it is enough to annotate enum with 2 attributes:
```csharp
[Auto.AutoEnumTransformer(
   //1. optionally pass parser settings
   CaseInsensitive = true, AllowParsingNumerics = true,
   //2. TransformerClassName can be left blank. In that case the name of enum is used with "Transformer" suffix
   TransformerClassName = "MonthCodeGenTransformer",
   //3. optionally pass namespace to generate the transformer class within. If not provided the namespace of the enum will be used
   TransformerClassNamespace = "ABC"
)]
//4. decorate enum with TransformerAttribute that points to automatically generated transformer
[Transformer(typeof(ABC.MonthCodeGenTransformer))]
public enum Month : byte
{
   None = 0,
   January = 1, February = 2, March = 3,
   April = 4, May = 5, June = 6,
   July = 7, August = 8, September = 9,
   October = 10, November = 11, December = 12
}
```
This in turn generates the following parser using best practices (some lines are ommited for brevity):

<details>
<summary>Source code for generated parser</summary>

```csharp
public sealed class MonthCodeGenTransformer : TransformerBase<Nemesis.TextParsers.CodeGen.Sample.Month>
{
   public override string Format(Nemesis.TextParsers.CodeGen.Sample.Month element) => element switch
   {
       Nemesis.TextParsers.CodeGen.Sample.Month.None => nameof(Nemesis.TextParsers.CodeGen.Sample.Month.None),
       Nemesis.TextParsers.CodeGen.Sample.Month.January => nameof(Nemesis.TextParsers.CodeGen.Sample.Month.January),
       
       // ...

       Nemesis.TextParsers.CodeGen.Sample.Month.December => nameof(Nemesis.TextParsers.CodeGen.Sample.Month.December),
       _ => element.ToString("G"),
   };

   protected override Nemesis.TextParsers.CodeGen.Sample.Month ParseCore(in ReadOnlySpan<char> input) =>
       input.IsWhiteSpace() ? default : (Nemesis.TextParsers.CodeGen.Sample.Month)ParseElement(input);

   private static byte ParseElement(ReadOnlySpan<char> input)
   {
       if (input.IsEmpty || input.IsWhiteSpace()) return default;
       input = input.Trim();
       if (IsNumeric(input) && byte.TryParse(input
#if NETFRAMEWORK
   .ToString() //legacy frameworks do not support parsing from ReadOnlySpan<char>
#endif
           , out var number))
           return number;
       else
           return ParseName(input);


       static bool IsNumeric(ReadOnlySpan<char> input) =>
           input.Length > 0 && input[0] is var first &&
           (char.IsDigit(first) || first is '-' or '+');    
   }

   private static byte ParseName(ReadOnlySpan<char> input)
   {    
       if (IsEqual(input, nameof(Nemesis.TextParsers.CodeGen.Sample.Month.None)))
           return (byte)Nemesis.TextParsers.CodeGen.Sample.Month.None;            

       else if (IsEqual(input, nameof(Nemesis.TextParsers.CodeGen.Sample.Month.January)))
           return (byte)Nemesis.TextParsers.CodeGen.Sample.Month.January;            

       else if (IsEqual(input, nameof(Nemesis.TextParsers.CodeGen.Sample.Month.February)))
           return (byte)Nemesis.TextParsers.CodeGen.Sample.Month.February;            

       // ...         

       else if (IsEqual(input, nameof(Nemesis.TextParsers.CodeGen.Sample.Month.December)))
           return (byte)Nemesis.TextParsers.CodeGen.Sample.Month.December;            

       else throw new FormatException(@$"Enum of type 'Nemesis.TextParsers.CodeGen.Sample.Month' cannot be parsed from '{input.ToString()}'.
Valid values are: [None or January or February or March or April or May or June or July or August or September or October or November or December] or number within byte range.
Ignore case option on.");        

       static bool IsEqual(ReadOnlySpan<char> input, string label) =>
           MemoryExtensions.Equals(input, label.AsSpan(), StringComparison.OrdinalIgnoreCase);
   }
}
```

</details>