LexiDiff.Snowball 0.2.0

There is a newer version of this package available.
See the version list below for details.
dotnet add package LexiDiff.Snowball --version 0.2.0
                    
NuGet\Install-Package LexiDiff.Snowball -Version 0.2.0
                    
This command is intended to be used within the Package Manager Console in Visual Studio, as it uses the NuGet module's version of Install-Package.
<PackageReference Include="LexiDiff.Snowball" Version="0.2.0" />
                    
For projects that support PackageReference, copy this XML node into the project file to reference the package.
<PackageVersion Include="LexiDiff.Snowball" Version="0.2.0" />
                    
Directory.Packages.props
<PackageReference Include="LexiDiff.Snowball" />
                    
Project file
For projects that support Central Package Management (CPM), copy this XML node into the solution Directory.Packages.props file to version the package.
paket add LexiDiff.Snowball --version 0.2.0
                    
#r "nuget: LexiDiff.Snowball, 0.2.0"
                    
#r directive can be used in F# Interactive and Polyglot Notebooks. Copy this into the interactive tool or source code of the script to reference the package.
#:package LexiDiff.Snowball@0.2.0
                    
#:package directive can be used in C# file-based apps starting in .NET 10 preview 4. Copy this into a .cs file before any lines of code to reference the package.
#addin nuget:?package=LexiDiff.Snowball&version=0.2.0
                    
Install as a Cake Addin
#tool nuget:?package=LexiDiff.Snowball&version=0.2.0
                    
Install as a Cake Tool

Build & Tests NuGet

LexiDiff

Token-aware text diffs with an objective to favour readability over compactness.

| Pure Leveinstein | LexiDiff | |------------------|----------| | Alice was <del>beginning to get</del><ins>getting</ins> very tired <ins>t</ins>o<del>f</del> sit<del>ting</del> by her sister on the bank, <del>and of having</del><ins>with</ins> nothing to do. | Alice was <del>beginning to </del>get<ins>ting</ins> very tired <del>of</del><ins>to</ins> sit<del>ting</del> by her sister on the bank, <del>and of having</del><ins>with</ins> nothing to do. |

Produce readable diffs that never split randomly inside words, optionally promote changes to sentence or paragraph granularity, and render as unified diff or inline HTML.

  • ICU word segmentation + Snowball stemming (multi-language)
  • Token-aware diff (Diff Match Patch), no mid-token splits
  • Optional promotion to Sentence or Paragraph
  • Output as Delete-Add-Replace sequences, Unified Diff (line-level hunks) or Inline HTML

Install

Requirements: .NET 4.8+ / .NET 8 / .NET 9

dotnet add package LexiDiff

Quick Start

using System.Globalization;
using LexiDiff;

LexiDiffResult result = Lexi.Compare(
    "Alice was beginning to get very tired of sitting by her sister on the bank, and of having nothing to do.",
    "Alice was getting very tired to sit by her sister on the bank, with nothing to do.");

foreach (var span in result.Spans) {
    switch (span.Op) {
        case LexOp.Insert: Console.Write($"<ins>{span.Text}</ins>"); break;
        case LexOp.Equal:  Console.Write(span.Text);                 break;
        case LexOp.Delete: Console.Write($"<del>{span.Text}</del>"); break;
    }
}

The LexiDiffResult result contains a list of operation (delete, insert, equal) which, when applied onto the original string, generates the second:

Alice was <del>beginning to </del>get<ins>ting</ins> very tired <del>of</del><ins>to</ins> sit<del>ting</del> by her sister on the bank, <del>and of having</del><ins>with</ins> nothing to do.

Notice that we perform word and stemming aware diff:

  • get<u>ting</u> is allowed (stemming aware) but
  • <u>to</u>[f] would have been preferred by Levenshtein distance, but is not allowed here and transformed as of <u>to</u> instead

Granularity Promotion (Sentence / Paragraph)

You can “promote” any in-sentence edits to a whole-sentence replacement (or paragraph-level), which is often what reviewers want to see.

// Sentence-level promotion (locale-aware via ICU)
var sentenceDiff = Lexi.Compare(
    a, b,
    new LexiDiff.LexOptions {
        PromoteTo = LexiDiff.LexGranularity.Sentence,
        SentenceCulture = CultureInfo.GetCultureInfo("en-US")
    });

Console.WriteLine(sentenceDiff.ToUnifiedDiff("a.txt", "b.txt"));

// Paragraph-level promotion
var paraDiff = LexiDiff.LexDiff.CompareParagraphs(a, b);

Sentence boundaries use ICU’s Unicode Text Segmentation (UAX #29) with locale tailoring. Paragraphs split on newlines (blank line is its own paragraph).


Why token-aware?

Traditional diffs split anywhere in the character stream. LexiDiff:

  • Segments words with ICU, so punctuation/whitespace tokens are preserved.
  • Stems with Snowball, so variants like Running → Runner align on Run.
  • Diffs on tokens, so we never split inside a stem/suffix.
  • Guarantees perfect reconstruction: for every token, either a Whole token or (Stem + Suffix) where stem + suffix == original.

This makes deltas cleaner and more meaningful for reviewers.


Known Limitations

  • Unified diff is line-level. Inline word/suffix highlighting is available via ToInlineHtml, not in unified output.
  • Snowball’s stemming is heuristic; some languages/words may not split (by design). We preserve the original text regardless.
  • Sentence boundaries in may need RBBI tailoring; a light post-filter for abbreviations (Me., Dr., art.) is easy to add if needed.

License

MIT (project code). Snowball stemmers are BSD-style; ICU4N follows ICU/Unicode licenses. Review their licenses if redistributing.

Product Compatible and additional computed target framework versions.
.NET net8.0 is compatible.  net8.0-android was computed.  net8.0-browser was computed.  net8.0-ios was computed.  net8.0-maccatalyst was computed.  net8.0-macos was computed.  net8.0-tvos was computed.  net8.0-windows was computed.  net9.0 is compatible.  net9.0-android was computed.  net9.0-browser was computed.  net9.0-ios was computed.  net9.0-maccatalyst was computed.  net9.0-macos was computed.  net9.0-tvos was computed.  net9.0-windows was computed.  net10.0 was computed.  net10.0-android was computed.  net10.0-browser was computed.  net10.0-ios was computed.  net10.0-maccatalyst was computed.  net10.0-macos was computed.  net10.0-tvos was computed.  net10.0-windows was computed. 
.NET Framework net48 is compatible.  net481 was computed. 
Compatible target framework(s)
Included target framework(s) (in package)
Learn more about Target Frameworks and .NET Standard.
  • .NETFramework 4.8

    • No dependencies.
  • net8.0

    • No dependencies.
  • net9.0

    • No dependencies.

NuGet packages (1)

Showing the top 1 NuGet packages that depend on LexiDiff.Snowball:

Package Downloads
LexiDiff

Token-aware diff engine with Snowball stemming, ICU tokenization, and unified diff rendering.

GitHub repositories

This package is not used by any popular GitHub repositories.

Version Downloads Last Updated
0.2.1 128 9/22/2025
0.2.0 159 9/22/2025
0.1.2 134 9/22/2025
0.1.1 143 9/21/2025