Raffinert.FuzzySharp
4.0.0
dotnet add package Raffinert.FuzzySharp --version 4.0.0
NuGet\Install-Package Raffinert.FuzzySharp -Version 4.0.0
<PackageReference Include="Raffinert.FuzzySharp" Version="4.0.0" />
<PackageVersion Include="Raffinert.FuzzySharp" Version="4.0.0" />
<PackageReference Include="Raffinert.FuzzySharp" />
paket add Raffinert.FuzzySharp --version 4.0.0
#r "nuget: Raffinert.FuzzySharp, 4.0.0"
#:package Raffinert.FuzzySharp@4.0.0
#addin nuget:?package=Raffinert.FuzzySharp&version=4.0.0
#tool nuget:?package=Raffinert.FuzzySharp&version=4.0.0
Terms of use<sup>?</sup>
By using this project or its source code, for any purpose and in any shape or form, you grant your implicit agreement to all the following statements:
- You condemn Russia and its military aggression against Ukraine
- You recognize that Russia is an occupant that unlawfully invaded a sovereign state
- You support Ukraine's territorial integrity, including its claims over temporarily occupied territories of Crimea and Donbas
- You reject false narratives perpetuated by Russian state propaganda
To learn more about the war and how you can help, click here. Glory to Ukraine! πΊπ¦
Raffinert.FuzzySharp
C# .NET fast fuzzy string matching implementation of Seat Geek's well known python FuzzyWuzzy algorithm.
Nitrous-boosted Bit-parallel accelerated version of the original FuzzySharp.
Benchmark comparison of naive DP Levenshtein distance calculation (baseline), FuzzySharp, Fastenshtein and Quickenshtein:
Random words of 3 to 1024 random chars (LevenshteinLarge.cs):
| Method | Mean | Error | StdDev | Ratio | RatioSD | Gen0 | Gen1 | Allocated | Alloc Ratio |
|---|---|---|---|---|---|---|---|---|---|
| NaiveDp | 231.563 ms | 57.5403 ms | 3.1540 ms | 1.00 | 0.02 | 43500.0000 | 34500.0000 | 275312920 B | 1.000 |
| FuzzySharp | 141.820 ms | 4.0905 ms | 0.2242 ms | 0.61 | 0.01 | - | - | 1545732 B | 0.006 |
| Fastenshtein | 123.356 ms | 13.0959 ms | 0.7178 ms | 0.53 | 0.01 | - | - | 34028 B | 0.000 |
| Quickenshtein | 12.918 ms | 12.8046 ms | 0.7019 ms | 0.06 | 0.00 | - | - | 12 B | 0.000 |
| Raffinert.FuzzySharp | 4.970 ms | 0.3311 ms | 0.0181 ms | 0.02 | 0.00 | - | - | 3051 B | 0.000 |
Installation
Install-Package Raffinert.FuzzySharp
or
dotnet add package Raffinert.FuzzySharp
Usage
Simple Ratios
Fuzz.Ratio("mysmilarstring", "myawfullysimilarstirng");
// 72
Fuzz.Ratio("mysmilarstring", "mysimilarstring");
// 97
Partial Ratio
Fuzz.PartialRatio("similar", "somewhresimlrbetweenthisstring");
// 71
Token Sort Ratio
Fuzz.TokenSortRatio("order words out of", " words out of order");
// 100
Fuzz.PartialTokenSortRatio("order words out of", " words out of order");
// 100
Token Set Ratio
Fuzz.TokenSetRatio("fuzzy was a bear", "fuzzy fuzzy fuzzy bear");
// 100
Fuzz.PartialTokenSetRatio("fuzzy was a bear", "fuzzy fuzzy fuzzy bear");
// 100
Token Initialism Ratio
Fuzz.TokenInitialismRatio("NASA", "National Aeronautics and Space Administration");
// 89
Fuzz.TokenInitialismRatio("NASA", "National Aeronautics Space Administration");
// 100
Fuzz.TokenInitialismRatio("NASA", "National Aeronautics Space Administration, Kennedy Space Center, Cape Canaveral, Florida 32899");
// 53
Fuzz.PartialTokenInitialismRatio("NASA", "National Aeronautics Space Administration, Kennedy Space Center, Cape Canaveral, Florida 32899");
// 100
Token Abbreviation Ratio
Fuzz.TokenAbbreviationRatio("bl 420", "Baseline section 420", PreprocessMode.Full);
// 40
Fuzz.PartialTokenAbbreviationRatio("bl 420", "Baseline section 420", PreprocessMode.Full);
// 67
Weighted Ratio
Fuzz.WeightedRatio("The quick brown fox jimps ofver the small lazy dog", "the quick brown fox jumps over the small lazy dog");
// 95
Process Extraction
Find the best match(es) from a collection of choices.
Process.ExtractOne("cowboys", new[] { "Atlanta Falcons", "New York Jets", "New York Giants", "Dallas Cowboys" });
// (string: Dallas Cowboys, score: 90, index: 3)
Process.ExtractTop("goolge", new[] { "google", "bing", "facebook", "linkedin", "twitter", "googleplus", "bingnews", "plexoogl" }, limit: 3);
// [(string: google, score: 83, index: 0), (string: googleplus, score: 75, index: 5), (string: plexoogl, score: 43, index: 7)]
Process.ExtractAll("goolge", new[] { "google", "bing", "facebook", "linkedin", "twitter", "googleplus", "bingnews", "plexoogl" });
// [(string: google, score: 83, index: 0), (string: bing, score: 22, index: 1), ...]
// With score cutoff
Process.ExtractAll("goolge", new[] { "google", "bing", "facebook", "linkedin", "twitter", "googleplus", "bingnews", "plexoogl" }, cutoff: 40);
// [(string: google, score: 83, index: 0), (string: googleplus, score: 75, index: 5), (string: plexoogl, score: 43, index: 7)]
Process.ExtractSorted("goolge", new[] { "google", "bing", "facebook", "linkedin", "twitter", "googleplus", "bingnews", "plexoogl" });
// [(string: google, score: 83, index: 0), (string: googleplus, score: 75, index: 5), (string: plexoogl, score: 43, index: 7), ...]
Extraction uses WeightedRatio and Full preprocessing by default. Override these in the method parameters to use different scorers and processing:
Process.ExtractOne("cowboys", new[] { "Atlanta Falcons", "New York Jets", "New York Giants", "Dallas Cowboys" }, s => s, ScorerCache.Get<DefaultRatioScorer>());
// (string: Dallas Cowboys, score: 57, index: 3)
Generic Type Extraction
Extraction can operate on objects of any type. Use the processor parameter to reduce the object to the string it should be compared on:
var events = new[]
{
new[] { "chicago cubs vs new york mets", "CitiField", "2011-05-11", "8pm" },
new[] { "new york yankees vs boston red sox", "Fenway Park", "2011-05-11", "8pm" },
new[] { "atlanta braves vs pittsburgh pirates", "PNC Park", "2011-05-11", "8pm" },
};
var query = new[] { "new york mets vs chicago cubs", "CitiField", "2017-03-19", "8pm" };
var best = Process.ExtractOne(query, events, strings => strings[0]);
// (value: { "chicago cubs vs new york mets", "CitiField", "2011-05-11", "8pm" }, score: 95, index: 0)
Fluent Pipeline API
The Process.Configure() fluent builder creates reusable, immutable pipelines with preconfigured scoring, caching, and parallel execution.
Basic Pipeline
Equivalent to the static Process methods, but reusable across multiple queries:
var pipeline = Process.Configure().Build();
var result1 = pipeline.ExtractOne("cowboys", new[] { "Atlanta Falcons", "New York Jets", "New York Giants", "Dallas Cowboys" });
var result2 = pipeline.ExtractOne("chicago cubs", baseballStrings);
Custom Scorer
var pipeline = Process.Configure()
.WithScorer(ScorerCache.Get<DefaultRatioScorer>())
.Build();
var result = pipeline.ExtractOne("cowboys", new[] { "Atlanta Falcons", "New York Jets", "New York Giants", "Dallas Cowboys" });
Parallel Execution
Enable multi-threaded processing for large choice sets:
var pipeline = Process.Configure()
.Parallel()
.Build();
var results = pipeline.ExtractAll("goolge", largeChoicesList);
With ParallelOptions for fine-grained control:
var pipeline = Process.Configure()
.Parallel(new ParallelOptions { MaxDegreeOfParallelism = Environment.ProcessorCount })
.Build();
Cached Execution
Automatic caching creates a CachedWeightedRatioScorer per extraction call, pre-initializing internal data structures for the query string:
var pipeline = Process.Configure()
.Cached()
.Build();
var result = pipeline.ExtractOne("cowboys", new[] { "Atlanta Falcons", "New York Jets", "New York Giants", "Dallas Cowboys" });
Cached + Parallel
Combine caching and parallelism. Builder methods are order independent -- .Cached().Parallel() and .Parallel().Cached() produce identical results:
var pipeline = Process.Configure()
.Cached()
.Parallel(new ParallelOptions { MaxDegreeOfParallelism = Environment.ProcessorCount })
.Build();
var results = pipeline.ExtractAll("goolge", largeChoicesList);
External Cached Scorer (Across-Run Caching)
For maximum performance when running the same query against different choice sets, provide an externally managed ICachedRatioScorer. The scorer pre-initializes once and is reused across all extraction calls:
using var scorer = new CachedWeightedRatioScorer("new york mets at atlanta braves");
var pipeline = Process.Configure()
.Cached(scorer)
.Parallel()
.Build();
var results1 = pipeline.ExtractAll(choiceSet1);
var results2 = pipeline.ExtractAll(choiceSet2);
Note: External cached scorers implement
IDisposable. Useusingto ensure proper cleanup.
CancellationToken Support
Pass a CancellationToken via ParallelOptions to cancel long-running parallel extractions:
var cts = new CancellationTokenSource();
var pipeline = Process.Configure()
.Cached()
.Parallel(new ParallelOptions { CancellationToken = cts.Token })
.Build();
// Throws OperationCanceledException if cancelled
var results = pipeline.ExtractAll(query, largeChoicesList).ToList();
Using Different Scorers
Non-Cached Scorers (IRatioScorer)
Stateless scorers for use with Process static methods and the WithScorer() builder method:
var ratio = ScorerCache.Get<DefaultRatioScorer>();
var partialRatio = ScorerCache.Get<PartialRatioScorer>();
var tokenSet = ScorerCache.Get<TokenSetScorer>();
var partialTokenSet = ScorerCache.Get<PartialTokenSetScorer>();
var tokenSort = ScorerCache.Get<TokenSortScorer>();
var partialTokenSort = ScorerCache.Get<PartialTokenSortScorer>();
var tokenAbbreviation = ScorerCache.Get<TokenAbbreviationScorer>();
var partialTokenAbbrev = ScorerCache.Get<PartialTokenAbbreviationScorer>();
var weighted = ScorerCache.Get<WeightedRatioScorer>();
Cached Scorers (ICachedRatioScorer)
Pre-initialize with a query string for repeated comparisons. These implement IDisposable:
using var scorer = new CachedWeightedRatioScorer("search query");
int score = scorer.Score("candidate string");
Available cached scorers:
CachedWeightedRatioScorer-- weighted combination (default for.Cached())CachedDefaultRatioScorer-- simple Levenshtein ratioCachedTokenSortScorer-- token sort ratioCachedTokenSetScorer-- token set ratioCachedPartialTokenSetScorer-- partial token set ratioCachedTokenDifferenceScorer-- token difference ratio
Levenshtein Distance API
Low-level access to the bit-parallel Levenshtein distance implementation:
// Edit distance
int distance = Levenshtein.Distance("kitten", "sitting");
// 3
// Normalized similarity (1.0 = identical, 0.0 = completely different)
double similarity = Levenshtein.NormalizedSimilarity("kitten", "sitting");
// Edit operations to transform one string into another
EditOp[] ops = Levenshtein.GetEditOps("kitten", "sitting");
// [Replace(0->0), Equal, Equal, Equal, Insert(4->4), Replace(5->6)]
Instance Distance Classes
The Levenshtein, Indel, and LongestCommonSubsequence classes also offer an instance API for one-to-many comparisons. The constructor pre-computes a bit-parallel pattern match vector from the source string, which is then reused across all subsequent calls. This avoids rebuilding the internal data structure on every comparison, giving a significant speedup when comparing one source against many targets.
All three implement IDisposable -- use using to return pooled arrays.
Levenshtein Instance
using var lev = new Levenshtein("chicago cubs vs new york mets");
int d1 = lev.DistanceFrom("new york mets vs chicago cubs");
int d2 = lev.DistanceFrom("atlanta braves vs pittsburgh pirates");
Indel Instance
Indel distance counts only insertions and deletions (no replacements). NormalizedSimilarityWith returns a value between 0.0 (completely different) and 1.0 (identical):
using var indel = new Indel("chicago cubs");
int distance = indel.DistanceFrom("chicago white sox");
double similarity = indel.NormalizedSimilarityWith("chicago white sox");
A generic variant IndelT<T> is available for comparing sequences of any IEquatable<T>:
using var indel = new IndelT<string>(new[] { "hello", "world" });
int distance = indel.DistanceFrom(new[] { "hello", "there" });
double similarity = indel.NormalizedSimilarityWith(new[] { "hello", "there" });
LongestCommonSubsequence Instance
LCS distance is defined as max(len1, len2) - LCS_length:
using var lcs = new LongestCommonSubsequence("chicago cubs");
int distance = lcs.DistanceFrom("chicago white sox");
PreprocessMode
By default, Fuzz methods compare strings as-is. Pass PreprocessMode.Full to normalize whitespace, lowercase, and strip non-alphanumeric characters before comparing:
Fuzz.Ratio("new york mets", "NEW YORK METS");
// < 100 (case sensitive)
Fuzz.Ratio("new york mets", "NEW YORK METS", PreprocessMode.Full);
// 100 (case insensitive after preprocessing)
Process extraction methods use PreprocessMode.Full by default. Pass a custom processor function to override this behavior.
Credits
- Adam Cohen (seatgeek/fuzzywuzzy)
- Antti Haapala (python-Levenshtein)
- David Necas (python-Levenshtein)
- Jacob Bayer (original FuzzySharp library)
- Max Bachmann (RapidFuzz)
- Mikko Ohtamaa (python-Levenshtein)
- Panayiotis (Java implementation I heavily borrowed from)
Support
Support the project through GitHub Sponsors or via PayPal.
See CHANGELOG.md for release history.
| Product | Versions Compatible and additional computed target framework versions. |
|---|---|
| .NET | net5.0 was computed. net5.0-windows was computed. net6.0 is compatible. net6.0-android was computed. net6.0-ios was computed. net6.0-maccatalyst was computed. net6.0-macos was computed. net6.0-tvos was computed. net6.0-windows was computed. net7.0 was computed. net7.0-android was computed. net7.0-ios was computed. net7.0-maccatalyst was computed. net7.0-macos was computed. net7.0-tvos was computed. net7.0-windows was computed. net8.0 is compatible. net8.0-android was computed. net8.0-browser was computed. net8.0-ios was computed. net8.0-maccatalyst was computed. net8.0-macos was computed. net8.0-tvos was computed. net8.0-windows was computed. net9.0 is compatible. net9.0-android was computed. net9.0-browser was computed. net9.0-ios was computed. net9.0-maccatalyst was computed. net9.0-macos was computed. net9.0-tvos was computed. net9.0-windows was computed. net10.0 is compatible. net10.0-android was computed. net10.0-browser was computed. net10.0-ios was computed. net10.0-maccatalyst was computed. net10.0-macos was computed. net10.0-tvos was computed. net10.0-windows was computed. |
| .NET Core | netcoreapp2.0 was computed. netcoreapp2.1 was computed. netcoreapp2.2 was computed. netcoreapp3.0 was computed. netcoreapp3.1 is compatible. |
| .NET Standard | netstandard2.0 is compatible. netstandard2.1 is compatible. |
| .NET Framework | net45 is compatible. net451 was computed. net452 was computed. net46 is compatible. net461 was computed. net462 is compatible. net463 was computed. net47 was computed. net471 was computed. net472 is compatible. net48 is compatible. net481 was computed. |
| MonoAndroid | monoandroid was computed. |
| MonoMac | monomac was computed. |
| MonoTouch | monotouch was computed. |
| Tizen | tizen40 was computed. tizen60 was computed. |
| Xamarin.iOS | xamarinios was computed. |
| Xamarin.Mac | xamarinmac was computed. |
| Xamarin.TVOS | xamarintvos was computed. |
| Xamarin.WatchOS | xamarinwatchos was computed. |
-
.NETCoreApp 3.1
- IndexRange (>= 1.0.3)
-
.NETFramework 4.5
- IndexRange (>= 1.0.3)
- System.Memory (>= 4.5.5)
-
.NETFramework 4.6
- IndexRange (>= 1.0.3)
- System.Memory (>= 4.5.5)
-
.NETFramework 4.6.2
- IndexRange (>= 1.0.3)
- System.Memory (>= 4.5.5)
-
.NETFramework 4.7.2
- IndexRange (>= 1.0.3)
- System.Memory (>= 4.5.5)
-
.NETFramework 4.8
- IndexRange (>= 1.0.3)
- System.Memory (>= 4.5.5)
-
.NETStandard 2.0
- IndexRange (>= 1.0.3)
- System.Memory (>= 4.5.5)
-
.NETStandard 2.1
- No dependencies.
-
net10.0
- No dependencies.
-
net6.0
- No dependencies.
-
net8.0
- No dependencies.
-
net9.0
- No dependencies.
NuGet packages (1)
Showing the top 1 NuGet packages that depend on Raffinert.FuzzySharp:
| Package | Downloads |
|---|---|
|
DSharpPlus.Commands
An all in one package for managing commands. |
GitHub repositories (2)
Showing the top 2 popular GitHub repositories that depend on Raffinert.FuzzySharp:
| Repository | Stars |
|---|---|
|
DSharpPlus/DSharpPlus
A .NET library for making bots using the Discord API.
|
|
|
iPromKnight/zilean
|