UniCluster.Net 1.1.0

dotnet add package UniCluster.Net --version 1.1.0
                    
NuGet\Install-Package UniCluster.Net -Version 1.1.0
                    
This command is intended to be used within the Package Manager Console in Visual Studio, as it uses the NuGet module's version of Install-Package.
<PackageReference Include="UniCluster.Net" Version="1.1.0" />
                    
For projects that support PackageReference, copy this XML node into the project file to reference the package.
<PackageVersion Include="UniCluster.Net" Version="1.1.0" />
                    
Directory.Packages.props
<PackageReference Include="UniCluster.Net" />
                    
Project file
For projects that support Central Package Management (CPM), copy this XML node into the solution Directory.Packages.props file to version the package.
paket add UniCluster.Net --version 1.1.0
                    
#r "nuget: UniCluster.Net, 1.1.0"
                    
#r directive can be used in F# Interactive and Polyglot Notebooks. Copy this into the interactive tool or source code of the script to reference the package.
#:package UniCluster.Net@1.1.0
                    
#:package directive can be used in C# file-based apps starting in .NET 10 preview 4. Copy this into a .cs file before any lines of code to reference the package.
#addin nuget:?package=UniCluster.Net&version=1.1.0
                    
Install as a Cake Addin
#tool nuget:?package=UniCluster.Net&version=1.1.0
                    
Install as a Cake Tool

UniCluster.Net

NuGet Version NuGet Downloads License .NET

High-performance .NET library for optimal 1D K-means clustering using dynamic programming. Guarantees globally optimal solutions for one-dimensional data with deterministic results and predictable O(k·n) time complexity, achieving 39-951x performance improvements over ML.NET K-means.

Key Features

  • Guaranteed Global Optimum: Uses dynamic programming to find the mathematically optimal clustering solution
  • Deterministic Results: Always produces the same result for the same input data
  • High Performance: Optimized O(k·n) implementation, 39-951x faster than ML.NET K-means
  • Simple API: Easy-to-use interface for 1D clustering scenarios
  • Well-Tested: Comprehensive test suite with extensive validation

Installation

dotnet add package UniCluster.Net

Quick Start

using UniCluster.Net;

var kmeans = new OptimalKMeans1D();
var data = new double[] { 1.0, 2.0, 8.0, 9.0, 10.0 };
var result = kmeans.Fit(data, k: 2);

Console.WriteLine($"Total Cost (WCSS): {result.TotalCost:F2}");
Console.WriteLine($"Number of Clusters: {result.ClusterCount}");

foreach (var cluster in result.Clusters)
{
    Console.WriteLine($"Cluster: [{string.Join(", ", cluster.Points)}]");
    Console.WriteLine($"Centroid: {cluster.Centroid:F2}");
}

Expected Output

Total Cost (WCSS): 2.50
Number of Clusters: 2

Cluster 1: [1, 2]
Centroid: 1.50
  
Cluster 2: [8, 9, 10]  
Centroid: 9.00

The algorithm automatically identifies the optimal clustering that minimizes within-cluster sum of squares (WCSS), grouping similar values together while maximizing separation between clusters.

Comparison with ML.NET K-means

Aspect UniCluster.Net ML.NET K-means
Solution Quality Globally optimal (guaranteed) Locally optimal (variable)
Performance 39-951x faster Baseline
Consistency Deterministic, same every time Random initialization, varies
Space Complexity O(k·n) O(n + k) iterative
Time Complexity O(k·n) guaranteed O(i·k·n) where i = iterations
Dimensions 1D only Multi-dimensional

When to Use UniCluster.Net

✅ Ideal Use Cases

  • 1D data clustering where global optimum is required
  • Performance-critical applications needing fast, consistent results
  • Reproducible analysis requiring deterministic outcomes
  • Small to medium k values (typically k ≤ 20)
  • Applications requiring mathematical guarantees

⚠️ Consider Alternatives When

  • Working with high-dimensional data (use traditional K-means)
  • Need very large k values (memory usage scales with k·n)
  • Data preprocessing/feature engineering is the bottleneck
  • Working with streaming or constantly changing data

Algorithm Details

The algorithm minimizes the within-cluster sum of squares (WCSS):

WCSS = Σᵢ Σⱼ (xᵢⱼ - cᵢ)²

where xᵢⱼ is the j-th point in cluster i, and cᵢ is the centroid of cluster i.

How it works (optimal 1D K‑means via Dynamic Programming):

Goal: split sorted data into k contiguous groups to minimize total squared error.

  1. Precompute prefix sums so that looking up the cost of clustering any segment [a..b] into one cluster is an O(1) operation.
  2. Fill the first column: DP(i,1) = cost(1..i).
  3. For k = 2..K and i = k..n, choose a split j < i that minimizes: DP(j, k−1) + cost(j+1..i). This is the (previously computed) cost of clustering j elements in k-1 clusters, in addition to clustering the rest of the set (j+1..i) into 1 more cluster cluster.
  4. Record j to later backtrack and reconstruct the clusters.

Example x = [1,2,3,10,11,12], k = 2:

For i = 6, we try j = 1..5. The best is j = 3:

  • Left: DP(3,1) = cost(1..3) = 2.0
  • Right: cost(4..6) = 2.0
  • Total = 4.0 → clusters [1,2,3] and [10,11,12]

Note: You don’t need to compute any of this yourself—Fit(data, k) does it deterministically.

How the Monotonicity property allows this to run in O(k·n) time rather than O(k·n²):

  • Naive DP evaluates, for each state (i, k), all split positions j ∈ [k−1, i−1], which is O(k·n²) overall.
  • However in 1D Clustering, the optimal split index j* that minimizes DP(j, k−1) + cost(j+1..i) is non-decreasing as i grows.
  • Therefore, in UniCluster.Net we carry j* forward and only advance it when it reduces cost. Each j is visited at most once per k, making the total time complexity of constructing the DP table O(k·n).

Benchmarking

Performance Results

Benchmarks conducted on Windows 11, 13th Gen Intel Core i7-13620H @ 2.40GHz, .NET 8.0:

Dataset UniCluster.Net ML.NET Single Run ML.NET Best-of-N Improvement (Single)
Small (30 points, k=3) 840 ns 798,276 ns 8,309,976 ns (10 runs) 951x faster
Medium (300 points, k=5) 11,840 ns 1,169,301 ns 11,197,798 ns (10 runs) 99x faster
Large (1000 points, k=8) 61,293 ns 2,406,017 ns 11,902,864 ns (5 runs) 39x faster

Run Benchmarks Yourself

cd UniCluster.Net.Benchmarks
dotnet run -c Release

The benchmark suite includes:

  • Small datasets (30 points, k=3)
  • Medium datasets (300 points, k=5)
  • Large datasets (1000 points, k=8)
  • Comparison with ML.NET single-run and best-of-N approaches

Contributing

Contributions are welcome! Please:

  1. Fork the repository
  2. Create a feature branch
  3. Add tests for new functionality
  4. Ensure all tests pass
  5. Submit a pull request

License

This project is licensed under the MIT License - see the License.md file for details.

Acknowledgments

  • Algorithm based on the research by Grønlund et al. in "Fast Exact k-Means, k-Medians and Bregman Divergence Clustering in 1D" (arXiv:1701.07204, 2017)
  • Benchmarked against Microsoft ML.NET for validation
  • Inspired by the need for deterministic, high-performance clustering
Product Compatible and additional computed target framework versions.
.NET net8.0 is compatible.  net8.0-android was computed.  net8.0-browser was computed.  net8.0-ios was computed.  net8.0-maccatalyst was computed.  net8.0-macos was computed.  net8.0-tvos was computed.  net8.0-windows was computed.  net9.0 was computed.  net9.0-android was computed.  net9.0-browser was computed.  net9.0-ios was computed.  net9.0-maccatalyst was computed.  net9.0-macos was computed.  net9.0-tvos was computed.  net9.0-windows was computed.  net10.0 was computed.  net10.0-android was computed.  net10.0-browser was computed.  net10.0-ios was computed.  net10.0-maccatalyst was computed.  net10.0-macos was computed.  net10.0-tvos was computed.  net10.0-windows was computed. 
Compatible target framework(s)
Included target framework(s) (in package)
Learn more about Target Frameworks and .NET Standard.
  • net8.0

    • No dependencies.

NuGet packages

This package is not used by any NuGet packages.

GitHub repositories

This package is not used by any popular GitHub repositories.

Version Downloads Last Updated
1.1.0 62 9/6/2025
1.0.0 76 9/5/2025

Initial release of UniCluster.Net - Optimal 1D K-means clustering library with guaranteed global optimum and 12-16x performance improvement over traditional implementations.