AiDotNet.Tensors 0.9.3

dotnet add package AiDotNet.Tensors --version 0.9.3
                    
NuGet\Install-Package AiDotNet.Tensors -Version 0.9.3
                    
This command is intended to be used within the Package Manager Console in Visual Studio, as it uses the NuGet module's version of Install-Package.
<PackageReference Include="AiDotNet.Tensors" Version="0.9.3" />
                    
For projects that support PackageReference, copy this XML node into the project file to reference the package.
<PackageVersion Include="AiDotNet.Tensors" Version="0.9.3" />
                    
Directory.Packages.props
<PackageReference Include="AiDotNet.Tensors" />
                    
Project file
For projects that support Central Package Management (CPM), copy this XML node into the solution Directory.Packages.props file to version the package.
paket add AiDotNet.Tensors --version 0.9.3
                    
#r "nuget: AiDotNet.Tensors, 0.9.3"
                    
#r directive can be used in F# Interactive and Polyglot Notebooks. Copy this into the interactive tool or source code of the script to reference the package.
#:package AiDotNet.Tensors@0.9.3
                    
#:package directive can be used in C# file-based apps starting in .NET 10 preview 4. Copy this into a .cs file before any lines of code to reference the package.
#addin nuget:?package=AiDotNet.Tensors&version=0.9.3
                    
Install as a Cake Addin
#tool nuget:?package=AiDotNet.Tensors&version=0.9.3
                    
Install as a Cake Tool

AiDotNet.Tensors

NuGet Build License

The fastest .NET tensor library. Beats MathNet, NumSharp, TensorPrimitives, and matches TorchSharp CPU on pure managed code with hand-tuned AVX2/FMA SIMD kernels and JIT-compiled machine code.

Features

  • Zero Allocations: In-place operations with ArrayPool<T> and Span<T> for hot paths
  • Hand-Tuned SIMD: Custom AVX2/FMA kernels with 4x loop unrolling, not just Vector<T> wrappers
  • JIT-Compiled Kernels: Runtime x86-64 machine code generation for size-specialized operations
  • BLIS-Style GEMM: Tiled matrix multiply with FMA micro-kernel, cache-aware panel packing
  • GPU Acceleration: Optional CUDA, HIP/ROCm, and OpenCL support via separate packages
  • Multi-Target: Supports .NET 10.0 and .NET Framework 4.7.1
  • Generic Math: Works with any numeric type via INumericOperations<T> interface

Installation

# Core package (CPU SIMD acceleration)
dotnet add package AiDotNet.Tensors

# Optional: OpenBLAS for optimized CPU BLAS operations
dotnet add package AiDotNet.Native.OpenBLAS

# Optional: CLBlast for OpenCL GPU acceleration (AMD/Intel/NVIDIA)
dotnet add package AiDotNet.Native.CLBlast

# Optional: CUDA for NVIDIA GPU acceleration (requires NVIDIA GPU)
dotnet add package AiDotNet.Native.CUDA

Quick Start

using AiDotNet.Tensors.LinearAlgebra;

// Create vectors
var v1 = new Vector<double>(new[] { 1.0, 2.0, 3.0, 4.0 });
var v2 = new Vector<double>(new[] { 5.0, 6.0, 7.0, 8.0 });

// SIMD-accelerated operations
var sum = v1 + v2;
var dot = v1.Dot(v2);

// Create matrices
var m1 = new Matrix<double>(3, 3);
var m2 = Matrix<double>.Identity(3);

// Matrix operations
var product = m1 * m2;
var transpose = m1.Transpose();

CPU Benchmarks

All benchmarks run on AMD Ryzen 9 3950X, .NET 10.0, BenchmarkDotNet. No AVX-512.

vs TorchSharp CPU (Tensor Operations, float)

Head-to-head against TorchSharp's libtorch C++ backend on identical data sizes.

Operation AiDotNet TorchSharp Speedup Result
MatMul 256x256 95 us 125 us 1.3x faster WIN
MatMul 512x512 427 us 533 us 1.2x faster WIN
Mean 1M 194 us 224 us 1.2x faster WIN
Add 100K 30 us 30 us tied TIED
Multiply 100K 42 us 42 us tied TIED
Sum 1M 200 us 183 us 0.9x Close
Sigmoid 1M 222 us 196 us 0.9x Close
Add 1M 209 us 182 us 0.9x Close
ReLU 1M 196 us 169 us 0.9x Close

AiDotNet wins or matches TorchSharp CPU on the majority of operations using pure managed C# with hand-tuned SIMD, no native C++ dependencies required.

vs MathNet.Numerics (Linear Algebra, double, N=1000)

Operation AiDotNet MathNet Speedup
Matrix Multiply 1000x1000 8.3 ms 49.2 ms 6x faster
Matrix Add 1.87 ms 2.50 ms 1.3x faster
Matrix Subtract 2.08 ms 2.47 ms 1.2x faster
Matrix Scalar Multiply 1.66 ms 2.14 ms 1.3x faster
Transpose 2.85 ms 3.68 ms 1.3x faster
Dot Product 97 ns 817 ns 8.4x faster
L2 Norm 92 ns 11,552 ns 125x faster

vs NumSharp (N=1000)

Operation AiDotNet NumSharp Speedup
Matrix Multiply 1000x1000 8.3 ms 26.5 s 3,200x faster
Matrix Add 1.87 ms 1.98 ms 1.1x faster
Transpose 2.85 ms 13.7 ms 4.8x faster
Vector Add 1.47 us 54.5 us 37x faster

vs System.Numerics.Tensors.TensorPrimitives (N=1000)

In-place operations (zero allocation) compared to raw TensorPrimitives calls.

Operation AiDotNet TensorPrimitives Speedup
Dot Product 97 ns 185 ns 1.9x faster
L2 Norm 92 ns 187 ns 2.0x faster
Vector AddInPlace 154 ns 117 ns 0.8x
Vector SubtractInPlace 116 ns 118 ns tied
Vector ScalarMulInPlace 105 ns 75 ns 0.7x
Vector Add to Span 116 ns 119 ns tied

Small Matrix Multiply (double)

Size AiDotNet MathNet NumSharp
4x4 172 ns 165 ns 2,198 ns
16x16 2.1 us 2.9 us 107.5 us
32x32 10.5 us 36.2 us 774.8 us

AiDotNet is 1.4x faster at 16x16 and 3.4x faster at 32x32 than MathNet.

SIMD Instruction Support

The library automatically detects and uses the best available SIMD instructions:

Instruction Set Vector Width Supported
AVX-512 512-bit (16 floats) .NET 8+
AVX2 + FMA 256-bit (8 floats) .NET 6+
AVX 256-bit (8 floats) .NET 6+
SSE4.2 128-bit (4 floats) .NET 6+
ARM NEON 128-bit (4 floats) .NET 6+

Check Available Acceleration

using AiDotNet.Tensors.Engines;

var caps = PlatformDetector.Capabilities;

// SIMD capabilities
Console.WriteLine($"AVX2: {caps.HasAVX2}");
Console.WriteLine($"AVX-512: {caps.HasAVX512F}");

// GPU support
Console.WriteLine($"CUDA: {caps.HasCudaSupport}");
Console.WriteLine($"OpenCL: {caps.HasOpenCLSupport}");

// Native library availability
Console.WriteLine($"OpenBLAS: {caps.HasOpenBlas}");
Console.WriteLine($"CLBlast: {caps.HasClBlast}");

// Or get a full status summary
Console.WriteLine(NativeLibraryDetector.GetStatusSummary());

Optional Acceleration Packages

AiDotNet.Native.OpenBLAS

Provides optimized CPU BLAS operations using OpenBLAS:

dotnet add package AiDotNet.Native.OpenBLAS

Performance: Accelerated BLAS operations for matrix multiply and decompositions.

AiDotNet.Native.CLBlast

Provides GPU acceleration via OpenCL (works on AMD, Intel, and NVIDIA GPUs):

dotnet add package AiDotNet.Native.CLBlast

Performance: 10x+ faster for large matrix operations on GPU.

AiDotNet.Native.CUDA

Provides GPU acceleration via NVIDIA CUDA (NVIDIA GPUs only):

dotnet add package AiDotNet.Native.CUDA

Performance: 30,000+ GFLOPS for matrix operations on modern NVIDIA GPUs.

Requirements:

  • NVIDIA GPU (GeForce, Quadro, or Tesla)
  • NVIDIA display driver 525.60+ (includes CUDA driver)

Usage with helpful error messages:

using AiDotNet.Tensors.Engines.DirectGpu.CUDA;

// Recommended: throws beginner-friendly exception if CUDA unavailable
using var cuda = CudaBackend.CreateOrThrow();

// Or check availability first
if (CudaBackend.IsCudaAvailable)
{
    using var backend = new CudaBackend();
    // Use CUDA acceleration
}

If CUDA is not available, you'll get detailed troubleshooting steps explaining exactly what's missing and how to fix it.

Requirements

  • .NET 10.0 or .NET Framework 4.7.1+
  • Windows x64, Linux x64, or macOS x64/arm64

License

Apache 2.0 - See LICENSE for details.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Product Compatible and additional computed target framework versions.
.NET net10.0 is compatible.  net10.0-android was computed.  net10.0-browser was computed.  net10.0-ios was computed.  net10.0-maccatalyst was computed.  net10.0-macos was computed.  net10.0-tvos was computed.  net10.0-windows was computed. 
.NET Framework net471 is compatible.  net472 was computed.  net48 was computed.  net481 was computed. 
Compatible target framework(s)
Included target framework(s) (in package)
Learn more about Target Frameworks and .NET Standard.

NuGet packages (1)

Showing the top 1 NuGet packages that depend on AiDotNet.Tensors:

Package Downloads
AiDotNet

This is a preview library that will eventually showcase the latest and greatest in ai breakthroughs and bring them to the .net community

GitHub repositories

This package is not used by any popular GitHub repositories.

Version Downloads Last Updated
0.9.3 282 3/10/2026
0.9.2 79 3/9/2026
0.9.1 236 3/5/2026
0.9.0 77 3/5/2026
0.8.0 448 2/28/2026
0.7.0 1,576 2/10/2026
0.6.0 96 2/10/2026
0.5.0 777 1/30/2026
0.4.0 553 1/21/2026
0.3.0 102 1/21/2026
0.2.0 107 1/21/2026
0.1.0 172 1/21/2026