DotCompute.Generators 0.6.2

dotnet add package DotCompute.Generators --version 0.6.2
                    
NuGet\Install-Package DotCompute.Generators -Version 0.6.2
                    
This command is intended to be used within the Package Manager Console in Visual Studio, as it uses the NuGet module's version of Install-Package.
<PackageReference Include="DotCompute.Generators" Version="0.6.2" />
                    
For projects that support PackageReference, copy this XML node into the project file to reference the package.
<PackageVersion Include="DotCompute.Generators" Version="0.6.2" />
                    
Directory.Packages.props
<PackageReference Include="DotCompute.Generators" />
                    
Project file
For projects that support Central Package Management (CPM), copy this XML node into the solution Directory.Packages.props file to version the package.
paket add DotCompute.Generators --version 0.6.2
                    
#r "nuget: DotCompute.Generators, 0.6.2"
                    
#r directive can be used in F# Interactive and Polyglot Notebooks. Copy this into the interactive tool or source code of the script to reference the package.
#:package DotCompute.Generators@0.6.2
                    
#:package directive can be used in C# file-based apps starting in .NET 10 preview 4. Copy this into a .cs file before any lines of code to reference the package.
#addin nuget:?package=DotCompute.Generators&version=0.6.2
                    
Install as a Cake Addin
#tool nuget:?package=DotCompute.Generators&version=0.6.2
                    
Install as a Cake Tool

DotCompute.Generators

Source generators for the DotCompute framework that enable compile-time code generation for high-performance compute kernels.

Overview

The DotCompute.Generators project provides Roslyn-based source generators that automatically generate optimized backend-specific implementations for compute kernels marked with the [Kernel] or [RingKernel] attributes.

Features

1. KernelSourceGenerator

  • Incremental source generator using IIncrementalGenerator for optimal performance
  • Detects methods marked with [Kernel] and [RingKernel] attributes
  • Generates backend-specific implementations (CPU, CUDA, Metal, OpenCL)
  • Creates a kernel registry for runtime dispatch
  • Supports SIMD vectorization, parallel execution, and persistent kernels
  • Generates message queue infrastructure for Ring Kernels

2. KernelCompilationAnalyzer

  • Compile-time diagnostics for kernel methods
  • Validates parameter types and vector sizes
  • Detects performance issues (nested loops, allocations in loops)
  • Ensures unsafe context for optimal performance
  • Provides actionable error messages

3. Backend Code Generators

  • CpuCodeGenerator: Generates optimized CPU implementations with:
    • Scalar fallback implementation
    • Platform-agnostic SIMD using Vector<T>
    • AVX2 optimizations for x86/x64
    • AVX-512 optimizations for latest processors
    • Parallel execution with task partitioning
    • Automatic hardware capability detection

Installation

dotnet add package DotCompute.Generators --version 0.6.0

Usage

1. Add the Generator to Your Project

<ItemGroup>
  <ProjectReference Include="..\..\src\DotCompute.Generators\DotCompute.Generators.csproj" 
                    OutputItemType="Analyzer" 
                    ReferenceOutputAssembly="false" />
</ItemGroup>

2. Mark Methods with Kernel or RingKernel Attribute

using DotCompute.Generators.Kernel;

// Standard kernel for one-shot execution
public static unsafe class VectorMath
{
    [Kernel(
        Backends = KernelBackends.CPU | KernelBackends.CUDA,
        VectorSize = 8,
        IsParallel = true,
        Optimizations = OptimizationHints.AggressiveInlining | OptimizationHints.Vectorize)]
    public static void AddVectors(float* a, float* b, float* result, int length)
    {
        for (int i = 0; i < length; i++)
        {
            result[i] = a[i] + b[i];
        }
    }
}

// Ring kernel for persistent GPU-resident computation
public static class GraphAlgorithms
{
    [RingKernel(
        KernelId = "pagerank-vertex",
        Domain = RingKernelDomain.GraphAnalytics,
        Mode = RingKernelMode.Persistent,
        Capacity = 10000,
        Backends = KernelBackends.CUDA | KernelBackends.OpenCL)]
    public static void PageRankVertex(
        IMessageQueue<VertexMessage> incoming,
        IMessageQueue<VertexMessage> outgoing,
        Span<float> pageRank)
    {
        int vertexId = Kernel.ThreadId.X;

        while (incoming.TryDequeue(out var msg))
        {
            if (msg.TargetVertex == vertexId)
                pageRank[vertexId] += msg.Rank;
        }

        // Send to neighbors...
    }
}

3. Generated Code

The source generator will create:

  1. Kernel Registry (KernelRegistry.g.cs):

    • Catalog of all kernels with metadata
    • Runtime lookup capabilities
    • Backend support information
  2. CPU Implementation (AddVectors_CPU.g.cs):

    • Multiple implementations (scalar, SIMD, AVX2, AVX-512)
    • Automatic hardware detection and dispatch
    • Parallel execution support
  3. Kernel Invoker (VectorMathInvoker.g.cs):

    • Dynamic dispatch based on backend
    • Parameter validation
    • Type-safe invocation

Kernel Attribute Options

Standard Kernel Attributes

Backends

  • CPU: CPU backend with SIMD support
  • CUDA: NVIDIA GPU backend
  • Metal: Apple GPU backend
  • OpenCL: Cross-platform GPU backend
  • All: All available backends

Optimization Hints

  • AggressiveInlining: Force method inlining
  • LoopUnrolling: Unroll loops for better performance
  • Vectorize: Enable SIMD vectorization
  • Prefetch: Add memory prefetch hints
  • FastMath: Use fast math operations (may reduce accuracy)

Memory Access Patterns

  • Sequential: Linear memory access
  • Strided: Fixed-stride memory access
  • Random: Random memory access
  • Coalesced: GPU-optimized coalesced access
  • Tiled: Tiled/blocked memory access

RingKernel Attribute Options

Ring Kernels enable persistent GPU computation with message passing capabilities:

Execution Modes

  • Persistent: Kernel stays active continuously, ideal for streaming workloads
  • EventDriven: Kernel launches on-demand when messages arrive, conserves resources

Message Passing Strategies

  • SharedMemory: Lock-free queues in GPU shared memory (fastest for single-GPU)
  • AtomicQueue: Lock-free queues in global memory with atomics (scalable)
  • P2P: Direct GPU-to-GPU memory transfers (CUDA only, requires NVLink)
  • NCCL: Multi-GPU collectives (CUDA only, optimal for distributed workloads)

Application Domains

  • General: No domain-specific optimizations
  • GraphAnalytics: Optimized for irregular memory access patterns (graph algorithms)
  • SpatialSimulation: Optimized for regular access with halo exchange (physics, fluids)
  • ActorModel: Optimized for message-heavy workloads with dynamic distribution

Configuration Options

  • KernelId: Unique identifier for the kernel (required)
  • Capacity: Maximum concurrent work items (default: 1024, must be power of 2)
  • InputQueueSize: Size of incoming message queue (default: 256, must be power of 2)
  • OutputQueueSize: Size of outgoing message queue (default: 256, must be power of 2)
  • GridDimensions: Number of thread blocks per dimension (auto-calculated if null)
  • BlockDimensions: Threads per block per dimension (auto-selected if null)
  • UseSharedMemory: Enable shared memory for thread-block coordination
  • SharedMemorySize: Shared memory size in bytes per block

Analyzer Diagnostics

ID Severity Description
DC0001 Error Unsupported type in kernel
DC0002 Error Kernel method missing buffer parameter
DC0003 Error Invalid vector size (must be 4, 8, or 16)
DC0004 Warning Unsafe code context required
DC0005 Warning Potential performance issue

Architecture

DotCompute.Generators/
├── Kernel/
│   ├── KernelSourceGenerator.cs    # Main generator
│   ├── KernelAttribute.cs          # Attribute definitions
│   ├── KernelCompilationAnalyzer.cs # Compile-time analysis
│   └── AcceleratorType.cs          # Backend enum
├── Backend/
│   └── CpuCodeGenerator.cs         # CPU code generation
├── Models/
│   ├── KernelParameter.cs          # Parameter model
│   └── VectorizationInfo.cs        # Vectorization analysis model
├── Configuration/
│   └── GeneratorConfiguration.cs   # Generator configuration
└── Utils/
    ├── SourceGeneratorHelpers.cs   # Legacy facade (deprecated)
    ├── CodeFormatter.cs             # Code formatting utilities
    ├── ParameterValidator.cs       # Parameter validation
    ├── LoopOptimizer.cs            # Loop optimization
    ├── VectorizationAnalyzer.cs    # Vectorization analysis
    ├── MethodBodyExtractor.cs      # Method body extraction
    └── SimdTypeMapper.cs           # SIMD type mapping

Future Enhancements

  1. CUDA Code Generation

    • PTX generation for NVIDIA GPUs
    • Shared memory optimization
    • Warp-level primitives
  2. Metal Shader Generation

    • Metal Shading Language generation
    • Compute pipeline setup
    • Resource binding
  3. OpenCL Kernel Generation

    • OpenCL C kernel generation
    • Work-group optimization
    • Memory coalescing
  4. Advanced Optimizations

    • Auto-vectorization analysis
    • Loop fusion and tiling
    • Memory layout optimization
    • Cache-aware algorithms
  5. Debugging Support

    • Source maps for generated code
    • Performance counters injection
    • Validation code generation

Development Notes

  • The generator targets .NET Standard 2.0 for compatibility
  • Uses incremental generation for optimal IDE performance
  • Follows Roslyn best practices for analyzers
  • Includes comprehensive unit tests (see tests project)

Documentation & Resources

Comprehensive documentation is available for DotCompute:

Architecture Documentation

Developer Guides

Reference

Examples

API Documentation

Support

Product Compatible and additional computed target framework versions.
.NET net5.0 was computed.  net5.0-windows was computed.  net6.0 was computed.  net6.0-android was computed.  net6.0-ios was computed.  net6.0-maccatalyst was computed.  net6.0-macos was computed.  net6.0-tvos was computed.  net6.0-windows was computed.  net7.0 was computed.  net7.0-android was computed.  net7.0-ios was computed.  net7.0-maccatalyst was computed.  net7.0-macos was computed.  net7.0-tvos was computed.  net7.0-windows was computed.  net8.0 was computed.  net8.0-android was computed.  net8.0-browser was computed.  net8.0-ios was computed.  net8.0-maccatalyst was computed.  net8.0-macos was computed.  net8.0-tvos was computed.  net8.0-windows was computed.  net9.0 was computed.  net9.0-android was computed.  net9.0-browser was computed.  net9.0-ios was computed.  net9.0-maccatalyst was computed.  net9.0-macos was computed.  net9.0-tvos was computed.  net9.0-windows was computed.  net10.0 was computed.  net10.0-android was computed.  net10.0-browser was computed.  net10.0-ios was computed.  net10.0-maccatalyst was computed.  net10.0-macos was computed.  net10.0-tvos was computed.  net10.0-windows was computed. 
.NET Core netcoreapp2.0 was computed.  netcoreapp2.1 was computed.  netcoreapp2.2 was computed.  netcoreapp3.0 was computed.  netcoreapp3.1 was computed. 
.NET Standard netstandard2.0 is compatible.  netstandard2.1 was computed. 
.NET Framework net461 was computed.  net462 was computed.  net463 was computed.  net47 was computed.  net471 was computed.  net472 was computed.  net48 was computed.  net481 was computed. 
MonoAndroid monoandroid was computed. 
MonoMac monomac was computed. 
MonoTouch monotouch was computed. 
Tizen tizen40 was computed.  tizen60 was computed. 
Xamarin.iOS xamarinios was computed. 
Xamarin.Mac xamarinmac was computed. 
Xamarin.TVOS xamarintvos was computed. 
Xamarin.WatchOS xamarinwatchos was computed. 
Compatible target framework(s)
Included target framework(s) (in package)
Learn more about Target Frameworks and .NET Standard.
  • .NETStandard 2.0

    • No dependencies.

NuGet packages (2)

Showing the top 2 NuGet packages that depend on DotCompute.Generators:

Package Downloads
DotCompute.Backends.CUDA

Production-ready NVIDIA CUDA GPU backend for DotCompute. Provides GPU acceleration (21-92x speedup) through CUDA with NVRTC compilation, P2P transfers, Ring Kernels with NCCL support, and unified memory. Requires CUDA 12.0+ and Compute Capability 5.0+ NVIDIA GPU. Benchmarked on RTX 2000 Ada (CC 8.9).

Orleans.GpuBridge.Backends.DotCompute

DotCompute backend provider for Orleans.GpuBridge.Core - Enables GPU acceleration via CUDA, OpenCL, Metal, and CPU with attribute-based kernel definition.

GitHub repositories

This package is not used by any popular GitHub repositories.

Version Downloads Last Updated
0.6.2 140 2/9/2026
0.5.3 313 2/2/2026
0.5.2 667 12/8/2025
0.5.1 627 11/28/2025
0.5.0 230 11/27/2025
0.4.2-rc2 298 11/11/2025
0.4.1-rc2 199 11/6/2025