DotCompute.Generators
0.6.2
dotnet add package DotCompute.Generators --version 0.6.2
NuGet\Install-Package DotCompute.Generators -Version 0.6.2
<PackageReference Include="DotCompute.Generators" Version="0.6.2" />
<PackageVersion Include="DotCompute.Generators" Version="0.6.2" />
<PackageReference Include="DotCompute.Generators" />
paket add DotCompute.Generators --version 0.6.2
#r "nuget: DotCompute.Generators, 0.6.2"
#:package DotCompute.Generators@0.6.2
#addin nuget:?package=DotCompute.Generators&version=0.6.2
#tool nuget:?package=DotCompute.Generators&version=0.6.2
DotCompute.Generators
Source generators for the DotCompute framework that enable compile-time code generation for high-performance compute kernels.
Overview
The DotCompute.Generators project provides Roslyn-based source generators that automatically generate optimized backend-specific implementations for compute kernels marked with the [Kernel] or [RingKernel] attributes.
Features
1. KernelSourceGenerator
- Incremental source generator using
IIncrementalGeneratorfor optimal performance - Detects methods marked with
[Kernel]and[RingKernel]attributes - Generates backend-specific implementations (CPU, CUDA, Metal, OpenCL)
- Creates a kernel registry for runtime dispatch
- Supports SIMD vectorization, parallel execution, and persistent kernels
- Generates message queue infrastructure for Ring Kernels
2. KernelCompilationAnalyzer
- Compile-time diagnostics for kernel methods
- Validates parameter types and vector sizes
- Detects performance issues (nested loops, allocations in loops)
- Ensures unsafe context for optimal performance
- Provides actionable error messages
3. Backend Code Generators
- CpuCodeGenerator: Generates optimized CPU implementations with:
- Scalar fallback implementation
- Platform-agnostic SIMD using
Vector<T> - AVX2 optimizations for x86/x64
- AVX-512 optimizations for latest processors
- Parallel execution with task partitioning
- Automatic hardware capability detection
Installation
dotnet add package DotCompute.Generators --version 0.6.0
Usage
1. Add the Generator to Your Project
<ItemGroup>
<ProjectReference Include="..\..\src\DotCompute.Generators\DotCompute.Generators.csproj"
OutputItemType="Analyzer"
ReferenceOutputAssembly="false" />
</ItemGroup>
2. Mark Methods with Kernel or RingKernel Attribute
using DotCompute.Generators.Kernel;
// Standard kernel for one-shot execution
public static unsafe class VectorMath
{
[Kernel(
Backends = KernelBackends.CPU | KernelBackends.CUDA,
VectorSize = 8,
IsParallel = true,
Optimizations = OptimizationHints.AggressiveInlining | OptimizationHints.Vectorize)]
public static void AddVectors(float* a, float* b, float* result, int length)
{
for (int i = 0; i < length; i++)
{
result[i] = a[i] + b[i];
}
}
}
// Ring kernel for persistent GPU-resident computation
public static class GraphAlgorithms
{
[RingKernel(
KernelId = "pagerank-vertex",
Domain = RingKernelDomain.GraphAnalytics,
Mode = RingKernelMode.Persistent,
Capacity = 10000,
Backends = KernelBackends.CUDA | KernelBackends.OpenCL)]
public static void PageRankVertex(
IMessageQueue<VertexMessage> incoming,
IMessageQueue<VertexMessage> outgoing,
Span<float> pageRank)
{
int vertexId = Kernel.ThreadId.X;
while (incoming.TryDequeue(out var msg))
{
if (msg.TargetVertex == vertexId)
pageRank[vertexId] += msg.Rank;
}
// Send to neighbors...
}
}
3. Generated Code
The source generator will create:
Kernel Registry (
KernelRegistry.g.cs):- Catalog of all kernels with metadata
- Runtime lookup capabilities
- Backend support information
CPU Implementation (
AddVectors_CPU.g.cs):- Multiple implementations (scalar, SIMD, AVX2, AVX-512)
- Automatic hardware detection and dispatch
- Parallel execution support
Kernel Invoker (
VectorMathInvoker.g.cs):- Dynamic dispatch based on backend
- Parameter validation
- Type-safe invocation
Kernel Attribute Options
Standard Kernel Attributes
Backends
CPU: CPU backend with SIMD supportCUDA: NVIDIA GPU backendMetal: Apple GPU backendOpenCL: Cross-platform GPU backendAll: All available backends
Optimization Hints
AggressiveInlining: Force method inliningLoopUnrolling: Unroll loops for better performanceVectorize: Enable SIMD vectorizationPrefetch: Add memory prefetch hintsFastMath: Use fast math operations (may reduce accuracy)
Memory Access Patterns
Sequential: Linear memory accessStrided: Fixed-stride memory accessRandom: Random memory accessCoalesced: GPU-optimized coalesced accessTiled: Tiled/blocked memory access
RingKernel Attribute Options
Ring Kernels enable persistent GPU computation with message passing capabilities:
Execution Modes
Persistent: Kernel stays active continuously, ideal for streaming workloadsEventDriven: Kernel launches on-demand when messages arrive, conserves resources
Message Passing Strategies
SharedMemory: Lock-free queues in GPU shared memory (fastest for single-GPU)AtomicQueue: Lock-free queues in global memory with atomics (scalable)P2P: Direct GPU-to-GPU memory transfers (CUDA only, requires NVLink)NCCL: Multi-GPU collectives (CUDA only, optimal for distributed workloads)
Application Domains
General: No domain-specific optimizationsGraphAnalytics: Optimized for irregular memory access patterns (graph algorithms)SpatialSimulation: Optimized for regular access with halo exchange (physics, fluids)ActorModel: Optimized for message-heavy workloads with dynamic distribution
Configuration Options
KernelId: Unique identifier for the kernel (required)Capacity: Maximum concurrent work items (default: 1024, must be power of 2)InputQueueSize: Size of incoming message queue (default: 256, must be power of 2)OutputQueueSize: Size of outgoing message queue (default: 256, must be power of 2)GridDimensions: Number of thread blocks per dimension (auto-calculated if null)BlockDimensions: Threads per block per dimension (auto-selected if null)UseSharedMemory: Enable shared memory for thread-block coordinationSharedMemorySize: Shared memory size in bytes per block
Analyzer Diagnostics
| ID | Severity | Description |
|---|---|---|
| DC0001 | Error | Unsupported type in kernel |
| DC0002 | Error | Kernel method missing buffer parameter |
| DC0003 | Error | Invalid vector size (must be 4, 8, or 16) |
| DC0004 | Warning | Unsafe code context required |
| DC0005 | Warning | Potential performance issue |
Architecture
DotCompute.Generators/
├── Kernel/
│ ├── KernelSourceGenerator.cs # Main generator
│ ├── KernelAttribute.cs # Attribute definitions
│ ├── KernelCompilationAnalyzer.cs # Compile-time analysis
│ └── AcceleratorType.cs # Backend enum
├── Backend/
│ └── CpuCodeGenerator.cs # CPU code generation
├── Models/
│ ├── KernelParameter.cs # Parameter model
│ └── VectorizationInfo.cs # Vectorization analysis model
├── Configuration/
│ └── GeneratorConfiguration.cs # Generator configuration
└── Utils/
├── SourceGeneratorHelpers.cs # Legacy facade (deprecated)
├── CodeFormatter.cs # Code formatting utilities
├── ParameterValidator.cs # Parameter validation
├── LoopOptimizer.cs # Loop optimization
├── VectorizationAnalyzer.cs # Vectorization analysis
├── MethodBodyExtractor.cs # Method body extraction
└── SimdTypeMapper.cs # SIMD type mapping
Future Enhancements
CUDA Code Generation
- PTX generation for NVIDIA GPUs
- Shared memory optimization
- Warp-level primitives
Metal Shader Generation
- Metal Shading Language generation
- Compute pipeline setup
- Resource binding
OpenCL Kernel Generation
- OpenCL C kernel generation
- Work-group optimization
- Memory coalescing
Advanced Optimizations
- Auto-vectorization analysis
- Loop fusion and tiling
- Memory layout optimization
- Cache-aware algorithms
Debugging Support
- Source maps for generated code
- Performance counters injection
- Validation code generation
Development Notes
- The generator targets .NET Standard 2.0 for compatibility
- Uses incremental generation for optimal IDE performance
- Follows Roslyn best practices for analyzers
- Includes comprehensive unit tests (see tests project)
Documentation & Resources
Comprehensive documentation is available for DotCompute:
Architecture Documentation
- Source Generators - Compile-time code generation (12 diagnostic rules, 5 automated fixes)
- System Overview - Generator integration in architecture
Developer Guides
- Getting Started - Using [Kernel] attributes
- Kernel Development - Writing kernels with attributes
- Native AOT Guide - Native AOT compatibility
Reference
- Diagnostic Rules (DC001-DC012) - Complete analyzer reference with automated fixes
Examples
- Basic Vector Operations - [Kernel] attribute usage examples
API Documentation
- API Reference - Complete API documentation
Support
- Documentation: Comprehensive Guides
- Issues: GitHub Issues
- Discussions: GitHub Discussions
| Product | Versions Compatible and additional computed target framework versions. |
|---|---|
| .NET | net5.0 was computed. net5.0-windows was computed. net6.0 was computed. net6.0-android was computed. net6.0-ios was computed. net6.0-maccatalyst was computed. net6.0-macos was computed. net6.0-tvos was computed. net6.0-windows was computed. net7.0 was computed. net7.0-android was computed. net7.0-ios was computed. net7.0-maccatalyst was computed. net7.0-macos was computed. net7.0-tvos was computed. net7.0-windows was computed. net8.0 was computed. net8.0-android was computed. net8.0-browser was computed. net8.0-ios was computed. net8.0-maccatalyst was computed. net8.0-macos was computed. net8.0-tvos was computed. net8.0-windows was computed. net9.0 was computed. net9.0-android was computed. net9.0-browser was computed. net9.0-ios was computed. net9.0-maccatalyst was computed. net9.0-macos was computed. net9.0-tvos was computed. net9.0-windows was computed. net10.0 was computed. net10.0-android was computed. net10.0-browser was computed. net10.0-ios was computed. net10.0-maccatalyst was computed. net10.0-macos was computed. net10.0-tvos was computed. net10.0-windows was computed. |
| .NET Core | netcoreapp2.0 was computed. netcoreapp2.1 was computed. netcoreapp2.2 was computed. netcoreapp3.0 was computed. netcoreapp3.1 was computed. |
| .NET Standard | netstandard2.0 is compatible. netstandard2.1 was computed. |
| .NET Framework | net461 was computed. net462 was computed. net463 was computed. net47 was computed. net471 was computed. net472 was computed. net48 was computed. net481 was computed. |
| MonoAndroid | monoandroid was computed. |
| MonoMac | monomac was computed. |
| MonoTouch | monotouch was computed. |
| Tizen | tizen40 was computed. tizen60 was computed. |
| Xamarin.iOS | xamarinios was computed. |
| Xamarin.Mac | xamarinmac was computed. |
| Xamarin.TVOS | xamarintvos was computed. |
| Xamarin.WatchOS | xamarinwatchos was computed. |
-
.NETStandard 2.0
- No dependencies.
NuGet packages (2)
Showing the top 2 NuGet packages that depend on DotCompute.Generators:
| Package | Downloads |
|---|---|
|
DotCompute.Backends.CUDA
Production-ready NVIDIA CUDA GPU backend for DotCompute. Provides GPU acceleration (21-92x speedup) through CUDA with NVRTC compilation, P2P transfers, Ring Kernels with NCCL support, and unified memory. Requires CUDA 12.0+ and Compute Capability 5.0+ NVIDIA GPU. Benchmarked on RTX 2000 Ada (CC 8.9). |
|
|
Orleans.GpuBridge.Backends.DotCompute
DotCompute backend provider for Orleans.GpuBridge.Core - Enables GPU acceleration via CUDA, OpenCL, Metal, and CPU with attribute-based kernel definition. |
GitHub repositories
This package is not used by any popular GitHub repositories.