DotCompute.Backends.OpenCL
0.6.2
dotnet add package DotCompute.Backends.OpenCL --version 0.6.2
NuGet\Install-Package DotCompute.Backends.OpenCL -Version 0.6.2
<PackageReference Include="DotCompute.Backends.OpenCL" Version="0.6.2" />
<PackageVersion Include="DotCompute.Backends.OpenCL" Version="0.6.2" />
<PackageReference Include="DotCompute.Backends.OpenCL" />
paket add DotCompute.Backends.OpenCL --version 0.6.2
#r "nuget: DotCompute.Backends.OpenCL, 0.6.2"
#:package DotCompute.Backends.OpenCL@0.6.2
#addin nuget:?package=DotCompute.Backends.OpenCL&version=0.6.2
#tool nuget:?package=DotCompute.Backends.OpenCL&version=0.6.2
DotCompute.Backends.OpenCL
Cross-platform OpenCL compute backend for .NET 9+ with GPU and accelerator support.
Status: ⚠️ EXPERIMENTAL
EXPERIMENTAL: This backend is functional for cross-platform GPU acceleration but has not been extensively production-tested across all vendor implementations. It works well for development and testing across NVIDIA, AMD, and Intel GPUs. Production use recommended only after validation on your target hardware.
The OpenCL backend provides cross-platform GPU acceleration:
- OpenCL Runtime Integration: P/Invoke bindings to OpenCL C API
- Device Management: Platform and device enumeration
- Context Management: OpenCL context creation and lifecycle
- Memory Management: Device memory allocation and transfers
- Kernel Compilation: Runtime kernel compilation from OpenCL C
- Ring Kernel Support: Persistent kernels with message passing
- Plugin Architecture: Integrated with DotCompute plugin system
- Cross-Vendor Support: NVIDIA, AMD, Intel, ARM Mali, Qualcomm Adreno
Key Components
OpenCL Accelerator
OpenCLAccelerator
Main accelerator implementation providing:
- Device initialization and management
- Kernel compilation and execution
- Memory allocation and synchronization
- OpenCL context lifecycle management
- Error handling and diagnostics
Device Management
OpenCLDeviceManager
Manages OpenCL devices:
- Platform enumeration
- Device discovery and selection
- Capability detection
- Device information queries
- Multi-device support
OpenCLDeviceInfo
Device information structure:
- Device name and vendor
- OpenCL version and driver version
- Memory sizes (global, local, constant)
- Compute capabilities (work group size, compute units)
- Image support and dimensions
- Device type (GPU, CPU, Accelerator)
OpenCLPlatformInfo
Platform information:
- Platform name and vendor
- OpenCL version support
- Available extensions
- Device count
Context and Execution
OpenCLContext
OpenCL context wrapper:
- Context creation from devices
- Command queue management
- Resource lifecycle
- Error handling
- Synchronization primitives
Memory Management
OpenCLMemoryManager
Unified memory manager for OpenCL:
- Device memory allocation
- Host-device memory transfers
- Buffer management
- Memory pooling support
- Synchronous and asynchronous operations
OpenCLMemoryBuffer
Buffer implementation:
- Device buffer allocation
- Read/write operations
- Zero-copy mapping when supported
- Rectangular buffer support
- Sub-buffer creation
Kernel Management
OpenCLCompiledKernel
Compiled kernel representation:
- Kernel compilation from OpenCL C source
- Argument binding
- Execution with work dimensions
- Local memory specification
- Synchronous and asynchronous execution
Ring Kernel Runtime
OpenCLRingKernelRuntime
Persistent kernel runtime for OpenCL:
- Launch and activation control
- Message queue management with atomic operations
- Status monitoring and metrics collection
- Deactivation and termination support
- Compatible with all OpenCL 1.2+ devices
OpenCLRingKernelCompiler
Ring kernel compilation for OpenCL:
- Generates OpenCL C code for persistent kernels
- Message queue implementation with atomics
- Control block for kernel lifecycle management
- Lock-free communication patterns
Factory
OpenCLAcceleratorFactory
Factory for creating OpenCL accelerators:
- Automatic device selection
- Configuration-based creation
- Workload profile matching
- Performance profile tuning
Native Interop
OpenCLRuntime
P/Invoke bindings to OpenCL C API:
- Platform and device functions
- Context and queue functions
- Memory object functions
- Kernel functions
- Event and synchronization functions
OpenCLTypes
Native type definitions:
- Platform and device IDs
- Context and queue handles
- Memory object handles
- Kernel handles
- Error codes and status types
OpenCLException
Exception type for OpenCL errors:
- Error code mapping
- Human-readable error messages
- Stack trace preservation
Installation
dotnet add package DotCompute.Backends.OpenCL --version 0.5.3
Usage
Basic Setup
using DotCompute.Backends.OpenCL;
using Microsoft.Extensions.Logging;
var logger = LoggerFactory.Create(builder => builder.AddConsole())
.CreateLogger<OpenCLAccelerator>();
// Create accelerator
var accelerator = new OpenCLAccelerator(logger, loggerFactory);
// Initialize with default device (first GPU or CPU)
await accelerator.InitializeAsync();
Console.WriteLine($"Using: {accelerator.Name}");
Console.WriteLine($"Global Memory: {accelerator.Info.TotalMemory / (1024*1024)} MB");
Service Registration
using Microsoft.Extensions.DependencyInjection;
var services = new ServiceCollection();
// Register OpenCL backend
services.AddSingleton<IAccelerator, OpenCLAccelerator>();
// OR use plugin registration
services.AddDotComputeBackend("DotCompute.Backends.OpenCL");
Device Selection
using DotCompute.Backends.OpenCL.DeviceManagement;
var deviceManager = new OpenCLDeviceManager(logger);
// Enumerate all devices
var devices = await deviceManager.EnumerateDevicesAsync();
foreach (var device in devices)
{
Console.WriteLine($"Device: {device.Name}");
Console.WriteLine($" Type: {device.DeviceType}");
Console.WriteLine($" Compute Units: {device.MaxComputeUnits}");
Console.WriteLine($" Global Memory: {device.GlobalMemorySize / (1024*1024)} MB");
Console.WriteLine($" Local Memory: {device.LocalMemorySize / 1024} KB");
}
// Select specific device
var selectedDevice = devices.FirstOrDefault(d => d.DeviceType == DeviceType.GPU);
if (selectedDevice != null)
{
await accelerator.InitializeAsync(selectedDevice);
}
Kernel Compilation and Execution
using DotCompute.Abstractions.Kernels;
// Define OpenCL kernel
var kernelDef = new KernelDefinition
{
Name = "VectorAdd",
Source = @"
__kernel void vector_add(
__global const float* a,
__global const float* b,
__global float* result,
const int length)
{
int gid = get_global_id(0);
if (gid < length) {
result[gid] = a[gid] + b[gid];
}
}
",
EntryPoint = "vector_add"
};
// Compile kernel
var compiledKernel = await accelerator.CompileKernelAsync(kernelDef);
// Allocate device memory
var length = 1_000_000;
var bufferA = await accelerator.Memory.AllocateAsync<float>(length);
var bufferB = await accelerator.Memory.AllocateAsync<float>(length);
var bufferResult = await accelerator.Memory.AllocateAsync<float>(length);
// Copy data to device
var dataA = Enumerable.Range(0, length).Select(i => (float)i).ToArray();
var dataB = Enumerable.Range(0, length).Select(i => (float)(i * 2)).ToArray();
await bufferA.CopyFromAsync(dataA);
await bufferB.CopyFromAsync(dataB);
// Set kernel arguments and execute
var launchParams = new KernelLaunchParameters
{
GlobalWorkSize = new[] { (uint)length },
LocalWorkSize = new[] { 256u }
};
await compiledKernel.ExecuteAsync(new object[]
{
bufferA,
bufferB,
bufferResult,
length
}, launchParams);
// Read results back
var results = new float[length];
await bufferResult.CopyToAsync(results);
// Cleanup
await bufferA.DisposeAsync();
await bufferB.DisposeAsync();
await bufferResult.DisposeAsync();
Memory Operations
// Allocate buffer
var buffer = await accelerator.Memory.AllocateAsync<float>(10_000);
// Write to device
var hostData = new float[10_000];
await buffer.CopyFromAsync(hostData);
// Read from device
var resultData = new float[10_000];
await buffer.CopyToAsync(resultData);
// Map memory for zero-copy access (if supported)
if (accelerator.DeviceInfo?.SupportsHostMemoryMapping == true)
{
var mappedPtr = await buffer.MapAsync(MapMode.ReadWrite);
// Access memory directly...
await buffer.UnmapAsync(mappedPtr);
}
Using Factory
using DotCompute.Backends.OpenCL.Factory;
var factory = new OpenCLAcceleratorFactory(configuration, logger);
// Create accelerator with performance profile
var accelerator = await factory.CreateAsync(new WorkloadProfile
{
WorkloadType = WorkloadType.Compute,
DataSize = DataSize.Large,
MemoryIntensive = true
});
Architecture
Component Hierarchy
OpenCLAccelerator (IAccelerator)
├── OpenCLContext (Context management)
├── OpenCLDeviceManager (Device discovery)
├── OpenCLMemoryManager (Memory operations)
└── OpenCLCompiledKernel (Kernel execution)
Native Layer:
├── OpenCLRuntime (P/Invoke bindings)
├── OpenCLTypes (Native type definitions)
└── OpenCLException (Error handling)
Initialization Flow
- Platform Enumeration: Detect all OpenCL platforms
- Device Discovery: Find devices on each platform
- Device Selection: Choose appropriate device
- Context Creation: Create OpenCL context for device
- Queue Creation: Create command queue for execution
- Memory Manager: Initialize memory management
- Ready: Accelerator ready for kernel execution
Kernel Execution Flow
- Kernel Compilation: Compile OpenCL C to device binary
- Argument Binding: Bind buffers and scalar arguments
- Work Sizing: Calculate global and local work sizes
- Enqueue: Enqueue kernel for execution
- Synchronize: Wait for completion (if synchronous)
- Result Retrieval: Copy results back to host
Supported Platforms
Operating Systems
- Windows: 10, 11, Server 2019+
- Linux: Most distributions with OpenCL runtime
- macOS: 10.13+ (deprecated by Apple, prefer Metal backend)
OpenCL Versions
- OpenCL 1.2: Minimum supported version
- OpenCL 2.0: Full feature support
- OpenCL 2.1/2.2/3.0: Enhanced features when available
Device Types
GPU Devices
- NVIDIA: GeForce, Quadro, Tesla (via NVIDIA OpenCL runtime)
- AMD: Radeon, FirePro, Instinct (via AMD OpenCL or ROCm)
- Intel: Iris, Arc Graphics (via Intel OpenCL runtime)
- ARM Mali: Mobile and embedded GPUs
- Qualcomm Adreno: Mobile GPUs
CPU Devices
- Intel: via Intel OpenCL CPU runtime
- AMD: via AMD OpenCL CPU runtime
- ARM: via ARM Compute Library
Accelerator Devices
- FPGA: Intel/Xilinx FPGA with OpenCL support
- DSP: Specialized signal processing accelerators
System Requirements
Minimum
- .NET 9.0 or later
- OpenCL 1.2 compatible device
- OpenCL runtime installed
Recommended
- OpenCL 2.0+ compatible device
- 4GB+ device memory
- Latest vendor drivers
Installing OpenCL Runtime
Windows
- NVIDIA: Install CUDA Toolkit or NVIDIA drivers
- AMD: Install AMD Radeon Software
- Intel: Install Intel Graphics drivers
Linux
- NVIDIA: Install CUDA Toolkit or nvidia-opencl-icd
- AMD: Install ROCm or amdgpu-pro drivers
- Intel: Install intel-opencl-icd or beignet
# Ubuntu/Debian
sudo apt-get install ocl-icd-opencl-dev nvidia-opencl-icd
# Fedora/RHEL
sudo dnf install ocl-icd-devel pocl
# Verify installation
clinfo
macOS
OpenCL is deprecated on macOS. Use Metal backend for macOS devices.
Configuration
Environment Variables
# Enable OpenCL debugging
export DOTCOMPUTE_OPENCL_DEBUG=1
# Select specific platform
export DOTCOMPUTE_OPENCL_PLATFORM=0
# Select specific device
export DOTCOMPUTE_OPENCL_DEVICE=0
# Force CPU device (for debugging)
export DOTCOMPUTE_OPENCL_FORCE_CPU=1
Configuration Options
var options = new OpenCLOptions
{
PreferredDeviceType = DeviceType.GPU,
EnableProfiling = true,
EnableOutOfOrderExecution = false,
BuildOptions = "-cl-fast-relaxed-math -cl-mad-enable",
CacheKernels = true
};
Current Limitations
- Image Processing: Limited support for image objects
- Shared Virtual Memory: SVM support not implemented
- Device Partitioning: Sub-device creation not supported
- Pipes: OpenCL 2.0 pipes not implemented
- P2P Message Passing: Ring kernel P2P strategy not available on OpenCL (use SharedMemory or AtomicQueue)
Troubleshooting
Device Not Found
- Verify Runtime: Run
clinfoto list OpenCL platforms and devices - Check Drivers: Ensure latest GPU drivers installed
- Permissions: On Linux, ensure user in
videogroup - Platform Selection: Try different OpenCL platforms
Compilation Failures
- Kernel Syntax: Validate OpenCL C syntax
- Build Options: Check build options compatibility
- Extensions: Verify required extensions available
- Device Capabilities: Check device limits (work group size, etc.)
Performance Issues
- Work Group Size: Optimize local work size for device
- Memory Access: Ensure coalesced memory access patterns
- Transfer Overhead: Minimize host-device transfers
- Kernel Complexity: Profile kernel execution time
Debug Tools
// Enable detailed logging
var logger = LoggerFactory.Create(builder =>
builder.AddConsole().SetMinimumLevel(LogLevel.Trace));
// Get device capabilities
var info = accelerator.DeviceInfo;
Console.WriteLine($"Max Work Group Size: {info.MaxWorkGroupSize}");
Console.WriteLine($"Max Compute Units: {info.MaxComputeUnits}");
Console.WriteLine($"Extensions: {string.Join(", ", info.Extensions)}");
// Profile kernel execution
var sw = Stopwatch.StartNew();
await kernel.ExecuteAsync(args, launchParams);
await accelerator.SynchronizeAsync();
sw.Stop();
Console.WriteLine($"Kernel time: {sw.ElapsedMilliseconds}ms");
Advanced Features
Multi-Device Execution
// Create accelerator for each device
var accelerators = new List<OpenCLAccelerator>();
foreach (var device in devices)
{
var acc = new OpenCLAccelerator(logger, loggerFactory);
await acc.InitializeAsync(device);
accelerators.Add(acc);
}
// Distribute work across devices
var tasks = accelerators.Select(acc =>
acc.CompileKernelAsync(kernelDef)
.ContinueWith(t => t.Result.ExecuteAsync(args))
);
await Task.WhenAll(tasks);
Custom Build Options
var options = new CompilationOptions
{
OptimizationLevel = OptimizationLevel.O3,
CustomOptions = new[]
{
"-cl-mad-enable", // Mad operations
"-cl-fast-relaxed-math", // Fast math
"-cl-finite-math-only", // No INF/NaN
"-cl-unsafe-math-optimizations"
}
};
var kernel = await accelerator.CompileKernelAsync(definition, options);
Ring Kernels with OpenCL
using DotCompute.Abstractions.RingKernels;
// Define persistent ring kernel for graph processing
[RingKernel(
KernelId = "graph-process",
Domain = RingKernelDomain.GraphAnalytics,
Mode = RingKernelMode.Persistent,
Capacity = 8192,
Backends = KernelBackends.OpenCL)]
public static void ProcessGraphVertex(
IMessageQueue<GraphMessage> incoming,
IMessageQueue<GraphMessage> outgoing,
Span<float> values)
{
int vertexId = Kernel.ThreadId.X;
// Process messages with OpenCL atomic operations
while (incoming.TryDequeue(out var msg))
{
if (msg.TargetVertex == vertexId)
values[vertexId] += msg.Value;
}
// Send updates to neighbors
outgoing.Enqueue(new GraphMessage { TargetVertex = ..., Value = ... });
}
// Launch ring kernel on OpenCL device
var runtime = orchestrator.GetRingKernelRuntime();
await runtime.LaunchAsync("graph-process", gridSize: 1024, blockSize: 256);
await runtime.ActivateAsync("graph-process");
// Send messages
await runtime.SendMessageAsync("graph-process", new GraphMessage { ... });
// Monitor performance
var metrics = await runtime.GetMetricsAsync("graph-process");
Console.WriteLine($"Throughput: {metrics.ThroughputMsgsPerSec:F2} msgs/sec");
Console.WriteLine($"GPU Utilization: {metrics.GpuUtilizationPercent:F1}%");
Dependencies
- DotCompute.Core: Core runtime components
- DotCompute.Abstractions: Interface definitions
- DotCompute.Plugins: Plugin system integration
- System.Runtime.InteropServices: P/Invoke support
- Polly: Resilience and retry policies
- Microsoft.Extensions.Logging: Logging infrastructure
Future Enhancements
- OpenCL 2.0+ Features: SVM, pipes, device-side enqueue
- Image Support: Comprehensive image object operations
- SPIR-V: Support for SPIR-V kernels
- Sub-Devices: Device partitioning support
- Interoperability: OpenGL/DirectX interop
- Performance: Further optimization and tuning
Documentation & Resources
Comprehensive documentation is available for DotCompute:
Architecture Documentation
- Backend Integration - Plugin system and accelerator implementations
- System Overview - Cross-platform architecture
Developer Guides
- Getting Started - Installation and setup
- Backend Selection - Choosing backends for cross-platform support
- Kernel Development - Writing OpenCL kernels
- Troubleshooting - Common OpenCL issues
API Documentation
- API Reference - Complete API documentation
Support
- Documentation: Comprehensive Guides
- Issues: GitHub Issues
- Discussions: GitHub Discussions
Contributing
Contributions are welcome, particularly in:
- Testing on diverse OpenCL implementations
- Platform-specific optimizations
- Additional OpenCL feature support
- Performance benchmarking
- Documentation improvements
See CONTRIBUTING.md for guidelines.
References
License
MIT License - Copyright (c) 2025 Michael Ivertowski
| Product | Versions Compatible and additional computed target framework versions. |
|---|---|
| .NET | net9.0 is compatible. net9.0-android was computed. net9.0-browser was computed. net9.0-ios was computed. net9.0-maccatalyst was computed. net9.0-macos was computed. net9.0-tvos was computed. net9.0-windows was computed. net10.0 was computed. net10.0-android was computed. net10.0-browser was computed. net10.0-ios was computed. net10.0-maccatalyst was computed. net10.0-macos was computed. net10.0-tvos was computed. net10.0-windows was computed. |
-
net9.0
- DotCompute.Abstractions (>= 0.6.2)
- DotCompute.Core (>= 0.6.2)
- DotCompute.Memory (>= 0.6.2)
- DotCompute.Plugins (>= 0.6.2)
- Microsoft.Extensions.Configuration.Abstractions (>= 10.0.2)
- Microsoft.Extensions.Configuration.Binder (>= 10.0.2)
- Microsoft.Extensions.DependencyInjection.Abstractions (>= 10.0.2)
- Microsoft.Extensions.Hosting.Abstractions (>= 10.0.2)
- Microsoft.Extensions.Logging.Abstractions (>= 10.0.2)
- Microsoft.NET.ILLink.Tasks (>= 9.0.12)
- Polly (>= 8.6.5)
- Polly.Extensions (>= 8.6.5)
- System.Runtime.InteropServices (>= 4.3.0)
NuGet packages (2)
Showing the top 2 NuGet packages that depend on DotCompute.Backends.OpenCL:
| Package | Downloads |
|---|---|
|
DotCompute.Runtime
Runtime services and dependency injection integration for DotCompute. Provides kernel execution orchestration, automatic kernel discovery, service registration, and DI container integration with Microsoft.Extensions.DependencyInjection. Production-ready with comprehensive service lifetime management. |
|
|
DotCompute.Linq
GPU-accelerated LINQ extensions for DotCompute. Transparent GPU execution for LINQ queries with automatic kernel generation, fusion optimization, and Reactive Extensions support. |
GitHub repositories
This package is not used by any popular GitHub repositories.