DotCompute.Backends.OpenCL 0.6.2

.NET 9.0

dotnet add package DotCompute.Backends.OpenCL --version 0.6.2

NuGet\Install-Package DotCompute.Backends.OpenCL -Version 0.6.2

This command is intended to be used within the Package Manager Console in Visual Studio, as it uses the NuGet module's version of Install-Package.

<PackageReference Include="DotCompute.Backends.OpenCL" Version="0.6.2" />

For projects that support PackageReference, copy this XML node into the project file to reference the package.

<PackageVersion Include="DotCompute.Backends.OpenCL" Version="0.6.2" />
                    

                            Directory.Packages.props

<PackageReference Include="DotCompute.Backends.OpenCL" />
                    

                            Project file

For projects that support Central Package Management (CPM), copy this XML node into the solution Directory.Packages.props file to version the package.

paket add DotCompute.Backends.OpenCL --version 0.6.2

The NuGet Team does not provide support for this client. Please contact its maintainers for support.

#r "nuget: DotCompute.Backends.OpenCL, 0.6.2"

#r directive can be used in F# Interactive and Polyglot Notebooks. Copy this into the interactive tool or source code of the script to reference the package.

#:package DotCompute.Backends.OpenCL@0.6.2

#:package directive can be used in C# file-based apps starting in .NET 10 preview 4. Copy this into a .cs file before any lines of code to reference the package.

#addin nuget:?package=DotCompute.Backends.OpenCL&version=0.6.2
                    

                            Install as a Cake Addin

#tool nuget:?package=DotCompute.Backends.OpenCL&version=0.6.2
                    

                            Install as a Cake Tool

The NuGet Team does not provide support for this client. Please contact its maintainers for support.

DotCompute.Backends.OpenCL

Cross-platform OpenCL compute backend for .NET 9+ with GPU and accelerator support.

Status: ⚠️ EXPERIMENTAL

EXPERIMENTAL: This backend is functional for cross-platform GPU acceleration but has not been extensively production-tested across all vendor implementations. It works well for development and testing across NVIDIA, AMD, and Intel GPUs. Production use recommended only after validation on your target hardware.

The OpenCL backend provides cross-platform GPU acceleration:

OpenCL Runtime Integration: P/Invoke bindings to OpenCL C API
Device Management: Platform and device enumeration
Context Management: OpenCL context creation and lifecycle
Memory Management: Device memory allocation and transfers
Kernel Compilation: Runtime kernel compilation from OpenCL C
Ring Kernel Support: Persistent kernels with message passing
Plugin Architecture: Integrated with DotCompute plugin system
Cross-Vendor Support: NVIDIA, AMD, Intel, ARM Mali, Qualcomm Adreno

Key Components

OpenCL Accelerator

OpenCLAccelerator

Main accelerator implementation providing:

Device initialization and management
Kernel compilation and execution
Memory allocation and synchronization
OpenCL context lifecycle management
Error handling and diagnostics

Device Management

OpenCLDeviceManager

Manages OpenCL devices:

Platform enumeration
Device discovery and selection
Capability detection
Device information queries
Multi-device support

OpenCLDeviceInfo

Device information structure:

Device name and vendor
OpenCL version and driver version
Memory sizes (global, local, constant)
Compute capabilities (work group size, compute units)
Image support and dimensions
Device type (GPU, CPU, Accelerator)

OpenCLPlatformInfo

Platform information:

Platform name and vendor
OpenCL version support
Available extensions
Device count

Context and Execution

OpenCLContext

OpenCL context wrapper:

Context creation from devices
Command queue management
Resource lifecycle
Error handling
Synchronization primitives

Memory Management

OpenCLMemoryManager

Unified memory manager for OpenCL:

Device memory allocation
Host-device memory transfers
Buffer management
Memory pooling support
Synchronous and asynchronous operations

OpenCLMemoryBuffer

Buffer implementation:

Device buffer allocation
Read/write operations
Zero-copy mapping when supported
Rectangular buffer support
Sub-buffer creation

Kernel Management

OpenCLCompiledKernel

Compiled kernel representation:

Kernel compilation from OpenCL C source
Argument binding
Execution with work dimensions
Local memory specification
Synchronous and asynchronous execution

Ring Kernel Runtime

OpenCLRingKernelRuntime

Persistent kernel runtime for OpenCL:

Launch and activation control
Message queue management with atomic operations
Status monitoring and metrics collection
Deactivation and termination support
Compatible with all OpenCL 1.2+ devices

OpenCLRingKernelCompiler

Ring kernel compilation for OpenCL:

Generates OpenCL C code for persistent kernels
Message queue implementation with atomics
Control block for kernel lifecycle management
Lock-free communication patterns

Factory

OpenCLAcceleratorFactory

Factory for creating OpenCL accelerators:

Automatic device selection
Configuration-based creation
Workload profile matching
Performance profile tuning

Native Interop

OpenCLRuntime

P/Invoke bindings to OpenCL C API:

Platform and device functions
Context and queue functions
Memory object functions
Kernel functions
Event and synchronization functions

OpenCLTypes

Native type definitions:

Platform and device IDs
Context and queue handles
Memory object handles
Kernel handles
Error codes and status types

OpenCLException

Exception type for OpenCL errors:

Error code mapping
Human-readable error messages
Stack trace preservation

Installation

dotnet add package DotCompute.Backends.OpenCL --version 0.5.3

Usage

Basic Setup

using DotCompute.Backends.OpenCL;
using Microsoft.Extensions.Logging;

var logger = LoggerFactory.Create(builder => builder.AddConsole())
    .CreateLogger<OpenCLAccelerator>();

// Create accelerator
var accelerator = new OpenCLAccelerator(logger, loggerFactory);

// Initialize with default device (first GPU or CPU)
await accelerator.InitializeAsync();

Console.WriteLine($"Using: {accelerator.Name}");
Console.WriteLine($"Global Memory: {accelerator.Info.TotalMemory / (1024*1024)} MB");

Service Registration

using Microsoft.Extensions.DependencyInjection;

var services = new ServiceCollection();

// Register OpenCL backend
services.AddSingleton<IAccelerator, OpenCLAccelerator>();

// OR use plugin registration
services.AddDotComputeBackend("DotCompute.Backends.OpenCL");

Device Selection

using DotCompute.Backends.OpenCL.DeviceManagement;

var deviceManager = new OpenCLDeviceManager(logger);

// Enumerate all devices
var devices = await deviceManager.EnumerateDevicesAsync();

foreach (var device in devices)
{
    Console.WriteLine($"Device: {device.Name}");
    Console.WriteLine($"  Type: {device.DeviceType}");
    Console.WriteLine($"  Compute Units: {device.MaxComputeUnits}");
    Console.WriteLine($"  Global Memory: {device.GlobalMemorySize / (1024*1024)} MB");
    Console.WriteLine($"  Local Memory: {device.LocalMemorySize / 1024} KB");
}

// Select specific device
var selectedDevice = devices.FirstOrDefault(d => d.DeviceType == DeviceType.GPU);
if (selectedDevice != null)
{
    await accelerator.InitializeAsync(selectedDevice);
}

Kernel Compilation and Execution

using DotCompute.Abstractions.Kernels;

// Define OpenCL kernel
var kernelDef = new KernelDefinition
{
    Name = "VectorAdd",
    Source = @"
        __kernel void vector_add(
            __global const float* a,
            __global const float* b,
            __global float* result,
            const int length)
        {
            int gid = get_global_id(0);
            if (gid < length) {
                result[gid] = a[gid] + b[gid];
            }
        }
    ",
    EntryPoint = "vector_add"
};

// Compile kernel
var compiledKernel = await accelerator.CompileKernelAsync(kernelDef);

// Allocate device memory
var length = 1_000_000;
var bufferA = await accelerator.Memory.AllocateAsync<float>(length);
var bufferB = await accelerator.Memory.AllocateAsync<float>(length);
var bufferResult = await accelerator.Memory.AllocateAsync<float>(length);

// Copy data to device
var dataA = Enumerable.Range(0, length).Select(i => (float)i).ToArray();
var dataB = Enumerable.Range(0, length).Select(i => (float)(i * 2)).ToArray();

await bufferA.CopyFromAsync(dataA);
await bufferB.CopyFromAsync(dataB);

// Set kernel arguments and execute
var launchParams = new KernelLaunchParameters
{
    GlobalWorkSize = new[] { (uint)length },
    LocalWorkSize = new[] { 256u }
};

await compiledKernel.ExecuteAsync(new object[]
{
    bufferA,
    bufferB,
    bufferResult,
    length
}, launchParams);

// Read results back
var results = new float[length];
await bufferResult.CopyToAsync(results);

// Cleanup
await bufferA.DisposeAsync();
await bufferB.DisposeAsync();
await bufferResult.DisposeAsync();

Memory Operations

// Allocate buffer
var buffer = await accelerator.Memory.AllocateAsync<float>(10_000);

// Write to device
var hostData = new float[10_000];
await buffer.CopyFromAsync(hostData);

// Read from device
var resultData = new float[10_000];
await buffer.CopyToAsync(resultData);

// Map memory for zero-copy access (if supported)
if (accelerator.DeviceInfo?.SupportsHostMemoryMapping == true)
{
    var mappedPtr = await buffer.MapAsync(MapMode.ReadWrite);
    // Access memory directly...
    await buffer.UnmapAsync(mappedPtr);
}

Using Factory

using DotCompute.Backends.OpenCL.Factory;

var factory = new OpenCLAcceleratorFactory(configuration, logger);

// Create accelerator with performance profile
var accelerator = await factory.CreateAsync(new WorkloadProfile
{
    WorkloadType = WorkloadType.Compute,
    DataSize = DataSize.Large,
    MemoryIntensive = true
});

Architecture

Component Hierarchy

OpenCLAccelerator (IAccelerator)
    ├── OpenCLContext (Context management)
    ├── OpenCLDeviceManager (Device discovery)
    ├── OpenCLMemoryManager (Memory operations)
    └── OpenCLCompiledKernel (Kernel execution)

Native Layer:
    ├── OpenCLRuntime (P/Invoke bindings)
    ├── OpenCLTypes (Native type definitions)
    └── OpenCLException (Error handling)

Initialization Flow

Platform Enumeration: Detect all OpenCL platforms
Device Discovery: Find devices on each platform
Device Selection: Choose appropriate device
Context Creation: Create OpenCL context for device
Queue Creation: Create command queue for execution
Memory Manager: Initialize memory management
Ready: Accelerator ready for kernel execution

Kernel Execution Flow

Kernel Compilation: Compile OpenCL C to device binary
Argument Binding: Bind buffers and scalar arguments
Work Sizing: Calculate global and local work sizes
Enqueue: Enqueue kernel for execution
Synchronize: Wait for completion (if synchronous)
Result Retrieval: Copy results back to host

Supported Platforms

Operating Systems

Windows: 10, 11, Server 2019+
Linux: Most distributions with OpenCL runtime
macOS: 10.13+ (deprecated by Apple, prefer Metal backend)

OpenCL Versions

OpenCL 1.2: Minimum supported version
OpenCL 2.0: Full feature support
OpenCL 2.1/2.2/3.0: Enhanced features when available

Device Types

GPU Devices

NVIDIA: GeForce, Quadro, Tesla (via NVIDIA OpenCL runtime)
AMD: Radeon, FirePro, Instinct (via AMD OpenCL or ROCm)
Intel: Iris, Arc Graphics (via Intel OpenCL runtime)
ARM Mali: Mobile and embedded GPUs
Qualcomm Adreno: Mobile GPUs

CPU Devices

Intel: via Intel OpenCL CPU runtime
AMD: via AMD OpenCL CPU runtime
ARM: via ARM Compute Library

Accelerator Devices

FPGA: Intel/Xilinx FPGA with OpenCL support
DSP: Specialized signal processing accelerators

System Requirements

Minimum

.NET 9.0 or later
OpenCL 1.2 compatible device
OpenCL runtime installed

Installing OpenCL Runtime

Windows

NVIDIA: Install CUDA Toolkit or NVIDIA drivers
AMD: Install AMD Radeon Software
Intel: Install Intel Graphics drivers

Linux

NVIDIA: Install CUDA Toolkit or nvidia-opencl-icd
AMD: Install ROCm or amdgpu-pro drivers
Intel: Install intel-opencl-icd or beignet

# Ubuntu/Debian
sudo apt-get install ocl-icd-opencl-dev nvidia-opencl-icd

# Fedora/RHEL
sudo dnf install ocl-icd-devel pocl

# Verify installation
clinfo

macOS

OpenCL is deprecated on macOS. Use Metal backend for macOS devices.

Configuration

Environment Variables

# Enable OpenCL debugging
export DOTCOMPUTE_OPENCL_DEBUG=1

# Select specific platform
export DOTCOMPUTE_OPENCL_PLATFORM=0

# Select specific device
export DOTCOMPUTE_OPENCL_DEVICE=0

# Force CPU device (for debugging)
export DOTCOMPUTE_OPENCL_FORCE_CPU=1

Configuration Options

var options = new OpenCLOptions
{
    PreferredDeviceType = DeviceType.GPU,
    EnableProfiling = true,
    EnableOutOfOrderExecution = false,
    BuildOptions = "-cl-fast-relaxed-math -cl-mad-enable",
    CacheKernels = true
};

Current Limitations

Image Processing: Limited support for image objects
Shared Virtual Memory: SVM support not implemented
Device Partitioning: Sub-device creation not supported
Pipes: OpenCL 2.0 pipes not implemented
P2P Message Passing: Ring kernel P2P strategy not available on OpenCL (use SharedMemory or AtomicQueue)

Troubleshooting

Device Not Found

Verify Runtime: Run clinfo to list OpenCL platforms and devices
Check Drivers: Ensure latest GPU drivers installed
Permissions: On Linux, ensure user in video group
Platform Selection: Try different OpenCL platforms

Compilation Failures

Kernel Syntax: Validate OpenCL C syntax
Build Options: Check build options compatibility
Extensions: Verify required extensions available
Device Capabilities: Check device limits (work group size, etc.)

Performance Issues

Work Group Size: Optimize local work size for device
Memory Access: Ensure coalesced memory access patterns
Transfer Overhead: Minimize host-device transfers
Kernel Complexity: Profile kernel execution time

Debug Tools

// Enable detailed logging
var logger = LoggerFactory.Create(builder =>
    builder.AddConsole().SetMinimumLevel(LogLevel.Trace));

// Get device capabilities
var info = accelerator.DeviceInfo;
Console.WriteLine($"Max Work Group Size: {info.MaxWorkGroupSize}");
Console.WriteLine($"Max Compute Units: {info.MaxComputeUnits}");
Console.WriteLine($"Extensions: {string.Join(", ", info.Extensions)}");

// Profile kernel execution
var sw = Stopwatch.StartNew();
await kernel.ExecuteAsync(args, launchParams);
await accelerator.SynchronizeAsync();
sw.Stop();
Console.WriteLine($"Kernel time: {sw.ElapsedMilliseconds}ms");

Advanced Features

Multi-Device Execution

// Create accelerator for each device
var accelerators = new List<OpenCLAccelerator>();
foreach (var device in devices)
{
    var acc = new OpenCLAccelerator(logger, loggerFactory);
    await acc.InitializeAsync(device);
    accelerators.Add(acc);
}

// Distribute work across devices
var tasks = accelerators.Select(acc =>
    acc.CompileKernelAsync(kernelDef)
       .ContinueWith(t => t.Result.ExecuteAsync(args))
);

await Task.WhenAll(tasks);

Custom Build Options

var options = new CompilationOptions
{
    OptimizationLevel = OptimizationLevel.O3,
    CustomOptions = new[]
    {
        "-cl-mad-enable",           // Mad operations
        "-cl-fast-relaxed-math",    // Fast math
        "-cl-finite-math-only",     // No INF/NaN
        "-cl-unsafe-math-optimizations"
    }
};

var kernel = await accelerator.CompileKernelAsync(definition, options);

Ring Kernels with OpenCL

using DotCompute.Abstractions.RingKernels;

// Define persistent ring kernel for graph processing
[RingKernel(
    KernelId = "graph-process",
    Domain = RingKernelDomain.GraphAnalytics,
    Mode = RingKernelMode.Persistent,
    Capacity = 8192,
    Backends = KernelBackends.OpenCL)]
public static void ProcessGraphVertex(
    IMessageQueue<GraphMessage> incoming,
    IMessageQueue<GraphMessage> outgoing,
    Span<float> values)
{
    int vertexId = Kernel.ThreadId.X;

    // Process messages with OpenCL atomic operations
    while (incoming.TryDequeue(out var msg))
    {
        if (msg.TargetVertex == vertexId)
            values[vertexId] += msg.Value;
    }

    // Send updates to neighbors
    outgoing.Enqueue(new GraphMessage { TargetVertex = ..., Value = ... });
}

// Launch ring kernel on OpenCL device
var runtime = orchestrator.GetRingKernelRuntime();
await runtime.LaunchAsync("graph-process", gridSize: 1024, blockSize: 256);
await runtime.ActivateAsync("graph-process");

// Send messages
await runtime.SendMessageAsync("graph-process", new GraphMessage { ... });

// Monitor performance
var metrics = await runtime.GetMetricsAsync("graph-process");
Console.WriteLine($"Throughput: {metrics.ThroughputMsgsPerSec:F2} msgs/sec");
Console.WriteLine($"GPU Utilization: {metrics.GpuUtilizationPercent:F1}%");

Dependencies

DotCompute.Core: Core runtime components
DotCompute.Abstractions: Interface definitions
DotCompute.Plugins: Plugin system integration
System.Runtime.InteropServices: P/Invoke support
Polly: Resilience and retry policies
Microsoft.Extensions.Logging: Logging infrastructure

Future Enhancements

OpenCL 2.0+ Features: SVM, pipes, device-side enqueue
Image Support: Comprehensive image object operations
SPIR-V: Support for SPIR-V kernels
Sub-Devices: Device partitioning support
Interoperability: OpenGL/DirectX interop
Performance: Further optimization and tuning

Documentation & Resources

Comprehensive documentation is available for DotCompute:

Architecture Documentation

Backend Integration - Plugin system and accelerator implementations
System Overview - Cross-platform architecture

Developer Guides

Getting Started - Installation and setup
Backend Selection - Choosing backends for cross-platform support
Kernel Development - Writing OpenCL kernels
Troubleshooting - Common OpenCL issues

API Documentation

API Reference - Complete API documentation

Support

Documentation: Comprehensive Guides
Issues: GitHub Issues
Discussions: GitHub Discussions

Contributing

Contributions are welcome, particularly in:

Testing on diverse OpenCL implementations
Platform-specific optimizations
Additional OpenCL feature support
Performance benchmarking
Documentation improvements

See CONTRIBUTING.md for guidelines.

References

License

Product	Compatible and additional computed target framework versions.
.NET	net9.0 is compatible. net9.0-android was computed. net9.0-browser was computed. net9.0-ios was computed. net9.0-maccatalyst was computed. net9.0-macos was computed. net9.0-tvos was computed. net9.0-windows was computed. net10.0 was computed. net10.0-android was computed. net10.0-browser was computed. net10.0-ios was computed. net10.0-maccatalyst was computed. net10.0-macos was computed. net10.0-tvos was computed. net10.0-windows was computed.

Product

.NET

Compatible target framework(s)

Included target framework(s) (in package)

Learn more about Target Frameworks and .NET Standard.

net9.0
- DotCompute.Abstractions (>= 0.6.2)
- DotCompute.Core (>= 0.6.2)
- DotCompute.Memory (>= 0.6.2)
- DotCompute.Plugins (>= 0.6.2)
- Microsoft.Extensions.Configuration.Abstractions (>= 10.0.2)
- Microsoft.Extensions.Configuration.Binder (>= 10.0.2)
- Microsoft.Extensions.DependencyInjection.Abstractions (>= 10.0.2)
- Microsoft.Extensions.Hosting.Abstractions (>= 10.0.2)
- Microsoft.Extensions.Logging.Abstractions (>= 10.0.2)
- Microsoft.NET.ILLink.Tasks (>= 9.0.12)
- Polly (>= 8.6.5)
- Polly.Extensions (>= 8.6.5)
- System.Runtime.InteropServices (>= 4.3.0)

NuGet packages (2)

Showing the top 2 NuGet packages that depend on DotCompute.Backends.OpenCL:

Package	Downloads
DotCompute.Runtime Runtime services and dependency injection integration for DotCompute. Provides kernel execution orchestration, automatic kernel discovery, service registration, and DI container integration with Microsoft.Extensions.DependencyInjection. Production-ready with comprehensive service lifetime management.	3.1K
DotCompute.Linq GPU-accelerated LINQ extensions for DotCompute. Transparent GPU execution for LINQ queries with automatic kernel generation, fusion optimization, and Reactive Extensions support.	2.3K

GitHub repositories

This package is not used by any popular GitHub repositories.

Version	Downloads	Last Updated
0.6.2	122	2/9/2026
0.5.3	250	2/2/2026
0.5.2	610	12/8/2025
0.5.1	448	11/28/2025
0.5.0	226	11/27/2025
0.4.2-rc2	389	11/11/2025
0.4.1-rc2	333	11/6/2025

Total 3.3K

Current version 122

Per day average 28

dotcompute opencl gpu compute backend crossplatform nvidia amd intel mali adreno ring-kernel production-ready