Orleans.GpuBridge.Runtime 0.3.0

.NET 9.0

dotnet add package Orleans.GpuBridge.Runtime --version 0.3.0

NuGet\Install-Package Orleans.GpuBridge.Runtime -Version 0.3.0

This command is intended to be used within the Package Manager Console in Visual Studio, as it uses the NuGet module's version of Install-Package.

<PackageReference Include="Orleans.GpuBridge.Runtime" Version="0.3.0" />

For projects that support PackageReference, copy this XML node into the project file to reference the package.

<PackageVersion Include="Orleans.GpuBridge.Runtime" Version="0.3.0" />
                    

                            Directory.Packages.props

<PackageReference Include="Orleans.GpuBridge.Runtime" />
                    

                            Project file

For projects that support Central Package Management (CPM), copy this XML node into the solution Directory.Packages.props file to version the package.

paket add Orleans.GpuBridge.Runtime --version 0.3.0

The NuGet Team does not provide support for this client. Please contact its maintainers for support.

#r "nuget: Orleans.GpuBridge.Runtime, 0.3.0"

#r directive can be used in F# Interactive and Polyglot Notebooks. Copy this into the interactive tool or source code of the script to reference the package.

#:package Orleans.GpuBridge.Runtime@0.3.0

#:package directive can be used in C# file-based apps starting in .NET 10 preview 4. Copy this into a .cs file before any lines of code to reference the package.

#addin nuget:?package=Orleans.GpuBridge.Runtime&version=0.3.0
                    

                            Install as a Cake Addin

#tool nuget:?package=Orleans.GpuBridge.Runtime&version=0.3.0
                    

                            Install as a Cake Tool

The NuGet Team does not provide support for this client. Please contact its maintainers for support.

Orleans.GpuBridge.Runtime

Overview

Orleans.GpuBridge.Runtime provides the core runtime implementation for GPU acceleration in Orleans applications. This library enables seamless integration of GPU compute resources with Orleans grains through a comprehensive bridge abstraction layer, featuring automatic device discovery, intelligent load balancing, and robust fallback mechanisms.

Key Features

🚀 Core Capabilities

Automatic GPU Device Discovery: Detects CUDA, OpenCL, DirectCompute, Metal, and Vulkan devices
Intelligent Device Selection: Smart device scoring based on memory, queue depth, and performance
CPU Fallback Support: Seamless fallback to CPU when GPU resources are unavailable
Memory Pool Management: Efficient memory pooling for both GPU and CPU operations
Kernel Catalog System: Centralized kernel registration and execution management

🔧 Advanced Features

Load Balancing: Automatic work distribution across multiple GPU devices
Health Monitoring: Real-time device health monitoring and recovery
Persistent Kernel Hosting: Long-running kernel hosts for improved performance
Queue Depth Awareness: Queue-aware placement strategies for optimal resource utilization
Production Hardening: Comprehensive error handling, logging, and diagnostics

🛡️ Reliability & Monitoring

Resilience Patterns: Built-in retry policies and circuit breakers
Telemetry Collection: Comprehensive metrics and performance tracking
Device Health Monitoring: Continuous monitoring with automatic recovery
Resource Cleanup: Proper disposal and resource management

Installation

NuGet Package

dotnet add package Orleans.GpuBridge.Runtime

Manual Installation

git clone https://github.com/your-repo/Orleans.GpuBridge.Core.git
cd Orleans.GpuBridge.Core/src/Orleans.GpuBridge.Runtime
dotnet build

Quick Start

Basic Configuration

using Orleans.GpuBridge.Runtime.Extensions;
using Microsoft.Extensions.DependencyInjection;
using Microsoft.Extensions.Hosting;

var builder = Host.CreateDefaultBuilder()
    .ConfigureServices(services =>
    {
        // Basic GPU Bridge setup
        services.AddGpuBridge(options =>
        {
            options.PreferGpu = true;
            options.FallbackToCpu = true;
        });
    });

var host = builder.Build();
await host.RunAsync();

Full Configuration with Ring Kernels

using Orleans.GpuBridge.Runtime.Extensions;
using Orleans.GpuBridge.Backends.DotCompute.Extensions;

services.AddGpuBridge(options =>
{
    options.PreferGpu = true;
    options.FallbackToCpu = true;
    options.MaxConcurrentKernels = 100;
})
.AddDotGpuBackend() // Add DotCompute backend
.Services
.AddRingKernelSupport(options =>
{
    options.DefaultGridSize = 1;
    options.DefaultBlockSize = 256;
    options.DefaultQueueCapacity = 256;
    options.EnableKernelCaching = true;
    options.DeviceIndex = 0;
})
.AddK2KSupport()                    // Enable kernel-to-kernel messaging
.AddDotComputeRingKernelBridge();   // GPU-accelerated ring kernel bridge

Configuration

GpuBridgeOptions

services.AddGpuBridge(options =>
{
    // GPU preferences
    options.PreferGpu = true;
    options.FallbackToCpu = true;
    options.MaxRetries = 3;

    // Performance tuning
    options.DefaultMicroBatch = 8192;
    options.MaxConcurrentKernels = 100;
    options.MemoryPoolSizeMB = 1024;
    options.BatchSize = 1024;

    // Device management
    options.MaxDevices = 4;
    options.EnableGpuDirectStorage = false;

    // Backend configuration
    options.DefaultBackend = "DotCompute";
    options.EnableProviderDiscovery = true;

    // Telemetry
    options.EnableProfiling = false;
    options.Telemetry = new TelemetryOptions
    {
        EnableMetrics = true,
        EnableTracing = true,
        SamplingRate = 0.1
    };
});

RingKernelOptions

services.AddRingKernelSupport(options =>
{
    // Kernel launch configuration
    options.DefaultGridSize = 1;      // Single block for single-actor
    options.DefaultBlockSize = 256;   // Optimal for most GPUs

    // Message queue
    options.DefaultQueueCapacity = 256; // Must be power of 2

    // Compilation
    options.EnableKernelCaching = true; // Cache compiled kernels
    options.DeviceIndex = 0;            // First GPU
});

Key Components

IGpuBridge

The primary interface for GPU bridge operations:

public interface IGpuBridge
{
    ValueTask<GpuBridgeInfo> GetInfoAsync(CancellationToken ct = default);
    ValueTask<IGpuKernel<TIn, TOut>> GetKernelAsync<TIn, TOut>(
        KernelId kernelId, CancellationToken ct = default);
    ValueTask<IReadOnlyList<GpuDevice>> GetDevicesAsync(CancellationToken ct = default);
    ValueTask<object> ExecuteKernelAsync(string kernelId, object input, CancellationToken ct = default);
}

Usage Example:

var gpuBridge = serviceProvider.GetRequiredService<IGpuBridge>();

// Get bridge info
var info = await gpuBridge.GetInfoAsync();
Console.WriteLine($"Backend: {info.BackendName}, Devices: {info.DeviceCount}");

// Get kernel and execute
var kernel = await gpuBridge.GetKernelAsync<float[], float[]>(new KernelId("vector-add"));
var result = await kernel.ExecuteAsync(inputData);

IRingKernelBridge

For GPU-native actors using persistent ring kernels:

public interface IRingKernelBridge
{
    bool IsAvailable { get; }

    ValueTask<GpuStateHandle<TState>> AllocateStateAsync<TState>(
        long actorId, TState initialState, CancellationToken ct = default)
        where TState : unmanaged;

    ValueTask<TResponse> SendMessageAsync<TState, TRequest, TResponse>(
        GpuStateHandle<TState> stateHandle, TRequest request, CancellationToken ct = default)
        where TState : unmanaged where TRequest : unmanaged where TResponse : unmanaged;

    ValueTask<TState> GetStateAsync<TState>(GpuStateHandle<TState> handle, CancellationToken ct = default)
        where TState : unmanaged;

    ValueTask ReleaseAsync<TState>(GpuStateHandle<TState> handle, CancellationToken ct = default)
        where TState : unmanaged;
}

Usage Example:

var bridge = serviceProvider.GetRequiredService<IRingKernelBridge>();

// Allocate GPU state for actor
var stateHandle = await bridge.AllocateStateAsync(actorId, new CounterState { Value = 0 });

// Send message through GPU ring kernel
var response = await bridge.SendMessageAsync<CounterState, IncrementMessage, int>(
    stateHandle, new IncrementMessage { Amount = 5 });

// Get current state
var state = await bridge.GetStateAsync(stateHandle);

// Release when done
await bridge.ReleaseAsync(stateHandle);

DeviceBroker

Manages GPU device discovery and selection:

var deviceBroker = serviceProvider.GetRequiredService<DeviceBroker>();
await deviceBroker.InitializeAsync(cancellationToken);

var devices = deviceBroker.GetDevices();
foreach (var device in devices)
{
    Console.WriteLine($"Device {device.Index}: {device.Name}");
    Console.WriteLine($"  Memory: {device.TotalMemoryBytes / (1024 * 1024)}MB");
    Console.WriteLine($"  Compute Units: {device.ComputeUnits}");
}

Backend Providers

// DotCompute backend (recommended - cross-platform)
services.AddGpuBridge()
    .AddDotGpuBackend();

// With custom configuration
services.AddGpuBridge()
    .AddDotGpuBackend(config =>
    {
        config.OptimizationLevel = OptimizationLevel.O3;
        config.MemorySettings.InitialPoolSize = 1024 * 1024 * 1024; // 1 GB
    });

// With factory for runtime configuration
services.AddGpuBridge()
    .AddDotGpuBackend(sp =>
    {
        var env = sp.GetRequiredService<IHostEnvironment>();
        var config = new DotGpuBackendConfiguration();
        config.EnableDebugMode = env.IsDevelopment();
        return new DotComputeBackendProvider(config, sp.GetRequiredService<ILoggerFactory>());
    });

Performance Optimization

Deployment Models

Model	Latency	Throughput	Use Case
GPU-Offload (GpuGrainBase)	10-100μs	15K msg/s	Batch processing, infrequent GPU
GPU-Native (RingKernelGrainBase)	100-500ns	2M msg/s	High-frequency messaging

K2K Routing Strategies

// Kernel-to-Kernel messaging strategies
public enum K2KRoutingStrategy
{
    Direct,      // Point-to-point messaging (100-500ns)
    Broadcast,   // One-to-many messaging
    Ring,        // Circular topology for consensus
    HashRouted   // Consistent hashing for load distribution
}

Grain Base Classes

// For GPU-offload model (batch processing)
public class MyComputeGrain : GpuGrainBase<MyState>, IMyComputeGrain
{
    public async ValueTask<float[]> ProcessAsync(float[] input)
    {
        return await InvokeKernelAsync<float[], float[]>("my-kernel", input);
    }
}

// For GPU-native model (high-frequency messaging)
public class MyHighFreqActor : RingKernelGrainBase<MyState, MyMessage>
{
    protected override string KernelId => "my-actor-kernel";

    public async ValueTask<int> ProcessAsync(MyMessage msg)
    {
        return await InvokeKernelAsync<MyMessage, int>(msg);
    }
}

Memory Management Tips

Use Ring Kernels for High-Frequency: For >1000 msg/s, use RingKernelGrainBase
Batch Operations: For batch processing, use GpuGrainBase with large batches
Enable Memory Pooling: Set MemoryPoolSizeMB appropriately
Monitor Telemetry: Enable profiling during development

Error Handling and Diagnostics

Error Handling Patterns

try
{
    var result = await catalog.ExecuteAsync<float[], float[]>("my_kernel", input);
    return result;
}
catch (GpuDeviceException ex)
{
    logger.LogWarning(ex, "GPU operation failed, falling back to CPU");
    // Automatic CPU fallback is handled by the runtime
    throw;
}
catch (KernelNotFoundException ex)
{
    logger.LogError(ex, "Kernel {KernelId} not found", "my_kernel");
    throw;
}
catch (GpuMemoryException ex)
{
    logger.LogError(ex, "GPU memory allocation failed");
    // Consider reducing batch size or clearing cache
    throw;
}

Diagnostic Information

// Get device diagnostics
var diagnostics = serviceProvider.GetRequiredService<GpuDiagnostics>();
var deviceStats = await diagnostics.GetDeviceStatisticsAsync();

foreach (var stat in deviceStats)
{
    Console.WriteLine($"Device {stat.Index}: {stat.Name}");
    Console.WriteLine($"  Memory: {stat.UsedMemory:N0}/{stat.TotalMemory:N0} bytes");
    Console.WriteLine($"  Queue Depth: {stat.QueueDepth}");
    Console.WriteLine($"  Error Rate: {stat.ErrorRate:P2}");
}

// Get memory pool statistics
var memoryStats = memoryPool.GetStats();
Console.WriteLine($"Pool: {memoryStats.InUse:N0}/{memoryStats.TotalAllocated:N0} bytes");
Console.WriteLine($"Efficiency: {memoryStats.PooledItems} items pooled");

Logging Configuration

services.AddLogging(builder =>
{
    builder.AddConsole();
    builder.SetMinimumLevel(LogLevel.Information);
    
    // Specific GPU Bridge logging
    builder.AddFilter("Orleans.GpuBridge", LogLevel.Debug);
    builder.AddFilter("Orleans.GpuBridge.Runtime.DeviceBroker", LogLevel.Trace);
});

Dependencies

Required Packages

Microsoft.Orleans.Core (≥ 8.0.0) - Orleans framework
Microsoft.Extensions.DependencyInjection (≥ 8.0.0) - Dependency injection
Microsoft.Extensions.Hosting (≥ 8.0.0) - Application hosting
Microsoft.Extensions.Logging (≥ 8.0.0) - Logging infrastructure
Microsoft.Extensions.Options (≥ 8.0.0) - Configuration options

Optional Packages

Orleans.GpuBridge.Abstractions - Core abstractions (auto-included)
Orleans.GpuBridge.BridgeFX - High-level pipeline API
Orleans.GpuBridge.Grains - GPU-enabled Orleans grains

GPU Backend Dependencies

CUDA Toolkit (≥ 12.0) - For CUDA backend support
OpenCL Runtime - For OpenCL backend support
DirectX - For DirectCompute backend support (Windows only)

System Requirements

Minimum Requirements

.NET 9.0 or later
4GB RAM (8GB+ recommended)
GPU with 1GB+ VRAM (optional, CPU fallback available)

Recommended Requirements

.NET 9.0 or later
16GB+ RAM
Modern GPU with 4GB+ VRAM
NVMe SSD for optimal data transfer

Supported Platforms

Windows 10/11 (x64)
Linux (Ubuntu 20.04+, CentOS 8+)
macOS (Intel and Apple Silicon)

GPU Support

Backend	Windows	Linux	macOS	Notes
CUDA	✅	✅	❌	NVIDIA GPUs only
OpenCL	✅	✅	✅	Most GPU vendors
DirectCompute	✅	❌	❌	DirectX 11+ required
Metal	❌	❌	✅	Apple Silicon/Intel
Vulkan	🚧	🚧	🚧	Coming soon

Examples

Basic Kernel Implementation

public class VectorAddKernel : IGpuKernel<VectorAddInput, float[]>
{
    public async Task<float[]> ExecuteAsync(VectorAddInput input, CancellationToken ct = default)
    {
        // CPU fallback implementation
        var result = new float[input.A.Length];
        for (int i = 0; i < input.A.Length; i++)
        {
            result[i] = input.A[i] + input.B[i];
        }
        return result;
    }
    
    public void Dispose() { }
}

public record VectorAddInput(float[] A, float[] B);

Orleans Grain Integration

[GpuAccelerated]
public class ComputeGrain : Grain, IComputeGrain
{
    private readonly KernelCatalog _kernelCatalog;
    
    public ComputeGrain(KernelCatalog kernelCatalog)
    {
        _kernelCatalog = kernelCatalog;
    }
    
    public async Task<float[]> ProcessDataAsync(float[] input)
    {
        return await _kernelCatalog.ExecuteAsync<float[], float[]>(
            "vector_process", input);
    }
}

Advanced Pipeline

public class ImageProcessingService
{
    private readonly IGrainFactory _grainFactory;
    
    public async Task<ProcessedImage[]> ProcessBatchAsync(Image[] images)
    {
        var results = await GpuPipeline<Image, ProcessedImage>
            .For(_grainFactory, "image_pipeline")
            .WithBatchSize(8)
            .WithMemoryStrategy(MemoryStrategy.Pooled)
            .WithRetryPolicy(new ExponentialBackoffRetryPolicy())
            .ExecuteAsync(images);
            
        return results;
    }
}

Troubleshooting

Common Issues

1. No GPU devices detected

Solution: Ensure GPU drivers are installed and hardware is compatible
Check: nvidia-smi (NVIDIA), clinfo (OpenCL), dxdiag (DirectX)

2. GPU memory allocation failures

Solution: Reduce batch sizes, increase system memory, or enable memory pooling
Configuration: options.MemoryStrategy = MemoryStrategy.Pooled

3. Kernel execution timeouts

Solution: Increase timeout values or optimize kernel performance
Configuration: options.KernelTimeout = TimeSpan.FromMinutes(10)

4. High error rates

Solution: Check GPU stability, update drivers, or enable CPU fallback
Monitoring: Use GpuDiagnostics to track device health

Debug Configuration

services.AddGpuBridge(options =>
{
    options.EnableDiagnostics = true;
    options.LogLevel = LogLevel.Trace;
    options.EnableHealthMonitoring = true;
})
.AddLogging(builder => builder
    .AddConsole()
    .SetMinimumLevel(LogLevel.Debug));

Contributing

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

License

Apache License 2.0

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

For more information, visit the Orleans.GpuBridge.Core Documentation or check out the Examples directory.

Product	Compatible and additional computed target framework versions.
.NET	net9.0 is compatible. net9.0-android was computed. net9.0-browser was computed. net9.0-ios was computed. net9.0-maccatalyst was computed. net9.0-macos was computed. net9.0-tvos was computed. net9.0-windows was computed. net10.0 was computed. net10.0-android was computed. net10.0-browser was computed. net10.0-ios was computed. net10.0-maccatalyst was computed. net10.0-macos was computed. net10.0-tvos was computed. net10.0-windows was computed.

Product

.NET

Compatible target framework(s)

Included target framework(s) (in package)

Learn more about Target Frameworks and .NET Standard.

net9.0
- DotCompute.Abstractions (>= 0.5.3)
- DotCompute.Backends.CUDA (>= 0.5.3)
- Microsoft.CodeAnalysis.Workspaces.Common (>= 5.0.0)
- Microsoft.DotNet.ILCompiler (>= 10.0.0)
- Microsoft.Extensions.DependencyInjection (>= 10.0.1)
- Microsoft.Extensions.Hosting (>= 10.0.1)
- Microsoft.NET.ILLink.Tasks (>= 9.0.11)
- Microsoft.Orleans.Core (>= 9.2.1)
- Microsoft.Orleans.Server (>= 9.2.1)
- OpenTelemetry.Api (>= 1.14.0)
- Orleans.GpuBridge.Abstractions (>= 0.3.0)
- System.Management (>= 10.0.1)
- System.Threading.Channels (>= 10.0.1)

NuGet packages (3)

Showing the top 3 NuGet packages that depend on Orleans.GpuBridge.Runtime:

Package	Downloads
Orleans.GpuBridge.Grains Orleans grain implementations for GPU Bridge - GPU-accelerated batch, stream, and resident grains	1.2K
Orleans.GpuBridge.Backends.DotCompute DotCompute backend provider for Orleans.GpuBridge.Core - Enables GPU acceleration via CUDA, OpenCL, Metal, and CPU with attribute-based kernel definition.	1.1K
Orleans.GpuBridge.HealthChecks Package Description	1.1K

GitHub repositories

This package is not used by any popular GitHub repositories.

Version	Downloads	Last Updated
0.3.0	165	2/9/2026
0.2.1	502	12/8/2025
0.2.0	227	12/5/2025
0.1.0	407	11/30/2025