DotCompute.Core 0.4.2-rc2

This is a prerelease version of DotCompute.Core.
There is a newer version of this package available.
See the version list below for details.
dotnet add package DotCompute.Core --version 0.4.2-rc2
                    
NuGet\Install-Package DotCompute.Core -Version 0.4.2-rc2
                    
This command is intended to be used within the Package Manager Console in Visual Studio, as it uses the NuGet module's version of Install-Package.
<PackageReference Include="DotCompute.Core" Version="0.4.2-rc2" />
                    
For projects that support PackageReference, copy this XML node into the project file to reference the package.
<PackageVersion Include="DotCompute.Core" Version="0.4.2-rc2" />
                    
Directory.Packages.props
<PackageReference Include="DotCompute.Core" />
                    
Project file
For projects that support Central Package Management (CPM), copy this XML node into the solution Directory.Packages.props file to version the package.
paket add DotCompute.Core --version 0.4.2-rc2
                    
#r "nuget: DotCompute.Core, 0.4.2-rc2"
                    
#r directive can be used in F# Interactive and Polyglot Notebooks. Copy this into the interactive tool or source code of the script to reference the package.
#:package DotCompute.Core@0.4.2-rc2
                    
#:package directive can be used in C# file-based apps starting in .NET 10 preview 4. Copy this into a .cs file before any lines of code to reference the package.
#addin nuget:?package=DotCompute.Core&version=0.4.2-rc2&prerelease
                    
Install as a Cake Addin
#tool nuget:?package=DotCompute.Core&version=0.4.2-rc2&prerelease
                    
Install as a Cake Tool

DotCompute.Core

Core runtime and orchestration engine for the DotCompute compute acceleration framework.

Status: ✅ Production Ready (v0.3.0-rc1)

The Core runtime provides comprehensive infrastructure for compute acceleration:

  • Kernel Execution Management: Complete kernel compilation and execution pipeline
  • Accelerator Discovery: Automatic detection and lifecycle management
  • Service Orchestration: Dependency injection integration
  • Performance Monitoring: OpenTelemetry-based telemetry infrastructure
  • Debugging Services: Cross-backend validation and profiling
  • Optimization Engine: ML-powered backend selection and workload optimization
  • Pipeline System: Execution pipeline management with optimization
  • Recovery System: Fault tolerance and error recovery
  • Native AOT: Full Native AOT compatibility

Key Components

Compute Orchestration

IComputeOrchestrator

Universal kernel execution interface providing:

  • Backend-agnostic kernel execution
  • Automatic accelerator selection
  • Type-safe parameter binding
  • Asynchronous execution model
  • Error handling and recovery
Kernel Execution Service

Runtime kernel orchestration:

  • Generated kernel discovery
  • Automatic kernel registration
  • Execution context management
  • Performance profiling
  • Result materialization

Kernel Management

Kernel Definition System
  • KernelDefinition: Metadata and source code representation
  • KernelSource: Source code abstraction with language detection
  • Kernel Compilation: Multi-backend compilation pipeline
  • Kernel Validation: Static analysis and validation
  • Kernel Caching: Compiled kernel caching with TTL
Compiled Kernel Execution
  • ICompiledKernel: Compiled kernel interface
  • Parameter Binding: Type-safe argument binding
  • Launch Configuration: Grid/block dimension specification
  • Memory Management Integration: Automatic buffer handling
  • Synchronization: Explicit and implicit synchronization

Accelerator Management

Accelerator Discovery

Automatic backend detection:

  • Platform capability detection
  • Hardware enumeration
  • Driver version checking
  • Compute capability validation
  • Multi-GPU/device support
Accelerator Lifecycle
  • Initialization: Device context creation
  • Resource Management: Memory and compute resources
  • Synchronization: Cross-device synchronization
  • Cleanup: Proper resource disposal

Debugging and Validation

Cross-Backend Debugging

Production-ready debugging system with 8 methods:

  • CompareBackends: CPU vs GPU result validation
  • ValidateKernelExecution: Output correctness verification
  • AnalyzePerformance: Performance profiling
  • InspectMemory: Memory pattern analysis
  • TestDeterminism: Reproducibility testing
  • FindOptimalConfig: Parameter tuning
  • SimulateFailures: Fault injection testing
  • GenerateDiagnostics: Comprehensive diagnostics
Debug Profiles
  • Development: Extensive validation and logging
  • Testing: Balance of validation and performance
  • Production: Minimal overhead with targeted checks
  • Custom: User-defined debugging strategies

Optimization Engine

Adaptive Backend Selection

ML-powered backend selection with 4 optimization levels:

  • Conservative: Stable, proven configurations
  • Balanced: Performance vs reliability tradeoff
  • Aggressive: Maximum performance optimization
  • ML-Optimized: Machine learning-based selection
Workload Analysis
  • Workload Characteristics: Pattern recognition
  • Performance Prediction: Execution time estimation
  • Resource Utilization: Memory and compute requirements
  • Backend Affinity: Optimal backend recommendation
Optimization Strategies
  • Hardware Profiling: Capability-based selection
  • Cost-Based: Execution cost minimization
  • ML-Based: Learning from execution patterns
  • Hybrid: Combination of multiple strategies

Pipeline System

Execution Pipelines

Comprehensive pipeline management:

  • Pipeline Definition: DAG-based execution graphs
  • Stage Composition: Kernel chains and parallel stages
  • Pipeline Optimization: Automatic graph optimization
  • Memory Management: Inter-stage buffer management
  • Error Handling: Pipeline-wide error recovery
Pipeline Profiling
  • Metrics Collection: Per-stage performance metrics
  • Bottleneck Analysis: Critical path identification
  • Resource Tracking: Memory and compute utilization
  • Optimization Recommendations: Auto-tuning suggestions

Telemetry and Observability

OpenTelemetry Integration

Production-grade observability:

  • Distributed Tracing: Execution flow tracking
  • Metrics Collection: Performance counters
  • Logging Integration: Structured logging
  • Prometheus Export: Metrics endpoint
  • OTLP Support: OpenTelemetry Protocol
Performance Metrics
  • Kernel Execution Time: Per-kernel timing
  • Memory Transfer Bandwidth: Host-device transfer rates
  • Throughput: Operations per second
  • Resource Utilization: CPU/GPU usage
  • Queue Depth: Command queue metrics

Recovery and Fault Tolerance

Fault Detection
  • Compilation Failures: Kernel compilation errors
  • Execution Failures: Runtime kernel failures
  • Memory Errors: Out-of-memory, allocation failures
  • Device Errors: GPU hangs, driver issues
Recovery Strategies
  • Retry with Backoff: Automatic retry logic
  • Fallback Backends: CPU fallback for GPU failures
  • Parameter Adjustment: Reduce resource requirements
  • Graceful Degradation: Continue with reduced capability

Security Features

Kernel Validation
  • Source Code Scanning: Detect unsafe patterns
  • Resource Limits: Prevent resource exhaustion
  • Sandboxing: Isolated execution environments
  • Access Control: Permission-based execution

Installation

dotnet add package DotCompute.Core --version 0.3.0-rc1

Usage

Basic Service Configuration

using DotCompute.Core;
using Microsoft.Extensions.DependencyInjection;
using Microsoft.Extensions.Hosting;

// Configure services
var builder = Host.CreateApplicationBuilder(args);

// Add DotCompute runtime
builder.Services.AddDotComputeRuntime(options =>
{
    options.EnableTelemetry = true;
    options.DefaultAccelerator = AcceleratorType.Auto;
    options.EnableDebugValidation = false; // Production
});

var host = builder.Build();

// Get orchestrator service
var orchestrator = host.Services.GetRequiredService<IComputeOrchestrator>();

Kernel Execution

using DotCompute.Core;

// Execute kernel using orchestrator
var result = await orchestrator.ExecuteKernelAsync<float[], float[]>(
    "VectorAdd",
    new { a = dataA, b = dataB, length = 1_000_000 }
);

// Result is automatically materialized
Console.WriteLine($"First result: {result[0]}");

Debug-Enabled Orchestration

using DotCompute.Core.Debugging;

// Enable debugging for development
builder.Services.AddProductionDebugging(options =>
{
    options.Profile = DebugProfile.Development;
    options.EnableCrossBackendValidation = true;
    options.ValidateAllExecutions = true;
    options.CollectPerformanceMetrics = true;
});

// Orchestrator automatically validates all executions
var result = await orchestrator.ExecuteKernelAsync<float[], float[]>(
    "MyKernel",
    parameters
);

// Debug service provides detailed diagnostics
var debugService = host.Services.GetRequiredService<IKernelDebugService>();
var diagnostics = await debugService.GenerateDiagnosticsAsync("MyKernel", parameters);
Console.WriteLine(diagnostics.Summary);

Performance Optimization

using DotCompute.Core.Optimization;

// Enable ML-based optimization
builder.Services.AddProductionOptimization(options =>
{
    options.OptimizationStrategy = OptimizationStrategy.Aggressive;
    options.EnableMachineLearning = true;
    options.EnableAdaptiveSelection = true;
});

// Orchestrator automatically selects optimal backend
var result = await orchestrator.ExecuteKernelAsync<float[], float[]>(
    "ComplexKernel",
    largeDataset
);

// Get optimization insights
var optimizer = host.Services.GetRequiredService<IAdaptiveBackendSelector>();
var recommendation = await optimizer.SelectBackendAsync(
    new WorkloadCharacteristics
    {
        DataSize = largeDataset.Length,
        ComputeIntensity = ComputeIntensity.High,
        MemoryIntensive = true
    }
);

Console.WriteLine($"Recommended: {recommendation.Backend}");
Console.WriteLine($"Confidence: {recommendation.Confidence:P2}");

Pipeline Execution

using DotCompute.Core.Pipelines;

// Build execution pipeline
var pipeline = await pipelineBuilder
    .AddStage("Load", loadKernel, new { path = inputPath })
    .AddStage("Transform", transformKernel)
    .AddStage("Reduce", reduceKernel)
    .AddStage("Save", saveKernel, new { path = outputPath })
    .WithOptimization()
    .WithProfiling()
    .BuildAsync();

// Execute pipeline
var result = await pipeline.ExecuteAsync();

// Get profiling results
var profiler = host.Services.GetRequiredService<IPipelineProfiler>();
var metrics = await profiler.GetMetricsAsync(pipeline.Id);

foreach (var stage in metrics.Stages)
{
    Console.WriteLine($"{stage.Name}: {stage.Duration.TotalMilliseconds}ms");
}

Telemetry Configuration

using DotCompute.Core.Telemetry;
using OpenTelemetry.Metrics;
using OpenTelemetry.Trace;

builder.Services.AddOpenTelemetry()
    .WithMetrics(metrics =>
    {
        metrics.AddDotComputeInstrumentation();
        metrics.AddPrometheusExporter();
    })
    .WithTracing(tracing =>
    {
        tracing.AddDotComputeInstrumentation();
        tracing.AddOtlpExporter();
    });

// Metrics automatically collected:
// - dotcompute.kernel.executions (counter)
// - dotcompute.kernel.duration (histogram)
// - dotcompute.memory.allocated (counter)
// - dotcompute.memory.transferred (counter)

Recovery Configuration

using DotCompute.Core.Recovery;

builder.Services.Configure<RecoveryOptions>(options =>
{
    options.EnableAutoRecovery = true;
    options.MaxRetries = 3;
    options.RetryDelay = TimeSpan.FromMilliseconds(100);
    options.FallbackToCPU = true;
    options.CollectFailureStatistics = true;
});

// Automatic recovery on failures
try
{
    var result = await orchestrator.ExecuteKernelAsync(kernelName, parameters);
}
catch (ComputeException ex)
{
    // After exhausting retries and fallbacks
    Console.WriteLine($"Execution failed: {ex.Message}");
    Console.WriteLine($"Retries attempted: {ex.RetryCount}");
}

Architecture

Service Layer

Application
    ↓
IComputeOrchestrator (High-level API)
    ↓
KernelExecutionService (Orchestration)
    ↓
├── Accelerator Management (Device selection)
├── Kernel Discovery (Generated kernels)
├── Kernel Compilation (Backend-specific)
├── Memory Management (Buffer allocation)
├── Execution Pipeline (Kernel launch)
├── Telemetry Collection (Metrics)
└── Error Recovery (Fault handling)

Component Integration

DotCompute.Core
├── Abstractions Layer (Interfaces)
├── Compute Engine (Execution)
├── Memory Management (Buffers)
├── Debugging Services (Validation)
├── Optimization Engine (Backend selection)
├── Pipeline System (Workflow)
├── Telemetry System (Observability)
├── Recovery System (Fault tolerance)
└── Security System (Validation)

Configuration Options

Runtime Options

public class DotComputeRuntimeOptions
{
    public AcceleratorType DefaultAccelerator { get; set; } = AcceleratorType.Auto;
    public bool EnableTelemetry { get; set; } = true;
    public bool EnableDebugValidation { get; set; } = false;
    public bool EnableAutoOptimization { get; set; } = true;
    public bool EnableRecovery { get; set; } = true;
    public LogLevel MinimumLogLevel { get; set; } = LogLevel.Information;
}

Debug Options

public class DebugOptions
{
    public DebugProfile Profile { get; set; } = DebugProfile.Production;
    public bool EnableCrossBackendValidation { get; set; } = false;
    public bool ValidateAllExecutions { get; set; } = false;
    public bool CollectPerformanceMetrics { get; set; } = true;
    public double ToleranceThreshold { get; set; } = 1e-5;
}

Optimization Options

public class OptimizationOptions
{
    public OptimizationStrategy Strategy { get; set; } = OptimizationStrategy.Balanced;
    public bool EnableMachineLearning { get; set; } = false;
    public bool EnableAdaptiveSelection { get; set; } = true;
    public bool CacheDecisions { get; set; } = true;
    public TimeSpan CacheDuration { get; set; } = TimeSpan.FromMinutes(30);
}

System Requirements

  • .NET 9.0 or later
  • Native AOT compatible runtime (optional)
  • 2GB+ RAM (4GB+ recommended)
  • OpenTelemetry compatible monitoring (optional)

Performance Characteristics

Overhead

  • Orchestration overhead: < 50μs per kernel
  • Debugging overhead (Development): 2-5x execution time
  • Debugging overhead (Production): < 5% execution time
  • Telemetry overhead: < 1% execution time

Scalability

  • Concurrent kernel executions: Unlimited (backend-limited)
  • Pipeline depth: No practical limit
  • Telemetry throughput: 10K+ events/second

Troubleshooting

Accelerator Not Found

Check accelerator availability:

var service = host.Services.GetRequiredService<IAcceleratorDiscoveryService>();
var accelerators = await service.DiscoverAsync();

if (!accelerators.Any())
{
    Console.WriteLine("No accelerators found. Ensure drivers installed.");
}

Kernel Compilation Failures

Enable detailed logging:

builder.Logging.SetMinimumLevel(LogLevel.Trace);

Performance Issues

Use profiling:

builder.Services.AddProductionDebugging(options =>
{
    options.CollectPerformanceMetrics = true;
});

Advanced Topics

Custom Accelerator Implementation

Implement IAccelerator and register:

builder.Services.AddSingleton<IAccelerator, CustomAccelerator>();

Custom Optimization Strategy

Implement IOptimizationStrategy:

public class CustomStrategy : IOptimizationStrategy
{
    public Task<OptimizationDecision> DecideAsync(WorkloadCharacteristics workload)
    {
        // Custom logic
    }
}

Pipeline Optimization

Custom pipeline optimizer:

public class CustomPipelineOptimizer : IPipelineOptimizer
{
    public Task<PipelineExecutionPlan> OptimizeAsync(IPipeline pipeline)
    {
        // Custom optimization logic
    }
}

Dependencies

  • DotCompute.Abstractions: Core abstractions
  • Microsoft.Extensions.DependencyInjection: DI infrastructure
  • Microsoft.Extensions.Logging: Logging infrastructure
  • OpenTelemetry: Observability infrastructure
  • System.Diagnostics.DiagnosticSource: Metrics and tracing
  • prometheus-net: Prometheus metrics export

Documentation & Resources

Comprehensive documentation is available for DotCompute:

Architecture Documentation

Developer Guides

Examples

API Documentation

Support

Contributing

Contributions are welcome in:

  • Additional optimization strategies
  • Performance improvements
  • Platform-specific enhancements
  • Documentation and examples

See CONTRIBUTING.md for guidelines.

License

MIT License - Copyright (c) 2025 Michael Ivertowski

Product Compatible and additional computed target framework versions.
.NET net9.0 is compatible.  net9.0-android was computed.  net9.0-browser was computed.  net9.0-ios was computed.  net9.0-maccatalyst was computed.  net9.0-macos was computed.  net9.0-tvos was computed.  net9.0-windows was computed.  net10.0 was computed.  net10.0-android was computed.  net10.0-browser was computed.  net10.0-ios was computed.  net10.0-maccatalyst was computed.  net10.0-macos was computed.  net10.0-tvos was computed.  net10.0-windows was computed. 
Compatible target framework(s)
Included target framework(s) (in package)
Learn more about Target Frameworks and .NET Standard.

NuGet packages (10)

Showing the top 5 NuGet packages that depend on DotCompute.Core:

Package Downloads
DotCompute.Plugins

Plugin system for DotCompute compute acceleration framework. Provides hot-reload capability, plugin discovery, circuit breaker patterns, and fault tolerance infrastructure.

DotCompute.Memory

Unified memory management for DotCompute. Provides zero-copy buffers, memory pooling, and cross-device memory transfers.

DotCompute.Backends.CUDA

Production-ready NVIDIA CUDA GPU backend for DotCompute. Provides GPU acceleration (21-92x speedup) through CUDA with NVRTC compilation, P2P transfers, Ring Kernels with NCCL support, and unified memory. Requires CUDA 12.0+ and Compute Capability 5.0+ NVIDIA GPU. Benchmarked on RTX 2000 Ada (CC 8.9).

DotCompute.Backends.CPU

Production-ready CPU compute backend for DotCompute. Provides SIMD vectorization (3.7x faster) using AVX2/AVX512/NEON instructions, multi-threaded kernel execution, and Ring Kernel simulation. Benchmarked: Vector Add (100K elements) 2.14ms → 0.58ms. Native AOT compatible with sub-10ms startup.

DotCompute.Backends.OpenCL

Production-ready OpenCL backend for DotCompute. Cross-platform GPU acceleration for NVIDIA, AMD, Intel, ARM Mali, and Qualcomm Adreno GPUs. Supports OpenCL 1.2+, Ring Kernels with atomic message queues, runtime kernel compilation, and multi-device workload distribution. Works with nvidia-opencl-icd, ROCm, intel-opencl-icd, and vendor drivers.

GitHub repositories

This package is not used by any popular GitHub repositories.

Version Downloads Last Updated
0.6.2 166 2/9/2026
0.5.3 351 2/2/2026
0.5.2 703 12/8/2025
0.5.1 664 11/28/2025
0.5.0 264 11/27/2025
0.4.2-rc2 435 11/11/2025
0.4.1-rc2 397 11/6/2025