DotCompute.Memory 0.6.2

.NET 9.0

dotnet add package DotCompute.Memory --version 0.6.2

NuGet\Install-Package DotCompute.Memory -Version 0.6.2

This command is intended to be used within the Package Manager Console in Visual Studio, as it uses the NuGet module's version of Install-Package.

<PackageReference Include="DotCompute.Memory" Version="0.6.2" />

For projects that support PackageReference, copy this XML node into the project file to reference the package.

<PackageVersion Include="DotCompute.Memory" Version="0.6.2" />
                    

                            Directory.Packages.props

<PackageReference Include="DotCompute.Memory" />
                    

                            Project file

For projects that support Central Package Management (CPM), copy this XML node into the solution Directory.Packages.props file to version the package.

paket add DotCompute.Memory --version 0.6.2

The NuGet Team does not provide support for this client. Please contact its maintainers for support.

#r "nuget: DotCompute.Memory, 0.6.2"

#r directive can be used in F# Interactive and Polyglot Notebooks. Copy this into the interactive tool or source code of the script to reference the package.

#:package DotCompute.Memory@0.6.2

#:package directive can be used in C# file-based apps starting in .NET 10 preview 4. Copy this into a .cs file before any lines of code to reference the package.

#addin nuget:?package=DotCompute.Memory&version=0.6.2
                    

                            Install as a Cake Addin

#tool nuget:?package=DotCompute.Memory&version=0.6.2
                    

                            Install as a Cake Tool

The NuGet Team does not provide support for this client. Please contact its maintainers for support.

DotCompute.Memory

High-performance unified memory management system for DotCompute with zero-copy operations and cross-device memory transfers.

Status: ✅ Production Ready

The Memory module provides comprehensive memory management capabilities:

Memory Pooling: 90% allocation reduction through object pooling
Zero-Copy Operations: Span<T> and Memory<T> throughout
Cross-Device Transfers: Unified abstraction for CPU/GPU memory
Thread Safety: Lock-free operations and concurrent access
Native AOT: Full compatibility with Native AOT compilation

Key Features

Unified Memory Management

UnifiedMemoryManager: Central memory management authority for DotCompute
Cross-Backend Support: Works with CPU, CUDA, Metal, OpenCL backends
Automatic Cleanup: Background defragmentation and resource reclamation
Statistics and Monitoring: Comprehensive metrics for memory usage

Buffer Abstractions

OptimizedUnifiedBuffer<T>

Performance-optimized unified buffer with:

Object pooling for frequent allocations (90% reduction target)
Lazy initialization for expensive operations
Zero-copy operations using Span<T> and Memory<T>
Async-first design with optimized synchronization
Memory prefetching for improved cache performance
NUMA-aware memory allocation

Buffer States

Buffers track their state across devices:

HostOnly: Data exists only in system memory
DeviceOnly: Data exists only in device memory
Synchronized: Data is consistent across host and device
HostDirty: Host copy modified, needs sync to device
DeviceDirty: Device copy modified, needs sync to host

Memory Pooling

HighPerformanceObjectPool<T>

High-performance object pool optimized for compute workloads:

Lock-free operations using ConcurrentStack
Automatic pool size management
Thread-local storage for hot paths
Performance metrics and monitoring
Configurable eviction policies
NUMA-aware allocation when available

MemoryPool

Specialized memory pool for buffer management:

Buffer size categorization for optimal reuse
Automatic cleanup of unused buffers
Configurable retention policies
Cross-backend buffer pooling

Advanced Transfer Engine

AdvancedMemoryTransferEngine

Sophisticated memory transfer system:

Concurrent Transfers: Parallel host-device data movement
Transfer Pipelining: Overlap compute and transfer operations
Adaptive Batching: Automatic batch size optimization
Transfer Statistics: Detailed performance metrics
Error Recovery: Automatic retry with exponential backoff

Transfer Options

Asynchronous Transfers: Non-blocking memory operations
Pinned Memory: Use page-locked memory for faster transfers
Streaming Transfers: Support for chunked data movement
Priority Scheduling: Prioritize critical transfers

Zero-Copy Operations

ZeroCopyOperations

Optimized operations that avoid unnecessary data copies:

Direct Span<T> manipulation
Memory-mapped operations
Pointer-based transformations
SIMD-accelerated operations when available

UnsafeMemoryOperations

Low-level unsafe memory operations:

Pointer arithmetic utilities
Unmanaged memory marshalling
Platform-specific optimizations
Bounds-checked unsafe operations

Memory Utilities

MemoryAllocator

Centralized memory allocation with:

Aligned memory allocation
Platform-specific allocators
Huge page support (when available)
Memory tracking and diagnostics

ArrayPoolWrapper<T>

Wrapper around ArrayPool<T> with:

Automatic size normalization
Return tracking to prevent double-free
Usage statistics
Integration with DotCompute pooling

Performance Metrics

MemoryStatistics

Comprehensive memory usage statistics:

Total bytes allocated/freed
Active allocation count
Peak memory usage
Allocation/deallocation rates
Pool efficiency metrics

BufferPerformanceMetrics

Per-buffer performance tracking:

Transfer count and total time
Average transfer bandwidth
Last access timestamp
Cache hit/miss ratio

TransferStatistics

Memory transfer performance data:

Bytes transferred (host-to-device, device-to-host)
Transfer count and average time
Bandwidth utilization
Transfer efficiency metrics

Installation

dotnet add package DotCompute.Memory --version 0.5.3

Usage

Basic Memory Management

using DotCompute.Memory;
using DotCompute.Abstractions;

// Create memory manager (typically done by accelerator)
var memoryManager = new UnifiedMemoryManager(accelerator, logger);

// Allocate a buffer
var buffer = await memoryManager.AllocateAsync<float>(1_000_000);

// Write data to buffer
var data = new float[1_000_000];
await buffer.CopyFromAsync(data);

// Use buffer in kernel execution
await kernel.ExecuteAsync(buffer);

// Read results back
var results = new float[1_000_000];
await buffer.CopyToAsync(results);

// Dispose when done (returns to pool if eligible)
await buffer.DisposeAsync();

CPU-Only Memory Manager

using DotCompute.Memory;

// Create CPU-only memory manager
var memoryManager = new UnifiedMemoryManager(logger);

// Allocate CPU buffers
var buffer = await memoryManager.AllocateAsync<double>(10_000);

// All operations work on CPU memory
await buffer.CopyFromAsync(cpuData);

Zero-Copy Operations with Span<T>

using DotCompute.Memory;

// Allocate buffer
var buffer = await memoryManager.AllocateAsync<float>(1000);

// Get host span for zero-copy access
var span = buffer.AsSpan();

// Direct manipulation without copying
for (int i = 0; i < span.Length; i++)
{
    span[i] = i * 2.0f;
}

// Mark buffer as dirty to sync to device
await buffer.SynchronizeAsync();

Memory Pooling

using DotCompute.Memory;

// High-performance object pool
var pool = new HighPerformanceObjectPool<MyObject>(
    createFunc: () => new MyObject(),
    resetAction: obj => obj.Reset(),
    validateFunc: obj => obj.IsValid(),
    config: new PoolConfiguration
    {
        MaxPoolSize = 1000,
        MinPoolSize = 10,
        EvictionPolicy = EvictionPolicy.LRU
    },
    logger: logger
);

// Get object from pool
var obj = pool.Get();

// Use object
obj.DoWork();

// Return to pool for reuse
pool.Return(obj);

// Get pool statistics
var stats = pool.GetStatistics();
Console.WriteLine($"Pool hits: {stats.PoolHitRate:P2}");
Console.WriteLine($"Allocation reduction: {stats.AllocationReduction:P2}");

Advanced Transfer Engine

using DotCompute.Memory;
using DotCompute.Memory.Types;

var transferEngine = new AdvancedMemoryTransferEngine(accelerator, logger);

// Concurrent transfers with options
var options = new ConcurrentTransferOptions
{
    MaxConcurrency = 4,
    UsePinnedMemory = true,
    EnablePipelining = true,
    ChunkSize = 1024 * 1024 // 1MB chunks
};

var transfers = new[]
{
    (source: data1, destination: deviceBuffer1),
    (source: data2, destination: deviceBuffer2),
    (source: data3, destination: deviceBuffer3)
};

var result = await transferEngine.ExecuteConcurrentTransfersAsync(transfers, options);

Console.WriteLine($"Total transferred: {result.TotalBytesTransferred:N0} bytes");
Console.WriteLine($"Transfer rate: {result.AverageBandwidth:F2} MB/s");
Console.WriteLine($"Failed transfers: {result.FailedTransfers}");

Memory Statistics and Monitoring

using DotCompute.Memory;

// Get memory statistics
var stats = memoryManager.Statistics;

Console.WriteLine($"Total allocated: {stats.TotalBytesAllocated:N0} bytes");
Console.WriteLine($"Active allocations: {stats.ActiveAllocationCount}");
Console.WriteLine($"Peak usage: {stats.PeakMemoryUsage:N0} bytes");
Console.WriteLine($"Pool efficiency: {stats.PoolEfficiency:P2}");

// Get detailed buffer metrics
var bufferMetrics = buffer.GetPerformanceMetrics();
Console.WriteLine($"Transfer count: {bufferMetrics.TransferCount}");
Console.WriteLine($"Avg bandwidth: {bufferMetrics.AverageBandwidth:F2} MB/s");
Console.WriteLine($"Last accessed: {bufferMetrics.LastAccessTime}");

Buffer Slicing

using DotCompute.Memory;

// Create buffer
var buffer = await memoryManager.AllocateAsync<int>(1000);

// Create slice (view) without copying data
var slice = buffer.Slice(100, 200); // Elements 100-299

// Operate on slice
await kernel.ExecuteAsync(slice);

// Slices share underlying memory with parent buffer

NUMA-Aware Allocation

using DotCompute.Memory;

// Allocator with NUMA awareness
var allocator = new MemoryAllocator(logger)
{
    UseNumaAllocation = true,
    PreferredNumaNode = 0
};

// Allocate on specific NUMA node for optimal performance
var buffer = allocator.AllocateAligned<float>(1_000_000, alignment: 64);

Architecture

Memory Hierarchy

UnifiedMemoryManager (Top-level coordinator)
    ├── MemoryPool (Buffer pooling and reuse)
    ├── AdvancedMemoryTransferEngine (Transfer orchestration)
    ├── MemoryAllocator (Low-level allocation)
    └── Buffer Registry (Active buffer tracking)

Buffers:
    ├── OptimizedUnifiedBuffer<T> (General-purpose unified buffer)
    ├── UnifiedBufferView (Non-owning view of buffer data)
    └── UnifiedBufferSlice (Slice/window into buffer)

Buffer Lifecycle

Allocation: Request buffer from memory manager
Initialization: Initialize host or device memory
Data Transfer: Move data between host and device
Execution: Use in kernel operations
Synchronization: Ensure consistency across devices
Disposal: Return to pool or free memory

Pooling Strategy

The memory system uses tiered pooling:

Thread-Local Pools: Fast path for frequent allocations
Global Pool: Shared across threads with lock-free access
Size-Based Buckets: Categorize buffers by size for optimal reuse
Automatic Eviction: Remove unused buffers based on LRU policy

Target metrics:

90%+ reduction in allocation calls
Sub-microsecond pool access latency
95%+ pool hit rate for common sizes

Transfer Optimization

Memory transfers are optimized through:

Pinned Memory: Page-locked memory for DMA transfers
Asynchronous Transfers: Non-blocking operations
Transfer Pipelining: Overlap compute and data movement
Batching: Combine small transfers into larger operations
Concurrent Streams: Parallel transfers when possible

Performance Benchmarks

Tested on RTX 2000 Ada with 16GB RAM:

Operation	Standard	Optimized	Improvement
Buffer Allocation	45μs	4μs	11.2x
Host-Device Copy (10MB)	2.8ms	2.4ms	1.2x
Device-Host Copy (10MB)	2.9ms	2.5ms	1.2x
Buffer Pool Hit	N/A	120ns	N/A
Zero-Copy Access	450ns	45ns	10x

Memory usage:

Standard allocation: 100% allocation overhead
Pooled allocation: 10% allocation overhead (90% reduction)

System Requirements

.NET 9.0 or later
4GB+ RAM recommended (8GB+ for large datasets)
Native AOT compatible

Optional Features

NUMA Support: Requires NUMA-aware OS (Windows Server, Linux with NUMA)
Huge Pages: Requires OS configuration (Linux: hugetlbfs, Windows: Large Pages privilege)
GPU Acceleration: Requires compatible accelerator (CUDA, Metal, OpenCL)

Configuration

Memory Manager Options

Configure memory manager behavior:

var options = new MemoryManagerOptions
{
    EnablePooling = true,
    MaxPoolSize = 1024 * 1024 * 1024, // 1GB
    MinPoolSize = 64 * 1024 * 1024,    // 64MB
    EnableAutoDefragmentation = true,
    DefragmentationInterval = TimeSpan.FromMinutes(5),
    EnableStatistics = true
};

var memoryManager = new UnifiedMemoryManager(accelerator, options, logger);

Pool Configuration

var poolConfig = new PoolConfiguration
{
    MaxPoolSize = 1000,
    MinPoolSize = 10,
    MaxObjectSize = 100 * 1024 * 1024, // 100MB
    EvictionPolicy = EvictionPolicy.LRU,
    MaintenanceInterval = TimeSpan.FromMinutes(1),
    EnableMetrics = true
};

Troubleshooting

Out of Memory Errors

Check Statistics: Review memoryManager.Statistics for usage patterns
Enable Cleanup: Ensure automatic cleanup is running
Manual Cleanup: Call memoryManager.TrimExcessAsync()
Adjust Pool Size: Reduce MaxPoolSize if memory constrained

Performance Issues

Pool Efficiency: Check PoolEfficiency metric (target: 90%+)
Transfer Bandwidth: Monitor transfer statistics for bottlenecks
Alignment: Ensure buffers are properly aligned (64-byte for SIMD)
Pinned Memory: Enable pinned memory for large transfers

Memory Leaks

Dispose Buffers: Always dispose buffers explicitly or use using
Registry Check: Review _activeBuffers count in memory manager
Weak References: Check _bufferRegistry for leaked references
Enable Diagnostics: Use UnifiedBufferDiagnostics for leak detection

Advanced Topics

Custom Buffer Types

Implement IUnifiedMemoryBuffer<T> for custom buffer behavior:

public class CustomBuffer<T> : IUnifiedMemoryBuffer<T> where T : unmanaged
{
    public int Length { get; }
    public long SizeInBytes { get; }

    public async ValueTask CopyFromAsync(ReadOnlyMemory<T> source)
    {
        // Custom implementation
    }

    // Implement other interface members...
}

Integration with External Libraries

// Wrap external library buffers
var externalPointer = ExternalLibrary.AllocateBuffer(size);
var wrappedBuffer = MemoryAllocator.WrapUnmanagedPointer<float>(
    externalPointer,
    size,
    ownsMemory: false
);

// Create shared memory buffer
var sharedBuffer = await memoryManager.AllocateSharedAsync<float>(
    size,
    name: "SharedComputeBuffer"
);

// Other process can open the same buffer
var remoteBuffer = await memoryManager.OpenSharedAsync<float>(
    "SharedComputeBuffer"
);

Dependencies

DotCompute.Abstractions: Core abstractions
DotCompute.Core: Core runtime components
System.IO.Pipelines: High-performance I/O
Microsoft.Toolkit.HighPerformance: Performance utilities
System.Runtime.InteropServices: Platform invoke support

Design Principles

Zero-Copy First: Minimize data copying through Span<T> and Memory<T>
Pool Everything: Reduce allocations through aggressive pooling
Async by Default: Non-blocking operations for scalability
Monitor Everything: Comprehensive statistics for diagnostics
Fail Fast: Immediate validation and error reporting
Thread-Safe: All operations safe for concurrent access

Documentation & Resources

Comprehensive documentation is available for DotCompute:

Architecture Documentation

Memory Management Architecture - Unified buffer abstraction and pooling (90% allocation reduction)
Backend Integration - P2P memory transfer architecture

Developer Guides

Memory Management Guide - Memory pooling best practices and zero-copy techniques
Performance Tuning - Memory optimization strategies (11.2x speedup)
Multi-GPU Programming - P2P transfers (12 GB/s measured)

Examples

Basic Vector Operations - Buffer management examples
Image Processing - Memory-efficient image operations

API Documentation

API Reference - Complete API documentation
UnifiedBuffer Documentation - Buffer API reference

Support

Documentation: Comprehensive Guides
Issues: GitHub Issues
Discussions: GitHub Discussions

Contributing

Contributions are welcome, particularly in:

Platform-specific memory optimizations
Additional pooling strategies
Transfer optimization techniques
Memory usage profiling tools

See CONTRIBUTING.md for guidelines.

License

Product	Compatible and additional computed target framework versions.
.NET	net9.0 is compatible. net9.0-android was computed. net9.0-browser was computed. net9.0-ios was computed. net9.0-maccatalyst was computed. net9.0-macos was computed. net9.0-tvos was computed. net9.0-windows was computed. net10.0 was computed. net10.0-android was computed. net10.0-browser was computed. net10.0-ios was computed. net10.0-maccatalyst was computed. net10.0-macos was computed. net10.0-tvos was computed. net10.0-windows was computed.

Product

.NET

Compatible target framework(s)

Included target framework(s) (in package)

Learn more about Target Frameworks and .NET Standard.

net9.0
- DotCompute.Abstractions (>= 0.6.2)
- DotCompute.Core (>= 0.6.2)
- Microsoft.Extensions.Configuration.Binder (>= 10.0.2)
- Microsoft.Extensions.Hosting.Abstractions (>= 10.0.2)
- Microsoft.Extensions.Logging (>= 10.0.2)
- Microsoft.NET.ILLink.Tasks (>= 9.0.12)
- Microsoft.Toolkit.HighPerformance (>= 7.1.2)
- System.IO.Pipelines (>= 10.0.2)
- System.Runtime.InteropServices (>= 4.3.0)

NuGet packages (4)

Showing the top 4 NuGet packages that depend on DotCompute.Memory:

Package	Downloads
DotCompute.Backends.CPU Production-ready CPU compute backend for DotCompute. Provides SIMD vectorization (3.7x faster) using AVX2/AVX512/NEON instructions, multi-threaded kernel execution, and Ring Kernel simulation. Benchmarked: Vector Add (100K elements) 2.14ms → 0.58ms. Native AOT compatible with sub-10ms startup.	3.6K
DotCompute.Backends.OpenCL Production-ready OpenCL backend for DotCompute. Cross-platform GPU acceleration for NVIDIA, AMD, Intel, ARM Mali, and Qualcomm Adreno GPUs. Supports OpenCL 1.2+, Ring Kernels with atomic message queues, runtime kernel compilation, and multi-device workload distribution. Works with nvidia-opencl-icd, ROCm, intel-opencl-icd, and vendor drivers.	3.3K
DotCompute.Linq GPU-accelerated LINQ extensions for DotCompute. Transparent GPU execution for LINQ queries with automatic kernel generation, fusion optimization, and Reactive Extensions support.	2.3K
DotCompute.Algorithms GPU-accelerated algorithms for DotCompute. Includes FFT, AutoDiff, sparse matrix operations, signal processing, and cryptographic primitives.	2.3K

GitHub repositories

This package is not used by any popular GitHub repositories.

Version	Downloads	Last Updated
0.6.2	158	2/9/2026
0.5.3	302	2/2/2026
0.5.2	668	12/8/2025
0.5.1	575	11/28/2025
0.5.0	255	11/27/2025
0.4.2-rc2	415	11/11/2025
0.4.1-rc2	374	11/6/2025

Total 4.0K

Current version 158

Per day average 22

dotcompute memory buffer pool unified zero-copy

DotCompute.Memory 0.6.2

DotCompute.Memory

Status: ✅ Production Ready

Key Features

Unified Memory Management

Buffer Abstractions

OptimizedUnifiedBuffer<T>

Buffer States

Memory Pooling

HighPerformanceObjectPool<T>

MemoryPool

Advanced Transfer Engine

AdvancedMemoryTransferEngine

Transfer Options

Zero-Copy Operations

ZeroCopyOperations

UnsafeMemoryOperations

Memory Utilities

MemoryAllocator

ArrayPoolWrapper<T>

Performance Metrics

MemoryStatistics

BufferPerformanceMetrics

TransferStatistics

Installation

Usage

Basic Memory Management

CPU-Only Memory Manager

Zero-Copy Operations with Span<T>

Memory Pooling

Advanced Transfer Engine

Memory Statistics and Monitoring

Buffer Slicing

NUMA-Aware Allocation

Architecture

Memory Hierarchy

Buffer Lifecycle

Pooling Strategy

Transfer Optimization

Performance Benchmarks

System Requirements

Optional Features

Configuration

Memory Manager Options

Pool Configuration

Troubleshooting

Out of Memory Errors

Performance Issues

Memory Leaks

Advanced Topics

Custom Buffer Types

Integration with External Libraries

Cross-Process Memory Sharing

Dependencies

Design Principles

Documentation & Resources

Architecture Documentation

Developer Guides

Examples

API Documentation

Support

Contributing

License

net9.0

NuGet packages (4)

GitHub repositories