Class GpuAccelerationConfig

Namespace: AiDotNet.Engines

Assembly: AiDotNet.dll

Configuration for GPU-accelerated training and inference.

public class GpuAccelerationConfig

Inheritance: object

GpuAccelerationConfig

Inherited Members: object.Equals(object)

object.Equals(object, object)

object.GetHashCode()

object.GetType()

object.MemberwiseClone()

object.ReferenceEquals(object, object)

Remarks

Phase B: GPU Acceleration Configuration

This configuration controls when and how GPU acceleration is used during training and inference. The default settings work well for most desktop GPUs - just call ConfigureGpuAcceleration() without parameters for automatic GPU detection and sensible defaults.

For Beginners: GPU makes training 10-100x faster for large models by using your graphics card for parallel computation. This config lets you: - Enable/disable GPU acceleration - Choose which GPU to use (if you have multiple) - Control when to use GPU vs CPU based on operation size - Enable debug logging to see what's running where

Constructors

GpuAccelerationConfig()

Creates a configuration with default GPU settings.

public GpuAccelerationConfig()

Properties

CacheCompiledGraphs

Gets or sets whether to cache compiled execution graphs (default: true).

public bool CacheCompiledGraphs { get; set; }

Property Value

bool

Remarks

For Beginners: When enabled, compiled execution graphs are cached and reused for repeated execution patterns (like training epochs).

Eliminates graph compilation overhead after the first execution. Uses some memory for the cache. Only disable for debugging.

DeviceIndex

Gets or sets the GPU device index to use if multiple GPUs are available (default: 0).

public int DeviceIndex { get; set; }

Property Value

int

Remarks

For Beginners: If you have multiple GPUs, specify which one to use: - 0: First GPU (default) - 1: Second GPU - etc.

The system will enumerate available GPUs and select the one at this index. If the specified index doesn't exist, falls back to the first available GPU.

DeviceType

Gets or sets the GPU device type to use (default: Auto).

public GpuDeviceType DeviceType { get; set; }

Property Value

GpuDeviceType

Remarks

For Beginners: Specifies which type of GPU to use: - **Auto**: Automatically select best available (CUDA → OpenCL → HIP → CPU) - **CUDA**: Force NVIDIA CUDA (fails if not available) - **OpenCL**: Force OpenCL (AMD/Intel/NVIDIA GPUs) - **CPU**: Force CPU execution (disable GPU)

Leave as Auto unless you have specific requirements.

EnableAutoFusion

Gets or sets whether to enable automatic kernel fusion (default: true).

public bool EnableAutoFusion { get; set; }

Property Value

bool

Remarks

For Beginners: Kernel fusion combines multiple small operations into a single GPU kernel. For example, GEMM + Bias + ReLU becomes one fused operation. This reduces kernel launch overhead and memory bandwidth usage.

Provides 1.5-2x speedup for common patterns. Only disable for debugging.

EnableComputeTransferOverlap

Gets or sets whether to enable compute/transfer overlap (default: true).

public bool EnableComputeTransferOverlap { get; set; }

Property Value

bool

Remarks

For Beginners: When enabled, the system uses separate GPU streams for compute operations and data transfers. This allows GPU compute to continue while data is being uploaded or downloaded.

Provides 1.5-2x speedup for workloads with mixed compute and data movement. Requires GPU with multi-stream support.

EnableForInference

Gets or sets whether to enable GPU acceleration for inference (prediction) as well as training (default: true).

public bool EnableForInference { get; set; }

Property Value

bool

Remarks

For Beginners: GPU can accelerate both training AND inference. Set to false if you only want GPU during training but CPU during inference, for example when deploying to CPU-only servers.

EnableGpuPersistence

Gets or sets whether to enable GPU persistence for neural network weights (default: true).

public bool EnableGpuPersistence { get; set; }

Property Value

bool

Remarks

Phase B: Persistent GPU Tensors (US-GPU-030)

When enabled, neural network weights and biases stay on GPU memory between operations, eliminating per-operation CPU-GPU memory transfers. This provides massive speedups (up to 100x) for training and inference.

For Beginners: This keeps your model's weights on the GPU permanently instead of copying them back and forth for each operation. This is the single most important optimization for GPU performance.

Only disable if:

You're running out of GPU memory
You need weights on CPU for other purposes between operations
You're debugging GPU-related issues

Memory Impact: Weights stay in GPU memory until the model is disposed. For large models (e.g., 100M parameters at 4 bytes each = 400MB), this GPU memory is allocated and held for the model's lifetime.

EnableGraphCompilation

Gets or sets whether to enable execution graph compilation (default: true).

public bool EnableGraphCompilation { get; set; }

Property Value

bool

Remarks

For Beginners: When enabled, the system records operations and compiles them into an optimized execution graph. This enables: - Operation fusion (combine GEMM + Bias + ReLU into one kernel) - Stream scheduling (overlap compute and data transfer) - Memory planning (reuse buffers efficiently)

Provides 1.5-3x speedup. Only disable for debugging.

EnablePrefetch

Gets or sets whether to enable data prefetching (default: true).

public bool EnablePrefetch { get; set; }

Property Value

bool

Remarks

For Beginners: Prefetching uploads data to GPU before it's needed, hiding transfer latency. For example, while the GPU computes layer N, prefetch uploads data for layer N+1.

Provides 1.2-1.5x speedup for workloads with predictable data access. Uses slightly more GPU memory for prefetch buffers.

EnableProfiling

Gets or sets whether to enable GPU profiling (default: false).

public bool EnableProfiling { get; set; }

Property Value

bool

Remarks

For Beginners: When enabled, detailed timing information is collected for all GPU operations. This helps identify performance bottlenecks.

Adds overhead (5-10%), so only enable when investigating performance issues. Profile data can be accessed via the GpuExecutionContext.

ExecutionMode

Gets or sets the GPU execution mode (default: Auto).

public GpuExecutionModeConfig ExecutionMode { get; set; }

Property Value

GpuExecutionModeConfig

Remarks

Phase 2-3: Async Pipelining and Graph Execution

For Beginners: Controls how GPU operations are scheduled: - **Auto**: Automatically select best mode (recommended for most users) - **Eager**: Immediate execution, easiest debugging - **Deferred**: Batch and optimize operations for maximum performance (10-50x faster) - **ScopedDeferred**: Balanced - batch within explicit scopes

Start with Auto. If you need predictable step-by-step execution for debugging, use Eager. For maximum performance with large models, use Deferred.

MaxComputeStreams

Gets or sets the maximum number of compute streams (default: 3).

public int MaxComputeStreams { get; set; }

Property Value

int

Remarks

For Beginners: More streams allow more operations to run in parallel. Modern GPUs can execute multiple independent operations simultaneously using streams.

Values:

1: Single stream, no parallelism
2-3: Moderate parallelism (recommended for most GPUs)
4+: High parallelism (for high-end GPUs like A100/H100)

More streams use more GPU resources. Start with 3 and increase if profiling shows idle GPU time.

MaxGpuMemoryUsage

Gets or sets the maximum GPU memory usage fraction (default: 0.8).

public double MaxGpuMemoryUsage { get; set; }

Property Value

double

Remarks

For Beginners: Controls how much GPU memory the system will use. Value is a fraction from 0.0 to 1.0 (e.g., 0.8 = 80% of GPU memory).

When this limit is approached, the system will:

Evict least-recently-used tensors to CPU
Block new allocations until memory is freed

Lower values leave headroom for other applications using the GPU. Higher values maximize memory for large models.

MinGpuElements

Gets or sets the minimum number of elements to use GPU (default: 4096).

public int MinGpuElements { get; set; }

Property Value

int

Remarks

For Beginners: Small operations have overhead that makes GPU slower than CPU. This threshold determines when to use GPU vs CPU.

Values:

1024-2048: Aggressive GPU usage (high-end GPUs with low latency)
4096: Balanced (recommended default)
10000+: Conservative (older GPUs, PCIe bandwidth limited)

If you see many small operations running on GPU, increase this value.

TransferStreams

Gets or sets the number of transfer streams (default: 2).

public int TransferStreams { get; set; }

Property Value

int

Remarks

For Beginners: Transfer streams are used for data movement between CPU and GPU. Having dedicated transfer streams allows data transfers to overlap with computation.

Values:

1: Single transfer stream (no transfer parallelism)
2: Separate H2D and D2H streams (recommended)
3+: Additional transfer parallelism (for high-bandwidth systems)

This can also be configured via AIDOTNET_GPU_TRANSFER_STREAMS environment variable.

UsageLevel

Gets or sets the GPU usage level (default: Default).

public GpuUsageLevel UsageLevel { get; set; }

Property Value

GpuUsageLevel

Remarks

For Beginners: Controls how aggressively GPU is used: - **Default**: Balanced for typical GPUs (recommended) - **Conservative**: Only use GPU for very large operations (older GPUs) - **Aggressive**: Use GPU more often (high-end GPUs like RTX 4090/A100) - **AlwaysGpu**: Force all operations to GPU - **AlwaysCpu**: Force all operations to CPU

VerboseLogging

Gets or sets whether to enable verbose logging of GPU operations (default: false).

public bool VerboseLogging { get; set; }

Property Value

bool

Remarks

For Beginners: When true, logs information about: - GPU initialization and device selection - Which operations run on GPU vs CPU - Memory transfers and sizes - Performance metrics

Useful for debugging and optimization, but can produce a lot of output.

Methods

ToString()

Gets a string representation of this configuration.

public override string ToString()

Returns

string: A string describing the configuration settings.

Table of Contents

Class GpuAccelerationConfig

Remarks

Constructors

GpuAccelerationConfig()

Properties

CacheCompiledGraphs

Property Value

Remarks

DeviceIndex

Property Value

Remarks

DeviceType

Property Value

Remarks

EnableAutoFusion

Property Value

Remarks

EnableComputeTransferOverlap

Property Value

Remarks

EnableForInference

Property Value

Remarks

EnableGpuPersistence

Property Value

Remarks

EnableGraphCompilation

Property Value

Remarks

EnablePrefetch

Property Value

Remarks

EnableProfiling

Property Value

Remarks

ExecutionMode

Property Value

Remarks

MaxComputeStreams

Property Value

Remarks

MaxGpuMemoryUsage

Property Value

Remarks

MinGpuElements

Property Value

Remarks

TransferStreams

Property Value

Remarks

UsageLevel

Property Value

Remarks

VerboseLogging

Property Value

Remarks

Methods

ToString()

Returns