Class GpuAccelerationConfig
Configuration for GPU-accelerated training and inference.
public class GpuAccelerationConfig
- Inheritance
-
GpuAccelerationConfig
- Inherited Members
Remarks
Phase B: GPU Acceleration Configuration
This configuration controls when and how GPU acceleration is used during training and inference. The default settings work well for most desktop GPUs - just call ConfigureGpuAcceleration() without parameters for automatic GPU detection and sensible defaults.
For Beginners: GPU makes training 10-100x faster for large models by using your graphics card for parallel computation. This config lets you: - Enable/disable GPU acceleration - Choose which GPU to use (if you have multiple) - Control when to use GPU vs CPU based on operation size - Enable debug logging to see what's running where
Constructors
GpuAccelerationConfig()
Creates a configuration with default GPU settings.
public GpuAccelerationConfig()
Properties
CacheCompiledGraphs
Gets or sets whether to cache compiled execution graphs (default: true).
public bool CacheCompiledGraphs { get; set; }
Property Value
Remarks
For Beginners: When enabled, compiled execution graphs are cached and reused for repeated execution patterns (like training epochs).
Eliminates graph compilation overhead after the first execution. Uses some memory for the cache. Only disable for debugging.
DeviceIndex
Gets or sets the GPU device index to use if multiple GPUs are available (default: 0).
public int DeviceIndex { get; set; }
Property Value
Remarks
For Beginners: If you have multiple GPUs, specify which one to use: - 0: First GPU (default) - 1: Second GPU - etc.
The system will enumerate available GPUs and select the one at this index. If the specified index doesn't exist, falls back to the first available GPU.
DeviceType
Gets or sets the GPU device type to use (default: Auto).
public GpuDeviceType DeviceType { get; set; }
Property Value
Remarks
For Beginners: Specifies which type of GPU to use: - **Auto**: Automatically select best available (CUDA → OpenCL → HIP → CPU) - **CUDA**: Force NVIDIA CUDA (fails if not available) - **OpenCL**: Force OpenCL (AMD/Intel/NVIDIA GPUs) - **CPU**: Force CPU execution (disable GPU)
Leave as Auto unless you have specific requirements.
EnableAutoFusion
Gets or sets whether to enable automatic kernel fusion (default: true).
public bool EnableAutoFusion { get; set; }
Property Value
Remarks
For Beginners: Kernel fusion combines multiple small operations into a single GPU kernel. For example, GEMM + Bias + ReLU becomes one fused operation. This reduces kernel launch overhead and memory bandwidth usage.
Provides 1.5-2x speedup for common patterns. Only disable for debugging.
EnableComputeTransferOverlap
Gets or sets whether to enable compute/transfer overlap (default: true).
public bool EnableComputeTransferOverlap { get; set; }
Property Value
Remarks
For Beginners: When enabled, the system uses separate GPU streams for compute operations and data transfers. This allows GPU compute to continue while data is being uploaded or downloaded.
Provides 1.5-2x speedup for workloads with mixed compute and data movement. Requires GPU with multi-stream support.
EnableForInference
Gets or sets whether to enable GPU acceleration for inference (prediction) as well as training (default: true).
public bool EnableForInference { get; set; }
Property Value
Remarks
For Beginners: GPU can accelerate both training AND inference. Set to false if you only want GPU during training but CPU during inference, for example when deploying to CPU-only servers.
EnableGpuPersistence
Gets or sets whether to enable GPU persistence for neural network weights (default: true).
public bool EnableGpuPersistence { get; set; }
Property Value
Remarks
Phase B: Persistent GPU Tensors (US-GPU-030)
When enabled, neural network weights and biases stay on GPU memory between operations, eliminating per-operation CPU-GPU memory transfers. This provides massive speedups (up to 100x) for training and inference.
For Beginners: This keeps your model's weights on the GPU permanently instead of copying them back and forth for each operation. This is the single most important optimization for GPU performance.
Only disable if:
- You're running out of GPU memory
- You need weights on CPU for other purposes between operations
- You're debugging GPU-related issues
Memory Impact: Weights stay in GPU memory until the model is disposed. For large models (e.g., 100M parameters at 4 bytes each = 400MB), this GPU memory is allocated and held for the model's lifetime.
EnableGraphCompilation
Gets or sets whether to enable execution graph compilation (default: true).
public bool EnableGraphCompilation { get; set; }
Property Value
Remarks
For Beginners: When enabled, the system records operations and compiles them into an optimized execution graph. This enables: - Operation fusion (combine GEMM + Bias + ReLU into one kernel) - Stream scheduling (overlap compute and data transfer) - Memory planning (reuse buffers efficiently)
Provides 1.5-3x speedup. Only disable for debugging.
EnablePrefetch
Gets or sets whether to enable data prefetching (default: true).
public bool EnablePrefetch { get; set; }
Property Value
Remarks
For Beginners: Prefetching uploads data to GPU before it's needed, hiding transfer latency. For example, while the GPU computes layer N, prefetch uploads data for layer N+1.
Provides 1.2-1.5x speedup for workloads with predictable data access. Uses slightly more GPU memory for prefetch buffers.
EnableProfiling
Gets or sets whether to enable GPU profiling (default: false).
public bool EnableProfiling { get; set; }
Property Value
Remarks
For Beginners: When enabled, detailed timing information is collected for all GPU operations. This helps identify performance bottlenecks.
Adds overhead (5-10%), so only enable when investigating performance issues. Profile data can be accessed via the GpuExecutionContext.
ExecutionMode
Gets or sets the GPU execution mode (default: Auto).
public GpuExecutionModeConfig ExecutionMode { get; set; }
Property Value
Remarks
Phase 2-3: Async Pipelining and Graph Execution
For Beginners: Controls how GPU operations are scheduled: - **Auto**: Automatically select best mode (recommended for most users) - **Eager**: Immediate execution, easiest debugging - **Deferred**: Batch and optimize operations for maximum performance (10-50x faster) - **ScopedDeferred**: Balanced - batch within explicit scopes
Start with Auto. If you need predictable step-by-step execution for debugging, use Eager. For maximum performance with large models, use Deferred.
MaxComputeStreams
Gets or sets the maximum number of compute streams (default: 3).
public int MaxComputeStreams { get; set; }
Property Value
Remarks
For Beginners: More streams allow more operations to run in parallel. Modern GPUs can execute multiple independent operations simultaneously using streams.
Values:
- 1: Single stream, no parallelism
- 2-3: Moderate parallelism (recommended for most GPUs)
- 4+: High parallelism (for high-end GPUs like A100/H100)
More streams use more GPU resources. Start with 3 and increase if profiling shows idle GPU time.
MaxGpuMemoryUsage
Gets or sets the maximum GPU memory usage fraction (default: 0.8).
public double MaxGpuMemoryUsage { get; set; }
Property Value
Remarks
For Beginners: Controls how much GPU memory the system will use. Value is a fraction from 0.0 to 1.0 (e.g., 0.8 = 80% of GPU memory).
When this limit is approached, the system will:
- Evict least-recently-used tensors to CPU
- Block new allocations until memory is freed
Lower values leave headroom for other applications using the GPU. Higher values maximize memory for large models.
MinGpuElements
Gets or sets the minimum number of elements to use GPU (default: 4096).
public int MinGpuElements { get; set; }
Property Value
Remarks
For Beginners: Small operations have overhead that makes GPU slower than CPU. This threshold determines when to use GPU vs CPU.
Values:
- 1024-2048: Aggressive GPU usage (high-end GPUs with low latency)
- 4096: Balanced (recommended default)
- 10000+: Conservative (older GPUs, PCIe bandwidth limited)
If you see many small operations running on GPU, increase this value.
TransferStreams
Gets or sets the number of transfer streams (default: 2).
public int TransferStreams { get; set; }
Property Value
Remarks
For Beginners: Transfer streams are used for data movement between CPU and GPU. Having dedicated transfer streams allows data transfers to overlap with computation.
Values:
- 1: Single transfer stream (no transfer parallelism)
- 2: Separate H2D and D2H streams (recommended)
- 3+: Additional transfer parallelism (for high-bandwidth systems)
This can also be configured via AIDOTNET_GPU_TRANSFER_STREAMS environment variable.
UsageLevel
Gets or sets the GPU usage level (default: Default).
public GpuUsageLevel UsageLevel { get; set; }
Property Value
Remarks
For Beginners: Controls how aggressively GPU is used: - **Default**: Balanced for typical GPUs (recommended) - **Conservative**: Only use GPU for very large operations (older GPUs) - **Aggressive**: Use GPU more often (high-end GPUs like RTX 4090/A100) - **AlwaysGpu**: Force all operations to GPU - **AlwaysCpu**: Force all operations to CPU
VerboseLogging
Gets or sets whether to enable verbose logging of GPU operations (default: false).
public bool VerboseLogging { get; set; }
Property Value
Remarks
For Beginners: When true, logs information about: - GPU initialization and device selection - Which operations run on GPU vs CPU - Memory transfers and sizes - Performance metrics
Useful for debugging and optimization, but can produce a lot of output.
Methods
ToString()
Gets a string representation of this configuration.
public override string ToString()
Returns
- string
A string describing the configuration settings.