Class TrainingMemoryConfig
Configuration for training memory management including gradient checkpointing, activation pooling, and model sharding.
public class TrainingMemoryConfig
- Inheritance
-
TrainingMemoryConfig
- Inherited Members
Remarks
For Beginners: Training neural networks requires a lot of memory:
- Model weights: The parameters being trained
- Activations: Intermediate results saved for backpropagation
- Gradients: Computed during backward pass
This configuration helps reduce memory usage through:
- Gradient Checkpointing: Trade compute for memory by recomputing activations
- Activation Pooling: Reuse memory buffers to reduce garbage collection
- Model Sharding: Split large models across multiple GPUs
Example usage with AiModelBuilder:
// Using a preset for common scenarios
var builder = new AiModelBuilder<double, double[], double>()
.ConfigureMemoryManagement(TrainingMemoryConfig.ForTransformers());
// Or with custom settings
var builder = new AiModelBuilder<double, double[], double>()
.ConfigureMemoryManagement(new TrainingMemoryConfig
{
UseGradientCheckpointing = true,
CheckpointEveryNLayers = 2,
UseActivationPooling = true,
MaxPoolMemoryMB = 2048
});
Properties
CheckpointAttentionLayers
Gets or sets whether to checkpoint attention layers specifically.
public bool CheckpointAttentionLayers { get; set; }
Property Value
Remarks
Attention layers are memory-intensive (O(n^2) for sequence length n). Checkpointing them can significantly reduce memory for transformer models.
CheckpointEveryNLayers
Gets or sets how often to create checkpoints (every N layers).
public int CheckpointEveryNLayers { get; set; }
Property Value
Remarks
Lower values = more memory savings but more recomputation. Higher values = less recomputation but more memory usage. Typical values: 1-4 for transformers, 2-3 for CNNs.
CheckpointResidualBlocks
Gets or sets whether to checkpoint residual blocks.
public bool CheckpointResidualBlocks { get; set; }
Property Value
DefaultInitializationStrategy
Gets or sets the default initialization strategy type.
public InitializationStrategyType DefaultInitializationStrategy { get; set; }
Property Value
Remarks
For Beginners: This controls how layer weights are initialized: - Eager: Initialize immediately (traditional) - Lazy: Defer until first use (faster construction) - Zero: All zeros (for testing only) - FromFile: Load from a file (transfer learning)
MaxPoolMemoryMB
Gets or sets the maximum memory for the activation pool in megabytes.
public long MaxPoolMemoryMB { get; set; }
Property Value
MemoryWarningThreshold
Gets or sets the warning threshold for memory usage (as fraction of max).
public double MemoryWarningThreshold { get; set; }
Property Value
MicroBatchCount
Gets or sets the number of micro-batches for pipeline parallelism.
public int MicroBatchCount { get; set; }
Property Value
NumDevices
Gets or sets the number of devices for model sharding.
public int NumDevices { get; set; }
Property Value
Remarks
Set to 1 for single-device training. Values > 1 enable model parallelism.
ShardingStrategy
Gets or sets the sharding strategy.
public ShardingStrategy ShardingStrategy { get; set; }
Property Value
TensorPoolOptions
Gets or sets the tensor pooling options.
public PoolingOptions? TensorPoolOptions { get; set; }
Property Value
Remarks
Configure advanced pooling behavior including: - Maximum pool size in MB - Maximum elements per buffer to pool - Whether to use weak references (allows GC under memory pressure)
TrackMemoryStatistics
Gets or sets whether to track detailed memory statistics.
public bool TrackMemoryStatistics { get; set; }
Property Value
UseActivationPooling
Gets or sets whether to use activation pooling.
public bool UseActivationPooling { get; set; }
Property Value
Remarks
For Beginners: Activation pooling reuses tensor memory instead of allocating new memory for each operation. This reduces garbage collection pressure and can significantly improve training speed for large models.
UseGradientCheckpointing
Gets or sets whether to use gradient checkpointing.
public bool UseGradientCheckpointing { get; set; }
Property Value
Remarks
For Beginners: Gradient checkpointing saves memory by not storing all intermediate activations. Instead, it saves checkpoints and recomputes activations during the backward pass. This trades ~30% extra compute time for ~40-50% memory savings.
UsePipelineParallelism
Gets or sets whether to use pipeline parallelism.
public bool UsePipelineParallelism { get; set; }
Property Value
Remarks
Pipeline parallelism overlaps computation across shards for better GPU utilization.
UseSharedTensorPool
Gets or sets whether to use a shared global tensor pool.
public bool UseSharedTensorPool { get; set; }
Property Value
Remarks
When true, uses TensorPoolManager.Shared for all pooling operations. When false, each training session creates its own pool.
UseTensorPooling
Gets or sets whether tensor pooling is enabled.
public bool UseTensorPooling { get; set; }
Property Value
Remarks
For Beginners: Tensor pooling reduces memory allocations by reusing tensor buffers. When enabled, tensors are borrowed from and returned to a pool instead of being allocated and garbage collected each time.
WeightsFileFormat
Gets or sets the format of the weights file.
public WeightFileFormat WeightsFileFormat { get; set; }
Property Value
WeightsFilePath
Gets or sets the path to a weights file for FromFile initialization.
public string? WeightsFilePath { get; set; }
Property Value
Remarks
Only used when DefaultInitializationStrategy is set to FromFile.
Methods
AggressivePooling(int)
Creates a configuration with aggressive memory pooling.
public static TrainingMemoryConfig AggressivePooling(int maxPoolSizeMB = 512)
Parameters
maxPoolSizeMBintMaximum size of the tensor pool in MB.
Returns
Remarks
Maximizes tensor reuse to minimize garbage collection. Best for training loops where the same tensor shapes are used repeatedly.
Default()
Creates a default configuration with minimal memory optimization.
public static TrainingMemoryConfig Default()
Returns
FastConstruction()
Creates a configuration for fast model construction (good for testing).
public static TrainingMemoryConfig FastConstruction()
Returns
Remarks
Uses lazy initialization to defer weight allocation, making network construction fast. Useful when you just want to inspect the network architecture or run tests.
ForConvNets()
Creates a configuration optimized for convolutional networks.
public static TrainingMemoryConfig ForConvNets()
Returns
ForTransferLearning(string, WeightFileFormat)
Creates a configuration for transfer learning from a pre-trained model.
public static TrainingMemoryConfig ForTransferLearning(string weightsPath, WeightFileFormat format = WeightFileFormat.Auto)
Parameters
weightsPathstringPath to the pre-trained weights file.
formatWeightFileFormatFormat of the weights file. Default is auto-detect.
Returns
Remarks
For Beginners: Transfer learning lets you start with a model that was already trained on a similar task. This is much faster than training from scratch and often produces better results, especially with limited data.
ForTransformers()
Creates a configuration optimized for transformer models.
public static TrainingMemoryConfig ForTransformers()
Returns
MemoryEfficient()
Creates a memory-efficient configuration for large models.
public static TrainingMemoryConfig MemoryEfficient()
Returns
Remarks
Enables gradient checkpointing and activation pooling for maximum memory savings. Best for training models that don't fit in GPU memory otherwise.
MultiGpu(int)
Creates a configuration for multi-GPU training.
public static TrainingMemoryConfig MultiGpu(int numDevices)
Parameters
numDevicesintNumber of GPUs to use.
Returns
SpeedOptimized()
Creates a speed-optimized configuration (less memory optimization).
public static TrainingMemoryConfig SpeedOptimized()
Returns
Remarks
Disables gradient checkpointing for maximum speed. Best when you have enough GPU memory for your model.