Table of Contents

Class TrainingMemoryConfig

Namespace
AiDotNet.Training.Memory
Assembly
AiDotNet.dll

Configuration for training memory management including gradient checkpointing, activation pooling, and model sharding.

public class TrainingMemoryConfig
Inheritance
TrainingMemoryConfig
Inherited Members

Remarks

For Beginners: Training neural networks requires a lot of memory:

  1. Model weights: The parameters being trained
  2. Activations: Intermediate results saved for backpropagation
  3. Gradients: Computed during backward pass

This configuration helps reduce memory usage through:

  • Gradient Checkpointing: Trade compute for memory by recomputing activations
  • Activation Pooling: Reuse memory buffers to reduce garbage collection
  • Model Sharding: Split large models across multiple GPUs

Example usage with AiModelBuilder:

// Using a preset for common scenarios
var builder = new AiModelBuilder<double, double[], double>()
    .ConfigureMemoryManagement(TrainingMemoryConfig.ForTransformers());

// Or with custom settings var builder = new AiModelBuilder<double, double[], double>() .ConfigureMemoryManagement(new TrainingMemoryConfig { UseGradientCheckpointing = true, CheckpointEveryNLayers = 2, UseActivationPooling = true, MaxPoolMemoryMB = 2048 });

Properties

CheckpointAttentionLayers

Gets or sets whether to checkpoint attention layers specifically.

public bool CheckpointAttentionLayers { get; set; }

Property Value

bool

Remarks

Attention layers are memory-intensive (O(n^2) for sequence length n). Checkpointing them can significantly reduce memory for transformer models.

CheckpointEveryNLayers

Gets or sets how often to create checkpoints (every N layers).

public int CheckpointEveryNLayers { get; set; }

Property Value

int

Remarks

Lower values = more memory savings but more recomputation. Higher values = less recomputation but more memory usage. Typical values: 1-4 for transformers, 2-3 for CNNs.

CheckpointResidualBlocks

Gets or sets whether to checkpoint residual blocks.

public bool CheckpointResidualBlocks { get; set; }

Property Value

bool

DefaultInitializationStrategy

Gets or sets the default initialization strategy type.

public InitializationStrategyType DefaultInitializationStrategy { get; set; }

Property Value

InitializationStrategyType

Remarks

For Beginners: This controls how layer weights are initialized: - Eager: Initialize immediately (traditional) - Lazy: Defer until first use (faster construction) - Zero: All zeros (for testing only) - FromFile: Load from a file (transfer learning)

MaxPoolMemoryMB

Gets or sets the maximum memory for the activation pool in megabytes.

public long MaxPoolMemoryMB { get; set; }

Property Value

long

MemoryWarningThreshold

Gets or sets the warning threshold for memory usage (as fraction of max).

public double MemoryWarningThreshold { get; set; }

Property Value

double

MicroBatchCount

Gets or sets the number of micro-batches for pipeline parallelism.

public int MicroBatchCount { get; set; }

Property Value

int

NumDevices

Gets or sets the number of devices for model sharding.

public int NumDevices { get; set; }

Property Value

int

Remarks

Set to 1 for single-device training. Values > 1 enable model parallelism.

ShardingStrategy

Gets or sets the sharding strategy.

public ShardingStrategy ShardingStrategy { get; set; }

Property Value

ShardingStrategy

TensorPoolOptions

Gets or sets the tensor pooling options.

public PoolingOptions? TensorPoolOptions { get; set; }

Property Value

PoolingOptions

Remarks

Configure advanced pooling behavior including: - Maximum pool size in MB - Maximum elements per buffer to pool - Whether to use weak references (allows GC under memory pressure)

TrackMemoryStatistics

Gets or sets whether to track detailed memory statistics.

public bool TrackMemoryStatistics { get; set; }

Property Value

bool

UseActivationPooling

Gets or sets whether to use activation pooling.

public bool UseActivationPooling { get; set; }

Property Value

bool

Remarks

For Beginners: Activation pooling reuses tensor memory instead of allocating new memory for each operation. This reduces garbage collection pressure and can significantly improve training speed for large models.

UseGradientCheckpointing

Gets or sets whether to use gradient checkpointing.

public bool UseGradientCheckpointing { get; set; }

Property Value

bool

Remarks

For Beginners: Gradient checkpointing saves memory by not storing all intermediate activations. Instead, it saves checkpoints and recomputes activations during the backward pass. This trades ~30% extra compute time for ~40-50% memory savings.

UsePipelineParallelism

Gets or sets whether to use pipeline parallelism.

public bool UsePipelineParallelism { get; set; }

Property Value

bool

Remarks

Pipeline parallelism overlaps computation across shards for better GPU utilization.

UseSharedTensorPool

Gets or sets whether to use a shared global tensor pool.

public bool UseSharedTensorPool { get; set; }

Property Value

bool

Remarks

When true, uses TensorPoolManager.Shared for all pooling operations. When false, each training session creates its own pool.

UseTensorPooling

Gets or sets whether tensor pooling is enabled.

public bool UseTensorPooling { get; set; }

Property Value

bool

Remarks

For Beginners: Tensor pooling reduces memory allocations by reusing tensor buffers. When enabled, tensors are borrowed from and returned to a pool instead of being allocated and garbage collected each time.

WeightsFileFormat

Gets or sets the format of the weights file.

public WeightFileFormat WeightsFileFormat { get; set; }

Property Value

WeightFileFormat

WeightsFilePath

Gets or sets the path to a weights file for FromFile initialization.

public string? WeightsFilePath { get; set; }

Property Value

string

Remarks

Only used when DefaultInitializationStrategy is set to FromFile.

Methods

AggressivePooling(int)

Creates a configuration with aggressive memory pooling.

public static TrainingMemoryConfig AggressivePooling(int maxPoolSizeMB = 512)

Parameters

maxPoolSizeMB int

Maximum size of the tensor pool in MB.

Returns

TrainingMemoryConfig

Remarks

Maximizes tensor reuse to minimize garbage collection. Best for training loops where the same tensor shapes are used repeatedly.

Default()

Creates a default configuration with minimal memory optimization.

public static TrainingMemoryConfig Default()

Returns

TrainingMemoryConfig

FastConstruction()

Creates a configuration for fast model construction (good for testing).

public static TrainingMemoryConfig FastConstruction()

Returns

TrainingMemoryConfig

Remarks

Uses lazy initialization to defer weight allocation, making network construction fast. Useful when you just want to inspect the network architecture or run tests.

ForConvNets()

Creates a configuration optimized for convolutional networks.

public static TrainingMemoryConfig ForConvNets()

Returns

TrainingMemoryConfig

ForTransferLearning(string, WeightFileFormat)

Creates a configuration for transfer learning from a pre-trained model.

public static TrainingMemoryConfig ForTransferLearning(string weightsPath, WeightFileFormat format = WeightFileFormat.Auto)

Parameters

weightsPath string

Path to the pre-trained weights file.

format WeightFileFormat

Format of the weights file. Default is auto-detect.

Returns

TrainingMemoryConfig

Remarks

For Beginners: Transfer learning lets you start with a model that was already trained on a similar task. This is much faster than training from scratch and often produces better results, especially with limited data.

ForTransformers()

Creates a configuration optimized for transformer models.

public static TrainingMemoryConfig ForTransformers()

Returns

TrainingMemoryConfig

MemoryEfficient()

Creates a memory-efficient configuration for large models.

public static TrainingMemoryConfig MemoryEfficient()

Returns

TrainingMemoryConfig

Remarks

Enables gradient checkpointing and activation pooling for maximum memory savings. Best for training models that don't fit in GPU memory otherwise.

MultiGpu(int)

Creates a configuration for multi-GPU training.

public static TrainingMemoryConfig MultiGpu(int numDevices)

Parameters

numDevices int

Number of GPUs to use.

Returns

TrainingMemoryConfig

SpeedOptimized()

Creates a speed-optimized configuration (less memory optimization).

public static TrainingMemoryConfig SpeedOptimized()

Returns

TrainingMemoryConfig

Remarks

Disables gradient checkpointing for maximum speed. Best when you have enough GPU memory for your model.