Interface IShardedOptimizer<T, TInput, TOutput>

Namespace: AiDotNet.DistributedTraining

Assembly: AiDotNet.dll

Defines the contract for optimizers that support distributed training with parameter sharding.

public interface IShardedOptimizer<T, TInput, TOutput> : IOptimizer<T, TInput, TOutput>, IModelSerializer

Type Parameters

T: The numeric type for operations
TInput: The input type for the model
TOutput: The output type for the model

Inherited Members: IOptimizer<T, TInput, TOutput>.Optimize(OptimizationInputData<T, TInput, TOutput>)

IOptimizer<T, TInput, TOutput>.ShouldEarlyStop()

IOptimizer<T, TInput, TOutput>.GetOptions()

IOptimizer<T, TInput, TOutput>.Reset()

IModelSerializer.Serialize()

IModelSerializer.Deserialize(byte[])

IModelSerializer.SaveModel(string)

IModelSerializer.LoadModel(string)

Extension Methods: DistributedExtensions.AsDistributed<T, TInput, TOutput>(IOptimizer<T, TInput, TOutput>, ICommunicationBackend<T>)

DistributedExtensions.AsDistributed<T, TInput, TOutput>(IOptimizer<T, TInput, TOutput>, IShardingConfiguration<T>)

Remarks

For Beginners: A sharded optimizer is like having a team of coaches working together. Each coach (process) is responsible for updating a portion of the player's (model's) skills. After each round of practice, the coaches share and combine their improvements to ensure everyone stays in sync.

This allows optimizing very large models that don't fit on a single GPU.

Properties

Rank

Gets the rank of this process in the distributed group.

int Rank { get; }

Property Value

int

Remarks

For Beginners: Each process has a unique ID (rank). This tells you which process you are. Rank 0 is typically the "coordinator" process.

ShardingConfiguration

Gets the sharding configuration for this optimizer.

IShardingConfiguration<T> ShardingConfiguration { get; }

Property Value

IShardingConfiguration<T>

WorldSize

Gets the total number of processes in the distributed group.

int WorldSize { get; }

Property Value

int

Remarks

For Beginners: This is how many processes are working together to optimize the model. For example, if you have 4 GPUs, WorldSize would be 4.

WrappedOptimizer

Gets the underlying wrapped optimizer.

IOptimizer<T, TInput, TOutput> WrappedOptimizer { get; }

Property Value

IOptimizer<T, TInput, TOutput>

Remarks

For Beginners: This is the original optimizer (like Adam, SGD, etc.) that we're adding distributed training capabilities to. Think of it as the "core brain" that we're helping to work across multiple processes.

Methods

SynchronizeOptimizerState()

Synchronizes optimizer state (like momentum buffers) across all processes.

void SynchronizeOptimizerState()

Remarks

For Beginners: Some optimizers (like Adam) keep track of past gradients to make smarter updates. This method makes sure all processes have the same optimizer state, so they stay coordinated. It's like making sure all team members are reading from the same playbook.

Table of Contents

Interface IShardedOptimizer<T, TInput, TOutput>

Type Parameters

Remarks

Properties

Rank

Property Value

Remarks

ShardingConfiguration

Property Value

WorldSize

Property Value

Remarks

WrappedOptimizer

Property Value

Remarks

Methods

SynchronizeOptimizerState()

Remarks