Interface IShardedOptimizer<T, TInput, TOutput>
- Namespace
- AiDotNet.DistributedTraining
- Assembly
- AiDotNet.dll
Defines the contract for optimizers that support distributed training with parameter sharding.
public interface IShardedOptimizer<T, TInput, TOutput> : IOptimizer<T, TInput, TOutput>, IModelSerializer
Type Parameters
TThe numeric type for operations
TInputThe input type for the model
TOutputThe output type for the model
- Inherited Members
- Extension Methods
Remarks
For Beginners: A sharded optimizer is like having a team of coaches working together. Each coach (process) is responsible for updating a portion of the player's (model's) skills. After each round of practice, the coaches share and combine their improvements to ensure everyone stays in sync.
This allows optimizing very large models that don't fit on a single GPU.
Properties
Rank
Gets the rank of this process in the distributed group.
int Rank { get; }
Property Value
Remarks
For Beginners: Each process has a unique ID (rank). This tells you which process you are. Rank 0 is typically the "coordinator" process.
ShardingConfiguration
Gets the sharding configuration for this optimizer.
IShardingConfiguration<T> ShardingConfiguration { get; }
Property Value
WorldSize
Gets the total number of processes in the distributed group.
int WorldSize { get; }
Property Value
Remarks
For Beginners: This is how many processes are working together to optimize the model. For example, if you have 4 GPUs, WorldSize would be 4.
WrappedOptimizer
Gets the underlying wrapped optimizer.
IOptimizer<T, TInput, TOutput> WrappedOptimizer { get; }
Property Value
- IOptimizer<T, TInput, TOutput>
Remarks
For Beginners: This is the original optimizer (like Adam, SGD, etc.) that we're adding distributed training capabilities to. Think of it as the "core brain" that we're helping to work across multiple processes.
Methods
SynchronizeOptimizerState()
Synchronizes optimizer state (like momentum buffers) across all processes.
void SynchronizeOptimizerState()
Remarks
For Beginners: Some optimizers (like Adam) keep track of past gradients to make smarter updates. This method makes sure all processes have the same optimizer state, so they stay coordinated. It's like making sure all team members are reading from the same playbook.