Table of Contents

Interface IShardingConfiguration<T>

Namespace
AiDotNet.DistributedTraining
Assembly
AiDotNet.dll

Configuration for parameter sharding in distributed training.

public interface IShardingConfiguration<T>

Type Parameters

T

The numeric type

Remarks

For Beginners: This configuration tells the sharding system how to divide up parameters and how to handle communication. Think of it as the "rules" for how the team collaborates.

Properties

AutoSyncGradients

Gets whether to automatically synchronize gradients after backward pass.

bool AutoSyncGradients { get; }

Property Value

bool

Remarks

For Beginners: When true, gradients are automatically shared across all processes after each training step. This is usually what you want for standard training. You might set it to false if you want manual control over synchronization.

Default: true

CommunicationBackend

Gets the communication backend to use for distributed operations.

ICommunicationBackend<T> CommunicationBackend { get; }

Property Value

ICommunicationBackend<T>

Remarks

For Beginners: This is the "communication system" that processes use to talk to each other. It could be an in-memory backend for testing or an MPI backend for real distributed training across multiple machines.

EnableGradientCompression

Gets whether to enable gradient compression to reduce communication costs.

bool EnableGradientCompression { get; }

Property Value

bool

Remarks

For Beginners: Gradient compression reduces the size of data that needs to be sent between processes. It's like zipping a file before sending it - faster to send, but requires a tiny bit of extra work to compress/decompress. This can significantly speed up training on slower networks.

Default: false

LearningRate

Gets the learning rate for gradient application during training.

T LearningRate { get; }

Property Value

T

Remarks

For Beginners: The learning rate controls how much to update model parameters based on computed gradients. A typical default is 0.01. Lower values mean slower but more stable learning; higher values mean faster but potentially unstable learning.

Default: 0.01

MinimumParameterGroupSize

Gets the minimum parameter group size for sharding.

int MinimumParameterGroupSize { get; }

Property Value

int

Remarks

Parameters smaller than this might be grouped together to reduce communication overhead.

For Beginners: Sending many tiny messages is inefficient. This setting groups small parameters together into larger chunks before communicating them. Think of it like sending one big box instead of 100 tiny envelopes.

Default: 1024