Class ShardingConfiguration<T>
- Namespace
- AiDotNet.DistributedTraining
- Assembly
- AiDotNet.dll
Default implementation of sharding configuration for distributed training.
public class ShardingConfiguration<T> : IShardingConfiguration<T>
Type Parameters
TThe numeric type
- Inheritance
-
ShardingConfiguration<T>
- Implements
- Inherited Members
Remarks
For Beginners: This class holds all the settings that control how distributed training works. You can create an instance with default settings or customize it for your needs.
Example:
var config = new ShardingConfiguration<double>(backend)
{
AutoSyncGradients = true, // Automatically sync after each step
MinimumParameterGroupSize = 1024, // Group small parameters together
EnableGradientCompression = false // No compression for now
};
Constructors
ShardingConfiguration(ICommunicationBackend<T>, double)
Creates a new sharding configuration with the specified communication backend.
public ShardingConfiguration(ICommunicationBackend<T> communicationBackend, double learningRate = 0.01)
Parameters
communicationBackendICommunicationBackend<T>The communication backend to use
learningRatedoubleLearning rate for gradient application. Defaults to 0.01.
Remarks
For Beginners: This creates the configuration object that tells the system how to handle distributed training. You must provide a communication backend (the system that allows processes to talk to each other).
Exceptions
- ArgumentNullException
Thrown if backend is null
Properties
AutoSyncGradients
Gets whether to automatically synchronize gradients after backward pass.
public bool AutoSyncGradients { get; set; }
Property Value
Remarks
For Beginners: When true, gradients are automatically shared across all processes after each training step. This is usually what you want for standard training. You might set it to false if you want manual control over synchronization.
Default: true
CommunicationBackend
Gets the communication backend to use for distributed operations.
public ICommunicationBackend<T> CommunicationBackend { get; }
Property Value
Remarks
For Beginners: This is the "communication system" that processes use to talk to each other. It could be an in-memory backend for testing or an MPI backend for real distributed training across multiple machines.
EnableGradientCompression
Gets whether to enable gradient compression to reduce communication costs.
public bool EnableGradientCompression { get; set; }
Property Value
Remarks
For Beginners: Gradient compression reduces the size of data that needs to be sent between processes. It's like zipping a file before sending it - faster to send, but requires a tiny bit of extra work to compress/decompress. This can significantly speed up training on slower networks.
Default: false
LearningRate
Gets the learning rate for gradient application during training.
public T LearningRate { get; set; }
Property Value
- T
Remarks
For Beginners: The learning rate controls how much to update model parameters based on computed gradients. A typical default is 0.01. Lower values mean slower but more stable learning; higher values mean faster but potentially unstable learning.
Default: 0.01
MinimumParameterGroupSize
Gets the minimum parameter group size for sharding.
public int MinimumParameterGroupSize { get; set; }
Property Value
Remarks
Parameters smaller than this might be grouped together to reduce communication overhead.
For Beginners: Sending many tiny messages is inefficient. This setting groups small parameters together into larger chunks before communicating them. Think of it like sending one big box instead of 100 tiny envelopes.
Default: 1024
Methods
CreateDefault(ICommunicationBackend<T>)
Creates a new sharding configuration with default settings and the specified backend.
public static ShardingConfiguration<T> CreateDefault(ICommunicationBackend<T> communicationBackend)
Parameters
communicationBackendICommunicationBackend<T>The communication backend to use
Returns
- ShardingConfiguration<T>
A new configuration with default settings
Remarks
For Beginners: This is a convenient way to create a configuration with sensible defaults. The defaults are: - AutoSyncGradients = true (automatically sync gradients) - MinimumParameterGroupSize = 1024 (group small parameters) - EnableGradientCompression = false (no compression for simplicity)
CreateForHighBandwidth(ICommunicationBackend<T>)
Creates a configuration optimized for high-bandwidth networks (like NVLink between GPUs).
public static ShardingConfiguration<T> CreateForHighBandwidth(ICommunicationBackend<T> communicationBackend)
Parameters
communicationBackendICommunicationBackend<T>The communication backend to use
Returns
- ShardingConfiguration<T>
A configuration optimized for high-bandwidth scenarios
Remarks
For Beginners: Use this when your GPUs or machines are connected with very fast networks. It disables compression (not needed with fast networks) and uses smaller parameter groups (communication is fast enough to handle many messages).
CreateForLowBandwidth(ICommunicationBackend<T>)
Creates a configuration optimized for low-bandwidth networks (like machines connected over ethernet).
public static ShardingConfiguration<T> CreateForLowBandwidth(ICommunicationBackend<T> communicationBackend)
Parameters
communicationBackendICommunicationBackend<T>The communication backend to use
Returns
- ShardingConfiguration<T>
A configuration optimized for low-bandwidth scenarios
Remarks
For Beginners: Use this when your machines are connected over slower networks like regular ethernet. It enables compression to reduce the amount of data sent and uses larger parameter groups to minimize the number of messages.