Table of Contents

Namespace AiDotNet.DistributedTraining

Classes

AsyncSGDOptimizer<T, TInput, TOutput>

Implements Asynchronous SGD optimizer - allows asynchronous parameter updates without strict barriers.

CommunicationBackendBase<T>

Provides base implementation for distributed communication backends.

CommunicationManager

Central manager for distributed communication operations.

DDPModel<T, TInput, TOutput>

Implements DDP (Distributed Data Parallel) model wrapper for distributed training.

DDPOptimizer<T, TInput, TOutput>

Implements true DDP (Distributed Data Parallel) optimizer - industry-standard gradient averaging.

DistributedExtensions

Provides extension methods for easily enabling distributed training on models and optimizers.

ElasticOptimizer<T, TInput, TOutput>

Implements Elastic optimizer - supports dynamic worker addition/removal during training.

FSDPModel<T, TInput, TOutput>

Implements FSDP (Fully Sharded Data Parallel) model wrapper that shards parameters across multiple processes.

FSDPOptimizer<T, TInput, TOutput>

Implements FSDP (Fully Sharded Data Parallel) optimizer wrapper that coordinates optimization across multiple processes.

GlooCommunicationBackend<T>

Gloo-based communication backend for CPU-based collective operations.

GradientCompressionOptimizer<T, TInput, TOutput>
HybridShardedModel<T, TInput, TOutput>

Implements 3D Parallelism (Hybrid Sharded) model - combines data, tensor, and pipeline parallelism.

HybridShardedOptimizer<T, TInput, TOutput>

Implements 3D Parallelism optimizer - coordinates across data, tensor, and pipeline dimensions.

InMemoryCommunicationBackend<T>

Provides an in-memory implementation of distributed communication for testing and single-machine scenarios.

LocalSGDOptimizer<T, TInput, TOutput>

Implements Local SGD distributed training optimizer - parameter averaging after local optimization.

MPICommunicationBackend<T>

MPI.NET-based communication backend for production distributed training.

NCCLCommunicationBackend<T>

NVIDIA NCCL-based communication backend for GPU-to-GPU communication.

ParameterAnalyzer<T>

Analyzes model parameters and creates optimized groupings for distributed communication.

ParameterAnalyzer<T>.ParameterGroup

Represents a group of parameters that should be communicated together.

PipelineParallelModel<T, TInput, TOutput>

Implements Pipeline Parallel model wrapper - splits model into stages across ranks.

PipelineParallelOptimizer<T, TInput, TOutput>

Implements Pipeline Parallel optimizer - coordinates optimization across pipeline stages.

ShardedModelBase<T, TInput, TOutput>

Provides base implementation for distributed models with parameter sharding.

ShardedOptimizerBase<T, TInput, TOutput>

Provides base implementation for distributed optimizers with parameter sharding.

ShardingConfiguration<T>

Default implementation of sharding configuration for distributed training.

TensorParallelModel<T, TInput, TOutput>

Implements Tensor Parallel model wrapper - splits individual layers across ranks (Megatron-LM style).

TensorParallelOptimizer<T, TInput, TOutput>

Implements Tensor Parallel optimizer - coordinates updates for tensor-parallel layers.

ZeRO1Model<T, TInput, TOutput>

Implements ZeRO Stage 1 model wrapper - shards optimizer states only.

ZeRO1Optimizer<T, TInput, TOutput>

Implements ZeRO Stage 1 optimizer - shards optimizer states only.

ZeRO2Model<T, TInput, TOutput>

Implements ZeRO Stage 2 model wrapper - shards optimizer states and gradients.

ZeRO2Optimizer<T, TInput, TOutput>

Implements ZeRO Stage 2 optimizer - shards gradients and optimizer states across ranks.

ZeRO3Model<T, TInput, TOutput>

Implements ZeRO Stage 3 model wrapper - full sharding of parameters, gradients, and optimizer states.

ZeRO3Optimizer<T, TInput, TOutput>

Implements ZeRO Stage 3 optimizer - full sharding equivalent to FSDP.

Interfaces

ICommunicationBackend<T>

Defines the contract for distributed communication backends.

IShardedModel<T, TInput, TOutput>

Defines the contract for models that support distributed training with parameter sharding.

IShardedOptimizer<T, TInput, TOutput>

Defines the contract for optimizers that support distributed training with parameter sharding.

IShardingConfiguration<T>

Configuration for parameter sharding in distributed training.

Enums

ReductionOperation

Defines the supported reduction operations for collective communication.