Class ZeRO3Model<T, TInput, TOutput>

Namespace: AiDotNet.DistributedTraining

Assembly: AiDotNet.dll

Implements ZeRO Stage 3 model wrapper - full sharding of parameters, gradients, and optimizer states.

public class ZeRO3Model<T, TInput, TOutput> : FSDPModel<T, TInput, TOutput>, IShardedModel<T, TInput, TOutput>, IFullModel<T, TInput, TOutput>, IModel<TInput, TOutput, ModelMetadata<T>>, IModelSerializer, ICheckpointableModel, IParameterizable<T, TInput, TOutput>, IFeatureAware, IFeatureImportance<T>, ICloneable<IFullModel<T, TInput, TOutput>>, IGradientComputable<T, TInput, TOutput>, IJitCompilable<T>

Type Parameters

T: The numeric type
TInput: The input type for the model
TOutput: The output type for the model

Inheritance: object

ShardedModelBase<T, TInput, TOutput>

FSDPModel<T, TInput, TOutput>

ZeRO3Model<T, TInput, TOutput>

Implements: IShardedModel<T, TInput, TOutput>

IFullModel<T, TInput, TOutput>

IModel<TInput, TOutput, ModelMetadata<T>>

IModelSerializer

ICheckpointableModel

IParameterizable<T, TInput, TOutput>

IFeatureAware

IFeatureImportance<T>

ICloneable<IFullModel<T, TInput, TOutput>>

IGradientComputable<T, TInput, TOutput>

IJitCompilable<T>

Inherited Members: FSDPModel<T, TInput, TOutput>.Train(TInput, TOutput)

FSDPModel<T, TInput, TOutput>.Predict(TInput)

FSDPModel<T, TInput, TOutput>.SynchronizeGradients()

FSDPModel<T, TInput, TOutput>.GetModelMetadata()

FSDPModel<T, TInput, TOutput>.WithParameters(Vector<T>)

FSDPModel<T, TInput, TOutput>.Serialize()

FSDPModel<T, TInput, TOutput>.Deserialize(byte[])

FSDPModel<T, TInput, TOutput>.SaveModel(string)

FSDPModel<T, TInput, TOutput>.LoadModel(string)

FSDPModel<T, TInput, TOutput>.Clone()

FSDPModel<T, TInput, TOutput>.GetFeatureImportance()

FSDPModel<T, TInput, TOutput>.DeepCopy()

FSDPModel<T, TInput, TOutput>.GetActiveFeatureIndices()

FSDPModel<T, TInput, TOutput>.SetActiveFeatureIndices(IEnumerable<int>)

FSDPModel<T, TInput, TOutput>.IsFeatureUsed(int)

ShardedModelBase<T, TInput, TOutput>.NumOps

ShardedModelBase<T, TInput, TOutput>.Config

ShardedModelBase<T, TInput, TOutput>.LocalShard

ShardedModelBase<T, TInput, TOutput>.CachedFullParameters

ShardedModelBase<T, TInput, TOutput>.ShardStartIndex

ShardedModelBase<T, TInput, TOutput>.ShardSize

ShardedModelBase<T, TInput, TOutput>.WrappedModel

ShardedModelBase<T, TInput, TOutput>.WrappedModelInternal

ShardedModelBase<T, TInput, TOutput>.Rank

ShardedModelBase<T, TInput, TOutput>.WorldSize

ShardedModelBase<T, TInput, TOutput>.LocalParameterShard

ShardedModelBase<T, TInput, TOutput>.ShardingConfiguration

ShardedModelBase<T, TInput, TOutput>.ParameterCount

ShardedModelBase<T, TInput, TOutput>.OnBeforeInitializeSharding()

ShardedModelBase<T, TInput, TOutput>.InitializeSharding()

ShardedModelBase<T, TInput, TOutput>.GatherFullParameters()

ShardedModelBase<T, TInput, TOutput>.SynchronizeGradients()

ShardedModelBase<T, TInput, TOutput>.InvalidateCache()

ShardedModelBase<T, TInput, TOutput>.UpdateLocalShardFromFull(Vector<T>)

ShardedModelBase<T, TInput, TOutput>.Train(TInput, TOutput)

ShardedModelBase<T, TInput, TOutput>.Predict(TInput)

ShardedModelBase<T, TInput, TOutput>.GetModelMetadata()

ShardedModelBase<T, TInput, TOutput>.GetParameters()

ShardedModelBase<T, TInput, TOutput>.SetParameters(Vector<T>)

ShardedModelBase<T, TInput, TOutput>.WithParameters(Vector<T>)

ShardedModelBase<T, TInput, TOutput>.Serialize()

ShardedModelBase<T, TInput, TOutput>.Deserialize(byte[])

ShardedModelBase<T, TInput, TOutput>.SaveModel(string)

ShardedModelBase<T, TInput, TOutput>.LoadModel(string)

ShardedModelBase<T, TInput, TOutput>.Clone()

ShardedModelBase<T, TInput, TOutput>.DeepCopy()

ShardedModelBase<T, TInput, TOutput>.GetFeatureImportance()

ShardedModelBase<T, TInput, TOutput>.GetActiveFeatureIndices()

ShardedModelBase<T, TInput, TOutput>.SetActiveFeatureIndices(IEnumerable<int>)

ShardedModelBase<T, TInput, TOutput>.IsFeatureUsed(int)

ShardedModelBase<T, TInput, TOutput>.DefaultLossFunction

ShardedModelBase<T, TInput, TOutput>.ComputeGradients(TInput, TOutput, ILossFunction<T>)

ShardedModelBase<T, TInput, TOutput>.ApplyGradients(Vector<T>, T)

ShardedModelBase<T, TInput, TOutput>.SupportsJitCompilation

ShardedModelBase<T, TInput, TOutput>.ExportComputationGraph(List<ComputationNode<T>>)

ShardedModelBase<T, TInput, TOutput>.SaveState(Stream)

ShardedModelBase<T, TInput, TOutput>.LoadState(Stream)

object.Equals(object)

object.Equals(object, object)

object.GetHashCode()

object.GetType()

object.MemberwiseClone()

object.ReferenceEquals(object, object)

object.ToString()

Extension Methods: DistributedExtensions.AsDistributedForHighBandwidth<T, TInput, TOutput>(IFullModel<T, TInput, TOutput>, ICommunicationBackend<T>)

DistributedExtensions.AsDistributedForLowBandwidth<T, TInput, TOutput>(IFullModel<T, TInput, TOutput>, ICommunicationBackend<T>)

DistributedExtensions.AsDistributed<T, TInput, TOutput>(IFullModel<T, TInput, TOutput>, ICommunicationBackend<T>)

DistributedExtensions.AsDistributed<T, TInput, TOutput>(IFullModel<T, TInput, TOutput>, IShardingConfiguration<T>)

Remarks

Strategy Overview: ZeRO Stage 3 is the full implementation of the ZeRO optimization, sharding parameters, gradients, AND optimizer states across all processes. This is equivalent to PyTorch's FSDP (Fully Sharded Data Parallel). Parameters are gathered just-in-time for forward/backward passes and immediately released, maximizing memory efficiency.

For Beginners: ZeRO-3 is identical to FSDP - it's the ultimate memory-saving strategy. Everything is sharded: parameters, gradients, and optimizer states. Each process only holds a small piece of the model, and pieces are gathered only when absolutely needed, then immediately released.

This class is essentially an alias/wrapper for FSDPModel to maintain ZeRO naming consistency.

Use Cases: - Same as FSDP - training very large models - When you prefer ZeRO terminology over FSDP - Maximum memory efficiency

Trade-offs: - Same as FSDP - Memory: Excellent - everything sharded - Communication: Higher - AllGather for each forward/backward - Complexity: Moderate

Example:

var model = new NeuralNetworkModel<double>(...);
var backend = new InMemoryCommunicationBackend<double>(rank: 0, worldSize: 4);
var config = new ShardingConfiguration<double>(backend);
// ZeRO-3 and FSDP are equivalent
var zero3Model = new ZeRO3Model<double, Tensor<double>, Tensor<double>>(model, config);
// Or equivalently:
// var fsdpModel = new FSDPModel<double, Tensor<double>, Tensor<double>>(model, config);

Constructors

ZeRO3Model(IFullModel<T, TInput, TOutput>, IShardingConfiguration<T>)

Creates a new ZeRO-3 model wrapping an existing model.

public ZeRO3Model(IFullModel<T, TInput, TOutput> wrappedModel, IShardingConfiguration<T> config)

Parameters

wrappedModel IFullModel<T, TInput, TOutput>: The model to wrap with ZeRO-3 capabilities
config IShardingConfiguration<T>: Configuration for sharding and communication

Remarks

For Beginners: ZeRO-3 is the same as FSDP, just different terminology. Use whichever name you prefer. This constructor delegates to FSDPModel for all functionality.

Exceptions

ArgumentNullException: Thrown if model or config is null

Methods

Clone()

Creates a shallow copy of this object.

public override IFullModel<T, TInput, TOutput> Clone()

Returns

IFullModel<T, TInput, TOutput>

GetModelMetadata()

Retrieves metadata and performance metrics about the trained model.

public override ModelMetadata<T> GetModelMetadata()

Returns

ModelMetadata<T>: An object containing metadata and performance metrics about the trained model.

Remarks

This method provides information about the model's structure, parameters, and performance metrics.

For Beginners: Model metadata is like a report card for your machine learning model.

Just as a report card shows how well a student is performing in different subjects, model metadata shows how well your model is performing and provides details about its structure.

This information typically includes:

Accuracy measures: How well does the model's predictions match actual values?
Error metrics: How far off are the model's predictions on average?
Model parameters: What patterns did the model learn from the data?
Training information: How long did training take? How many iterations were needed?

For example, in a house price prediction model, metadata might include:

Average prediction error (e.g., off by $15,000 on average)
How strongly each feature (bedrooms, location) influences the prediction
How well the model fits the training data

This information helps you understand your model's strengths and weaknesses, and decide if it's ready to use or needs more training.

WithParameters(Vector<T>)

Creates a new instance with the specified parameters.

public override IFullModel<T, TInput, TOutput> WithParameters(Vector<T> parameters)

Parameters

parameters Vector<T>

Returns

IFullModel<T, TInput, TOutput>

Table of Contents

Class ZeRO3Model<T, TInput, TOutput>

Type Parameters

Remarks

Constructors

ZeRO3Model(IFullModel<T, TInput, TOutput>, IShardingConfiguration<T>)

Parameters

Remarks

Exceptions

Methods

Clone()

Returns

GetModelMetadata()

Returns

Remarks

WithParameters(Vector<T>)

Parameters

Returns