Table of Contents

Class AsyncSGDOptimizer<T, TInput, TOutput>

Namespace
AiDotNet.DistributedTraining
Assembly
AiDotNet.dll

Implements Asynchronous SGD optimizer - allows asynchronous parameter updates without strict barriers.

public class AsyncSGDOptimizer<T, TInput, TOutput> : ShardedOptimizerBase<T, TInput, TOutput>, IShardedOptimizer<T, TInput, TOutput>, IOptimizer<T, TInput, TOutput>, IModelSerializer

Type Parameters

T

The numeric type

TInput

The input type for the model

TOutput

The output type for the model

Inheritance
ShardedOptimizerBase<T, TInput, TOutput>
AsyncSGDOptimizer<T, TInput, TOutput>
Implements
IShardedOptimizer<T, TInput, TOutput>
IOptimizer<T, TInput, TOutput>
Inherited Members
Extension Methods

Remarks

Strategy Overview: Asynchronous SGD (and variants like Hogwild!) removes synchronization barriers between workers. Each process updates parameters independently without waiting for others, using a parameter server or shared memory. This eliminates idle time from synchronization but introduces stale gradients - workers may compute gradients on slightly outdated parameters.

When done correctly (sparse gradients, low contention), async SGD can achieve near-linear speedup without much accuracy loss. However, it's more sensitive to hyperparameters and can be unstable for dense updates.

For Beginners: Async SGD is like a team working independently without meetings. Each person: 1. Reads current parameters 2. Computes gradients on their data 3. Updates parameters immediately (no waiting!)

Pro: No time wasted waiting for slow workers Con: Updates might conflict or use slightly stale information

This works well when updates are sparse (touching different parameters) but can be unstable when all workers update the same parameters frequently.

Use Cases: - Sparse models (embeddings, recommendation systems) - Scenarios with stragglers (some workers slower than others) - When synchronization overhead is very high - Research and experimentation

Trade-offs: - Memory: Requires parameter server or shared memory - Communication: Asynchronous - can be higher total volume - Complexity: High - requires parameter server infrastructure - Convergence: Can be slower or less stable than sync SGD - Best for: Sparse updates, heterogeneous workers, straggler tolerance - Limitation: Harder to tune, may require staleness-aware algorithms

Implementation Note: This framework provides async SGD infrastructure. Full production implementation requires parameter server setup or shared memory coordination. This implementation demonstrates the async update pattern.

Example:

var optimizer = new AdamOptimizer<double, Tensor<double>, Tensor<double>>(model, options);
var backend = new InMemoryCommunicationBackend<double>(rank: 0, worldSize: 4);
var config = new ShardingConfiguration<double>(backend);

var asyncOptimizer = new AsyncSGDOptimizer<double, Tensor<double>, Tensor<double>>( optimizer, config, allowStaleness: 2); // Allow up to 2 stale gradient steps

Constructors

AsyncSGDOptimizer(IOptimizer<T, TInput, TOutput>, IShardingConfiguration<T>, int)

Creates an async SGD optimizer.

public AsyncSGDOptimizer(IOptimizer<T, TInput, TOutput> wrappedOptimizer, IShardingConfiguration<T> config, int allowStaleness = 0)

Parameters

wrappedOptimizer IOptimizer<T, TInput, TOutput>

The optimizer to wrap with async capabilities

config IShardingConfiguration<T>

Configuration for sharding and communication

allowStaleness int

Maximum allowed staleness in gradient steps (default: 0 = sync)

Methods

Deserialize(byte[])

Loads a previously serialized model from binary data.

public override void Deserialize(byte[] data)

Parameters

data byte[]

The byte array containing the serialized model data.

Remarks

This method takes binary data created by the Serialize method and uses it to restore a model to its previous state.

For Beginners: This is like opening a saved file to continue your work.

When you call this method:

  • You provide the binary data (bytes) that was previously created by Serialize
  • The model rebuilds itself using this data
  • After deserializing, the model is exactly as it was when serialized
  • It's ready to make predictions without needing to be trained again

For example:

  • You download a pre-trained model file for detecting spam emails
  • You deserialize this file into your application
  • Immediately, your application can detect spam without any training
  • The model has all the knowledge that was built into it by its original creator

This is particularly useful when:

  • You want to use a model that took days to train
  • You need to deploy the same model across multiple devices
  • You're creating an application that non-technical users will use

Think of it like installing the brain of a trained expert directly into your application.

Optimize(OptimizationInputData<T, TInput, TOutput>)

Performs the optimization process to find the best parameters for a model.

public override OptimizationResult<T, TInput, TOutput> Optimize(OptimizationInputData<T, TInput, TOutput> inputData)

Parameters

inputData OptimizationInputData<T, TInput, TOutput>

The data needed for optimization, including the objective function, initial parameters, and any constraints.

Returns

OptimizationResult<T, TInput, TOutput>

The result of the optimization process, including the optimized parameters and performance metrics.

Remarks

This method takes input data and attempts to find the optimal parameters that minimize or maximize the objective function.

For Beginners: This is where the actual "learning" happens. The optimizer looks at your data and tries different parameter values to find the ones that make your model perform best.

The process typically involves:

  1. Evaluating how well the current parameters perform
  2. Calculating how to change the parameters to improve performance
  3. Updating the parameters
  4. Repeating until the model performs well enough or reaches a maximum number of attempts

Serialize()

Converts the current state of a machine learning model into a binary format.

public override byte[] Serialize()

Returns

byte[]

A byte array containing the serialized model data.

Remarks

This method captures all the essential information about a trained model and converts it into a sequence of bytes that can be stored or transmitted.

For Beginners: This is like exporting your work to a file.

When you call this method:

  • The model's current state (all its learned patterns and parameters) is captured
  • This information is converted into a compact binary format (bytes)
  • You can then save these bytes to a file, database, or send them over a network

For example:

  • After training a model to recognize cats vs. dogs in images
  • You can serialize the model to save all its learned knowledge
  • Later, you can use this saved data to recreate the model exactly as it was
  • The recreated model will make the same predictions as the original

Think of it like taking a snapshot of your model's brain at a specific moment in time.

ShouldSync(int)

Checks if a barrier should be used (for periodic synchronization).

public bool ShouldSync(int iteration)

Parameters

iteration int

Current iteration number

Returns

bool

True if should synchronize at this iteration

SynchronizeOptimizerState()

Synchronizes optimizer state (like momentum buffers) across all processes.

public override void SynchronizeOptimizerState()

Remarks

For Beginners: Some optimizers (like Adam) keep track of past gradients to make smarter updates. This method makes sure all processes have the same optimizer state, so they stay coordinated. It's like making sure all team members are reading from the same playbook.