Table of Contents

Class StochasticGradientDescentOptimizer<T, TInput, TOutput>

Namespace
AiDotNet.Optimizers
Assembly
AiDotNet.dll

Represents a Stochastic Gradient Descent (SGD) optimizer for machine learning models.

public class StochasticGradientDescentOptimizer<T, TInput, TOutput> : GradientBasedOptimizerBase<T, TInput, TOutput>, IGradientBasedOptimizer<T, TInput, TOutput>, IOptimizer<T, TInput, TOutput>, IModelSerializer

Type Parameters

T

The numeric type used for calculations, typically float or double.

TInput
TOutput
Inheritance
OptimizerBase<T, TInput, TOutput>
GradientBasedOptimizerBase<T, TInput, TOutput>
StochasticGradientDescentOptimizer<T, TInput, TOutput>
Implements
IGradientBasedOptimizer<T, TInput, TOutput>
IOptimizer<T, TInput, TOutput>
Inherited Members
Extension Methods

Remarks

The StochasticGradientDescentOptimizer is a gradient-based optimization algorithm that iteratively adjusts model parameters to minimize the loss function. It uses a stochastic approach, updating parameters based on a subset of the training data in each iteration.

For Beginners: Think of this optimizer as a hiker trying to find the lowest point in a hilly landscape:

  • The hiker (optimizer) takes steps downhill to find the lowest point (best model parameters)
  • Instead of looking at the entire landscape at once, the hiker looks at small patches (subsets of data)
  • The hiker adjusts their step size (learning rate) as they go
  • This approach helps the hiker find a good low point quickly, even in a complex landscape

This method is efficient for large datasets and can often find good solutions quickly.

Constructors

StochasticGradientDescentOptimizer(IFullModel<T, TInput, TOutput>, StochasticGradientDescentOptimizerOptions<T, TInput, TOutput>?, IEngine?)

Initializes a new instance of the StochasticGradientDescentOptimizer class.

public StochasticGradientDescentOptimizer(IFullModel<T, TInput, TOutput> model, StochasticGradientDescentOptimizerOptions<T, TInput, TOutput>? options = null, IEngine? engine = null)

Parameters

model IFullModel<T, TInput, TOutput>
options StochasticGradientDescentOptimizerOptions<T, TInput, TOutput>

Options specific to the SGD optimizer.

engine IEngine

Remarks

This constructor sets up the SGD optimizer with the specified options and components. If no options are provided, default options are used.

For Beginners: This is like setting up your hiker with their gear before the hike:

  • You can give the hiker special instructions (options) for how to search
  • You can provide tools to measure progress (evaluator, fit detector, etc.)
  • If you don't provide instructions, the hiker will use a standard set

This setup ensures the optimizer is ready to start finding the best solution.

Methods

Deserialize(byte[])

Deserializes a byte array to restore the state of the StochasticGradientDescentOptimizer.

public override void Deserialize(byte[] data)

Parameters

data byte[]

The byte array containing the serialized optimizer state.

Remarks

This method restores the state of the optimizer from a byte array, including its base class data and SGD-specific options. It uses a BinaryReader to read the serialized data and reconstruct the optimizer's state.

For Beginners: This is like unpacking the hiker's backpack after a journey:

  • It reads the saved snapshot of the hiker's journey
  • It restores both general hiking info and SGD-specific details
  • If there's a problem reading the SGD-specific details, it reports an error

This allows you to continue from a previously saved state of the optimizer.

Exceptions

InvalidOperationException

Thrown when deserialization of optimizer options fails.

GenerateGradientCacheKey(IFullModel<T, TInput, TOutput>, TInput, TOutput)

Generates a unique cache key for gradient calculations.

protected override string GenerateGradientCacheKey(IFullModel<T, TInput, TOutput> model, TInput X, TOutput y)

Parameters

model IFullModel<T, TInput, TOutput>

The symbolic model for which the gradient is being calculated.

X TInput

The input data matrix.

y TOutput

The target vector.

Returns

string

A string representing the unique cache key.

Remarks

This method creates a unique identifier for caching gradient calculations. It combines the base cache key with SGD-specific parameters to ensure that cached gradients are only reused when all relevant parameters are identical.

For Beginners: This is like creating a unique label for each calculation the hiker does:

  • It starts with a basic label (baseKey) that describes the general calculation
  • It adds SGD-specific information like the current step size (learning rate) and how many steps the hiker is allowed to take (max iterations)
  • This unique label helps the hiker remember and quickly recall previous calculations instead of redoing them unnecessarily

This improves efficiency by avoiding redundant calculations.

GetOptions()

Gets the current options for this optimizer.

public override OptimizationAlgorithmOptions<T, TInput, TOutput> GetOptions()

Returns

OptimizationAlgorithmOptions<T, TInput, TOutput>

The current StochasticGradientDescentOptimizerOptions.

Remarks

This method returns the current configuration options of the SGD optimizer.

For Beginners: This is like asking the hiker what their current instructions are:

  • You can see how the hiker is currently set up to search
  • This includes things like how big their steps are, how many steps they're allowed to take, etc.

This is useful for understanding or checking the current setup of the optimizer.

Optimize(OptimizationInputData<T, TInput, TOutput>)

Performs the optimization process to find the best solution for the given input data.

public override OptimizationResult<T, TInput, TOutput> Optimize(OptimizationInputData<T, TInput, TOutput> inputData)

Parameters

inputData OptimizationInputData<T, TInput, TOutput>

The input data to optimize against.

Returns

OptimizationResult<T, TInput, TOutput>

An optimization result containing the best solution found and associated metrics.

Remarks

This method implements the main SGD algorithm. It iteratively updates the model parameters based on the calculated gradient, applying momentum and adaptive learning rates if configured. The process continues until either the maximum number of iterations is reached or early stopping criteria are met.

For Beginners: This is the main journey of our hiker:

  1. Start at a random point on the hill (initialize random solution)
  2. For each epoch (pass through the data):
    • Process data in batches (default BatchSize=1 for true stochastic)
    • For each batch:
      • Look around to decide which way is downhill (calculate gradient)
      • Apply momentum if configured
      • Take a step in that direction (update solution)
    • Check if this is the lowest point found so far (evaluate and update best solution)
    • Adjust step size if needed (update adaptive parameters)
    • Decide whether to stop early if no progress is being made
  3. Return the lowest point found during the entire journey

This process helps find a good solution efficiently, even in complex landscapes.

DataLoader Integration: This optimizer now uses the DataLoader batching infrastructure which supports: - Custom samplers (weighted, stratified, curriculum, importance, active learning) - Reproducible shuffling via RandomSeed - Option to drop incomplete final batches - True stochastic behavior with BatchSize=1 (default) Set these options via GradientBasedOptimizerOptions.DataSampler, ShuffleData, DropLastBatch, and RandomSeed.

Serialize()

Serializes the current state of the StochasticGradientDescentOptimizer to a byte array.

public override byte[] Serialize()

Returns

byte[]

A byte array representing the serialized state of the optimizer.

Remarks

This method saves the current state of the optimizer, including its base class data and SGD-specific options, into a byte array.

For Beginners: This is like taking a snapshot of the hiker's journey:

  • It saves all the current settings and progress
  • This saved data can be used later to continue from where you left off
  • It includes both general hiking info and SGD-specific details

This is useful for saving progress or sharing the optimizer's current state.

UpdateOptions(OptimizationAlgorithmOptions<T, TInput, TOutput>)

Updates the optimizer's options with the provided options.

protected override void UpdateOptions(OptimizationAlgorithmOptions<T, TInput, TOutput> options)

Parameters

options OptimizationAlgorithmOptions<T, TInput, TOutput>

The options to apply to this optimizer.

Remarks

This method ensures that only StochasticGradientDescentOptimizerOptions can be applied to this optimizer.

For Beginners: This is like giving the hiker new instructions mid-journey:

  • You can only give instructions specific to this type of hike (SGD)
  • If you try to give the wrong type of instructions, it will cause an error

This ensures that the optimizer always has the correct type of settings.

Exceptions

ArgumentException

Thrown when the options are not of the expected type.

UpdateParametersGpu(IGpuBuffer, IGpuBuffer, int, IDirectGpuBackend)

Updates parameters on the GPU using vanilla SGD.

public override void UpdateParametersGpu(IGpuBuffer parameters, IGpuBuffer gradients, int parameterCount, IDirectGpuBackend backend)

Parameters

parameters IGpuBuffer
gradients IGpuBuffer
parameterCount int
backend IDirectGpuBackend

UpdateSolution(IFullModel<T, TInput, TOutput>, Vector<T>)

Updates the current solution based on the calculated gradient.

protected override IFullModel<T, TInput, TOutput> UpdateSolution(IFullModel<T, TInput, TOutput> currentSolution, Vector<T> gradient)

Parameters

currentSolution IFullModel<T, TInput, TOutput>

The current solution to update.

gradient Vector<T>

The calculated gradient.

Returns

IFullModel<T, TInput, TOutput>

A new ISymbolicModel representing the updated solution.

Remarks

This method applies the gradient descent update rule, subtracting the gradient multiplied by the learning rate from the current solution's coefficients.

For Beginners: This is like the hiker taking a step:

  • The direction to step is given by the gradient
  • The size of the step is controlled by the learning rate
  • The hiker moves from their current position in this direction and distance

This small step helps the hiker gradually move towards the lowest point.