Class StochasticGradientDescentOptimizer<T, TInput, TOutput>

Namespace: AiDotNet.Optimizers

Assembly: AiDotNet.dll

Represents a Stochastic Gradient Descent (SGD) optimizer for machine learning models.

public class StochasticGradientDescentOptimizer<T, TInput, TOutput> : GradientBasedOptimizerBase<T, TInput, TOutput>, IGradientBasedOptimizer<T, TInput, TOutput>, IOptimizer<T, TInput, TOutput>, IModelSerializer

Type Parameters

T: The numeric type used for calculations, typically float or double.
TInput
TOutput

Inheritance: object

OptimizerBase<T, TInput, TOutput>

GradientBasedOptimizerBase<T, TInput, TOutput>

StochasticGradientDescentOptimizer<T, TInput, TOutput>

Implements: IGradientBasedOptimizer<T, TInput, TOutput>

IOptimizer<T, TInput, TOutput>

IModelSerializer

Inherited Members: GradientBasedOptimizerBase<T, TInput, TOutput>.GradientOptions

GradientBasedOptimizerBase<T, TInput, TOutput>._previousGradient

GradientBasedOptimizerBase<T, TInput, TOutput>._lastComputedGradients

GradientBasedOptimizerBase<T, TInput, TOutput>.GradientCache

GradientBasedOptimizerBase<T, TInput, TOutput>.LossFunction

GradientBasedOptimizerBase<T, TInput, TOutput>.Regularization

GradientBasedOptimizerBase<T, TInput, TOutput>._mixedPrecisionContext

GradientBasedOptimizerBase<T, TInput, TOutput>._learningRateScheduler

GradientBasedOptimizerBase<T, TInput, TOutput>._schedulerStepMode

GradientBasedOptimizerBase<T, TInput, TOutput>._currentStep

GradientBasedOptimizerBase<T, TInput, TOutput>._currentEpoch

GradientBasedOptimizerBase<T, TInput, TOutput>.IsMixedPrecisionEnabled

GradientBasedOptimizerBase<T, TInput, TOutput>.LearningRateScheduler

GradientBasedOptimizerBase<T, TInput, TOutput>.SchedulerStepMode

GradientBasedOptimizerBase<T, TInput, TOutput>.CreateBatcher(OptimizationInputData<T, TInput, TOutput>, int)

GradientBasedOptimizerBase<T, TInput, TOutput>.CreateBatcher(OptimizationInputData<T, TInput, TOutput>, int, IDataSampler)

GradientBasedOptimizerBase<T, TInput, TOutput>.NotifyEpochStart(int)

GradientBasedOptimizerBase<T, TInput, TOutput>.LastComputedGradients

GradientBasedOptimizerBase<T, TInput, TOutput>.ApplyGradients(Vector<T>, IFullModel<T, TInput, TOutput>)

GradientBasedOptimizerBase<T, TInput, TOutput>.ApplyGradients(Vector<T>, Vector<T>, IFullModel<T, TInput, TOutput>)

GradientBasedOptimizerBase<T, TInput, TOutput>.ReverseUpdate(Vector<T>, Vector<T>)

GradientBasedOptimizerBase<T, TInput, TOutput>.CreateRegularization(GradientDescentOptimizerOptions<T, TInput, TOutput>)

GradientBasedOptimizerBase<T, TInput, TOutput>.CalculateGradient(IFullModel<T, TInput, TOutput>, TInput, TOutput)

GradientBasedOptimizerBase<T, TInput, TOutput>.ApplyGradientClipping(Vector<T>)

GradientBasedOptimizerBase<T, TInput, TOutput>.AreGradientsExploding(double)

GradientBasedOptimizerBase<T, TInput, TOutput>.AreGradientsVanishing(double)

GradientBasedOptimizerBase<T, TInput, TOutput>.GetGradientNorm()

GradientBasedOptimizerBase<T, TInput, TOutput>.ComputeHessianEfficiently(IFullModel<T, TInput, TOutput>, OptimizationInputData<T, TInput, TOutput>)

GradientBasedOptimizerBase<T, TInput, TOutput>.ComputeHessianFiniteDifferences(IFullModel<T, TInput, TOutput>, OptimizationInputData<T, TInput, TOutput>)

GradientBasedOptimizerBase<T, TInput, TOutput>.LineSearch(IFullModel<T, TInput, TOutput>, Vector<T>, Vector<T>, OptimizationInputData<T, TInput, TOutput>)

GradientBasedOptimizerBase<T, TInput, TOutput>.CalculateGradient(IFullModel<T, TInput, TOutput>, TInput, TOutput, int[])

GradientBasedOptimizerBase<T, TInput, TOutput>.UpdateSolution(IFullModel<T, TInput, TOutput>, Vector<T>)

GradientBasedOptimizerBase<T, TInput, TOutput>.GenerateGradientCacheKey(IFullModel<T, TInput, TOutput>, TInput, TOutput)

GradientBasedOptimizerBase<T, TInput, TOutput>.Reset()

GradientBasedOptimizerBase<T, TInput, TOutput>.StepScheduler()

GradientBasedOptimizerBase<T, TInput, TOutput>.OnEpochEnd()

GradientBasedOptimizerBase<T, TInput, TOutput>.OnBatchEnd()

GradientBasedOptimizerBase<T, TInput, TOutput>.IsInWarmupPhase()

GradientBasedOptimizerBase<T, TInput, TOutput>.GetCurrentLearningRate()

GradientBasedOptimizerBase<T, TInput, TOutput>.CurrentStep

GradientBasedOptimizerBase<T, TInput, TOutput>.CurrentEpoch

GradientBasedOptimizerBase<T, TInput, TOutput>.ApplyMomentum(Vector<T>)

GradientBasedOptimizerBase<T, TInput, TOutput>.UpdateParameters(List<ILayer<T>>)

GradientBasedOptimizerBase<T, TInput, TOutput>.UpdateParameters(Matrix<T>, Matrix<T>)

GradientBasedOptimizerBase<T, TInput, TOutput>.UpdateParameters(Tensor<T>, Tensor<T>)

GradientBasedOptimizerBase<T, TInput, TOutput>.UpdateParameters(Vector<T>, Vector<T>)

GradientBasedOptimizerBase<T, TInput, TOutput>.UpdateOptions(OptimizationAlgorithmOptions<T, TInput, TOutput>)

GradientBasedOptimizerBase<T, TInput, TOutput>.SupportsGpuUpdate

GradientBasedOptimizerBase<T, TInput, TOutput>._gpuState

GradientBasedOptimizerBase<T, TInput, TOutput>._gpuStateInitialized

GradientBasedOptimizerBase<T, TInput, TOutput>.UpdateParametersGpu(IGpuBuffer, IGpuBuffer, int, IDirectGpuBackend)

GradientBasedOptimizerBase<T, TInput, TOutput>.InitializeGpuState(int, IDirectGpuBackend)

GradientBasedOptimizerBase<T, TInput, TOutput>.DisposeGpuState()

OptimizerBase<T, TInput, TOutput>.Engine

OptimizerBase<T, TInput, TOutput>.NumOps

OptimizerBase<T, TInput, TOutput>.Random

OptimizerBase<T, TInput, TOutput>.Options

OptimizerBase<T, TInput, TOutput>.PredictionOptions

OptimizerBase<T, TInput, TOutput>.ModelStatsOptions

OptimizerBase<T, TInput, TOutput>.ModelEvaluator

OptimizerBase<T, TInput, TOutput>.FitDetector

OptimizerBase<T, TInput, TOutput>.FitnessCalculator

OptimizerBase<T, TInput, TOutput>.FitnessList

OptimizerBase<T, TInput, TOutput>.IterationHistoryList

OptimizerBase<T, TInput, TOutput>.ModelCache

OptimizerBase<T, TInput, TOutput>.CurrentLearningRate

OptimizerBase<T, TInput, TOutput>.CurrentMomentum

OptimizerBase<T, TInput, TOutput>.IterationsWithoutImprovement

OptimizerBase<T, TInput, TOutput>.IterationsWithImprovement

OptimizerBase<T, TInput, TOutput>.Model

OptimizerBase<T, TInput, TOutput>.Optimize(OptimizationInputData<T, TInput, TOutput>)

OptimizerBase<T, TInput, TOutput>.GetCachedStepData(string)

OptimizerBase<T, TInput, TOutput>.CacheStepData(string, OptimizationStepData<T, TInput, TOutput>)

OptimizerBase<T, TInput, TOutput>.AdjustModelParameters(IFullModel<T, TInput, TOutput>, double, double)

OptimizerBase<T, TInput, TOutput>.RandomlySelectFeatures(int, int?, int?)

OptimizerBase<T, TInput, TOutput>.ApplyFeatureSelection(IFullModel<T, TInput, TOutput>, List<int>)

OptimizerBase<T, TInput, TOutput>.AdjustParameters(Vector<T>, double, double)

OptimizerBase<T, TInput, TOutput>.EvaluateSolution(IFullModel<T, TInput, TOutput>, OptimizationInputData<T, TInput, TOutput>)

OptimizerBase<T, TInput, TOutput>.PrepareAndEvaluateSolution(IFullModel<T, TInput, TOutput>, OptimizationInputData<T, TInput, TOutput>)

OptimizerBase<T, TInput, TOutput>.CalculateLoss(IFullModel<T, TInput, TOutput>, OptimizationInputData<T, TInput, TOutput>)

OptimizerBase<T, TInput, TOutput>.CreateOptimizationResult(OptimizationStepData<T, TInput, TOutput>, OptimizationInputData<T, TInput, TOutput>)

OptimizerBase<T, TInput, TOutput>.ApplyFeatureSelection(IFullModel<T, TInput, TOutput>, int)

OptimizerBase<T, TInput, TOutput>.CreateSolution(TInput)

OptimizerBase<T, TInput, TOutput>.GenerateCacheKey(IFullModel<T, TInput, TOutput>, OptimizationInputData<T, TInput, TOutput>)

OptimizerBase<T, TInput, TOutput>.UpdateBestSolution(OptimizationStepData<T, TInput, TOutput>, ref OptimizationStepData<T, TInput, TOutput>)

OptimizerBase<T, TInput, TOutput>.InitializeAdaptiveParameters()

OptimizerBase<T, TInput, TOutput>.Reset()

OptimizerBase<T, TInput, TOutput>.ResetAdaptiveParameters()

OptimizerBase<T, TInput, TOutput>.UpdateAdaptiveParameters(OptimizationStepData<T, TInput, TOutput>, OptimizationStepData<T, TInput, TOutput>)

OptimizerBase<T, TInput, TOutput>.UpdateIterationHistoryAndCheckEarlyStopping(int, OptimizationStepData<T, TInput, TOutput>)

OptimizerBase<T, TInput, TOutput>.ShouldEarlyStop()

OptimizerBase<T, TInput, TOutput>.Serialize()

OptimizerBase<T, TInput, TOutput>.Deserialize(byte[])

OptimizerBase<T, TInput, TOutput>.SerializeAdditionalData(BinaryWriter)

OptimizerBase<T, TInput, TOutput>.DeserializeAdditionalData(BinaryReader)

OptimizerBase<T, TInput, TOutput>.UpdateOptions(OptimizationAlgorithmOptions<T, TInput, TOutput>)

OptimizerBase<T, TInput, TOutput>.Step()

OptimizerBase<T, TInput, TOutput>.CalculateUpdate(Dictionary<string, Vector<T>>)

OptimizerBase<T, TInput, TOutput>.GetOptions()

OptimizerBase<T, TInput, TOutput>.CalculateUpdate(Vector<T>, Vector<T>)

OptimizerBase<T, TInput, TOutput>.InitializeRandomSolution(Vector<T>, Vector<T>)

OptimizerBase<T, TInput, TOutput>.InitializeRandomSolution(TInput)

OptimizerBase<T, TInput, TOutput>.SaveModel(string)

OptimizerBase<T, TInput, TOutput>.LoadModel(string)

object.Equals(object)

object.Equals(object, object)

object.GetHashCode()

object.GetType()

object.MemberwiseClone()

object.ReferenceEquals(object, object)

object.ToString()

Extension Methods: DistributedExtensions.AsDistributed<T, TInput, TOutput>(IOptimizer<T, TInput, TOutput>, ICommunicationBackend<T>)

DistributedExtensions.AsDistributed<T, TInput, TOutput>(IOptimizer<T, TInput, TOutput>, IShardingConfiguration<T>)

Remarks

The StochasticGradientDescentOptimizer is a gradient-based optimization algorithm that iteratively adjusts model parameters to minimize the loss function. It uses a stochastic approach, updating parameters based on a subset of the training data in each iteration.

For Beginners: Think of this optimizer as a hiker trying to find the lowest point in a hilly landscape:

The hiker (optimizer) takes steps downhill to find the lowest point (best model parameters)
Instead of looking at the entire landscape at once, the hiker looks at small patches (subsets of data)
The hiker adjusts their step size (learning rate) as they go
This approach helps the hiker find a good low point quickly, even in a complex landscape

This method is efficient for large datasets and can often find good solutions quickly.

Constructors

StochasticGradientDescentOptimizer(IFullModel<T, TInput, TOutput>, StochasticGradientDescentOptimizerOptions<T, TInput, TOutput>?, IEngine?)

Initializes a new instance of the StochasticGradientDescentOptimizer class.

public StochasticGradientDescentOptimizer(IFullModel<T, TInput, TOutput> model, StochasticGradientDescentOptimizerOptions<T, TInput, TOutput>? options = null, IEngine? engine = null)

Parameters

model IFullModel<T, TInput, TOutput>
options StochasticGradientDescentOptimizerOptions<T, TInput, TOutput>: Options specific to the SGD optimizer.
engine IEngine

Remarks

This constructor sets up the SGD optimizer with the specified options and components. If no options are provided, default options are used.

For Beginners: This is like setting up your hiker with their gear before the hike:

You can give the hiker special instructions (options) for how to search
You can provide tools to measure progress (evaluator, fit detector, etc.)
If you don't provide instructions, the hiker will use a standard set

This setup ensures the optimizer is ready to start finding the best solution.

Methods

Deserialize(byte[])

Deserializes a byte array to restore the state of the StochasticGradientDescentOptimizer.

public override void Deserialize(byte[] data)

Parameters

data byte[]: The byte array containing the serialized optimizer state.

Remarks

This method restores the state of the optimizer from a byte array, including its base class data and SGD-specific options. It uses a BinaryReader to read the serialized data and reconstruct the optimizer's state.

For Beginners: This is like unpacking the hiker's backpack after a journey:

It reads the saved snapshot of the hiker's journey
It restores both general hiking info and SGD-specific details
If there's a problem reading the SGD-specific details, it reports an error

This allows you to continue from a previously saved state of the optimizer.

Exceptions

InvalidOperationException: Thrown when deserialization of optimizer options fails.

GenerateGradientCacheKey(IFullModel<T, TInput, TOutput>, TInput, TOutput)

Generates a unique cache key for gradient calculations.

protected override string GenerateGradientCacheKey(IFullModel<T, TInput, TOutput> model, TInput X, TOutput y)

Parameters

model IFullModel<T, TInput, TOutput>: The symbolic model for which the gradient is being calculated.
X TInput: The input data matrix.
y TOutput: The target vector.

Returns

string: A string representing the unique cache key.

Remarks

This method creates a unique identifier for caching gradient calculations. It combines the base cache key with SGD-specific parameters to ensure that cached gradients are only reused when all relevant parameters are identical.

For Beginners: This is like creating a unique label for each calculation the hiker does:

It starts with a basic label (baseKey) that describes the general calculation
It adds SGD-specific information like the current step size (learning rate) and how many steps the hiker is allowed to take (max iterations)
This unique label helps the hiker remember and quickly recall previous calculations instead of redoing them unnecessarily

This improves efficiency by avoiding redundant calculations.

GetOptions()

Gets the current options for this optimizer.

public override OptimizationAlgorithmOptions<T, TInput, TOutput> GetOptions()

Returns

OptimizationAlgorithmOptions<T, TInput, TOutput>: The current StochasticGradientDescentOptimizerOptions.

Remarks

This method returns the current configuration options of the SGD optimizer.

For Beginners: This is like asking the hiker what their current instructions are:

You can see how the hiker is currently set up to search
This includes things like how big their steps are, how many steps they're allowed to take, etc.

This is useful for understanding or checking the current setup of the optimizer.

Optimize(OptimizationInputData<T, TInput, TOutput>)

Performs the optimization process to find the best solution for the given input data.

public override OptimizationResult<T, TInput, TOutput> Optimize(OptimizationInputData<T, TInput, TOutput> inputData)

Parameters

inputData OptimizationInputData<T, TInput, TOutput>: The input data to optimize against.

Returns

OptimizationResult<T, TInput, TOutput>: An optimization result containing the best solution found and associated metrics.

Remarks

This method implements the main SGD algorithm. It iteratively updates the model parameters based on the calculated gradient, applying momentum and adaptive learning rates if configured. The process continues until either the maximum number of iterations is reached or early stopping criteria are met.

For Beginners: This is the main journey of our hiker:

Start at a random point on the hill (initialize random solution)
For each epoch (pass through the data):
- Process data in batches (default BatchSize=1 for true stochastic)
- For each batch:
  - Look around to decide which way is downhill (calculate gradient)
  - Apply momentum if configured
  - Take a step in that direction (update solution)
- Check if this is the lowest point found so far (evaluate and update best solution)
- Adjust step size if needed (update adaptive parameters)
- Decide whether to stop early if no progress is being made
Return the lowest point found during the entire journey

This process helps find a good solution efficiently, even in complex landscapes.

DataLoader Integration: This optimizer now uses the DataLoader batching infrastructure which supports: - Custom samplers (weighted, stratified, curriculum, importance, active learning) - Reproducible shuffling via RandomSeed - Option to drop incomplete final batches - True stochastic behavior with BatchSize=1 (default) Set these options via GradientBasedOptimizerOptions.DataSampler, ShuffleData, DropLastBatch, and RandomSeed.

Serialize()

Serializes the current state of the StochasticGradientDescentOptimizer to a byte array.

public override byte[] Serialize()

Returns

byte[]: A byte array representing the serialized state of the optimizer.

Remarks

This method saves the current state of the optimizer, including its base class data and SGD-specific options, into a byte array.

For Beginners: This is like taking a snapshot of the hiker's journey:

It saves all the current settings and progress
This saved data can be used later to continue from where you left off
It includes both general hiking info and SGD-specific details

This is useful for saving progress or sharing the optimizer's current state.

UpdateOptions(OptimizationAlgorithmOptions<T, TInput, TOutput>)

Updates the optimizer's options with the provided options.

protected override void UpdateOptions(OptimizationAlgorithmOptions<T, TInput, TOutput> options)

Parameters

options OptimizationAlgorithmOptions<T, TInput, TOutput>: The options to apply to this optimizer.

Remarks

This method ensures that only StochasticGradientDescentOptimizerOptions can be applied to this optimizer.

For Beginners: This is like giving the hiker new instructions mid-journey:

You can only give instructions specific to this type of hike (SGD)
If you try to give the wrong type of instructions, it will cause an error

This ensures that the optimizer always has the correct type of settings.

Exceptions

ArgumentException: Thrown when the options are not of the expected type.

UpdateParametersGpu(IGpuBuffer, IGpuBuffer, int, IDirectGpuBackend)

Updates parameters on the GPU using vanilla SGD.

public override void UpdateParametersGpu(IGpuBuffer parameters, IGpuBuffer gradients, int parameterCount, IDirectGpuBackend backend)

Parameters

parameters IGpuBuffer
gradients IGpuBuffer
parameterCount int
backend IDirectGpuBackend

UpdateSolution(IFullModel<T, TInput, TOutput>, Vector<T>)

Updates the current solution based on the calculated gradient.

protected override IFullModel<T, TInput, TOutput> UpdateSolution(IFullModel<T, TInput, TOutput> currentSolution, Vector<T> gradient)

Parameters

currentSolution IFullModel<T, TInput, TOutput>: The current solution to update.
gradient Vector<T>: The calculated gradient.

Returns

IFullModel<T, TInput, TOutput>: A new ISymbolicModel representing the updated solution.

Remarks

This method applies the gradient descent update rule, subtracting the gradient multiplied by the learning rate from the current solution's coefficients.

For Beginners: This is like the hiker taking a step:

The direction to step is given by the gradient
The size of the step is controlled by the learning rate
The hiker moves from their current position in this direction and distance

This small step helps the hiker gradually move towards the lowest point.

Table of Contents

Class StochasticGradientDescentOptimizer<T, TInput, TOutput>

Type Parameters

Remarks

Constructors

StochasticGradientDescentOptimizer(IFullModel<T, TInput, TOutput>, StochasticGradientDescentOptimizerOptions<T, TInput, TOutput>?, IEngine?)

Parameters

Remarks

Methods

Deserialize(byte[])

Parameters

Remarks

Exceptions

GenerateGradientCacheKey(IFullModel<T, TInput, TOutput>, TInput, TOutput)

Parameters

Returns

Remarks

GetOptions()

Returns

Remarks

Optimize(OptimizationInputData<T, TInput, TOutput>)

Parameters

Returns

Remarks

Serialize()

Returns

Remarks

UpdateOptions(OptimizationAlgorithmOptions<T, TInput, TOutput>)

Parameters

Remarks

Exceptions

UpdateParametersGpu(IGpuBuffer, IGpuBuffer, int, IDirectGpuBackend)

Parameters

UpdateSolution(IFullModel<T, TInput, TOutput>, Vector<T>)

Parameters

Returns

Remarks