Class AMSGradOptimizerOptions<T, TInput, TOutput>

Namespace: AiDotNet.Models.Options

Assembly: AiDotNet.dll

Configuration options for the AMSGrad optimization algorithm, which is an improved variant of the Adam optimizer that addresses potential convergence issues by maintaining the maximum of past squared gradients.

public class AMSGradOptimizerOptions<T, TInput, TOutput> : GradientBasedOptimizerOptions<T, TInput, TOutput>

Type Parameters

T
TInput
TOutput

Inheritance: object

ModelOptions

OptimizationAlgorithmOptions<T, TInput, TOutput>

GradientBasedOptimizerOptions<T, TInput, TOutput>

AMSGradOptimizerOptions<T, TInput, TOutput>

Inherited Members: GradientBasedOptimizerOptions<T, TInput, TOutput>.GradientCache

GradientBasedOptimizerOptions<T, TInput, TOutput>.LossFunction

GradientBasedOptimizerOptions<T, TInput, TOutput>.Regularization

GradientBasedOptimizerOptions<T, TInput, TOutput>.DataSampler

GradientBasedOptimizerOptions<T, TInput, TOutput>.ShuffleData

GradientBasedOptimizerOptions<T, TInput, TOutput>.DropLastBatch

GradientBasedOptimizerOptions<T, TInput, TOutput>.RandomSeed

GradientBasedOptimizerOptions<T, TInput, TOutput>.EnableGradientClipping

GradientBasedOptimizerOptions<T, TInput, TOutput>.GradientClippingMethod

GradientBasedOptimizerOptions<T, TInput, TOutput>.MaxGradientNorm

GradientBasedOptimizerOptions<T, TInput, TOutput>.MaxGradientValue

GradientBasedOptimizerOptions<T, TInput, TOutput>.LearningRateScheduler

GradientBasedOptimizerOptions<T, TInput, TOutput>.SchedulerStepMode

OptimizationAlgorithmOptions<T, TInput, TOutput>.MaxIterations

OptimizationAlgorithmOptions<T, TInput, TOutput>.UseEarlyStopping

OptimizationAlgorithmOptions<T, TInput, TOutput>.EarlyStoppingPatience

OptimizationAlgorithmOptions<T, TInput, TOutput>.BadFitPatience

OptimizationAlgorithmOptions<T, TInput, TOutput>.MinimumFeatures

OptimizationAlgorithmOptions<T, TInput, TOutput>.MaximumFeatures

OptimizationAlgorithmOptions<T, TInput, TOutput>.UseExpressionTrees

OptimizationAlgorithmOptions<T, TInput, TOutput>.InitialLearningRate

OptimizationAlgorithmOptions<T, TInput, TOutput>.UseAdaptiveLearningRate

OptimizationAlgorithmOptions<T, TInput, TOutput>.LearningRateDecay

OptimizationAlgorithmOptions<T, TInput, TOutput>.MinLearningRate

OptimizationAlgorithmOptions<T, TInput, TOutput>.MaxLearningRate

OptimizationAlgorithmOptions<T, TInput, TOutput>.UseAdaptiveMomentum

OptimizationAlgorithmOptions<T, TInput, TOutput>.InitialMomentum

OptimizationAlgorithmOptions<T, TInput, TOutput>.MomentumIncreaseFactor

OptimizationAlgorithmOptions<T, TInput, TOutput>.MomentumDecreaseFactor

OptimizationAlgorithmOptions<T, TInput, TOutput>.MinMomentum

OptimizationAlgorithmOptions<T, TInput, TOutput>.MaxMomentum

OptimizationAlgorithmOptions<T, TInput, TOutput>.ExplorationRate

OptimizationAlgorithmOptions<T, TInput, TOutput>.MinExplorationRate

OptimizationAlgorithmOptions<T, TInput, TOutput>.MaxExplorationRate

OptimizationAlgorithmOptions<T, TInput, TOutput>.Tolerance

OptimizationAlgorithmOptions<T, TInput, TOutput>.OptimizationMode

OptimizationAlgorithmOptions<T, TInput, TOutput>.ParameterAdjustmentScale

OptimizationAlgorithmOptions<T, TInput, TOutput>.SignFlipProbability

OptimizationAlgorithmOptions<T, TInput, TOutput>.FeatureSelectionProbability

OptimizationAlgorithmOptions<T, TInput, TOutput>.ParameterAdjustmentProbability

OptimizationAlgorithmOptions<T, TInput, TOutput>.PredictionOptions

OptimizationAlgorithmOptions<T, TInput, TOutput>.ModelStatsOptions

OptimizationAlgorithmOptions<T, TInput, TOutput>.ModelEvaluator

OptimizationAlgorithmOptions<T, TInput, TOutput>.FitDetector

OptimizationAlgorithmOptions<T, TInput, TOutput>.FitnessCalculator

OptimizationAlgorithmOptions<T, TInput, TOutput>.ModelCache

OptimizationAlgorithmOptions<T, TInput, TOutput>.CreateDefaults(OptimizerType)

ModelOptions.Seed

object.Equals(object)

object.Equals(object, object)

object.GetHashCode()

object.GetType()

object.MemberwiseClone()

object.ReferenceEquals(object, object)

object.ToString()

Remarks

AMSGrad is an adaptive learning rate optimization algorithm that combines the benefits of AdaGrad and RMSProp while ensuring convergence by using a non-decreasing learning rate adjustment. It's particularly effective for deep learning models and non-convex optimization problems.

For Beginners: AMSGrad is like a smart running coach that adjusts your training pace based on your past performance. It remembers how difficult different parts of your training have been and adjusts accordingly, making sure you don't slow down too much on challenging sections. This helps your AI model learn more efficiently by giving more attention to important patterns and less to noise in the data. Unlike some other methods, AMSGrad ensures that your learning progress never goes backward, which helps it reach better solutions.

Properties

BatchSize

Gets or sets the batch size for mini-batch gradient descent.

public int BatchSize { get; set; }

Property Value

int: A positive integer, defaulting to 32.

Remarks

For Beginners: The batch size controls how many examples the optimizer looks at before making an update to the model. The default of 32 is a good balance for AMSGrad.

Beta1

Gets or sets the exponential decay rate for the first moment estimates (momentum).

public double Beta1 { get; set; }

Property Value

double: The first moment decay rate, defaulting to 0.9.

Remarks

Beta1 controls how much the algorithm relies on the gradient from the current iteration versus gradients from previous iterations. Values closer to 1.0 give more weight to past gradients, creating a stronger momentum effect.

For Beginners: Beta1 is like momentum when you're running - it determines how much your previous direction influences your current one. The default value of 0.9 means the algorithm considers about 90% of its previous direction and 10% of the new information when deciding which way to go. This helps the model move smoothly past small bumps in the learning landscape rather than zigzagging.

Beta2

Gets or sets the exponential decay rate for the second moment estimates (adaptive learning rates).

public double Beta2 { get; set; }

Property Value

double: The second moment decay rate, defaulting to 0.999.

Remarks

Beta2 controls how quickly the algorithm adapts the learning rate for each parameter based on historical gradient magnitudes. Values closer to 1.0 result in slower adaptation but more stable learning rates.

For Beginners: Beta2 determines how quickly the algorithm adjusts to the difficulty of different parts of the learning process. The default value of 0.999 means it takes a long-term view, considering almost all past experience when deciding how to adjust the learning rate for each parameter. This creates stability and prevents overreacting to temporary difficulties in the learning process.

Epsilon

Gets or sets a small constant added to denominators to improve numerical stability.

public double Epsilon { get; set; }

Property Value

double: The epsilon value, defaulting to 1e-8 (0.00000001).

Remarks

Epsilon prevents division by zero and reduces the impact of very small gradients, which could otherwise cause excessive parameter updates.

For Beginners: Epsilon is like a safety net that prevents mathematical errors when calculations get very small. It's a tiny value (0.00000001) that gets added to certain calculations to make sure the algorithm doesn't try to divide by zero or make other mathematical mistakes when working with very small numbers. You typically don't need to change this value unless you're experiencing numerical stability issues.

InitialLearningRate

Gets or sets the initial step size used for parameter updates during optimization.

public override double InitialLearningRate { get; set; }

Property Value

double: The learning rate, defaulting to 0.001.

Remarks

The learning rate controls how large each optimization step should be. Higher values can lead to faster convergence but may cause overshooting or instability, while lower values provide more stable but slower learning.

For Beginners: Think of the learning rate as how big of steps your AI takes when learning. A small value (like the default 0.001) means taking small, cautious steps - the model learns slowly but steadily. A larger value means taking bigger steps - learning might be faster, but the model might step too far and miss the best solution. The default value is generally a good starting point for most problems.

LearningRateDecreaseFactor

Gets or sets the factor by which to decrease the learning rate when the loss is increasing or oscillating.

public double LearningRateDecreaseFactor { get; set; }

Property Value

double: The learning rate decrease factor, defaulting to 0.95.

Remarks

When the optimization encounters difficulties or starts to diverge, the learning rate can be decreased by this factor to stabilize the process.

For Beginners: This is like slowing down when the path becomes tricky or unclear. The default value of 0.95 means the algorithm will reduce the learning rate by 5% when progress becomes difficult. This helps the model navigate challenging parts of the learning landscape without overshooting or getting stuck.

LearningRateIncreaseFactor

Gets or sets the factor by which to increase the learning rate when the loss is consistently decreasing.

public double LearningRateIncreaseFactor { get; set; }

Property Value

double: The learning rate increase factor, defaulting to 1.05.

Remarks

When the optimization is making consistent progress, the learning rate can be increased by this factor to speed up convergence.

For Beginners: This is like speeding up when you're on a straight, clear path. The default value of 1.05 means the algorithm will increase the learning rate by 5% when things are going well. This helps the model learn faster during periods when the path to the solution is clear and direct.

MaxLearningRate

Gets or sets the maximum allowed learning rate during adaptive adjustments.

public double MaxLearningRate { get; set; }

Property Value

double: The maximum learning rate, defaulting to 0.1.

Remarks

This prevents the learning rate from becoming too large during adaptive adjustments, which could cause the optimization to become unstable or diverge.

For Beginners: This sets a ceiling for how fast the learning can go. Even when progress is smooth and the algorithm wants to speed up, it won't increase the step size above this value (0.1 by default). This prevents your model from taking such large steps that it overshoots the optimal solution or becomes unstable.

MinLearningRate

Gets or sets the minimum allowed learning rate during adaptive adjustments.

public double MinLearningRate { get; set; }

Property Value

double: The minimum learning rate, defaulting to 1e-5 (0.00001).

Remarks

This prevents the learning rate from becoming too small during adaptive adjustments, which could cause the optimization to stall.

For Beginners: This sets a floor for how slow the learning can go. Even if the algorithm wants to be extra cautious, it won't reduce the step size below this value (0.00001 by default). This ensures that your model keeps making meaningful progress and doesn't get stuck taking infinitesimally small steps.

Table of Contents

Class AMSGradOptimizerOptions<T, TInput, TOutput>

Type Parameters

Remarks

Properties

BatchSize

Property Value

Remarks

Beta1

Property Value

Remarks

Beta2

Property Value

Remarks

Epsilon

Property Value

Remarks

InitialLearningRate

Property Value

Remarks

LearningRateDecreaseFactor

Property Value

Remarks

LearningRateIncreaseFactor

Property Value

Remarks

MaxLearningRate

Property Value

Remarks

MinLearningRate

Property Value

Remarks