Table of Contents

Class AdamOptimizerOptions<T, TInput, TOutput>

Namespace
AiDotNet.Models.Options
Assembly
AiDotNet.dll

Configuration options for the Adam optimization algorithm, which combines the benefits of AdaGrad and RMSProp.

public class AdamOptimizerOptions<T, TInput, TOutput> : GradientBasedOptimizerOptions<T, TInput, TOutput>

Type Parameters

T
TInput
TOutput
Inheritance
OptimizationAlgorithmOptions<T, TInput, TOutput>
GradientBasedOptimizerOptions<T, TInput, TOutput>
AdamOptimizerOptions<T, TInput, TOutput>
Inherited Members

Remarks

Adam (Adaptive Moment Estimation) is a popular optimization algorithm that computes adaptive learning rates for each parameter. It stores both an exponentially decaying average of past gradients (first moment) and past squared gradients (second moment).

For Beginners: Adam is like a smart learning assistant that remembers both the direction (momentum) and the size of previous steps. It automatically adjusts how big each step should be for each parameter, making it easier to train models without having to manually tune the learning rate. Adam is often a good default choice for many machine learning problems.

Properties

BatchSize

Gets or sets the batch size for mini-batch gradient descent.

public int BatchSize { get; set; }

Property Value

int

A positive integer, defaulting to 32.

Remarks

The batch size determines how many samples are processed before updating the model parameters. Larger batch sizes provide more stable gradient estimates but use more memory.

For Beginners: The batch size controls how many examples the optimizer looks at before making an update to the model:

  • BatchSize = 1: Update after each sample (true stochastic)
  • BatchSize = 32: Update after every 32 samples (typical mini-batch)
  • BatchSize = [entire dataset]: Batch gradient descent

The default of 32 is a good balance between speed and stability for Adam.

Beta1

Gets or sets the exponential decay rate for the first moment estimates.

public double Beta1 { get; set; }

Property Value

double

The beta1 value, defaulting to 0.9.

Remarks

For Beginners: Beta1 controls how much the algorithm remembers about the direction it was moving in previous steps. A value of 0.9 means it gives 90% importance to past directions and 10% to the new direction. Think of it like steering a boat - this parameter determines how much you consider your previous steering direction versus the new direction you want to go. Higher values (closer to 1) make for smoother but potentially slower learning.

Beta2

Gets or sets the exponential decay rate for the second moment estimates.

public double Beta2 { get; set; }

Property Value

double

The beta2 value, defaulting to 0.999.

Remarks

For Beginners: Beta2 controls how much the algorithm remembers about the size of previous steps for each parameter. A value of 0.999 means it gives 99.9% importance to past step sizes and only 0.1% to new information. This helps stabilize learning by preventing wild changes in step size. Think of it like remembering how bumpy the road has been for each wheel of your car, allowing you to adjust the suspension accordingly for a smoother ride.

Epsilon

Gets or sets a small constant added to denominators to prevent division by zero.

public double Epsilon { get; set; }

Property Value

double

The epsilon value, defaulting to 0.00000001.

Remarks

For Beginners: Epsilon is a tiny safety value that prevents the algorithm from crashing when it would otherwise divide by zero. It's like having training wheels that only activate when needed. You typically don't need to change this unless you're experiencing numerical stability issues.

InitialLearningRate

Gets or sets the initial learning rate for the Adam optimizer.

public override double InitialLearningRate { get; set; }

Property Value

double

The learning rate, defaulting to 0.001.

Remarks

For Beginners: The learning rate controls how big each step is during training. A value of 0.001 means taking small, careful steps. If this value is too large, the model might miss the optimal solution by stepping too far. If it's too small, training will take a very long time. The default of 0.001 works well for most problems, which is why Adam is popular - it doesn't require much tuning of this value.

MaxBeta1

Gets or sets the maximum allowed value for Beta1.

public double MaxBeta1 { get; set; }

Property Value

double

The maximum Beta1 value, defaulting to 0.999.

Remarks

For Beginners: This prevents Beta1 from becoming too large, which would make the algorithm rely too heavily on past directions and adapt too slowly to new information. Even if Beta1 keeps increasing, it won't go above this value. A maximum of 0.999 ensures the algorithm always incorporates at least some new directional information.

MaxBeta2

Gets or sets the maximum allowed value for Beta2.

public double MaxBeta2 { get; set; }

Property Value

double

The maximum Beta2 value, defaulting to 0.9999.

Remarks

For Beginners: This prevents Beta2 from becoming too large, which would make the algorithm rely too heavily on past step sizes and adapt too slowly. Even if Beta2 keeps increasing, it won't go above this value. A maximum of 0.9999 ensures the algorithm always incorporates at least some new information about step sizes, allowing it to adapt to changing conditions during training.

MinBeta1

Gets or sets the minimum allowed value for Beta1.

public double MinBeta1 { get; set; }

Property Value

double

The minimum Beta1 value, defaulting to 0.8.

Remarks

For Beginners: This prevents Beta1 from becoming too small, which would make the algorithm ignore past directions too much. Even if Beta1 keeps decreasing, it won't go below this value. A minimum of 0.8 ensures the algorithm always considers at least some of its previous momentum, preventing it from changing direction too abruptly.

MinBeta2

Gets or sets the minimum allowed value for Beta2.

public double MinBeta2 { get; set; }

Property Value

double

The minimum Beta2 value, defaulting to 0.8.

Remarks

For Beginners: This prevents Beta2 from becoming too small, which would make the algorithm ignore past step sizes too much. Even if Beta2 keeps decreasing, it won't go below this value. A minimum of 0.8 ensures the algorithm always considers at least some of its previous step size information, maintaining some stability in the learning process.

UseAdaptiveBetas

Gets or sets whether to automatically adjust the Beta parameters during training.

public bool UseAdaptiveBetas { get; set; }

Property Value

bool

True to use adaptive betas (default), false otherwise.

Remarks

For Beginners: When enabled, the algorithm will automatically adjust how much it relies on past information based on how well it's performing. If the model is improving, it will trust its memory more. If performance worsens, it will pay more attention to new information. This helps the algorithm adapt to different phases of learning, like slowing down when approaching the destination and speeding up when far away.