Class GradientBasedOptimizerOptions<T, TInput, TOutput>

Namespace: AiDotNet.Models.Options

Assembly: AiDotNet.dll

Configuration options for gradient-based optimization algorithms.

public class GradientBasedOptimizerOptions<T, TInput, TOutput> : OptimizationAlgorithmOptions<T, TInput, TOutput>

Type Parameters

T
TInput
TOutput

Inheritance: object

ModelOptions

OptimizationAlgorithmOptions<T, TInput, TOutput>

GradientBasedOptimizerOptions<T, TInput, TOutput>

Derived: ADMMOptimizerOptions<T, TInput, TOutput>

AMSGradOptimizerOptions<T, TInput, TOutput>

AdaDeltaOptimizerOptions<T, TInput, TOutput>

AdaMaxOptimizerOptions<T, TInput, TOutput>

AdagradOptimizerOptions<T, TInput, TOutput>

AdamOptimizerOptions<T, TInput, TOutput>

AdamWOptimizerOptions<T, TInput, TOutput>

BFGSOptimizerOptions<T, TInput, TOutput>

ConjugateGradientOptimizerOptions<T, TInput, TOutput>

CoordinateDescentOptimizerOptions<T, TInput, TOutput>

DFPOptimizerOptions<T, TInput, TOutput>

FTRLOptimizerOptions<T, TInput, TOutput>

GradientDescentOptimizerOptions<T, TInput, TOutput>

LAMBOptimizerOptions<T, TInput, TOutput>

LARSOptimizerOptions<T, TInput, TOutput>

LBFGSOptimizerOptions<T, TInput, TOutput>

LevenbergMarquardtOptimizerOptions<T, TInput, TOutput>

LionOptimizerOptions<T, TInput, TOutput>

MiniBatchGradientDescentOptions<T, TInput, TOutput>

MomentumOptimizerOptions<T, TInput, TOutput>

NadamOptimizerOptions<T, TInput, TOutput>

NesterovAcceleratedGradientOptimizerOptions<T, TInput, TOutput>

NewtonMethodOptimizerOptions<T, TInput, TOutput>

ProximalGradientDescentOptimizerOptions<T, TInput, TOutput>

RootMeanSquarePropagationOptimizerOptions<T, TInput, TOutput>

StochasticGradientDescentOptimizerOptions<T, TInput, TOutput>

TrustRegionOptimizerOptions<T, TInput, TOutput>

Inherited Members: OptimizationAlgorithmOptions<T, TInput, TOutput>.MaxIterations

OptimizationAlgorithmOptions<T, TInput, TOutput>.UseEarlyStopping

OptimizationAlgorithmOptions<T, TInput, TOutput>.EarlyStoppingPatience

OptimizationAlgorithmOptions<T, TInput, TOutput>.BadFitPatience

OptimizationAlgorithmOptions<T, TInput, TOutput>.MinimumFeatures

OptimizationAlgorithmOptions<T, TInput, TOutput>.MaximumFeatures

OptimizationAlgorithmOptions<T, TInput, TOutput>.UseExpressionTrees

OptimizationAlgorithmOptions<T, TInput, TOutput>.InitialLearningRate

OptimizationAlgorithmOptions<T, TInput, TOutput>.UseAdaptiveLearningRate

OptimizationAlgorithmOptions<T, TInput, TOutput>.LearningRateDecay

OptimizationAlgorithmOptions<T, TInput, TOutput>.MinLearningRate

OptimizationAlgorithmOptions<T, TInput, TOutput>.MaxLearningRate

OptimizationAlgorithmOptions<T, TInput, TOutput>.UseAdaptiveMomentum

OptimizationAlgorithmOptions<T, TInput, TOutput>.InitialMomentum

OptimizationAlgorithmOptions<T, TInput, TOutput>.MomentumIncreaseFactor

OptimizationAlgorithmOptions<T, TInput, TOutput>.MomentumDecreaseFactor

OptimizationAlgorithmOptions<T, TInput, TOutput>.MinMomentum

OptimizationAlgorithmOptions<T, TInput, TOutput>.MaxMomentum

OptimizationAlgorithmOptions<T, TInput, TOutput>.ExplorationRate

OptimizationAlgorithmOptions<T, TInput, TOutput>.MinExplorationRate

OptimizationAlgorithmOptions<T, TInput, TOutput>.MaxExplorationRate

OptimizationAlgorithmOptions<T, TInput, TOutput>.Tolerance

OptimizationAlgorithmOptions<T, TInput, TOutput>.OptimizationMode

OptimizationAlgorithmOptions<T, TInput, TOutput>.ParameterAdjustmentScale

OptimizationAlgorithmOptions<T, TInput, TOutput>.SignFlipProbability

OptimizationAlgorithmOptions<T, TInput, TOutput>.FeatureSelectionProbability

OptimizationAlgorithmOptions<T, TInput, TOutput>.ParameterAdjustmentProbability

OptimizationAlgorithmOptions<T, TInput, TOutput>.PredictionOptions

OptimizationAlgorithmOptions<T, TInput, TOutput>.ModelStatsOptions

OptimizationAlgorithmOptions<T, TInput, TOutput>.ModelEvaluator

OptimizationAlgorithmOptions<T, TInput, TOutput>.FitDetector

OptimizationAlgorithmOptions<T, TInput, TOutput>.FitnessCalculator

OptimizationAlgorithmOptions<T, TInput, TOutput>.ModelCache

OptimizationAlgorithmOptions<T, TInput, TOutput>.CreateDefaults(OptimizerType)

ModelOptions.Seed

object.Equals(object)

object.Equals(object, object)

object.GetHashCode()

object.GetType()

object.MemberwiseClone()

object.ReferenceEquals(object, object)

object.ToString()

Remarks

Gradient-based optimizers are algorithms that find the minimum or maximum of a function by following the direction of steepest descent or ascent (the gradient).

For Beginners: Imagine you're in a hilly landscape and want to find the lowest point. Gradient-based optimization is like always walking downhill in the steepest direction until you can't go any lower. The "gradient" is simply the direction of the steepest slope at your current position.

These algorithms are fundamental to training many machine learning models, including neural networks, linear regression, and logistic regression.

This class inherits from OptimizationAlgorithmOptions, which means it includes all the base configuration options for optimization algorithms plus any additional options specific to gradient-based methods.

Properties

DataSampler

Gets or sets the optional data sampler for advanced sampling strategies during batch creation.

public IDataSampler? DataSampler { get; set; }

Property Value

IDataSampler

Remarks

A data sampler controls how training examples are selected and ordered during batch creation. This enables advanced sampling strategies like: - Weighted sampling for class imbalance - Stratified sampling to maintain class proportions - Curriculum learning to start with easy examples - Importance sampling to focus on high-loss examples - Active learning to prioritize uncertain examples

For Beginners: Think of this as choosing which examples to show the model and in what order. If you have more examples of cats than dogs, weighted sampling can help the model see dogs more often. Curriculum learning shows easy examples first, like learning to walk before running.

Example:

// Balanced sampling for imbalanced classes
options.DataSampler = Samplers.Balanced(labels, numClasses: 2);

// Curriculum learning (easy to hard)
options.DataSampler = Samplers.Curriculum(difficulties);

DropLastBatch

Gets or sets whether to drop the last incomplete batch.

public bool DropLastBatch { get; set; }

Property Value

bool

Remarks

When the training data size is not evenly divisible by the batch size, the last batch will be smaller. Setting this to true discards that incomplete batch.

For Beginners: If you have 100 examples and a batch size of 32, you'll have 3 full batches (96 examples) and 1 partial batch (4 examples). Setting DropLastBatch=true discards that partial batch, which can help with training stability.

EnableGradientClipping

Gets or sets whether gradient clipping is enabled.

public bool EnableGradientClipping { get; set; }

Property Value

bool

Remarks

Gradient clipping helps prevent exploding gradients during training by limiting the magnitude of gradients. This is particularly important for deep networks and recurrent neural networks.

For Beginners: Sometimes during training, gradients can become extremely large, causing the model to take huge steps that destabilize learning. Gradient clipping is like putting a speed limit on these updates to keep training stable.

GradientCache

Gets or sets the gradient cache to use for storing and retrieving computed gradients.

public IGradientCache<T> GradientCache { get; set; }

Property Value

IGradientCache<T>

Remarks

The gradient cache helps avoid redundant gradient calculations by storing previously computed gradients. This can significantly improve performance, especially when the same model is evaluated multiple times.

For Beginners: Think of this as a memory that stores calculations you've already done. Instead of recalculating the same gradient multiple times, the optimizer can look it up in this cache, saving computational resources.

GradientClippingMethod

Gets or sets the gradient clipping method to use.

public GradientClippingMethod GradientClippingMethod { get; set; }

Property Value

GradientClippingMethod

Remarks

Two main methods are available: - ByNorm: Scales the entire gradient vector if its norm exceeds a threshold (recommended) - ByValue: Clips each gradient element independently to a range

For Beginners: ClipByNorm is generally preferred because it preserves the direction of the gradient while only reducing its magnitude. ClipByValue is simpler but can change the gradient direction.

LearningRateScheduler

Gets or sets the learning rate scheduler to use during training.

public ILearningRateScheduler? LearningRateScheduler { get; set; }

Property Value

ILearningRateScheduler

Remarks

Learning rate schedulers dynamically adjust the learning rate during training, which can significantly improve convergence and final model performance.

For Beginners: A learning rate scheduler automatically adjusts how fast your model learns during training. Common strategies include: - Starting with a higher learning rate and gradually decreasing it - Using warmup to slowly increase the learning rate at the start - Cycling between high and low learning rates

Set this to null (default) to use a constant learning rate.

LossFunction

Gets or sets the loss function to use for evaluating model performance.

public ILossFunction<T> LossFunction { get; set; }

Property Value

ILossFunction<T>

Remarks

The loss function measures how well the model's predictions match the actual target values. Different loss functions are appropriate for different types of problems (e.g., regression vs. classification).

For Beginners: The loss function is like a scorecard that tells you how well your model is doing. A higher loss means worse performance, so the optimizer tries to find model parameters that minimize this loss.

MaxGradientNorm

Gets or sets the maximum gradient norm for norm-based clipping.

public double MaxGradientNorm { get; set; }

Property Value

double

Remarks

When using ByNorm, gradients are scaled down if their L2 norm exceeds this value. A typical value is 1.0, but this may need to be tuned for your model.

For Beginners: This is the "speed limit" for the total gradient magnitude. If the gradient vector is longer than this value, it gets scaled down proportionally.

MaxGradientValue

Gets or sets the maximum gradient value for value-based clipping.

public double MaxGradientValue { get; set; }

Property Value

double

Remarks

When using ByValue, each gradient element is clipped to the range [-MaxGradientValue, MaxGradientValue].

For Beginners: This is the "speed limit" for each individual gradient component. Any gradient value larger than this gets capped at this value.

RandomSeed

Gets or sets the random seed for reproducibility.

public int? RandomSeed { get; set; }

Property Value

int?

Remarks

Setting a seed ensures the same random sequence is generated for shuffling and sampling, making experiments reproducible.

For Beginners: Like a recipe, a seed lets you recreate the exact same training run. This is useful for debugging and comparing different model configurations.

Regularization

Gets or sets the regularization method to use for preventing overfitting.

public IRegularization<T, TInput, TOutput> Regularization { get; set; }

Property Value

IRegularization<T, TInput, TOutput>

Remarks

Regularization adds a penalty for complexity to the loss function, which helps prevent the model from overfitting to the training data. Common regularization methods include L1 (Lasso) and L2 (Ridge).

For Beginners: Regularization is like adding a rule that says "keep it simple." It prevents your model from becoming too complex and fitting the training data too perfectly, which can actually hurt performance on new, unseen data.

SchedulerStepMode

Gets or sets when the learning rate scheduler should be stepped.

public SchedulerStepMode SchedulerStepMode { get; set; }

Property Value

SchedulerStepMode

Remarks

- StepPerBatch: Update LR after each mini-batch - StepPerEpoch: Update LR after each epoch (default) - WarmupThenEpoch: Per-batch during warmup, then per-epoch

For Beginners: Most schedulers work best with per-epoch stepping. Use per-batch stepping for warmup schedulers or cyclical learning rates.

ShuffleData

Gets or sets whether to shuffle data at the beginning of each epoch.

public bool ShuffleData { get; set; }

Property Value

bool

Remarks

Shuffling the training data at each epoch helps prevent the model from learning the order of training examples rather than the underlying patterns. This is ignored if a custom DataSampler is provided.

For Beginners: Like shuffling a deck of cards before each deal, this ensures the model sees examples in different orders, which helps it learn better patterns.

Table of Contents

Class GradientBasedOptimizerOptions<T, TInput, TOutput>

Type Parameters

Remarks

Properties

DataSampler

Property Value

Remarks

DropLastBatch

Property Value

Remarks

EnableGradientClipping

Property Value

Remarks

GradientCache

Property Value

Remarks

GradientClippingMethod

Property Value

Remarks

LearningRateScheduler

Property Value

Remarks

LossFunction

Property Value

Remarks

MaxGradientNorm

Property Value

Remarks

MaxGradientValue

Property Value

Remarks

RandomSeed

Property Value

Remarks

Regularization

Property Value

Remarks

SchedulerStepMode

Property Value

Remarks

ShuffleData

Property Value

Remarks