Class GradientBasedOptimizerOptions<T, TInput, TOutput>
Configuration options for gradient-based optimization algorithms.
public class GradientBasedOptimizerOptions<T, TInput, TOutput> : OptimizationAlgorithmOptions<T, TInput, TOutput>
Type Parameters
TTInputTOutput
- Inheritance
-
OptimizationAlgorithmOptions<T, TInput, TOutput>GradientBasedOptimizerOptions<T, TInput, TOutput>
- Derived
- Inherited Members
Remarks
Gradient-based optimizers are algorithms that find the minimum or maximum of a function by following the direction of steepest descent or ascent (the gradient).
For Beginners: Imagine you're in a hilly landscape and want to find the lowest point. Gradient-based optimization is like always walking downhill in the steepest direction until you can't go any lower. The "gradient" is simply the direction of the steepest slope at your current position.
These algorithms are fundamental to training many machine learning models, including neural networks, linear regression, and logistic regression.
This class inherits from OptimizationAlgorithmOptions, which means it includes all the base configuration options for optimization algorithms plus any additional options specific to gradient-based methods.
Properties
DataSampler
Gets or sets the optional data sampler for advanced sampling strategies during batch creation.
public IDataSampler? DataSampler { get; set; }
Property Value
Remarks
A data sampler controls how training examples are selected and ordered during batch creation. This enables advanced sampling strategies like: - Weighted sampling for class imbalance - Stratified sampling to maintain class proportions - Curriculum learning to start with easy examples - Importance sampling to focus on high-loss examples - Active learning to prioritize uncertain examples
For Beginners: Think of this as choosing which examples to show the model and in what order. If you have more examples of cats than dogs, weighted sampling can help the model see dogs more often. Curriculum learning shows easy examples first, like learning to walk before running.
Example:
// Balanced sampling for imbalanced classes
options.DataSampler = Samplers.Balanced(labels, numClasses: 2);
// Curriculum learning (easy to hard)
options.DataSampler = Samplers.Curriculum(difficulties);
DropLastBatch
Gets or sets whether to drop the last incomplete batch.
public bool DropLastBatch { get; set; }
Property Value
Remarks
When the training data size is not evenly divisible by the batch size, the last batch will be smaller. Setting this to true discards that incomplete batch.
For Beginners: If you have 100 examples and a batch size of 32, you'll have 3 full batches (96 examples) and 1 partial batch (4 examples). Setting DropLastBatch=true discards that partial batch, which can help with training stability.
EnableGradientClipping
Gets or sets whether gradient clipping is enabled.
public bool EnableGradientClipping { get; set; }
Property Value
Remarks
Gradient clipping helps prevent exploding gradients during training by limiting the magnitude of gradients. This is particularly important for deep networks and recurrent neural networks.
For Beginners: Sometimes during training, gradients can become extremely large, causing the model to take huge steps that destabilize learning. Gradient clipping is like putting a speed limit on these updates to keep training stable.
GradientCache
Gets or sets the gradient cache to use for storing and retrieving computed gradients.
public IGradientCache<T> GradientCache { get; set; }
Property Value
Remarks
The gradient cache helps avoid redundant gradient calculations by storing previously computed gradients. This can significantly improve performance, especially when the same model is evaluated multiple times.
For Beginners: Think of this as a memory that stores calculations you've already done. Instead of recalculating the same gradient multiple times, the optimizer can look it up in this cache, saving computational resources.
GradientClippingMethod
Gets or sets the gradient clipping method to use.
public GradientClippingMethod GradientClippingMethod { get; set; }
Property Value
Remarks
Two main methods are available: - ByNorm: Scales the entire gradient vector if its norm exceeds a threshold (recommended) - ByValue: Clips each gradient element independently to a range
For Beginners: ClipByNorm is generally preferred because it preserves the direction of the gradient while only reducing its magnitude. ClipByValue is simpler but can change the gradient direction.
LearningRateScheduler
Gets or sets the learning rate scheduler to use during training.
public ILearningRateScheduler? LearningRateScheduler { get; set; }
Property Value
Remarks
Learning rate schedulers dynamically adjust the learning rate during training, which can significantly improve convergence and final model performance.
For Beginners: A learning rate scheduler automatically adjusts how fast your model learns during training. Common strategies include: - Starting with a higher learning rate and gradually decreasing it - Using warmup to slowly increase the learning rate at the start - Cycling between high and low learning rates
Set this to null (default) to use a constant learning rate.
LossFunction
Gets or sets the loss function to use for evaluating model performance.
public ILossFunction<T> LossFunction { get; set; }
Property Value
Remarks
The loss function measures how well the model's predictions match the actual target values. Different loss functions are appropriate for different types of problems (e.g., regression vs. classification).
For Beginners: The loss function is like a scorecard that tells you how well your model is doing. A higher loss means worse performance, so the optimizer tries to find model parameters that minimize this loss.
MaxGradientNorm
Gets or sets the maximum gradient norm for norm-based clipping.
public double MaxGradientNorm { get; set; }
Property Value
Remarks
When using ByNorm, gradients are scaled down if their L2 norm exceeds this value. A typical value is 1.0, but this may need to be tuned for your model.
For Beginners: This is the "speed limit" for the total gradient magnitude. If the gradient vector is longer than this value, it gets scaled down proportionally.
MaxGradientValue
Gets or sets the maximum gradient value for value-based clipping.
public double MaxGradientValue { get; set; }
Property Value
Remarks
When using ByValue, each gradient element is clipped to the range [-MaxGradientValue, MaxGradientValue].
For Beginners: This is the "speed limit" for each individual gradient component. Any gradient value larger than this gets capped at this value.
RandomSeed
Gets or sets the random seed for reproducibility.
public int? RandomSeed { get; set; }
Property Value
- int?
Remarks
Setting a seed ensures the same random sequence is generated for shuffling and sampling, making experiments reproducible.
For Beginners: Like a recipe, a seed lets you recreate the exact same training run. This is useful for debugging and comparing different model configurations.
Regularization
Gets or sets the regularization method to use for preventing overfitting.
public IRegularization<T, TInput, TOutput> Regularization { get; set; }
Property Value
- IRegularization<T, TInput, TOutput>
Remarks
Regularization adds a penalty for complexity to the loss function, which helps prevent the model from overfitting to the training data. Common regularization methods include L1 (Lasso) and L2 (Ridge).
For Beginners: Regularization is like adding a rule that says "keep it simple." It prevents your model from becoming too complex and fitting the training data too perfectly, which can actually hurt performance on new, unseen data.
SchedulerStepMode
Gets or sets when the learning rate scheduler should be stepped.
public SchedulerStepMode SchedulerStepMode { get; set; }
Property Value
Remarks
- StepPerBatch: Update LR after each mini-batch - StepPerEpoch: Update LR after each epoch (default) - WarmupThenEpoch: Per-batch during warmup, then per-epoch
For Beginners: Most schedulers work best with per-epoch stepping. Use per-batch stepping for warmup schedulers or cyclical learning rates.
ShuffleData
Gets or sets whether to shuffle data at the beginning of each epoch.
public bool ShuffleData { get; set; }
Property Value
Remarks
Shuffling the training data at each epoch helps prevent the model from learning the order of training examples rather than the underlying patterns. This is ignored if a custom DataSampler is provided.
For Beginners: Like shuffling a deck of cards before each deal, this ensures the model sees examples in different orders, which helps it learn better patterns.