Class LBFGSOptimizerOptions<T, TInput, TOutput>

Namespace: AiDotNet.Models.Options

Assembly: AiDotNet.dll

Configuration options for the Limited-memory Broyden-Fletcher-Goldfarb-Shanno (L-BFGS) optimizer, which is an efficient optimization algorithm for training machine learning models.

public class LBFGSOptimizerOptions<T, TInput, TOutput> : GradientBasedOptimizerOptions<T, TInput, TOutput>

Type Parameters

T
TInput
TOutput

Inheritance: object

ModelOptions

OptimizationAlgorithmOptions<T, TInput, TOutput>

GradientBasedOptimizerOptions<T, TInput, TOutput>

LBFGSOptimizerOptions<T, TInput, TOutput>

Inherited Members: GradientBasedOptimizerOptions<T, TInput, TOutput>.GradientCache

GradientBasedOptimizerOptions<T, TInput, TOutput>.LossFunction

GradientBasedOptimizerOptions<T, TInput, TOutput>.Regularization

GradientBasedOptimizerOptions<T, TInput, TOutput>.DataSampler

GradientBasedOptimizerOptions<T, TInput, TOutput>.ShuffleData

GradientBasedOptimizerOptions<T, TInput, TOutput>.DropLastBatch

GradientBasedOptimizerOptions<T, TInput, TOutput>.RandomSeed

GradientBasedOptimizerOptions<T, TInput, TOutput>.EnableGradientClipping

GradientBasedOptimizerOptions<T, TInput, TOutput>.GradientClippingMethod

GradientBasedOptimizerOptions<T, TInput, TOutput>.MaxGradientNorm

GradientBasedOptimizerOptions<T, TInput, TOutput>.MaxGradientValue

GradientBasedOptimizerOptions<T, TInput, TOutput>.LearningRateScheduler

GradientBasedOptimizerOptions<T, TInput, TOutput>.SchedulerStepMode

OptimizationAlgorithmOptions<T, TInput, TOutput>.MaxIterations

OptimizationAlgorithmOptions<T, TInput, TOutput>.UseEarlyStopping

OptimizationAlgorithmOptions<T, TInput, TOutput>.EarlyStoppingPatience

OptimizationAlgorithmOptions<T, TInput, TOutput>.BadFitPatience

OptimizationAlgorithmOptions<T, TInput, TOutput>.MinimumFeatures

OptimizationAlgorithmOptions<T, TInput, TOutput>.MaximumFeatures

OptimizationAlgorithmOptions<T, TInput, TOutput>.UseExpressionTrees

OptimizationAlgorithmOptions<T, TInput, TOutput>.InitialLearningRate

OptimizationAlgorithmOptions<T, TInput, TOutput>.UseAdaptiveLearningRate

OptimizationAlgorithmOptions<T, TInput, TOutput>.LearningRateDecay

OptimizationAlgorithmOptions<T, TInput, TOutput>.MinLearningRate

OptimizationAlgorithmOptions<T, TInput, TOutput>.MaxLearningRate

OptimizationAlgorithmOptions<T, TInput, TOutput>.UseAdaptiveMomentum

OptimizationAlgorithmOptions<T, TInput, TOutput>.InitialMomentum

OptimizationAlgorithmOptions<T, TInput, TOutput>.MomentumIncreaseFactor

OptimizationAlgorithmOptions<T, TInput, TOutput>.MomentumDecreaseFactor

OptimizationAlgorithmOptions<T, TInput, TOutput>.MinMomentum

OptimizationAlgorithmOptions<T, TInput, TOutput>.MaxMomentum

OptimizationAlgorithmOptions<T, TInput, TOutput>.ExplorationRate

OptimizationAlgorithmOptions<T, TInput, TOutput>.MinExplorationRate

OptimizationAlgorithmOptions<T, TInput, TOutput>.MaxExplorationRate

OptimizationAlgorithmOptions<T, TInput, TOutput>.Tolerance

OptimizationAlgorithmOptions<T, TInput, TOutput>.OptimizationMode

OptimizationAlgorithmOptions<T, TInput, TOutput>.ParameterAdjustmentScale

OptimizationAlgorithmOptions<T, TInput, TOutput>.SignFlipProbability

OptimizationAlgorithmOptions<T, TInput, TOutput>.FeatureSelectionProbability

OptimizationAlgorithmOptions<T, TInput, TOutput>.ParameterAdjustmentProbability

OptimizationAlgorithmOptions<T, TInput, TOutput>.PredictionOptions

OptimizationAlgorithmOptions<T, TInput, TOutput>.ModelStatsOptions

OptimizationAlgorithmOptions<T, TInput, TOutput>.ModelEvaluator

OptimizationAlgorithmOptions<T, TInput, TOutput>.FitDetector

OptimizationAlgorithmOptions<T, TInput, TOutput>.FitnessCalculator

OptimizationAlgorithmOptions<T, TInput, TOutput>.ModelCache

OptimizationAlgorithmOptions<T, TInput, TOutput>.CreateDefaults(OptimizerType)

ModelOptions.Seed

object.Equals(object)

object.Equals(object, object)

object.GetHashCode()

object.GetType()

object.MemberwiseClone()

object.ReferenceEquals(object, object)

object.ToString()

Remarks

L-BFGS is a quasi-Newton optimization method that approximates the Broyden-Fletcher-Goldfarb-Shanno (BFGS) algorithm using limited memory. It's particularly effective for optimizing parameters in models with many parameters, as it doesn't need to store the full Hessian matrix. This makes it more memory-efficient than full BFGS while still providing good convergence properties.

For Beginners: L-BFGS is an advanced optimization algorithm that helps train machine learning models more efficiently than simpler methods like gradient descent.

Think of training a machine learning model as finding the lowest point in a hilly landscape, where the lowest point represents the best model parameters. While basic algorithms like gradient descent simply follow the steepest downhill path, L-BFGS is smarter:

It remembers information about previous steps to make better decisions about where to go next
It can take larger steps when appropriate, potentially finding the lowest point faster
It requires less memory than some other advanced methods, making it practical for larger models

L-BFGS is particularly useful when:

You have many parameters to optimize (complex models)
You need faster convergence than gradient descent provides
You have limited memory resources compared to what full second-order methods would require

This class lets you configure how L-BFGS behaves during training, including how much history it remembers and how it adjusts its learning rate.

Properties

BatchSize

Gets or sets the batch size for gradient computation.

public int BatchSize { get; set; }

Property Value

int: A positive integer, defaulting to -1 (full batch).

Remarks

For Beginners: The batch size controls how many examples are used to calculate gradients. L-BFGS traditionally uses full-batch gradients (batch size -1) because it maintains a history of gradient and position differences that require consistent gradients between iterations. Using mini-batches would introduce noise that disrupts the two-loop recursion.

InitialLearningRate

Gets or sets the initial learning rate for the L-BFGS algorithm, which controls the initial step size during optimization.

public override double InitialLearningRate { get; set; }

Property Value

double: The initial learning rate, defaulting to 1.0.

Remarks

The initial learning rate determines the step size at the beginning of the optimization process. Unlike standard gradient descent, L-BFGS can adjust this rate during optimization based on the curvature information it approximates. This property overrides the base class implementation to provide a more suitable default for L-BFGS.

For Beginners: The initial learning rate determines how big your first steps are when searching for the best model parameters.

Think of it like adjusting your initial step size when walking downhill:

Too small (e.g., 0.01): You'll move very cautiously but might take a long time to reach the bottom
Too large (e.g., 10.0): You might move quickly but risk overshooting the lowest point

L-BFGS is special because it can adjust this step size automatically as it goes, but the initial value still matters. The default of 1.0 is generally a good starting point for L-BFGS, which is higher than typical values for simpler algorithms like gradient descent (which might use 0.01-0.1).

Note: This property uses the "new" keyword because it overrides the base class property with a different default value that's more appropriate for L-BFGS.

LearningRateDecreaseFactor

Gets or sets the factor by which the learning rate is decreased when the algorithm encounters difficulties or needs to take more careful steps.

public double LearningRateDecreaseFactor { get; set; }

Property Value

double: The learning rate decrease factor, defaulting to 0.95.

Remarks

During optimization, if the algorithm encounters challenges such as increasing error or difficult terrain in the optimization landscape, it may decrease the learning rate to take more careful steps. This parameter controls how quickly the learning rate is reduced, with values less than 1.0 representing the multiplicative factor applied to the current learning rate.

For Beginners: This setting controls how quickly the algorithm reduces its step size when it encounters difficulties.

When L-BFGS isn't making good progress or finds itself in a tricky part of the optimization landscape, it might decide to take smaller, more careful steps. This parameter determines how much it decreases the step size each time:

With the default value of 0.95, the step size decreases by 5% each time
A value of 0.9 would decrease the step size by 10% each time
A value of 0.99 would decrease the step size by just 1% each time

Lower values make the algorithm more cautious when it encounters problems, quickly reducing step size to navigate difficult areas. The default value of 0.95 provides a moderate decrease that works well in most situations, allowing the algorithm to adapt without becoming too timid.

This parameter works together with LearningRateIncreaseFactor to help the algorithm adapt its step size throughout the optimization process.

LearningRateIncreaseFactor

Gets or sets the factor by which the learning rate is increased when the algorithm determines that larger steps would be beneficial.

public double LearningRateIncreaseFactor { get; set; }

Property Value

double: The learning rate increase factor, defaulting to 1.05.

Remarks

During optimization, if the algorithm determines that progress is being made consistently, it may increase the learning rate to accelerate convergence. This parameter controls how aggressively the learning rate is increased, with values greater than 1.0 representing the multiplicative factor applied to the current learning rate.

For Beginners: This setting controls how quickly the algorithm increases its step size when it's making good progress.

When L-BFGS is moving in a promising direction and making consistent progress, it might decide to increase its step size to get to the solution faster. This parameter determines how much it increases the step size each time:

With the default value of 1.05, the step size increases by 5% each time
A value of 1.1 would increase the step size by 10% each time
A value of 1.01 would increase the step size by just 1% each time

Higher values make the algorithm more aggressive in speeding up when things are going well, but might also make it more likely to overshoot. The default value of 1.05 provides a moderate increase that works well in most situations.

MaxIterations

Gets or sets the maximum number of iterations the L-BFGS algorithm will perform before stopping.

public int MaxIterations { get; set; }

Property Value

int: The maximum number of iterations, defaulting to 1000.

Remarks

This parameter sets an upper limit on the number of optimization steps the algorithm will take. Even if other stopping criteria (such as convergence thresholds) have not been met, the algorithm will terminate after this many iterations. This property overrides the base class implementation to provide a more suitable default for L-BFGS.

For Beginners: This setting determines the maximum number of steps the algorithm will take before giving up, even if it hasn't found the optimal solution yet.

Think of this as a safety limit to prevent the algorithm from running forever. The L-BFGS algorithm will stop when either:

It finds a solution that's good enough (based on other stopping criteria), or
It reaches this maximum number of iterations

The default value of 1000 is typically sufficient for many problems. Consider:

Increasing it (e.g., to 2000 or 5000) for complex problems where the algorithm might need more steps to converge to a good solution
Decreasing it (e.g., to 500 or less) if you need faster results and can accept a less optimal solution, or if you're just testing the algorithm

L-BFGS typically converges faster than simpler methods like gradient descent, so it often needs fewer iterations to reach a good solution. However, the exact number needed depends greatly on your specific problem.

Note: This property uses the "new" keyword because it overrides the base class property with a different default value that's more appropriate for L-BFGS.

MaxLearningRate

Gets or sets the maximum learning rate allowed during optimization, preventing the learning rate from becoming too large.

public double MaxLearningRate { get; set; }

Property Value

double: The maximum learning rate, defaulting to 10.0.

Remarks

This parameter sets an upper bound on the learning rate during optimization. If the adaptive learning rate mechanism attempts to increase the learning rate above this value, it will be clamped to this maximum. This helps prevent the algorithm from taking steps that are too large, which could cause instability or divergence. This property overrides the base class implementation to provide a more suitable default for L-BFGS.

For Beginners: This setting prevents the algorithm from taking steps that are too large, which could cause it to miss the optimal solution.

As L-BFGS adjusts its step size during training, it might sometimes decide to take very large steps. This parameter sets a maximum size - if the algorithm wants to take an even larger step, it will use this maximum value instead.

The default value of 10.0 allows the algorithm to take fairly large steps when appropriate, but not so large that it risks completely overshooting good solutions.

You typically won't need to change this unless:

You notice the algorithm is making wild jumps and not converging well (might want to decrease it)
You're working with a function that has very flat regions that require large steps (might want to increase it)

Note: This property uses the "new" keyword because it overrides the base class property with a different default value that's more appropriate for L-BFGS.

MemorySize

Gets or sets the memory size, which determines how many previous iterations' information the L-BFGS algorithm stores to approximate the Hessian matrix.

public int MemorySize { get; set; }

Property Value

int: The memory size, defaulting to 10.

Remarks

The memory size parameter controls how many previous iterations' gradient information is stored to approximate the inverse Hessian matrix. A larger memory size can lead to better approximations but requires more memory and computational resources per iteration.

For Beginners: This setting controls how much "history" the L-BFGS algorithm remembers when deciding where to go next.

Imagine you're hiking down a mountain to find the lowest point:

With a small memory size (like 3-5), you only remember your most recent few steps to decide where to go next
With a larger memory size (like 10-20), you remember more of your journey, which might help you make better decisions, but requires more mental effort

The default value of 10 works well for many problems. Consider:

Increasing it (15-20) if you have plenty of memory and the algorithm seems to be converging slowly
Decreasing it (5-8) if you're working with very large models and memory is a concern

Generally, values between 3 and 20 are common, with diminishing returns as you increase beyond that.

MinLearningRate

Gets or sets the minimum learning rate allowed during optimization, preventing the learning rate from becoming too small.

public double MinLearningRate { get; set; }

Property Value

double: The minimum learning rate, defaulting to 1e-6 (0.000001).

Remarks

This parameter sets a lower bound on the learning rate during optimization. If the adaptive learning rate mechanism attempts to reduce the learning rate below this value, it will be clamped to this minimum. This helps prevent the algorithm from taking steps that are too small to make meaningful progress. This property overrides the base class implementation to provide a more suitable default for L-BFGS.

For Beginners: This setting prevents the algorithm from taking steps that are too tiny to make meaningful progress.

As L-BFGS adjusts its step size during training, it might sometimes decide to take very small steps. This parameter sets a minimum size - if the algorithm wants to take an even smaller step, it will use this minimum value instead.

The default value of 0.000001 (written as 1e-6 in scientific notation) is very small, allowing the algorithm to take tiny steps when appropriate, but not so small that they become ineffective.

You typically won't need to change this unless:

You notice the algorithm is making extremely slow progress in the later stages of training (might want to increase it)
You're working with a function that requires extremely precise optimization (might want to decrease it)

Note: This property uses the "new" keyword because it overrides the base class property with a different default value that's more appropriate for L-BFGS.

Table of Contents

Class LBFGSOptimizerOptions<T, TInput, TOutput>

Type Parameters

Remarks

Properties

BatchSize

Property Value

Remarks

InitialLearningRate

Property Value

Remarks

LearningRateDecreaseFactor

Property Value

Remarks

LearningRateIncreaseFactor

Property Value

Remarks

MaxIterations

Property Value

Remarks

MaxLearningRate

Property Value

Remarks

MemorySize

Property Value

Remarks

MinLearningRate

Property Value

Remarks