Class ProximalGradientDescentOptimizerOptions<T, TInput, TOutput>

Namespace: AiDotNet.Models.Options

Assembly: AiDotNet.dll

Configuration options for the Proximal Gradient Descent optimizer, an advanced optimization algorithm that combines traditional gradient descent with proximal operators to handle regularization effectively.

public class ProximalGradientDescentOptimizerOptions<T, TInput, TOutput> : GradientBasedOptimizerOptions<T, TInput, TOutput>

Type Parameters

T
TInput
TOutput

Inheritance: object

ModelOptions

OptimizationAlgorithmOptions<T, TInput, TOutput>

GradientBasedOptimizerOptions<T, TInput, TOutput>

ProximalGradientDescentOptimizerOptions<T, TInput, TOutput>

Inherited Members: GradientBasedOptimizerOptions<T, TInput, TOutput>.GradientCache

GradientBasedOptimizerOptions<T, TInput, TOutput>.LossFunction

GradientBasedOptimizerOptions<T, TInput, TOutput>.Regularization

GradientBasedOptimizerOptions<T, TInput, TOutput>.DataSampler

GradientBasedOptimizerOptions<T, TInput, TOutput>.ShuffleData

GradientBasedOptimizerOptions<T, TInput, TOutput>.DropLastBatch

GradientBasedOptimizerOptions<T, TInput, TOutput>.RandomSeed

GradientBasedOptimizerOptions<T, TInput, TOutput>.EnableGradientClipping

GradientBasedOptimizerOptions<T, TInput, TOutput>.GradientClippingMethod

GradientBasedOptimizerOptions<T, TInput, TOutput>.MaxGradientNorm

GradientBasedOptimizerOptions<T, TInput, TOutput>.MaxGradientValue

GradientBasedOptimizerOptions<T, TInput, TOutput>.LearningRateScheduler

GradientBasedOptimizerOptions<T, TInput, TOutput>.SchedulerStepMode

OptimizationAlgorithmOptions<T, TInput, TOutput>.MaxIterations

OptimizationAlgorithmOptions<T, TInput, TOutput>.UseEarlyStopping

OptimizationAlgorithmOptions<T, TInput, TOutput>.EarlyStoppingPatience

OptimizationAlgorithmOptions<T, TInput, TOutput>.BadFitPatience

OptimizationAlgorithmOptions<T, TInput, TOutput>.MinimumFeatures

OptimizationAlgorithmOptions<T, TInput, TOutput>.MaximumFeatures

OptimizationAlgorithmOptions<T, TInput, TOutput>.UseExpressionTrees

OptimizationAlgorithmOptions<T, TInput, TOutput>.InitialLearningRate

OptimizationAlgorithmOptions<T, TInput, TOutput>.UseAdaptiveLearningRate

OptimizationAlgorithmOptions<T, TInput, TOutput>.LearningRateDecay

OptimizationAlgorithmOptions<T, TInput, TOutput>.MinLearningRate

OptimizationAlgorithmOptions<T, TInput, TOutput>.MaxLearningRate

OptimizationAlgorithmOptions<T, TInput, TOutput>.UseAdaptiveMomentum

OptimizationAlgorithmOptions<T, TInput, TOutput>.InitialMomentum

OptimizationAlgorithmOptions<T, TInput, TOutput>.MomentumIncreaseFactor

OptimizationAlgorithmOptions<T, TInput, TOutput>.MomentumDecreaseFactor

OptimizationAlgorithmOptions<T, TInput, TOutput>.MinMomentum

OptimizationAlgorithmOptions<T, TInput, TOutput>.MaxMomentum

OptimizationAlgorithmOptions<T, TInput, TOutput>.ExplorationRate

OptimizationAlgorithmOptions<T, TInput, TOutput>.MinExplorationRate

OptimizationAlgorithmOptions<T, TInput, TOutput>.MaxExplorationRate

OptimizationAlgorithmOptions<T, TInput, TOutput>.Tolerance

OptimizationAlgorithmOptions<T, TInput, TOutput>.OptimizationMode

OptimizationAlgorithmOptions<T, TInput, TOutput>.ParameterAdjustmentScale

OptimizationAlgorithmOptions<T, TInput, TOutput>.SignFlipProbability

OptimizationAlgorithmOptions<T, TInput, TOutput>.FeatureSelectionProbability

OptimizationAlgorithmOptions<T, TInput, TOutput>.ParameterAdjustmentProbability

OptimizationAlgorithmOptions<T, TInput, TOutput>.PredictionOptions

OptimizationAlgorithmOptions<T, TInput, TOutput>.ModelStatsOptions

OptimizationAlgorithmOptions<T, TInput, TOutput>.ModelEvaluator

OptimizationAlgorithmOptions<T, TInput, TOutput>.FitDetector

OptimizationAlgorithmOptions<T, TInput, TOutput>.FitnessCalculator

OptimizationAlgorithmOptions<T, TInput, TOutput>.ModelCache

OptimizationAlgorithmOptions<T, TInput, TOutput>.CreateDefaults(OptimizerType)

ModelOptions.Seed

object.Equals(object)

object.Equals(object, object)

object.GetHashCode()

object.GetType()

object.MemberwiseClone()

object.ReferenceEquals(object, object)

object.ToString()

Remarks

Proximal Gradient Descent is an extension of standard gradient descent that is particularly effective for solving optimization problems with regularization terms. It alternates between standard gradient steps on the smooth part of the objective function and proximal operations on the non-smooth regularization terms. This approach is especially valuable for problems involving L1 regularization (which promotes sparsity) or other complex regularization schemes that are difficult to optimize with standard gradient methods. The proximal approach helps maintain desirable properties of the regularization while ensuring stable convergence. It is widely used in machine learning for training models where specific structural properties (like sparsity, group structure, or low rank) are desired in the solution.

For Beginners: Proximal Gradient Descent is a specialized optimization method that helps train machine learning models with regularization.

Imagine you're trying to find the lowest point in a hilly landscape while also staying within certain boundaries:

Regular gradient descent is like always walking directly downhill
But sometimes this approach can lead you to areas that are too complex or "overfit" to your training data
Regularization adds "penalty zones" to discourage overly complex solutions
Proximal gradient descent helps navigate these penalty zones effectively

What this optimizer does:

Takes a step in the direction that reduces prediction error (like regular gradient descent)
Then takes a "proximal step" that handles the regularization penalties separately
By splitting the process this way, it can find solutions that balance accuracy and simplicity

Think of it like training a dog:

The gradient step teaches the dog to complete a task correctly
The proximal step ensures the dog doesn't develop bad habits along the way
Together, they produce well-behaved, effective results

This approach is particularly useful when you want your model to:

Use only a subset of available features (sparsity)
Group related features together
Avoid extreme parameter values

This class lets you configure how this specialized optimization process works.

Properties

BatchSize

Gets or sets the batch size for mini-batch gradient descent.

public int BatchSize { get; set; }

Property Value

int: A positive integer, defaulting to 32.

Remarks

For Beginners: The batch size controls how many examples the optimizer looks at before making an update to the model. The default of 32 is a good balance for proximal gradient descent.

InnerIterations

Gets or sets the number of inner iterations for each main optimization iteration.

public int InnerIterations { get; set; }

Property Value

int: The number of inner iterations, defaulting to 10.

Remarks

This parameter specifies the number of proximal gradient steps to perform within each outer iteration of the optimization algorithm. In proximal methods, it's common to have an inner loop that refines the proximal update before proceeding to the next main iteration. More inner iterations can lead to more accurate proximal updates at the cost of increased computation time. The appropriate value depends on the complexity of the regularization term and the desired accuracy of the proximal mapping. For simple regularization schemes like L1 or L2, fewer inner iterations may be sufficient, while more complex regularization might benefit from additional inner refinement.

For Beginners: This setting controls how many mini-steps the algorithm takes for each main optimization step.

The default value of 10 means:

For each main iteration, the algorithm performs 10 smaller refinement steps
These refinement steps help ensure the regularization is properly applied

Think of it like polishing a surface:

The main algorithm makes large changes to get the general shape right
Then these inner iterations carefully refine and polish the result
More inner iterations mean more careful polishing

You might want more inner iterations (like 20 or 50):

When using complex regularization that requires careful handling
When high precision is important in your final model
When you notice the optimization isn't converging well with fewer iterations
When you have the computational resources to spare

You might want fewer inner iterations (like 5 or 3):

When using simple regularization schemes like basic L1 or L2
When computational efficiency is a priority
When you're in early experimental phases and need quick results
When you find that additional inner iterations don't improve results

More inner iterations typically mean better quality results but longer training times.

LearningRateDecreaseFactor

Gets or sets the multiplicative factor for decreasing the learning rate when progress stalls or reverses.

public double LearningRateDecreaseFactor { get; set; }

Property Value

double: The learning rate decrease factor, defaulting to 0.95.

Remarks

This parameter controls how quickly the learning rate decreases when the optimization encounters difficulties or appears to be overshooting. In adaptive optimization schemes, the learning rate is typically decreased when an iteration fails to improve the objective function or when other indicators suggest the current step size is too large. A factor of 0.95 means the learning rate decreases by 5% when conditions warrant a decrease. Lower values (further from 1.0) lead to more aggressive deceleration when problems are encountered, which can help stabilize the optimization but may slow convergence. The optimal value depends on the optimization landscape and the balance needed between convergence speed and stability.

For Beginners: This setting controls how quickly the algorithm slows down when it encounters problems.

The default value of 0.95 means:

When the algorithm takes a step that doesn't improve the solution
It will reduce its step size by 5% to be more careful
This helps prevent overshooting or bouncing around without progress

Think of it like navigating a tricky path:

When you encounter obstacles or start to lose your way, you slow down
This value determines how much you slow down when things get difficult
A value of 0.95 represents a moderate slowdown (5% decrease)

You might want a lower value (like 0.8 or 0.9):

When you want more drastic slowdowns after bad steps
When the optimization landscape has sharp valleys or discontinuities
When stability is much more important than speed
When you've observed the algorithm overshooting repeatedly

You might want a higher value (like 0.98 or 0.99):

When you want only minor slowdowns after bad steps
When you're concerned about convergence becoming too slow
When occasional bad steps are expected and shouldn't trigger major adjustments
When the optimization landscape is relatively smooth

This parameter works in tandem with LearningRateIncreaseFactor to adaptively adjust the optimization speed based on progress.

LearningRateIncreaseFactor

Gets or sets the multiplicative factor for increasing the learning rate during adaptive optimization.

public double LearningRateIncreaseFactor { get; set; }

Property Value

double: The learning rate increase factor, defaulting to 1.05.

Remarks

This parameter controls how aggressively the learning rate increases when the optimization is making good progress. In adaptive optimization schemes, the learning rate may be increased when successive iterations show consistent improvement. A factor of 1.05 means the learning rate increases by 5% when conditions warrant an increase. Higher values lead to more aggressive acceleration when the optimization is progressing well, potentially speeding up convergence. However, too large a value can cause instability by increasing the learning rate too quickly. The optimal value depends on the optimization landscape and the balance needed between convergence speed and stability.

For Beginners: This setting controls how much the algorithm speeds up when it's making good progress.

The default value of 1.05 means:

When the algorithm is successfully reducing the error
It will increase its step size by 5% to move faster
This allows it to accelerate when it's on a promising path

Think of it like adjusting your walking speed:

When you're confident you're heading in the right direction, you walk a bit faster
This value determines how much faster you go when things are working well
A value of 1.05 represents a cautious acceleration (5% increase)

You might want a higher value (like 1.1 or 1.2):

When you want more aggressive acceleration
When the optimization landscape is smooth and well-behaved
When faster convergence is a priority and some instability is acceptable
When you've noticed that optimization progress is consistently positive

You might want a lower value (like 1.02 or 1.01):

When you've observed instability in the optimization process
When working with complex, ill-conditioned problems
When very stable, predictable convergence is more important than speed
When small parameter changes can dramatically affect model performance

This parameter works in tandem with LearningRateDecreaseFactor to adaptively adjust the optimization speed based on progress.

MaxIterations

Gets or sets the maximum number of iterations for the optimization process.

public int MaxIterations { get; set; }

Property Value

int: The maximum number of iterations, defaulting to 1000.

Remarks

This parameter determines the maximum number of outer iterations the optimization algorithm will perform. It serves as a hard limit to prevent excessive computation time in cases where convergence is slow or not achieved. Each iteration involves a gradient step on the smooth part of the objective function followed by a proximal operation on the regularization term, potentially with multiple inner iterations. The appropriate value depends on the complexity of the optimization problem, the desired precision, and the available computational resources. Note that this property hides (shadows) the MaxIterations property inherited from the base GradientBasedOptimizerOptions class, potentially allowing for different default values or validation logic specific to proximal gradient descent.

For Beginners: This setting controls the maximum number of major steps the algorithm will take before stopping.

The default value of 1000 means:

The algorithm will take at most 1000 main optimization steps
It will stop earlier if it reaches convergence (finds a good solution)
But it won't continue beyond 1000 steps even if not fully converged

Think of it like setting a maximum travel time for a journey:

Ideally, you reach your destination before the time limit
But if the journey is taking too long, you stop when you hit the limit
This prevents the algorithm from running indefinitely on difficult problems

You might want more iterations (like 5000 or 10000):

For complex problems that need more time to converge
When you prioritize finding the best possible solution over speed
When early experiments show that 1000 iterations is insufficient
When you have the computational resources to spare

You might want fewer iterations (like 500 or 100):

When quick approximate solutions are preferred over perfect ones
For simpler problems that converge quickly
When running many experimental models where time is limited
When you find that the model converges well before reaching the limit

Note: This setting overrides the MaxIterations from the parent class to provide a default specifically calibrated for proximal gradient descent.

ProximalStepSize

Gets or sets the step size for the proximal operator component of the algorithm.

public double ProximalStepSize { get; set; }

Property Value

double: The proximal step size, defaulting to 0.1.

Remarks

This parameter controls the size of the step taken during the proximal update phase of the algorithm. The proximal step specifically handles the regularization term, separate from the gradient step that addresses the smooth part of the objective function. A larger proximal step size makes the algorithm more aggressive in enforcing regularization constraints, while a smaller value makes it more conservative. The optimal value depends on the regularization type and strength, as well as the overall optimization landscape. In many cases, this parameter interacts with the RegularizationStrength and may need to be tuned accordingly. Too large a value can cause instability, while too small a value can slow convergence.

For Beginners: This setting controls how aggressively the algorithm enforces regularization in each iteration.

The default value of 0.1 means:

The algorithm takes moderate-sized steps when applying regularization
This provides a balance between rapid convergence and stability

Think of the proximal step as the "correction" after the main step:

First, the algorithm takes a step to reduce prediction errors
Then, it takes a proximal step to enforce simplicity (regularization)
This setting controls the size of that second, corrective step

You might want a larger value (like 0.3 or 0.5):

When you want regularization effects to be applied more quickly
When the regularization term is well-behaved and unlikely to cause instability
When faster convergence is a priority

You might want a smaller value (like 0.05 or 0.01):

When you notice instability in the optimization process
When using strong regularization that could cause large parameter changes
When you prefer more gradual, stable convergence
When working with particularly complex or ill-conditioned problems

This parameter often needs to be adjusted in coordination with RegularizationStrength to achieve optimal results.

RegularizationStrength

Gets or sets the strength of the regularization term in the objective function.

public double RegularizationStrength { get; set; }

Property Value

double: The regularization strength, defaulting to 0.01.

Remarks

This parameter controls the weight of the regularization term relative to the main loss function. Higher values increase the influence of regularization, promoting simpler models (e.g., sparser weights with L1 regularization or smaller weights with L2 regularization). Lower values reduce the regularization effect, allowing the model to focus more on minimizing the loss function. The optimal value depends on the specific problem, data characteristics, and the type of regularization being used. This parameter is often one of the most important hyperparameters to tune, as it directly controls the trade-off between fitting the training data and maintaining model simplicity.

For Beginners: This setting controls how strongly the regularization penalties affect your model.

The default value of 0.01 means:

The regularization has a moderate influence on the model
There's a balance between minimizing errors and keeping the model simple

Think of regularization like a budget constraint:

Your model wants to "spend" parameter values to fit the data perfectly
Regularization sets a "budget" that limits this spending
Higher RegularizationStrength means a tighter budget (simpler model)
Lower RegularizationStrength means a looser budget (potentially more complex model)

You might want a higher value (like 0.1 or 1.0):

When you suspect your model is overfitting
When you have limited training data
When you want to encourage sparse solutions (with L1 regularization)
When you want smaller parameter values overall (with L2 regularization)

You might want a lower value (like 0.001 or 0.0001):

When you have abundant training data
When underfitting is more of a concern than overfitting
When you want the model to focus more on minimizing training error
When you have complex patterns that require more expressive models

Finding the right regularization strength often requires experimentation with different values.

Table of Contents

Class ProximalGradientDescentOptimizerOptions<T, TInput, TOutput>

Type Parameters

Remarks

Properties

BatchSize

Property Value

Remarks

InnerIterations

Property Value

Remarks

LearningRateDecreaseFactor

Property Value

Remarks

LearningRateIncreaseFactor

Property Value

Remarks

MaxIterations

Property Value

Remarks

ProximalStepSize

Property Value

Remarks

RegularizationStrength

Property Value

Remarks