Class ProximalGradientDescentOptimizer<T, TInput, TOutput>
- Namespace
- AiDotNet.Optimizers
- Assembly
- AiDotNet.dll
Implements a Proximal Gradient Descent optimization algorithm which combines gradient descent with regularization.
public class ProximalGradientDescentOptimizer<T, TInput, TOutput> : GradientBasedOptimizerBase<T, TInput, TOutput>, IGradientBasedOptimizer<T, TInput, TOutput>, IOptimizer<T, TInput, TOutput>, IModelSerializer
Type Parameters
TThe numeric type used for calculations, typically float or double.
TInputTOutput
- Inheritance
-
OptimizerBase<T, TInput, TOutput>GradientBasedOptimizerBase<T, TInput, TOutput>ProximalGradientDescentOptimizer<T, TInput, TOutput>
- Implements
-
IGradientBasedOptimizer<T, TInput, TOutput>IOptimizer<T, TInput, TOutput>
- Inherited Members
- Extension Methods
Remarks
Proximal Gradient Descent (PGD) is an extension of standard gradient descent that handles regularization more efficiently. The algorithm alternates between performing a gradient descent step to minimize the loss function and applying a proximal operator to enforce regularization. This approach is particularly effective for problems where regularization is important to prevent overfitting or to enforce specific properties in the solution.
For Beginners: Proximal Gradient Descent is like walking downhill while staying within certain boundaries.
Imagine you're hiking down a mountain to find the lowest point:
- Standard gradient descent is like always walking directly downhill
- Proximal gradient descent adds boundaries or "guardrails" to your path
- These guardrails keep you from wandering into areas that might look good but are actually not helpful
- For example, the guardrails might prevent solutions that are too complex and would overfit the data
This approach helps find solutions that not only fit the data well but also have desirable properties like simplicity or stability.
Constructors
ProximalGradientDescentOptimizer(IFullModel<T, TInput, TOutput>, ProximalGradientDescentOptimizerOptions<T, TInput, TOutput>?, IEngine?)
Initializes a new instance of the ProximalGradientDescentOptimizer<T> class with the specified options and components.
public ProximalGradientDescentOptimizer(IFullModel<T, TInput, TOutput> model, ProximalGradientDescentOptimizerOptions<T, TInput, TOutput>? options = null, IEngine? engine = null)
Parameters
modelIFullModel<T, TInput, TOutput>The model to optimize.
optionsProximalGradientDescentOptimizerOptions<T, TInput, TOutput>The proximal gradient descent optimization options, or null to use default options.
engineIEngineThe computation engine (CPU or GPU) for vectorized operations.
Remarks
This constructor creates a new proximal gradient descent optimizer with the specified options and components. If any parameter is null, a default implementation is used. The constructor initializes the options, regularization strategy, and adaptive parameters.
For Beginners: This is the starting point for creating a new optimizer.
Think of it like setting up equipment for a mountain hike:
- You can provide custom settings (options) or use the default ones
- You can provide specialized tools (evaluators, calculators) or use the basic ones
- You can specify how to enforce boundaries (regularization) or use no boundaries
- It gets everything ready so you can start the optimization process
The options control things like how fast to move, when to stop, and how to adapt during the journey.
Methods
Deserialize(byte[])
Reconstructs the proximal gradient descent optimizer from a serialized byte array.
public override void Deserialize(byte[] data)
Parameters
databyte[]The byte array containing the serialized optimizer.
Remarks
This method overrides the base implementation to handle PGD-specific information during deserialization. It first deserializes the base class data, then reconstructs the PGD options and iteration count.
For Beginners: This method restores the optimizer from a previously saved state.
It's like restoring from a snapshot:
- First, it loads all the general optimizer information
- Then, it loads the PGD-specific settings and state
- It reconstructs the optimizer to the exact state it was in when saved
This allows you to:
- Continue working with an optimizer you previously saved
- Use an optimizer that someone else created and shared
- Revert to a backup if needed
Exceptions
- InvalidOperationException
Thrown when the options cannot be deserialized.
GenerateGradientCacheKey(IFullModel<T, TInput, TOutput>, TInput, TOutput)
Generates a unique key for caching gradients based on the model, input data, and optimizer state.
protected override string GenerateGradientCacheKey(IFullModel<T, TInput, TOutput> model, TInput X, TOutput y)
Parameters
modelIFullModel<T, TInput, TOutput>The model for which the gradient is calculated.
XTInputThe input features matrix.
yTOutputThe target values vector.
Returns
- string
A string key that uniquely identifies this gradient calculation.
Remarks
This method overrides the base implementation to include PGD-specific information in the cache key. It extends the base key with information about the learning rate, regularization type, tolerance, and current iteration. This ensures that gradients are properly cached and retrieved even as the optimizer's state changes.
For Beginners: This method creates a unique identification tag for each gradient calculation.
Think of it like a file naming system:
- It includes information about the model and data being used
- It adds details specific to the PGD optimizer's current state
- This unique tag helps the optimizer avoid redundant calculations
- If the same gradient is needed again, it can be retrieved from cache instead of recalculated
This caching mechanism improves efficiency by avoiding duplicate work.
GetOptions()
Gets the current options for this optimizer.
public override OptimizationAlgorithmOptions<T, TInput, TOutput> GetOptions()
Returns
- OptimizationAlgorithmOptions<T, TInput, TOutput>
The current proximal gradient descent optimization options.
Remarks
This method overrides the base implementation to return the PGD-specific options.
For Beginners: This method returns the current settings of the optimizer.
It's like checking what game settings are currently active:
- You can see the current learning rate settings
- You can see the current tolerance and iteration limits
- You can see all the other parameters that control the optimizer
This is useful for understanding how the optimizer is currently configured or for making a copy of the settings to modify and apply later.
InitializeAdaptiveParameters()
Initializes the adaptive parameters used by the Proximal Gradient Descent algorithm.
protected override void InitializeAdaptiveParameters()
Remarks
This method overrides the base implementation to initialize PGD-specific adaptive parameters. It sets the initial learning rate from the options and resets the iteration counter to zero. The learning rate controls how large each step is during optimization.
For Beginners: This method prepares the optimizer for a fresh start.
It's like a hiker preparing for a new journey:
- Setting their initial step size (learning rate) to a comfortable starting value
- Resetting their step counter to zero
- Getting ready to begin searching for the lowest point
These initial settings help the algorithm start with balanced movements that can be adjusted as it learns more about the landscape.
Optimize(OptimizationInputData<T, TInput, TOutput>)
Performs the proximal gradient descent optimization to find the best solution for the given input data.
public override OptimizationResult<T, TInput, TOutput> Optimize(OptimizationInputData<T, TInput, TOutput> inputData)
Parameters
inputDataOptimizationInputData<T, TInput, TOutput>The input data to optimize against.
Returns
- OptimizationResult<T, TInput, TOutput>
An optimization result containing the best solution found and associated metrics.
Remarks
This method implements the main PGD algorithm. It starts from a random solution and iteratively improves it by calculating the gradient, taking a step in the negative gradient direction, and then applying regularization. The process continues until either the maximum number of iterations is reached, early stopping criteria are met, or the improvement falls below the specified tolerance.
For Beginners: This is the main search process where the algorithm looks for the best solution.
The process works like this:
- Start at a random position on the "hill"
- For each iteration:
- Figure out which direction is most downhill (calculate gradient)
- Take a step in that direction (update solution)
- Apply the guardrails to keep the solution well-behaved (apply regularization)
- Check if the new position is better than the best found so far
- Adjust the step size based on progress
- Stop when enough iterations are done, when no more improvement is happening, or when the improvement is very small
This approach efficiently finds solutions that both fit the data well and satisfy the regularization constraints.
DataLoader Integration: This method uses the DataLoader API for efficient batch processing. It creates a batcher using CreateBatcher(OptimizationInputData<T, TInput, TOutput>, int) and notifies the sampler of epoch starts using NotifyEpochStart(int).
ReverseUpdate(Vector<T>, Vector<T>)
Reverses a Proximal Gradient Descent update to recover original parameters.
public override Vector<T> ReverseUpdate(Vector<T> updatedParameters, Vector<T> appliedGradients)
Parameters
updatedParametersVector<T>Parameters after PGD update
appliedGradientsVector<T>The gradients that were applied
Returns
- Vector<T>
Original parameters before the update
Remarks
PGD applies vanilla gradient descent followed by a proximal operator (regularization). The reverse update undoes the gradient step. Note: The regularization cannot be perfectly reversed since the proximal operator is generally not invertible.
For Beginners: This calculates where parameters were before a PGD update. PGD takes a gradient step then applies regularization. We can reverse the gradient step but the regularization effect remains, since regularization is one-way (like rounding numbers).
Serialize()
Serializes the proximal gradient descent optimizer to a byte array for storage or transmission.
public override byte[] Serialize()
Returns
- byte[]
A byte array containing the serialized optimizer.
Remarks
This method overrides the base implementation to include PGD-specific information in the serialization. It first serializes the base class data, then adds the PGD options and iteration count.
For Beginners: This method saves the current state of the optimizer so it can be restored later.
It's like taking a snapshot of the optimizer:
- First, it saves all the general optimizer information
- Then, it saves the PGD-specific settings and state
- It packages everything into a format that can be saved to a file or sent over a network
This allows you to:
- Save a trained optimizer to use later
- Share an optimizer with others
- Create a backup before making changes
UpdateAdaptiveParameters(OptimizationStepData<T, TInput, TOutput>, OptimizationStepData<T, TInput, TOutput>)
Updates adaptive parameters based on optimization progress.
protected override void UpdateAdaptiveParameters(OptimizationStepData<T, TInput, TOutput> currentStepData, OptimizationStepData<T, TInput, TOutput> previousStepData)
Parameters
currentStepDataOptimizationStepData<T, TInput, TOutput>The data from the current optimization step.
previousStepDataOptimizationStepData<T, TInput, TOutput>The data from the previous optimization step.
Remarks
This method overrides the base implementation to update PGD-specific adaptive parameters in addition to the base adaptive parameters. It adjusts the learning rate based on whether the algorithm is making progress. If there is improvement, the learning rate increases; otherwise, it decreases. The learning rate is kept within specified minimum and maximum limits.
For Beginners: This method adjusts how the algorithm searches based on its progress.
It's like a hiker changing their approach:
- If they're finding better spots, they might take bigger steps to progress more quickly
- If they're not finding improvements, they might take smaller steps to search more carefully
- The step size always stays between minimum and maximum values to avoid extremes
These adaptive adjustments help the algorithm be more efficient by being bold when things are going well and cautious when progress is difficult.
UpdateOptions(OptimizationAlgorithmOptions<T, TInput, TOutput>)
Updates the optimizer's options with the provided options.
protected override void UpdateOptions(OptimizationAlgorithmOptions<T, TInput, TOutput> options)
Parameters
optionsOptimizationAlgorithmOptions<T, TInput, TOutput>The options to apply to this optimizer.
Remarks
This method overrides the base implementation to update the PGD-specific options. It checks that the provided options are of the correct type (ProximalGradientDescentOptimizerOptions) and throws an exception if they are not.
For Beginners: This method updates the settings that control how the optimizer works.
It's like changing the game settings:
- You provide a set of options to use
- The method checks that these are the right kind of options for a PGD optimizer
- If they are, it applies these new settings
- If not, it lets you know there's a problem
This ensures that only appropriate settings are used with this specific optimizer.
Exceptions
- ArgumentException
Thrown when the options are not of the expected type.
UpdateParametersGpu(IGpuBuffer, IGpuBuffer, int, IDirectGpuBackend)
Updates parameters using GPU-accelerated proximal gradient descent.
public override void UpdateParametersGpu(IGpuBuffer parameters, IGpuBuffer gradients, int parameterCount, IDirectGpuBackend backend)
Parameters
parametersIGpuBuffergradientsIGpuBufferparameterCountintbackendIDirectGpuBackend
Remarks
Proximal gradient descent requires applying proximal operators which vary by regularizer type. GPU implementation is not yet available due to the variety of proximal operators (soft-thresholding for L1, projection for constraints, etc.).
UpdateSolution(IFullModel<T, TInput, TOutput>, Vector<T>)
Updates the solution by applying a gradient step followed by regularization.
protected override IFullModel<T, TInput, TOutput> UpdateSolution(IFullModel<T, TInput, TOutput> currentSolution, Vector<T> gradient)
Parameters
currentSolutionIFullModel<T, TInput, TOutput>The current solution to update.
gradientVector<T>The gradient vector indicating the direction of steepest ascent.
Returns
- IFullModel<T, TInput, TOutput>
A new solution after applying the gradient step and regularization.
Remarks
This method performs a two-step update: first, it applies a gradient descent step by moving in the negative gradient direction; then, it applies regularization to enforce desired properties in the solution. The step size is determined by the current learning rate.
For Beginners: This method takes one step down the hill while respecting the guardrails.
The process has two parts:
Take a step downhill:
- Look at the gradient to see which way is most downhill
- Move in that direction by an amount controlled by the learning rate
Apply the guardrails:
- The regularization takes the solution after the gradient step
- It adjusts the solution to make it satisfy the desired properties
- For example, it might reduce any extremely large values
This combination of steps helps find solutions that both minimize the error and have good properties.