Table of Contents

Class ProximalGradientDescentOptimizer<T, TInput, TOutput>

Namespace
AiDotNet.Optimizers
Assembly
AiDotNet.dll

Implements a Proximal Gradient Descent optimization algorithm which combines gradient descent with regularization.

public class ProximalGradientDescentOptimizer<T, TInput, TOutput> : GradientBasedOptimizerBase<T, TInput, TOutput>, IGradientBasedOptimizer<T, TInput, TOutput>, IOptimizer<T, TInput, TOutput>, IModelSerializer

Type Parameters

T

The numeric type used for calculations, typically float or double.

TInput
TOutput
Inheritance
OptimizerBase<T, TInput, TOutput>
GradientBasedOptimizerBase<T, TInput, TOutput>
ProximalGradientDescentOptimizer<T, TInput, TOutput>
Implements
IGradientBasedOptimizer<T, TInput, TOutput>
IOptimizer<T, TInput, TOutput>
Inherited Members
Extension Methods

Remarks

Proximal Gradient Descent (PGD) is an extension of standard gradient descent that handles regularization more efficiently. The algorithm alternates between performing a gradient descent step to minimize the loss function and applying a proximal operator to enforce regularization. This approach is particularly effective for problems where regularization is important to prevent overfitting or to enforce specific properties in the solution.

For Beginners: Proximal Gradient Descent is like walking downhill while staying within certain boundaries.

Imagine you're hiking down a mountain to find the lowest point:

  • Standard gradient descent is like always walking directly downhill
  • Proximal gradient descent adds boundaries or "guardrails" to your path
  • These guardrails keep you from wandering into areas that might look good but are actually not helpful
  • For example, the guardrails might prevent solutions that are too complex and would overfit the data

This approach helps find solutions that not only fit the data well but also have desirable properties like simplicity or stability.

Constructors

ProximalGradientDescentOptimizer(IFullModel<T, TInput, TOutput>, ProximalGradientDescentOptimizerOptions<T, TInput, TOutput>?, IEngine?)

Initializes a new instance of the ProximalGradientDescentOptimizer<T> class with the specified options and components.

public ProximalGradientDescentOptimizer(IFullModel<T, TInput, TOutput> model, ProximalGradientDescentOptimizerOptions<T, TInput, TOutput>? options = null, IEngine? engine = null)

Parameters

model IFullModel<T, TInput, TOutput>

The model to optimize.

options ProximalGradientDescentOptimizerOptions<T, TInput, TOutput>

The proximal gradient descent optimization options, or null to use default options.

engine IEngine

The computation engine (CPU or GPU) for vectorized operations.

Remarks

This constructor creates a new proximal gradient descent optimizer with the specified options and components. If any parameter is null, a default implementation is used. The constructor initializes the options, regularization strategy, and adaptive parameters.

For Beginners: This is the starting point for creating a new optimizer.

Think of it like setting up equipment for a mountain hike:

  • You can provide custom settings (options) or use the default ones
  • You can provide specialized tools (evaluators, calculators) or use the basic ones
  • You can specify how to enforce boundaries (regularization) or use no boundaries
  • It gets everything ready so you can start the optimization process

The options control things like how fast to move, when to stop, and how to adapt during the journey.

Methods

Deserialize(byte[])

Reconstructs the proximal gradient descent optimizer from a serialized byte array.

public override void Deserialize(byte[] data)

Parameters

data byte[]

The byte array containing the serialized optimizer.

Remarks

This method overrides the base implementation to handle PGD-specific information during deserialization. It first deserializes the base class data, then reconstructs the PGD options and iteration count.

For Beginners: This method restores the optimizer from a previously saved state.

It's like restoring from a snapshot:

  • First, it loads all the general optimizer information
  • Then, it loads the PGD-specific settings and state
  • It reconstructs the optimizer to the exact state it was in when saved

This allows you to:

  • Continue working with an optimizer you previously saved
  • Use an optimizer that someone else created and shared
  • Revert to a backup if needed

Exceptions

InvalidOperationException

Thrown when the options cannot be deserialized.

GenerateGradientCacheKey(IFullModel<T, TInput, TOutput>, TInput, TOutput)

Generates a unique key for caching gradients based on the model, input data, and optimizer state.

protected override string GenerateGradientCacheKey(IFullModel<T, TInput, TOutput> model, TInput X, TOutput y)

Parameters

model IFullModel<T, TInput, TOutput>

The model for which the gradient is calculated.

X TInput

The input features matrix.

y TOutput

The target values vector.

Returns

string

A string key that uniquely identifies this gradient calculation.

Remarks

This method overrides the base implementation to include PGD-specific information in the cache key. It extends the base key with information about the learning rate, regularization type, tolerance, and current iteration. This ensures that gradients are properly cached and retrieved even as the optimizer's state changes.

For Beginners: This method creates a unique identification tag for each gradient calculation.

Think of it like a file naming system:

  • It includes information about the model and data being used
  • It adds details specific to the PGD optimizer's current state
  • This unique tag helps the optimizer avoid redundant calculations
  • If the same gradient is needed again, it can be retrieved from cache instead of recalculated

This caching mechanism improves efficiency by avoiding duplicate work.

GetOptions()

Gets the current options for this optimizer.

public override OptimizationAlgorithmOptions<T, TInput, TOutput> GetOptions()

Returns

OptimizationAlgorithmOptions<T, TInput, TOutput>

The current proximal gradient descent optimization options.

Remarks

This method overrides the base implementation to return the PGD-specific options.

For Beginners: This method returns the current settings of the optimizer.

It's like checking what game settings are currently active:

  • You can see the current learning rate settings
  • You can see the current tolerance and iteration limits
  • You can see all the other parameters that control the optimizer

This is useful for understanding how the optimizer is currently configured or for making a copy of the settings to modify and apply later.

InitializeAdaptiveParameters()

Initializes the adaptive parameters used by the Proximal Gradient Descent algorithm.

protected override void InitializeAdaptiveParameters()

Remarks

This method overrides the base implementation to initialize PGD-specific adaptive parameters. It sets the initial learning rate from the options and resets the iteration counter to zero. The learning rate controls how large each step is during optimization.

For Beginners: This method prepares the optimizer for a fresh start.

It's like a hiker preparing for a new journey:

  • Setting their initial step size (learning rate) to a comfortable starting value
  • Resetting their step counter to zero
  • Getting ready to begin searching for the lowest point

These initial settings help the algorithm start with balanced movements that can be adjusted as it learns more about the landscape.

Optimize(OptimizationInputData<T, TInput, TOutput>)

Performs the proximal gradient descent optimization to find the best solution for the given input data.

public override OptimizationResult<T, TInput, TOutput> Optimize(OptimizationInputData<T, TInput, TOutput> inputData)

Parameters

inputData OptimizationInputData<T, TInput, TOutput>

The input data to optimize against.

Returns

OptimizationResult<T, TInput, TOutput>

An optimization result containing the best solution found and associated metrics.

Remarks

This method implements the main PGD algorithm. It starts from a random solution and iteratively improves it by calculating the gradient, taking a step in the negative gradient direction, and then applying regularization. The process continues until either the maximum number of iterations is reached, early stopping criteria are met, or the improvement falls below the specified tolerance.

For Beginners: This is the main search process where the algorithm looks for the best solution.

The process works like this:

  1. Start at a random position on the "hill"
  2. For each iteration:
    • Figure out which direction is most downhill (calculate gradient)
    • Take a step in that direction (update solution)
    • Apply the guardrails to keep the solution well-behaved (apply regularization)
    • Check if the new position is better than the best found so far
    • Adjust the step size based on progress
  3. Stop when enough iterations are done, when no more improvement is happening, or when the improvement is very small

This approach efficiently finds solutions that both fit the data well and satisfy the regularization constraints.

DataLoader Integration: This method uses the DataLoader API for efficient batch processing. It creates a batcher using CreateBatcher(OptimizationInputData<T, TInput, TOutput>, int) and notifies the sampler of epoch starts using NotifyEpochStart(int).

ReverseUpdate(Vector<T>, Vector<T>)

Reverses a Proximal Gradient Descent update to recover original parameters.

public override Vector<T> ReverseUpdate(Vector<T> updatedParameters, Vector<T> appliedGradients)

Parameters

updatedParameters Vector<T>

Parameters after PGD update

appliedGradients Vector<T>

The gradients that were applied

Returns

Vector<T>

Original parameters before the update

Remarks

PGD applies vanilla gradient descent followed by a proximal operator (regularization). The reverse update undoes the gradient step. Note: The regularization cannot be perfectly reversed since the proximal operator is generally not invertible.

For Beginners: This calculates where parameters were before a PGD update. PGD takes a gradient step then applies regularization. We can reverse the gradient step but the regularization effect remains, since regularization is one-way (like rounding numbers).

Serialize()

Serializes the proximal gradient descent optimizer to a byte array for storage or transmission.

public override byte[] Serialize()

Returns

byte[]

A byte array containing the serialized optimizer.

Remarks

This method overrides the base implementation to include PGD-specific information in the serialization. It first serializes the base class data, then adds the PGD options and iteration count.

For Beginners: This method saves the current state of the optimizer so it can be restored later.

It's like taking a snapshot of the optimizer:

  • First, it saves all the general optimizer information
  • Then, it saves the PGD-specific settings and state
  • It packages everything into a format that can be saved to a file or sent over a network

This allows you to:

  • Save a trained optimizer to use later
  • Share an optimizer with others
  • Create a backup before making changes

UpdateAdaptiveParameters(OptimizationStepData<T, TInput, TOutput>, OptimizationStepData<T, TInput, TOutput>)

Updates adaptive parameters based on optimization progress.

protected override void UpdateAdaptiveParameters(OptimizationStepData<T, TInput, TOutput> currentStepData, OptimizationStepData<T, TInput, TOutput> previousStepData)

Parameters

currentStepData OptimizationStepData<T, TInput, TOutput>

The data from the current optimization step.

previousStepData OptimizationStepData<T, TInput, TOutput>

The data from the previous optimization step.

Remarks

This method overrides the base implementation to update PGD-specific adaptive parameters in addition to the base adaptive parameters. It adjusts the learning rate based on whether the algorithm is making progress. If there is improvement, the learning rate increases; otherwise, it decreases. The learning rate is kept within specified minimum and maximum limits.

For Beginners: This method adjusts how the algorithm searches based on its progress.

It's like a hiker changing their approach:

  • If they're finding better spots, they might take bigger steps to progress more quickly
  • If they're not finding improvements, they might take smaller steps to search more carefully
  • The step size always stays between minimum and maximum values to avoid extremes

These adaptive adjustments help the algorithm be more efficient by being bold when things are going well and cautious when progress is difficult.

UpdateOptions(OptimizationAlgorithmOptions<T, TInput, TOutput>)

Updates the optimizer's options with the provided options.

protected override void UpdateOptions(OptimizationAlgorithmOptions<T, TInput, TOutput> options)

Parameters

options OptimizationAlgorithmOptions<T, TInput, TOutput>

The options to apply to this optimizer.

Remarks

This method overrides the base implementation to update the PGD-specific options. It checks that the provided options are of the correct type (ProximalGradientDescentOptimizerOptions) and throws an exception if they are not.

For Beginners: This method updates the settings that control how the optimizer works.

It's like changing the game settings:

  • You provide a set of options to use
  • The method checks that these are the right kind of options for a PGD optimizer
  • If they are, it applies these new settings
  • If not, it lets you know there's a problem

This ensures that only appropriate settings are used with this specific optimizer.

Exceptions

ArgumentException

Thrown when the options are not of the expected type.

UpdateParametersGpu(IGpuBuffer, IGpuBuffer, int, IDirectGpuBackend)

Updates parameters using GPU-accelerated proximal gradient descent.

public override void UpdateParametersGpu(IGpuBuffer parameters, IGpuBuffer gradients, int parameterCount, IDirectGpuBackend backend)

Parameters

parameters IGpuBuffer
gradients IGpuBuffer
parameterCount int
backend IDirectGpuBackend

Remarks

Proximal gradient descent requires applying proximal operators which vary by regularizer type. GPU implementation is not yet available due to the variety of proximal operators (soft-thresholding for L1, projection for constraints, etc.).

UpdateSolution(IFullModel<T, TInput, TOutput>, Vector<T>)

Updates the solution by applying a gradient step followed by regularization.

protected override IFullModel<T, TInput, TOutput> UpdateSolution(IFullModel<T, TInput, TOutput> currentSolution, Vector<T> gradient)

Parameters

currentSolution IFullModel<T, TInput, TOutput>

The current solution to update.

gradient Vector<T>

The gradient vector indicating the direction of steepest ascent.

Returns

IFullModel<T, TInput, TOutput>

A new solution after applying the gradient step and regularization.

Remarks

This method performs a two-step update: first, it applies a gradient descent step by moving in the negative gradient direction; then, it applies regularization to enforce desired properties in the solution. The step size is determined by the current learning rate.

For Beginners: This method takes one step down the hill while respecting the guardrails.

The process has two parts:

  1. Take a step downhill:

    • Look at the gradient to see which way is most downhill
    • Move in that direction by an amount controlled by the learning rate
  2. Apply the guardrails:

    • The regularization takes the solution after the gradient step
    • It adjusts the solution to make it satisfy the desired properties
    • For example, it might reduce any extremely large values

This combination of steps helps find solutions that both minimize the error and have good properties.