Table of Contents

Class AdaDeltaOptimizer<T, TInput, TOutput>

Namespace
AiDotNet.Optimizers
Assembly
AiDotNet.dll

Implements the AdaDelta optimization algorithm for training neural networks and other machine learning models.

public class AdaDeltaOptimizer<T, TInput, TOutput> : GradientBasedOptimizerBase<T, TInput, TOutput>, IGradientBasedOptimizer<T, TInput, TOutput>, IOptimizer<T, TInput, TOutput>, IModelSerializer

Type Parameters

T

The numeric type used for calculations, typically float or double.

TInput
TOutput
Inheritance
OptimizerBase<T, TInput, TOutput>
GradientBasedOptimizerBase<T, TInput, TOutput>
AdaDeltaOptimizer<T, TInput, TOutput>
Implements
IGradientBasedOptimizer<T, TInput, TOutput>
IOptimizer<T, TInput, TOutput>
Inherited Members
Extension Methods

Remarks

AdaDelta is an adaptive learning rate method that dynamically adjusts the learning rate for each parameter based on a moving window of gradient updates. This optimizer addresses some of the drawbacks of AdaGrad, particularly its aggressive, monotonically decreasing learning rate.

For Beginners: AdaDelta is like a smart assistant that helps your model learn more efficiently.

Imagine you're learning a new skill:

  • Sometimes you need to practice more on difficult parts (bigger learning steps)
  • Other times you need to be more careful with easier parts (smaller learning steps)

AdaDelta does this automatically for each part of your model, helping it learn better and faster. It remembers recent changes and uses this information to decide how big the next learning step should be.

Constructors

AdaDeltaOptimizer(IFullModel<T, TInput, TOutput>, AdaDeltaOptimizerOptions<T, TInput, TOutput>?, IEngine?)

Initializes a new instance of the AdaDeltaOptimizer<T> class.

public AdaDeltaOptimizer(IFullModel<T, TInput, TOutput> model, AdaDeltaOptimizerOptions<T, TInput, TOutput>? options = null, IEngine? engine = null)

Parameters

model IFullModel<T, TInput, TOutput>

The model to optimize.

options AdaDeltaOptimizerOptions<T, TInput, TOutput>

The options for configuring the AdaDelta optimizer.

engine IEngine

Remarks

This constructor sets up the AdaDelta optimizer with the specified options and components. If no options are provided, default AdaDelta options are used.

For Beginners: This is like setting up your learning assistant (the optimizer) with specific instructions.

You can customize how it works by providing different options and tools:

  • options: Special settings for AdaDelta (like how much it remembers from past steps)
  • predictionOptions and modelOptions: Rules for measuring how well the model is doing
  • modelEvaluator, fitDetector, fitnessCalculator: Different ways to check the model's performance
  • modelCache and gradientCache: Places to store information to speed up learning

If you don't provide these, the optimizer will use default settings.

Properties

SupportsGpuUpdate

Gets whether this optimizer supports GPU-accelerated parameter updates.

public override bool SupportsGpuUpdate { get; }

Property Value

bool

Remarks

For Beginners: Override this in derived classes that have GPU kernel implementations. The base class returns false since it has no specific GPU kernel.

Methods

Deserialize(byte[])

Deserializes the AdaDelta optimizer from a byte array.

public override void Deserialize(byte[] data)

Parameters

data byte[]

The byte array containing the serialized optimizer data.

Remarks

This method reconstructs the optimizer's state from a byte array, including its base class state and options.

For Beginners: This is like unpacking the optimizer from its compact form.

Continuing the suitcase analogy:

  1. You check how much basic stuff was packed
  2. You unpack the basic stuff (base class data)
  3. You unpack and set up your special AdaDelta stuff (options)

If there's a problem unpacking the special stuff, it will let you know with an error message.

Exceptions

InvalidOperationException

Thrown when deserialization of optimizer options fails.

DisposeGpuState()

Disposes GPU-allocated optimizer state.

public override void DisposeGpuState()

Remarks

For Beginners: The base implementation disposes _gpuState if set. Derived classes with multiple state buffers should override.

GenerateGradientCacheKey(IFullModel<T, TInput, TOutput>, TInput, TOutput)

Generates a unique key for caching gradients.

protected override string GenerateGradientCacheKey(IFullModel<T, TInput, TOutput> model, TInput X, TOutput y)

Parameters

model IFullModel<T, TInput, TOutput>

The symbolic model.

X TInput

The input data matrix.

y TOutput

The target values vector.

Returns

string

A string representing the unique gradient cache key.

Remarks

This method creates a unique identifier for caching gradients based on the model, input data, and specific AdaDelta parameters.

For Beginners: This is like creating a special label for each set of calculations.

Imagine you're organizing your homework:

  • You start with a basic label (from the base class)
  • Then you add specific information about this AdaDelta optimizer (rho and epsilon values)

This helps the optimizer quickly find and reuse calculations it has done before, which can make the learning process faster.

GetOptions()

Gets the current optimizer options.

public override OptimizationAlgorithmOptions<T, TInput, TOutput> GetOptions()

Returns

OptimizationAlgorithmOptions<T, TInput, TOutput>

The current AdaDeltaOptimizerOptions.

Remarks

This method returns the current options used by the AdaDelta optimizer.

For Beginners: This is like checking the current settings of your learning assistant.

You can use this to see how the optimizer is currently configured, which can be helpful if you want to understand its behavior or make changes.

InitializeAdaptiveParameters()

Initializes the adaptive parameters for the AdaDelta optimizer.

protected override void InitializeAdaptiveParameters()

Remarks

This method sets up the initial learning rate based on the options provided. It's called during the optimizer's initialization.

For Beginners: This is like setting the starting point for how big the learning steps will be.

The initial learning rate is like deciding how big your first step will be when starting to learn something new. This method sets that initial step size based on the options you provided when creating the optimizer.

InitializeGpuState(int, IDirectGpuBackend)

Initializes optimizer state on the GPU for a given parameter count.

public override void InitializeGpuState(int parameterCount, IDirectGpuBackend backend)

Parameters

parameterCount int

Number of parameters to initialize state for.

backend IDirectGpuBackend

The GPU backend to use for memory allocation.

Remarks

For Beginners: The base implementation does nothing. Derived classes that maintain optimizer state (like momentum or adaptive learning rates) override this.

Optimize(OptimizationInputData<T, TInput, TOutput>)

Performs the optimization process using the AdaDelta algorithm.

public override OptimizationResult<T, TInput, TOutput> Optimize(OptimizationInputData<T, TInput, TOutput> inputData)

Parameters

inputData OptimizationInputData<T, TInput, TOutput>

The input data for optimization, including training data.

Returns

OptimizationResult<T, TInput, TOutput>

The result of the optimization process.

Remarks

This method implements the main optimization loop of AdaDelta. It iteratively updates the model parameters using the AdaDelta update rule, evaluates the new solution, and checks for convergence or early stopping conditions.

For Beginners: This is the main learning process of the optimizer.

Here's what happens:

  1. It starts with a random guess for the best solution
  2. In each step (iteration):
    • It calculates how to improve the current solution
    • It updates the solution using the AdaDelta method
    • It checks if the new solution is better than the previous best
    • It decides whether to stop early if the solution is good enough
  3. It repeats this process until it reaches the maximum number of steps or finds a good enough solution

This is like practicing a skill over and over, getting a little better each time, until you're satisfied with your performance.

DataLoader Integration: This method uses the DataLoader API for efficient batch processing. It creates a batcher using CreateBatcher(OptimizationInputData<T, TInput, TOutput>, int) and notifies the sampler of epoch starts using NotifyEpochStart(int).

ReverseUpdate(Vector<T>, Vector<T>)

Reverses an AdaDelta gradient update to recover original parameters.

public override Vector<T> ReverseUpdate(Vector<T> updatedParameters, Vector<T> appliedGradients)

Parameters

updatedParameters Vector<T>

Parameters after AdaDelta update

appliedGradients Vector<T>

The gradients that were applied

Returns

Vector<T>

Original parameters before the update

Remarks

AdaDelta's reverse update requires both accumulated squared gradients and accumulated squared updates from the forward pass. This method must be called immediately after UpdateParameters while both states are fresh. It recalculates the adaptive update that was applied based on the accumulated statistics.

For Beginners: This calculates where parameters were before an AdaDelta update. AdaDelta uses two pieces of memory: one for gradient history and one for update history. To reverse an update, we need both memories to reconstruct what step was taken. It's like rewinding a dance where each move depends on previous moves and the music (gradients).

Serialize()

Serializes the AdaDelta optimizer to a byte array.

public override byte[] Serialize()

Returns

byte[]

A byte array representing the serialized optimizer.

Remarks

This method converts the optimizer's state, including its base class state and options, into a byte array that can be stored or transmitted.

For Beginners: This is like packing up the optimizer into a compact form.

Imagine you're packing a suitcase:

  1. You pack the basic stuff (base class data)
  2. You write down how much basic stuff you packed
  3. You pack your special AdaDelta stuff (options)

This packed form can be saved or sent somewhere else, and later unpacked to recreate the optimizer exactly as it was.

UpdateAdaptiveParameters(OptimizationStepData<T, TInput, TOutput>, OptimizationStepData<T, TInput, TOutput>)

Updates the adaptive parameters of the AdaDelta optimizer.

protected override void UpdateAdaptiveParameters(OptimizationStepData<T, TInput, TOutput> currentStepData, OptimizationStepData<T, TInput, TOutput> previousStepData)

Parameters

currentStepData OptimizationStepData<T, TInput, TOutput>

The optimization step data for the current iteration.

previousStepData OptimizationStepData<T, TInput, TOutput>

The optimization step data for the previous iteration.

Remarks

This method updates the adaptive parameters of the AdaDelta optimizer, specifically the rho value if adaptive rho is enabled in the options.

For Beginners: This method adjusts how the optimizer learns over time.

If adaptive rho is turned on:

  • If the current solution is better than the previous one, it slightly increases rho
  • If the current solution is worse, it slightly decreases rho

Rho controls how much the optimizer remembers from past steps. Adjusting it helps the optimizer adapt to the current state of learning, potentially making it more efficient.

UpdateOptions(OptimizationAlgorithmOptions<T, TInput, TOutput>)

Updates the optimizer options.

protected override void UpdateOptions(OptimizationAlgorithmOptions<T, TInput, TOutput> options)

Parameters

options OptimizationAlgorithmOptions<T, TInput, TOutput>

The new options to set.

Remarks

This method updates the optimizer's options with new settings. It ensures that only AdaDeltaOptimizerOptions are used with this optimizer.

For Beginners: This is like changing the settings on your learning assistant.

You can use this to adjust how the optimizer works, but you need to make sure you're using the right type of settings (AdaDeltaOptimizerOptions). If you try to use the wrong type of settings, it will give you an error message.

Exceptions

ArgumentException

Thrown when the provided options are not of type AdaDeltaOptimizerOptions.

UpdateParameters(Vector<T>, Vector<T>)

Updates a vector of parameters using the AdaDelta optimization algorithm.

public override Vector<T> UpdateParameters(Vector<T> parameters, Vector<T> gradient)

Parameters

parameters Vector<T>

The current parameter vector to be updated.

gradient Vector<T>

The gradient vector corresponding to the parameters.

Returns

Vector<T>

The updated parameter vector.

Remarks

This method implements the AdaDelta update rule by maintaining exponential moving averages of both squared gradients and squared updates. This allows AdaDelta to adapt the learning rate without requiring an explicit learning rate parameter.

For Beginners: AdaDelta automatically adjusts learning rates by remembering both how gradients have changed (squared gradients) and how parameters have been updated (squared updates). This makes it largely learning-rate-free, adapting automatically to the scale of the problem.

UpdateParametersGpu(IGpuBuffer, IGpuBuffer, int, IDirectGpuBackend)

Updates parameters on GPU using the AdaDelta optimization algorithm.

public override void UpdateParametersGpu(IGpuBuffer parameters, IGpuBuffer gradients, int parameterCount, IDirectGpuBackend backend)

Parameters

parameters IGpuBuffer

The current parameter tensor on GPU.

gradients IGpuBuffer

The gradient tensor on GPU.

parameterCount int
backend IDirectGpuBackend

Remarks

This method performs GPU-resident AdaDelta updates without CPU synchronization. All tensors remain on GPU throughout the update process.

UpdateSolution(IFullModel<T, TInput, TOutput>, Vector<T>)

Updates the current solution using the AdaDelta update rule.

protected override IFullModel<T, TInput, TOutput> UpdateSolution(IFullModel<T, TInput, TOutput> currentSolution, Vector<T> gradient)

Parameters

currentSolution IFullModel<T, TInput, TOutput>

The current solution (model parameters).

gradient Vector<T>

The computed gradient for the current solution.

Returns

IFullModel<T, TInput, TOutput>

A new solution with updated parameters.

Remarks

This method applies the AdaDelta update rule to each parameter of the current solution. It uses accumulated squared gradients and updates to compute adaptive learning rates for each parameter.

For Beginners: This is where the actual learning happens for each part of the model.

For each parameter in the model:

  1. It remembers how much this parameter has changed recently (accumulated squared gradients)
  2. It calculates how much to change the parameter this time (update)
  3. It remembers how big these changes have been (accumulated squared updates)
  4. It applies the change to the parameter

This process helps the model learn more efficiently by adjusting bigger for parameters that need more change and smaller for those that need less change.