Class AdaDeltaOptimizer<T, TInput, TOutput>
- Namespace
- AiDotNet.Optimizers
- Assembly
- AiDotNet.dll
Implements the AdaDelta optimization algorithm for training neural networks and other machine learning models.
public class AdaDeltaOptimizer<T, TInput, TOutput> : GradientBasedOptimizerBase<T, TInput, TOutput>, IGradientBasedOptimizer<T, TInput, TOutput>, IOptimizer<T, TInput, TOutput>, IModelSerializer
Type Parameters
TThe numeric type used for calculations, typically float or double.
TInputTOutput
- Inheritance
-
OptimizerBase<T, TInput, TOutput>GradientBasedOptimizerBase<T, TInput, TOutput>AdaDeltaOptimizer<T, TInput, TOutput>
- Implements
-
IGradientBasedOptimizer<T, TInput, TOutput>IOptimizer<T, TInput, TOutput>
- Inherited Members
- Extension Methods
Remarks
AdaDelta is an adaptive learning rate method that dynamically adjusts the learning rate for each parameter based on a moving window of gradient updates. This optimizer addresses some of the drawbacks of AdaGrad, particularly its aggressive, monotonically decreasing learning rate.
For Beginners: AdaDelta is like a smart assistant that helps your model learn more efficiently.
Imagine you're learning a new skill:
- Sometimes you need to practice more on difficult parts (bigger learning steps)
- Other times you need to be more careful with easier parts (smaller learning steps)
AdaDelta does this automatically for each part of your model, helping it learn better and faster. It remembers recent changes and uses this information to decide how big the next learning step should be.
Constructors
AdaDeltaOptimizer(IFullModel<T, TInput, TOutput>, AdaDeltaOptimizerOptions<T, TInput, TOutput>?, IEngine?)
Initializes a new instance of the AdaDeltaOptimizer<T> class.
public AdaDeltaOptimizer(IFullModel<T, TInput, TOutput> model, AdaDeltaOptimizerOptions<T, TInput, TOutput>? options = null, IEngine? engine = null)
Parameters
modelIFullModel<T, TInput, TOutput>The model to optimize.
optionsAdaDeltaOptimizerOptions<T, TInput, TOutput>The options for configuring the AdaDelta optimizer.
engineIEngine
Remarks
This constructor sets up the AdaDelta optimizer with the specified options and components. If no options are provided, default AdaDelta options are used.
For Beginners: This is like setting up your learning assistant (the optimizer) with specific instructions.
You can customize how it works by providing different options and tools:
- options: Special settings for AdaDelta (like how much it remembers from past steps)
- predictionOptions and modelOptions: Rules for measuring how well the model is doing
- modelEvaluator, fitDetector, fitnessCalculator: Different ways to check the model's performance
- modelCache and gradientCache: Places to store information to speed up learning
If you don't provide these, the optimizer will use default settings.
Properties
SupportsGpuUpdate
Gets whether this optimizer supports GPU-accelerated parameter updates.
public override bool SupportsGpuUpdate { get; }
Property Value
Remarks
For Beginners: Override this in derived classes that have GPU kernel implementations. The base class returns false since it has no specific GPU kernel.
Methods
Deserialize(byte[])
Deserializes the AdaDelta optimizer from a byte array.
public override void Deserialize(byte[] data)
Parameters
databyte[]The byte array containing the serialized optimizer data.
Remarks
This method reconstructs the optimizer's state from a byte array, including its base class state and options.
For Beginners: This is like unpacking the optimizer from its compact form.
Continuing the suitcase analogy:
- You check how much basic stuff was packed
- You unpack the basic stuff (base class data)
- You unpack and set up your special AdaDelta stuff (options)
If there's a problem unpacking the special stuff, it will let you know with an error message.
Exceptions
- InvalidOperationException
Thrown when deserialization of optimizer options fails.
DisposeGpuState()
Disposes GPU-allocated optimizer state.
public override void DisposeGpuState()
Remarks
For Beginners: The base implementation disposes _gpuState if set. Derived classes with multiple state buffers should override.
GenerateGradientCacheKey(IFullModel<T, TInput, TOutput>, TInput, TOutput)
Generates a unique key for caching gradients.
protected override string GenerateGradientCacheKey(IFullModel<T, TInput, TOutput> model, TInput X, TOutput y)
Parameters
modelIFullModel<T, TInput, TOutput>The symbolic model.
XTInputThe input data matrix.
yTOutputThe target values vector.
Returns
- string
A string representing the unique gradient cache key.
Remarks
This method creates a unique identifier for caching gradients based on the model, input data, and specific AdaDelta parameters.
For Beginners: This is like creating a special label for each set of calculations.
Imagine you're organizing your homework:
- You start with a basic label (from the base class)
- Then you add specific information about this AdaDelta optimizer (rho and epsilon values)
This helps the optimizer quickly find and reuse calculations it has done before, which can make the learning process faster.
GetOptions()
Gets the current optimizer options.
public override OptimizationAlgorithmOptions<T, TInput, TOutput> GetOptions()
Returns
- OptimizationAlgorithmOptions<T, TInput, TOutput>
The current AdaDeltaOptimizerOptions.
Remarks
This method returns the current options used by the AdaDelta optimizer.
For Beginners: This is like checking the current settings of your learning assistant.
You can use this to see how the optimizer is currently configured, which can be helpful if you want to understand its behavior or make changes.
InitializeAdaptiveParameters()
Initializes the adaptive parameters for the AdaDelta optimizer.
protected override void InitializeAdaptiveParameters()
Remarks
This method sets up the initial learning rate based on the options provided. It's called during the optimizer's initialization.
For Beginners: This is like setting the starting point for how big the learning steps will be.
The initial learning rate is like deciding how big your first step will be when starting to learn something new. This method sets that initial step size based on the options you provided when creating the optimizer.
InitializeGpuState(int, IDirectGpuBackend)
Initializes optimizer state on the GPU for a given parameter count.
public override void InitializeGpuState(int parameterCount, IDirectGpuBackend backend)
Parameters
parameterCountintNumber of parameters to initialize state for.
backendIDirectGpuBackendThe GPU backend to use for memory allocation.
Remarks
For Beginners: The base implementation does nothing. Derived classes that maintain optimizer state (like momentum or adaptive learning rates) override this.
Optimize(OptimizationInputData<T, TInput, TOutput>)
Performs the optimization process using the AdaDelta algorithm.
public override OptimizationResult<T, TInput, TOutput> Optimize(OptimizationInputData<T, TInput, TOutput> inputData)
Parameters
inputDataOptimizationInputData<T, TInput, TOutput>The input data for optimization, including training data.
Returns
- OptimizationResult<T, TInput, TOutput>
The result of the optimization process.
Remarks
This method implements the main optimization loop of AdaDelta. It iteratively updates the model parameters using the AdaDelta update rule, evaluates the new solution, and checks for convergence or early stopping conditions.
For Beginners: This is the main learning process of the optimizer.
Here's what happens:
- It starts with a random guess for the best solution
- In each step (iteration):
- It calculates how to improve the current solution
- It updates the solution using the AdaDelta method
- It checks if the new solution is better than the previous best
- It decides whether to stop early if the solution is good enough
- It repeats this process until it reaches the maximum number of steps or finds a good enough solution
This is like practicing a skill over and over, getting a little better each time, until you're satisfied with your performance.
DataLoader Integration: This method uses the DataLoader API for efficient batch processing. It creates a batcher using CreateBatcher(OptimizationInputData<T, TInput, TOutput>, int) and notifies the sampler of epoch starts using NotifyEpochStart(int).
ReverseUpdate(Vector<T>, Vector<T>)
Reverses an AdaDelta gradient update to recover original parameters.
public override Vector<T> ReverseUpdate(Vector<T> updatedParameters, Vector<T> appliedGradients)
Parameters
updatedParametersVector<T>Parameters after AdaDelta update
appliedGradientsVector<T>The gradients that were applied
Returns
- Vector<T>
Original parameters before the update
Remarks
AdaDelta's reverse update requires both accumulated squared gradients and accumulated squared updates from the forward pass. This method must be called immediately after UpdateParameters while both states are fresh. It recalculates the adaptive update that was applied based on the accumulated statistics.
For Beginners: This calculates where parameters were before an AdaDelta update. AdaDelta uses two pieces of memory: one for gradient history and one for update history. To reverse an update, we need both memories to reconstruct what step was taken. It's like rewinding a dance where each move depends on previous moves and the music (gradients).
Serialize()
Serializes the AdaDelta optimizer to a byte array.
public override byte[] Serialize()
Returns
- byte[]
A byte array representing the serialized optimizer.
Remarks
This method converts the optimizer's state, including its base class state and options, into a byte array that can be stored or transmitted.
For Beginners: This is like packing up the optimizer into a compact form.
Imagine you're packing a suitcase:
- You pack the basic stuff (base class data)
- You write down how much basic stuff you packed
- You pack your special AdaDelta stuff (options)
This packed form can be saved or sent somewhere else, and later unpacked to recreate the optimizer exactly as it was.
UpdateAdaptiveParameters(OptimizationStepData<T, TInput, TOutput>, OptimizationStepData<T, TInput, TOutput>)
Updates the adaptive parameters of the AdaDelta optimizer.
protected override void UpdateAdaptiveParameters(OptimizationStepData<T, TInput, TOutput> currentStepData, OptimizationStepData<T, TInput, TOutput> previousStepData)
Parameters
currentStepDataOptimizationStepData<T, TInput, TOutput>The optimization step data for the current iteration.
previousStepDataOptimizationStepData<T, TInput, TOutput>The optimization step data for the previous iteration.
Remarks
This method updates the adaptive parameters of the AdaDelta optimizer, specifically the rho value if adaptive rho is enabled in the options.
For Beginners: This method adjusts how the optimizer learns over time.
If adaptive rho is turned on:
- If the current solution is better than the previous one, it slightly increases rho
- If the current solution is worse, it slightly decreases rho
Rho controls how much the optimizer remembers from past steps. Adjusting it helps the optimizer adapt to the current state of learning, potentially making it more efficient.
UpdateOptions(OptimizationAlgorithmOptions<T, TInput, TOutput>)
Updates the optimizer options.
protected override void UpdateOptions(OptimizationAlgorithmOptions<T, TInput, TOutput> options)
Parameters
optionsOptimizationAlgorithmOptions<T, TInput, TOutput>The new options to set.
Remarks
This method updates the optimizer's options with new settings. It ensures that only AdaDeltaOptimizerOptions are used with this optimizer.
For Beginners: This is like changing the settings on your learning assistant.
You can use this to adjust how the optimizer works, but you need to make sure you're using the right type of settings (AdaDeltaOptimizerOptions). If you try to use the wrong type of settings, it will give you an error message.
Exceptions
- ArgumentException
Thrown when the provided options are not of type AdaDeltaOptimizerOptions.
UpdateParameters(Vector<T>, Vector<T>)
Updates a vector of parameters using the AdaDelta optimization algorithm.
public override Vector<T> UpdateParameters(Vector<T> parameters, Vector<T> gradient)
Parameters
parametersVector<T>The current parameter vector to be updated.
gradientVector<T>The gradient vector corresponding to the parameters.
Returns
- Vector<T>
The updated parameter vector.
Remarks
This method implements the AdaDelta update rule by maintaining exponential moving averages of both squared gradients and squared updates. This allows AdaDelta to adapt the learning rate without requiring an explicit learning rate parameter.
For Beginners: AdaDelta automatically adjusts learning rates by remembering both how gradients have changed (squared gradients) and how parameters have been updated (squared updates). This makes it largely learning-rate-free, adapting automatically to the scale of the problem.
UpdateParametersGpu(IGpuBuffer, IGpuBuffer, int, IDirectGpuBackend)
Updates parameters on GPU using the AdaDelta optimization algorithm.
public override void UpdateParametersGpu(IGpuBuffer parameters, IGpuBuffer gradients, int parameterCount, IDirectGpuBackend backend)
Parameters
parametersIGpuBufferThe current parameter tensor on GPU.
gradientsIGpuBufferThe gradient tensor on GPU.
parameterCountintbackendIDirectGpuBackend
Remarks
This method performs GPU-resident AdaDelta updates without CPU synchronization. All tensors remain on GPU throughout the update process.
UpdateSolution(IFullModel<T, TInput, TOutput>, Vector<T>)
Updates the current solution using the AdaDelta update rule.
protected override IFullModel<T, TInput, TOutput> UpdateSolution(IFullModel<T, TInput, TOutput> currentSolution, Vector<T> gradient)
Parameters
currentSolutionIFullModel<T, TInput, TOutput>The current solution (model parameters).
gradientVector<T>The computed gradient for the current solution.
Returns
- IFullModel<T, TInput, TOutput>
A new solution with updated parameters.
Remarks
This method applies the AdaDelta update rule to each parameter of the current solution. It uses accumulated squared gradients and updates to compute adaptive learning rates for each parameter.
For Beginners: This is where the actual learning happens for each part of the model.
For each parameter in the model:
- It remembers how much this parameter has changed recently (accumulated squared gradients)
- It calculates how much to change the parameter this time (update)
- It remembers how big these changes have been (accumulated squared updates)
- It applies the change to the parameter
This process helps the model learn more efficiently by adjusting bigger for parameters that need more change and smaller for those that need less change.