Class AdaMaxOptimizer<T, TInput, TOutput>
- Namespace
- AiDotNet.Optimizers
- Assembly
- AiDotNet.dll
Represents an AdaMax optimizer, an extension of Adam that uses the infinity norm.
public class AdaMaxOptimizer<T, TInput, TOutput> : GradientBasedOptimizerBase<T, TInput, TOutput>, IGradientBasedOptimizer<T, TInput, TOutput>, IOptimizer<T, TInput, TOutput>, IModelSerializer
Type Parameters
TThe numeric type used for calculations, typically float or double.
TInputTOutput
- Inheritance
-
OptimizerBase<T, TInput, TOutput>GradientBasedOptimizerBase<T, TInput, TOutput>AdaMaxOptimizer<T, TInput, TOutput>
- Implements
-
IGradientBasedOptimizer<T, TInput, TOutput>IOptimizer<T, TInput, TOutput>
- Inherited Members
- Extension Methods
Remarks
AdaMax is an adaptive learning rate optimization algorithm that extends the Adam optimizer. It uses the infinity norm to update parameters, which can make it more robust in certain scenarios.
For Beginners: AdaMax is like a smart learning assistant that adjusts its learning speed for each piece of information it's trying to learn. It's particularly good at handling different scales of information without getting confused.
Key features:
- Adapts the learning rate for each parameter
- Uses the maximum (infinity norm) of past gradients, which can be more stable
- Good for problems where the gradients can be sparse or have different scales
Constructors
AdaMaxOptimizer(IFullModel<T, TInput, TOutput>, AdaMaxOptimizerOptions<T, TInput, TOutput>?, IEngine?)
Initializes a new instance of the AdaMaxOptimizer class.
public AdaMaxOptimizer(IFullModel<T, TInput, TOutput> model, AdaMaxOptimizerOptions<T, TInput, TOutput>? options = null, IEngine? engine = null)
Parameters
modelIFullModel<T, TInput, TOutput>The model to optimize.
optionsAdaMaxOptimizerOptions<T, TInput, TOutput>The options for configuring the AdaMax optimizer.
engineIEngine
Remarks
This constructor sets up the AdaMax optimizer with the specified options and components. If no options are provided, it uses default AdaMaxOptimizerOptions.
For Beginners: This is like setting up your smart learning assistant with specific instructions.
You can customize:
- How fast it learns (learning rate)
- How it remembers past information (beta parameters)
- How long it should try to learn (max iterations)
- And many other aspects of its learning process
If you don't provide custom settings, it will use default settings that work well in many situations.
Methods
Deserialize(byte[])
Restores the optimizer's state from a byte array created by the Serialize method.
public override void Deserialize(byte[] data)
Parameters
databyte[]The byte array containing the serialized optimizer state.
Remarks
This method reconstructs the optimizer's state, including its options and internal counters, from a binary format created by the Serialize method.
For Beginners: This method is like rebuilding your learning assistant's brain from a saved picture.
Imagine you have a robot helper that you previously "photographed" (serialized):
- You give it the "photograph" (byte array)
- It reads the photograph piece by piece:
- First, it rebuilds its basic knowledge (base data)
- Then, it sets up its specific AdaMax settings (options)
- Finally, it remembers how long it has been learning (time step)
- If anything goes wrong while reading the settings, it lets you know
After this process, your robot helper is back to exactly the same state it was in when you took the "photograph". This is useful for:
- Continuing a learning session that was paused
- Setting up multiple identical helpers
- Recovering from a backup if something goes wrong
Exceptions
- InvalidOperationException
Thrown when deserialization of optimizer options fails.
DisposeGpuState()
Disposes GPU-allocated optimizer state.
public override void DisposeGpuState()
Remarks
For Beginners: The base implementation disposes _gpuState if set. Derived classes with multiple state buffers should override.
GenerateGradientCacheKey(IFullModel<T, TInput, TOutput>, TInput, TOutput)
Generates a unique key for caching gradients specific to the AdaMax optimizer.
protected override string GenerateGradientCacheKey(IFullModel<T, TInput, TOutput> model, TInput X, TOutput y)
Parameters
modelIFullModel<T, TInput, TOutput>The current model being optimized.
XTInputThe input data matrix.
yTOutputThe target values vector.
Returns
- string
A string that uniquely identifies the gradient for the given model, data, and optimizer state.
Remarks
This method creates a unique identifier for caching gradients. It extends the base gradient cache key with AdaMax-specific parameters to ensure that cached gradients are only reused when all relevant conditions are identical.
For Beginners: This method creates a special label for storing and retrieving calculated gradients.
Imagine you're solving a math problem:
- The "base key" is like writing down the problem you're solving
- Adding "AdaMax" tells us we're using this specific method to solve it
- Including Beta1, Beta2, and t (time step) is like noting which specific tools and at what stage we're using them
This helps us quickly find the right answer if we've solved a very similar problem before, saving time and effort.
GetOptions()
Gets the current options of the AdaMax optimizer.
public override OptimizationAlgorithmOptions<T, TInput, TOutput> GetOptions()
Returns
- OptimizationAlgorithmOptions<T, TInput, TOutput>
The current AdaMaxOptimizerOptions.
Remarks
This method returns the current configuration options of the AdaMax optimizer.
For Beginners: This method lets you see the current settings of your learning assistant.
It's like checking the current settings on your study robot:
- You can see how fast it's set to work (learning rate)
- How much it remembers from past lessons (beta parameters)
- How long it's supposed to study for (max iterations)
This is useful if you want to know exactly how your optimizer is currently configured.
InitializeAdaptiveParameters()
Initializes the adaptive parameters for the AdaMax optimizer.
protected override void InitializeAdaptiveParameters()
Remarks
This method sets up the initial state of the optimizer, including the learning rate and time step.
For Beginners: This is like resetting your learning assistant to its starting point.
It does two main things:
- Sets the initial learning speed (learning rate) based on the options you provided
- Resets the time step to 0, which is like starting a new learning session
This method is called when you first create the optimizer and can be called again if you want to restart the learning process.
InitializeGpuState(int, IDirectGpuBackend)
Initializes optimizer state on the GPU for a given parameter count.
public override void InitializeGpuState(int parameterCount, IDirectGpuBackend backend)
Parameters
parameterCountintNumber of parameters to initialize state for.
backendIDirectGpuBackendThe GPU backend to use for memory allocation.
Remarks
For Beginners: The base implementation does nothing. Derived classes that maintain optimizer state (like momentum or adaptive learning rates) override this.
Optimize(OptimizationInputData<T, TInput, TOutput>)
Performs the optimization process using the AdaMax algorithm.
public override OptimizationResult<T, TInput, TOutput> Optimize(OptimizationInputData<T, TInput, TOutput> inputData)
Parameters
inputDataOptimizationInputData<T, TInput, TOutput>The input data for optimization, including training data and targets.
Returns
- OptimizationResult<T, TInput, TOutput>
The result of the optimization process, including the best solution found.
Remarks
This method implements the core optimization loop of the AdaMax algorithm. It iteratively improves the solution by calculating gradients, updating parameters, and evaluating the current solution.
For Beginners: This method is like a smart learning process that tries to find the best answer.
Here's what it does:
- Starts with a random guess (solution)
- Repeatedly tries to improve the guess:
- Calculates how to change the guess to make it better (gradient)
- Updates the guess based on this information
- Checks if the new guess is the best one so far
- Stops when it has tried a certain number of times or when the improvement becomes very small
It's like playing a game where you're trying to find a hidden treasure, and after each step, you get a hint about which direction to go next.
DataLoader Integration: This method uses the DataLoader API for efficient batch processing. It creates a batcher using CreateBatcher(OptimizationInputData<T, TInput, TOutput>, int) and notifies the sampler of epoch starts using NotifyEpochStart(int).
ReverseUpdate(Vector<T>, Vector<T>)
Reverses an AdaMax gradient update to recover original parameters.
public override Vector<T> ReverseUpdate(Vector<T> updatedParameters, Vector<T> appliedGradients)
Parameters
updatedParametersVector<T>Parameters after AdaMax update
appliedGradientsVector<T>The gradients that were applied
Returns
- Vector<T>
Original parameters before the update
Remarks
AdaMax's reverse update requires the optimizer's internal state (_m, _u, _t) from the forward pass. This method must be called immediately after UpdateParameters while the state is fresh. It recalculates the bias-corrected learning rate and the infinity-norm-scaled update.
For Beginners: This calculates where parameters were before an AdaMax update. AdaMax uses the maximum gradient magnitude to scale updates, so we need to remember those maximum values (_u) and the momentum (_m) to reverse the step accurately.
Serialize()
Converts the current state of the optimizer into a byte array for storage or transmission.
public override byte[] Serialize()
Returns
- byte[]
A byte array representing the serialized state of the optimizer.
Remarks
This method saves the current state of the optimizer, including its options and internal counters, into a compact binary format.
For Beginners: This method is like taking a snapshot of your learning assistant's brain.
Imagine you could:
- Take a picture of everything your study robot knows and how it's set up
- Turn that picture into a long string of numbers
- Save those numbers so you can perfectly recreate the robot's state later
This is useful for:
- Saving your progress so you can continue later
- Sharing your optimizer's exact state with others
- Creating backups in case something goes wrong
UpdateAdaptiveParameters(OptimizationStepData<T, TInput, TOutput>, OptimizationStepData<T, TInput, TOutput>)
Updates the adaptive parameters of the optimizer based on the current and previous optimization steps.
protected override void UpdateAdaptiveParameters(OptimizationStepData<T, TInput, TOutput> currentStepData, OptimizationStepData<T, TInput, TOutput> previousStepData)
Parameters
currentStepDataOptimizationStepData<T, TInput, TOutput>Data from the current optimization step.
previousStepDataOptimizationStepData<T, TInput, TOutput>Data from the previous optimization step.
Remarks
This method adjusts the learning rate based on the performance of the current solution compared to the previous one. If adaptive learning rate is enabled, it increases or decreases the learning rate accordingly.
For Beginners: This method adjusts how big steps we take in our learning process.
It's like learning to ride a bike:
- If you're doing better (not falling as much), you might try to pedal a bit faster (increase learning rate)
- If you're struggling more, you might slow down a bit (decrease learning rate)
- There's a limit to how fast or slow you can go (min and max learning rates)
This helps the optimizer to learn efficiently: not too slow, but also not so fast that it becomes unstable.
UpdateOptions(OptimizationAlgorithmOptions<T, TInput, TOutput>)
Updates the optimizer options with new AdaMax-specific options.
protected override void UpdateOptions(OptimizationAlgorithmOptions<T, TInput, TOutput> options)
Parameters
optionsOptimizationAlgorithmOptions<T, TInput, TOutput>The new options to set.
Remarks
This method updates the optimizer's configuration with new options. It ensures that only valid AdaMax-specific options are applied.
For Beginners: This method is like updating the settings on your learning assistant.
Imagine you have a robot helper for studying:
- You can give it new instructions on how to help you (new options)
- But you need to make sure you're giving it the right kind of instructions (AdaMax-specific)
- If you try to give it instructions for a different type of helper, it will let you know there's a mistake
This ensures that your optimizer always has the correct and up-to-date settings to work with.
Exceptions
- ArgumentException
Thrown when the provided options are not of type AdaMaxOptimizerOptions.
UpdateParameters(Vector<T>, Vector<T>)
Updates a vector of parameters using the AdaMax optimization algorithm.
public override Vector<T> UpdateParameters(Vector<T> parameters, Vector<T> gradient)
Parameters
parametersVector<T>The current parameter vector to be updated.
gradientVector<T>The gradient vector corresponding to the parameters.
Returns
- Vector<T>
The updated parameter vector.
Remarks
AdaMax is a variant of Adam based on the infinity norm, which can be more stable than Adam for some problems. It adapts the learning rate using the maximum absolute value of gradients.
For Beginners: AdaMax adjusts step sizes by tracking the largest gradient magnitude seen so far for each parameter. This makes it robust to large, occasional gradient spikes.
UpdateParametersGpu(IGpuBuffer, IGpuBuffer, int, IDirectGpuBackend)
Updates parameters on the GPU using optimizer-specific GPU kernels.
public override void UpdateParametersGpu(IGpuBuffer parameters, IGpuBuffer gradients, int parameterCount, IDirectGpuBackend backend)
Parameters
parametersIGpuBufferGPU buffer containing parameters to update (modified in-place).
gradientsIGpuBufferGPU buffer containing gradients.
parameterCountintNumber of parameters.
backendIDirectGpuBackendThe GPU backend to use for execution.
Remarks
For Beginners: The base implementation throws since there's no generic GPU kernel. Derived classes that support GPU updates override this method.
UpdateSolution(IFullModel<T, TInput, TOutput>, Vector<T>)
Updates the current solution using the AdaMax update rule.
protected override IFullModel<T, TInput, TOutput> UpdateSolution(IFullModel<T, TInput, TOutput> currentSolution, Vector<T> gradient)
Parameters
currentSolutionIFullModel<T, TInput, TOutput>The current solution being optimized.
gradientVector<T>The calculated gradient for the current solution.
Returns
- IFullModel<T, TInput, TOutput>
A new solution with updated parameters.
Remarks
This method applies the AdaMax update rule to adjust the parameters of the current solution. It uses moment estimates and the infinity norm to adapt the learning rate for each parameter.
For Beginners: This method fine-tunes our current guess to make it better.
Imagine you're adjusting the volume and bass on a stereo:
- The current solution is like the current settings
- The gradient tells us how to adjust each knob
- We don't just follow the gradient directly; we use some clever math (AdaMax rules) to decide how much to turn each knob
- This clever math helps us avoid overreacting to any single piece of information
The result is a new, slightly improved set of stereo settings (or in our case, a better solution).