Table of Contents

Class AdaMaxOptimizer<T, TInput, TOutput>

Namespace
AiDotNet.Optimizers
Assembly
AiDotNet.dll

Represents an AdaMax optimizer, an extension of Adam that uses the infinity norm.

public class AdaMaxOptimizer<T, TInput, TOutput> : GradientBasedOptimizerBase<T, TInput, TOutput>, IGradientBasedOptimizer<T, TInput, TOutput>, IOptimizer<T, TInput, TOutput>, IModelSerializer

Type Parameters

T

The numeric type used for calculations, typically float or double.

TInput
TOutput
Inheritance
OptimizerBase<T, TInput, TOutput>
GradientBasedOptimizerBase<T, TInput, TOutput>
AdaMaxOptimizer<T, TInput, TOutput>
Implements
IGradientBasedOptimizer<T, TInput, TOutput>
IOptimizer<T, TInput, TOutput>
Inherited Members
Extension Methods

Remarks

AdaMax is an adaptive learning rate optimization algorithm that extends the Adam optimizer. It uses the infinity norm to update parameters, which can make it more robust in certain scenarios.

For Beginners: AdaMax is like a smart learning assistant that adjusts its learning speed for each piece of information it's trying to learn. It's particularly good at handling different scales of information without getting confused.

Key features:

  • Adapts the learning rate for each parameter
  • Uses the maximum (infinity norm) of past gradients, which can be more stable
  • Good for problems where the gradients can be sparse or have different scales

Constructors

AdaMaxOptimizer(IFullModel<T, TInput, TOutput>, AdaMaxOptimizerOptions<T, TInput, TOutput>?, IEngine?)

Initializes a new instance of the AdaMaxOptimizer class.

public AdaMaxOptimizer(IFullModel<T, TInput, TOutput> model, AdaMaxOptimizerOptions<T, TInput, TOutput>? options = null, IEngine? engine = null)

Parameters

model IFullModel<T, TInput, TOutput>

The model to optimize.

options AdaMaxOptimizerOptions<T, TInput, TOutput>

The options for configuring the AdaMax optimizer.

engine IEngine

Remarks

This constructor sets up the AdaMax optimizer with the specified options and components. If no options are provided, it uses default AdaMaxOptimizerOptions.

For Beginners: This is like setting up your smart learning assistant with specific instructions.

You can customize:

  • How fast it learns (learning rate)
  • How it remembers past information (beta parameters)
  • How long it should try to learn (max iterations)
  • And many other aspects of its learning process

If you don't provide custom settings, it will use default settings that work well in many situations.

Methods

Deserialize(byte[])

Restores the optimizer's state from a byte array created by the Serialize method.

public override void Deserialize(byte[] data)

Parameters

data byte[]

The byte array containing the serialized optimizer state.

Remarks

This method reconstructs the optimizer's state, including its options and internal counters, from a binary format created by the Serialize method.

For Beginners: This method is like rebuilding your learning assistant's brain from a saved picture.

Imagine you have a robot helper that you previously "photographed" (serialized):

  1. You give it the "photograph" (byte array)
  2. It reads the photograph piece by piece:
    • First, it rebuilds its basic knowledge (base data)
    • Then, it sets up its specific AdaMax settings (options)
    • Finally, it remembers how long it has been learning (time step)
  3. If anything goes wrong while reading the settings, it lets you know

After this process, your robot helper is back to exactly the same state it was in when you took the "photograph". This is useful for:

  • Continuing a learning session that was paused
  • Setting up multiple identical helpers
  • Recovering from a backup if something goes wrong

Exceptions

InvalidOperationException

Thrown when deserialization of optimizer options fails.

DisposeGpuState()

Disposes GPU-allocated optimizer state.

public override void DisposeGpuState()

Remarks

For Beginners: The base implementation disposes _gpuState if set. Derived classes with multiple state buffers should override.

GenerateGradientCacheKey(IFullModel<T, TInput, TOutput>, TInput, TOutput)

Generates a unique key for caching gradients specific to the AdaMax optimizer.

protected override string GenerateGradientCacheKey(IFullModel<T, TInput, TOutput> model, TInput X, TOutput y)

Parameters

model IFullModel<T, TInput, TOutput>

The current model being optimized.

X TInput

The input data matrix.

y TOutput

The target values vector.

Returns

string

A string that uniquely identifies the gradient for the given model, data, and optimizer state.

Remarks

This method creates a unique identifier for caching gradients. It extends the base gradient cache key with AdaMax-specific parameters to ensure that cached gradients are only reused when all relevant conditions are identical.

For Beginners: This method creates a special label for storing and retrieving calculated gradients.

Imagine you're solving a math problem:

  • The "base key" is like writing down the problem you're solving
  • Adding "AdaMax" tells us we're using this specific method to solve it
  • Including Beta1, Beta2, and t (time step) is like noting which specific tools and at what stage we're using them

This helps us quickly find the right answer if we've solved a very similar problem before, saving time and effort.

GetOptions()

Gets the current options of the AdaMax optimizer.

public override OptimizationAlgorithmOptions<T, TInput, TOutput> GetOptions()

Returns

OptimizationAlgorithmOptions<T, TInput, TOutput>

The current AdaMaxOptimizerOptions.

Remarks

This method returns the current configuration options of the AdaMax optimizer.

For Beginners: This method lets you see the current settings of your learning assistant.

It's like checking the current settings on your study robot:

  • You can see how fast it's set to work (learning rate)
  • How much it remembers from past lessons (beta parameters)
  • How long it's supposed to study for (max iterations)

This is useful if you want to know exactly how your optimizer is currently configured.

InitializeAdaptiveParameters()

Initializes the adaptive parameters for the AdaMax optimizer.

protected override void InitializeAdaptiveParameters()

Remarks

This method sets up the initial state of the optimizer, including the learning rate and time step.

For Beginners: This is like resetting your learning assistant to its starting point.

It does two main things:

  1. Sets the initial learning speed (learning rate) based on the options you provided
  2. Resets the time step to 0, which is like starting a new learning session

This method is called when you first create the optimizer and can be called again if you want to restart the learning process.

InitializeGpuState(int, IDirectGpuBackend)

Initializes optimizer state on the GPU for a given parameter count.

public override void InitializeGpuState(int parameterCount, IDirectGpuBackend backend)

Parameters

parameterCount int

Number of parameters to initialize state for.

backend IDirectGpuBackend

The GPU backend to use for memory allocation.

Remarks

For Beginners: The base implementation does nothing. Derived classes that maintain optimizer state (like momentum or adaptive learning rates) override this.

Optimize(OptimizationInputData<T, TInput, TOutput>)

Performs the optimization process using the AdaMax algorithm.

public override OptimizationResult<T, TInput, TOutput> Optimize(OptimizationInputData<T, TInput, TOutput> inputData)

Parameters

inputData OptimizationInputData<T, TInput, TOutput>

The input data for optimization, including training data and targets.

Returns

OptimizationResult<T, TInput, TOutput>

The result of the optimization process, including the best solution found.

Remarks

This method implements the core optimization loop of the AdaMax algorithm. It iteratively improves the solution by calculating gradients, updating parameters, and evaluating the current solution.

For Beginners: This method is like a smart learning process that tries to find the best answer.

Here's what it does:

  1. Starts with a random guess (solution)
  2. Repeatedly tries to improve the guess:
    • Calculates how to change the guess to make it better (gradient)
    • Updates the guess based on this information
    • Checks if the new guess is the best one so far
  3. Stops when it has tried a certain number of times or when the improvement becomes very small

It's like playing a game where you're trying to find a hidden treasure, and after each step, you get a hint about which direction to go next.

DataLoader Integration: This method uses the DataLoader API for efficient batch processing. It creates a batcher using CreateBatcher(OptimizationInputData<T, TInput, TOutput>, int) and notifies the sampler of epoch starts using NotifyEpochStart(int).

ReverseUpdate(Vector<T>, Vector<T>)

Reverses an AdaMax gradient update to recover original parameters.

public override Vector<T> ReverseUpdate(Vector<T> updatedParameters, Vector<T> appliedGradients)

Parameters

updatedParameters Vector<T>

Parameters after AdaMax update

appliedGradients Vector<T>

The gradients that were applied

Returns

Vector<T>

Original parameters before the update

Remarks

AdaMax's reverse update requires the optimizer's internal state (_m, _u, _t) from the forward pass. This method must be called immediately after UpdateParameters while the state is fresh. It recalculates the bias-corrected learning rate and the infinity-norm-scaled update.

For Beginners: This calculates where parameters were before an AdaMax update. AdaMax uses the maximum gradient magnitude to scale updates, so we need to remember those maximum values (_u) and the momentum (_m) to reverse the step accurately.

Serialize()

Converts the current state of the optimizer into a byte array for storage or transmission.

public override byte[] Serialize()

Returns

byte[]

A byte array representing the serialized state of the optimizer.

Remarks

This method saves the current state of the optimizer, including its options and internal counters, into a compact binary format.

For Beginners: This method is like taking a snapshot of your learning assistant's brain.

Imagine you could:

  • Take a picture of everything your study robot knows and how it's set up
  • Turn that picture into a long string of numbers
  • Save those numbers so you can perfectly recreate the robot's state later

This is useful for:

  • Saving your progress so you can continue later
  • Sharing your optimizer's exact state with others
  • Creating backups in case something goes wrong

UpdateAdaptiveParameters(OptimizationStepData<T, TInput, TOutput>, OptimizationStepData<T, TInput, TOutput>)

Updates the adaptive parameters of the optimizer based on the current and previous optimization steps.

protected override void UpdateAdaptiveParameters(OptimizationStepData<T, TInput, TOutput> currentStepData, OptimizationStepData<T, TInput, TOutput> previousStepData)

Parameters

currentStepData OptimizationStepData<T, TInput, TOutput>

Data from the current optimization step.

previousStepData OptimizationStepData<T, TInput, TOutput>

Data from the previous optimization step.

Remarks

This method adjusts the learning rate based on the performance of the current solution compared to the previous one. If adaptive learning rate is enabled, it increases or decreases the learning rate accordingly.

For Beginners: This method adjusts how big steps we take in our learning process.

It's like learning to ride a bike:

  • If you're doing better (not falling as much), you might try to pedal a bit faster (increase learning rate)
  • If you're struggling more, you might slow down a bit (decrease learning rate)
  • There's a limit to how fast or slow you can go (min and max learning rates)

This helps the optimizer to learn efficiently: not too slow, but also not so fast that it becomes unstable.

UpdateOptions(OptimizationAlgorithmOptions<T, TInput, TOutput>)

Updates the optimizer options with new AdaMax-specific options.

protected override void UpdateOptions(OptimizationAlgorithmOptions<T, TInput, TOutput> options)

Parameters

options OptimizationAlgorithmOptions<T, TInput, TOutput>

The new options to set.

Remarks

This method updates the optimizer's configuration with new options. It ensures that only valid AdaMax-specific options are applied.

For Beginners: This method is like updating the settings on your learning assistant.

Imagine you have a robot helper for studying:

  • You can give it new instructions on how to help you (new options)
  • But you need to make sure you're giving it the right kind of instructions (AdaMax-specific)
  • If you try to give it instructions for a different type of helper, it will let you know there's a mistake

This ensures that your optimizer always has the correct and up-to-date settings to work with.

Exceptions

ArgumentException

Thrown when the provided options are not of type AdaMaxOptimizerOptions.

UpdateParameters(Vector<T>, Vector<T>)

Updates a vector of parameters using the AdaMax optimization algorithm.

public override Vector<T> UpdateParameters(Vector<T> parameters, Vector<T> gradient)

Parameters

parameters Vector<T>

The current parameter vector to be updated.

gradient Vector<T>

The gradient vector corresponding to the parameters.

Returns

Vector<T>

The updated parameter vector.

Remarks

AdaMax is a variant of Adam based on the infinity norm, which can be more stable than Adam for some problems. It adapts the learning rate using the maximum absolute value of gradients.

For Beginners: AdaMax adjusts step sizes by tracking the largest gradient magnitude seen so far for each parameter. This makes it robust to large, occasional gradient spikes.

UpdateParametersGpu(IGpuBuffer, IGpuBuffer, int, IDirectGpuBackend)

Updates parameters on the GPU using optimizer-specific GPU kernels.

public override void UpdateParametersGpu(IGpuBuffer parameters, IGpuBuffer gradients, int parameterCount, IDirectGpuBackend backend)

Parameters

parameters IGpuBuffer

GPU buffer containing parameters to update (modified in-place).

gradients IGpuBuffer

GPU buffer containing gradients.

parameterCount int

Number of parameters.

backend IDirectGpuBackend

The GPU backend to use for execution.

Remarks

For Beginners: The base implementation throws since there's no generic GPU kernel. Derived classes that support GPU updates override this method.

UpdateSolution(IFullModel<T, TInput, TOutput>, Vector<T>)

Updates the current solution using the AdaMax update rule.

protected override IFullModel<T, TInput, TOutput> UpdateSolution(IFullModel<T, TInput, TOutput> currentSolution, Vector<T> gradient)

Parameters

currentSolution IFullModel<T, TInput, TOutput>

The current solution being optimized.

gradient Vector<T>

The calculated gradient for the current solution.

Returns

IFullModel<T, TInput, TOutput>

A new solution with updated parameters.

Remarks

This method applies the AdaMax update rule to adjust the parameters of the current solution. It uses moment estimates and the infinity norm to adapt the learning rate for each parameter.

For Beginners: This method fine-tunes our current guess to make it better.

Imagine you're adjusting the volume and bass on a stereo:

  • The current solution is like the current settings
  • The gradient tells us how to adjust each knob
  • We don't just follow the gradient directly; we use some clever math (AdaMax rules) to decide how much to turn each knob
  • This clever math helps us avoid overreacting to any single piece of information

The result is a new, slightly improved set of stereo settings (or in our case, a better solution).