Table of Contents

Class AdagradOptimizer<T, TInput, TOutput>

Namespace
AiDotNet.Optimizers
Assembly
AiDotNet.dll

Represents an Adagrad (Adaptive Gradient) optimizer for gradient-based optimization.

public class AdagradOptimizer<T, TInput, TOutput> : GradientBasedOptimizerBase<T, TInput, TOutput>, IGradientBasedOptimizer<T, TInput, TOutput>, IOptimizer<T, TInput, TOutput>, IModelSerializer

Type Parameters

T

The numeric type used for calculations, typically float or double.

TInput
TOutput
Inheritance
OptimizerBase<T, TInput, TOutput>
GradientBasedOptimizerBase<T, TInput, TOutput>
AdagradOptimizer<T, TInput, TOutput>
Implements
IGradientBasedOptimizer<T, TInput, TOutput>
IOptimizer<T, TInput, TOutput>
Inherited Members
Extension Methods

Remarks

The Adagrad optimizer adapts the learning rate for each parameter based on the historical gradients. It performs larger updates for infrequent parameters and smaller updates for frequent ones.

For Beginners: Adagrad is like a smart learning assistant that adjusts how much it learns for each piece of information based on how often it has seen similar information before.

  • It learns more from new or rare information
  • It learns less from common or frequently seen information
  • This helps it focus on the most important parts of what it's learning

This can be especially useful when some parts of your data are more important or occur less frequently.

Constructors

AdagradOptimizer(IFullModel<T, TInput, TOutput>, AdagradOptimizerOptions<T, TInput, TOutput>?)

Initializes a new instance of the AdagradOptimizer class.

public AdagradOptimizer(IFullModel<T, TInput, TOutput> model, AdagradOptimizerOptions<T, TInput, TOutput>? options = null)

Parameters

model IFullModel<T, TInput, TOutput>

The model to optimize.

options AdagradOptimizerOptions<T, TInput, TOutput>

The options for configuring the Adagrad optimizer.

Remarks

This constructor sets up the Adagrad optimizer with the specified options and components. If no options are provided, it uses default AdagradOptimizerOptions.

For Beginners: This is like setting up your learning assistant with specific instructions.

You can customize:

  • How the assistant learns (options)
  • How it measures its progress (predictionOptions, modelOptions)
  • How it evaluates its performance (modelEvaluator, fitDetector, fitnessCalculator)
  • How it remembers what it has learned (modelCache, gradientCache)

If you don't specify these, it will use default settings.

Properties

SupportsGpuUpdate

Gets whether this optimizer supports GPU-accelerated parameter updates.

public override bool SupportsGpuUpdate { get; }

Property Value

bool

Methods

Deserialize(byte[])

Deserializes the Adagrad optimizer from a byte array.

public override void Deserialize(byte[] data)

Parameters

data byte[]

The byte array containing the serialized optimizer state.

Remarks

This method reconstructs the state of the Adagrad optimizer from a byte array, including its base class state and specific options. It's used to restore a previously serialized optimizer state.

For Beginners: This is like recreating your learning assistant from a saved snapshot.

The process:

  1. Reads the basic information (for the parent class)
  2. Recreates the parent class state
  3. Reads and recreates the specific Adagrad settings

This allows you to continue using the optimizer from exactly where you left off, with all its learned information and settings intact.

Exceptions

InvalidOperationException

Thrown when deserialization of optimizer options fails.

DisposeGpuState()

Disposes GPU-allocated optimizer state.

public override void DisposeGpuState()

GenerateGradientCacheKey(IFullModel<T, TInput, TOutput>, TInput, TOutput)

Generates a unique key for caching gradients based on the model, input data, and Adagrad-specific parameters.

protected override string GenerateGradientCacheKey(IFullModel<T, TInput, TOutput> model, TInput X, TOutput y)

Parameters

model IFullModel<T, TInput, TOutput>

The symbolic model.

X TInput

The input feature matrix.

y TOutput

The target vector.

Returns

string

A string representing the unique gradient cache key.

Remarks

This method creates a unique identifier for caching gradients. It combines the base cache key with Adagrad-specific parameters to ensure that cached gradients are only reused when all relevant factors are identical.

For Beginners: This is like creating a unique label for each set of calculations.

The label includes:

  • Information about the model and data (from the base class)
  • Specific settings of the Adagrad optimizer (initial learning rate and epsilon)

This helps the optimizer quickly find and reuse previous calculations when the same situation occurs again, which can save time and computational resources.

GetOptions()

Retrieves the current options of the Adagrad optimizer.

public override OptimizationAlgorithmOptions<T, TInput, TOutput> GetOptions()

Returns

OptimizationAlgorithmOptions<T, TInput, TOutput>

The current AdagradOptimizerOptions.

Remarks

This method returns the current configuration options of the Adagrad optimizer.

For Beginners: This is like asking your learning assistant for its current instructions.

It allows you to check:

  • What learning rate the optimizer is using
  • How many iterations it will run
  • Other specific settings for the Adagrad method

This can be useful for understanding how the optimizer is currently set up or for saving its configuration.

InitializeAdaptiveParameters()

Initializes the adaptive parameters for the Adagrad optimizer.

protected override void InitializeAdaptiveParameters()

Remarks

This method sets up the initial learning rate for the optimizer based on the options.

For Beginners: This is like setting the initial speed at which your assistant learns.

The learning rate determines how big the steps are when the optimizer is trying to find the best solution. A good initial learning rate helps the optimizer start its learning process effectively.

InitializeGpuState(int, IDirectGpuBackend)

Initializes Adagrad optimizer state on the GPU.

public override void InitializeGpuState(int parameterCount, IDirectGpuBackend backend)

Parameters

parameterCount int
backend IDirectGpuBackend

Optimize(OptimizationInputData<T, TInput, TOutput>)

Performs the optimization process using the Adagrad algorithm.

public override OptimizationResult<T, TInput, TOutput> Optimize(OptimizationInputData<T, TInput, TOutput> inputData)

Parameters

inputData OptimizationInputData<T, TInput, TOutput>

The input data for optimization.

Returns

OptimizationResult<T, TInput, TOutput>

The result of the optimization process.

Remarks

This method implements the main optimization loop of the Adagrad algorithm. It iteratively updates the solution based on calculated gradients and accumulated squared gradients.

For Beginners: This is the main learning process of the Adagrad optimizer.

Here's what happens in each iteration:

  1. Calculate how to improve the current solution (gradient)
  2. Update the memory of past improvements (accumulated squared gradients)
  3. Create a new, hopefully better solution
  4. Check if this new solution is the best so far
  5. Adjust how the optimizer learns (adaptive parameters)
  6. Check if we should stop early (if the solution is good enough)

This process repeats until we reach the maximum number of iterations or find a good enough solution.

DataLoader Integration: This method uses the DataLoader API for efficient batch processing. It creates a batcher using CreateBatcher(OptimizationInputData<T, TInput, TOutput>, int) and notifies the sampler of epoch starts using NotifyEpochStart(int).

ReverseUpdate(Vector<T>, Vector<T>)

Reverses an Adagrad gradient update to recover original parameters.

public override Vector<T> ReverseUpdate(Vector<T> updatedParameters, Vector<T> appliedGradients)

Parameters

updatedParameters Vector<T>

Parameters after gradient application

appliedGradients Vector<T>

The gradients that were applied

Returns

Vector<T>

Original parameters before the gradient update

Remarks

For Adagrad, the forward update is: 1. _accumulatedSquaredGradients[i] += gradient[i]^2 2. adaptiveLearningRate = learning_rate / (sqrt(_accumulatedSquaredGradients[i]) + epsilon) 3. params_new = params_old - adaptiveLearningRate * gradient

To reverse: params_old = params_new + adaptiveLearningRate * gradient

This requires access to the accumulated squared gradients to recalculate the adaptive learning rate.

For Beginners: This is like undoing a learning step. Given where the optimizer ended up (updated parameters) and its memory of past improvements (accumulated squared gradients), we can calculate the exact step that was taken and figure out where it started from.

Exceptions

ArgumentNullException

If parameters or gradients are null

ArgumentException

If parameter and gradient sizes do not match

Serialize()

Serializes the Adagrad optimizer to a byte array.

public override byte[] Serialize()

Returns

byte[]

A byte array representing the serialized state of the optimizer.

Remarks

This method saves the current state of the Adagrad optimizer, including its base class state and specific options, into a byte array. This allows the optimizer's state to be stored or transmitted.

For Beginners: This is like taking a snapshot of your learning assistant's current state.

The process:

  1. Saves the basic information (from the parent class)
  2. Saves the specific Adagrad settings
  3. Combines all this information into a single package (byte array)

This snapshot can be used later to recreate the exact same state of the optimizer, which is useful for saving progress or sharing the optimizer's configuration.

UpdateAdaptiveParameters(OptimizationStepData<T, TInput, TOutput>, OptimizationStepData<T, TInput, TOutput>)

Updates the adaptive parameters of the Adagrad optimizer.

protected override void UpdateAdaptiveParameters(OptimizationStepData<T, TInput, TOutput> currentStepData, OptimizationStepData<T, TInput, TOutput> previousStepData)

Parameters

currentStepData OptimizationStepData<T, TInput, TOutput>

The optimization step data for the current iteration.

previousStepData OptimizationStepData<T, TInput, TOutput>

The optimization step data for the previous iteration.

Remarks

This method updates the learning rate if adaptive learning rate is enabled in the options. It increases or decreases the learning rate based on whether the current solution is better than the previous one.

For Beginners: This is like adjusting how fast the optimizer learns based on its recent progress.

If adaptive learning rate is turned on:

  • If the current solution is better, slightly increase the learning rate
  • If the current solution is worse, slightly decrease the learning rate
  • Keep the learning rate within specified limits

This helps the optimizer adapt its learning speed based on how well it's doing, potentially making the learning process more efficient.

UpdateOptions(OptimizationAlgorithmOptions<T, TInput, TOutput>)

Updates the options for the Adagrad optimizer.

protected override void UpdateOptions(OptimizationAlgorithmOptions<T, TInput, TOutput> options)

Parameters

options OptimizationAlgorithmOptions<T, TInput, TOutput>

The new options to be set.

Remarks

This method updates the optimizer's configuration with new options. It ensures that only AdagradOptimizerOptions are used to configure this optimizer.

For Beginners: This is like updating the instructions for your learning assistant.

  • It checks if the new instructions are the right type for this specific assistant (Adagrad)
  • If they are, it updates the assistant's settings
  • If they're not, it reports an error

This helps prevent accidentally using the wrong type of settings, which could cause problems.

Exceptions

ArgumentException

Thrown when the provided options are not of type AdagradOptimizerOptions.

UpdateParameters(Vector<T>, Vector<T>)

Updates a vector of parameters using the Adagrad optimization algorithm.

public override Vector<T> UpdateParameters(Vector<T> parameters, Vector<T> gradient)

Parameters

parameters Vector<T>

The current parameter vector to be updated.

gradient Vector<T>

The gradient vector corresponding to the parameters.

Returns

Vector<T>

The updated parameter vector.

Remarks

This method implements the Adagrad update rule by accumulating squared gradients for each parameter and using them to adapt the learning rate individually. Parameters with larger accumulated gradients receive smaller learning rates, and vice versa.

For Beginners: Adagrad adjusts the learning rate for each parameter based on how much it has changed in the past. Parameters that have received many large updates get smaller future updates, while rarely-updated parameters get larger updates. This helps focus learning on less frequent features.

UpdateParametersGpu(IGpuBuffer, IGpuBuffer, int, IDirectGpuBackend)

Updates parameters on the GPU using the Adagrad kernel.

public override void UpdateParametersGpu(IGpuBuffer parameters, IGpuBuffer gradients, int parameterCount, IDirectGpuBackend backend)

Parameters

parameters IGpuBuffer
gradients IGpuBuffer
parameterCount int
backend IDirectGpuBackend

UpdateSolution(IFullModel<T, TInput, TOutput>, Vector<T>)

Updates the current solution using the Adagrad update rule.

protected override IFullModel<T, TInput, TOutput> UpdateSolution(IFullModel<T, TInput, TOutput> currentSolution, Vector<T> gradient)

Parameters

currentSolution IFullModel<T, TInput, TOutput>

The current solution model.

gradient Vector<T>

The calculated gradient.

Returns

IFullModel<T, TInput, TOutput>

A new solution model after applying the Adagrad update.

Remarks

This method applies the Adagrad update rule to each coefficient of the current solution. It uses the accumulated squared gradients to adapt the learning rate for each parameter.

For Beginners: This is like taking a step towards a better solution.

For each part of the solution:

  1. Calculate a custom learning rate based on past improvements
  2. Use this rate to decide how big a step to take
  3. Take the step by updating that part of the solution

This adaptive approach allows the optimizer to take larger steps for less frequently updated parts and smaller steps for more frequently updated parts.