Class ModifiedGradientDescentOptimizer<T>
- Namespace
- AiDotNet.Optimizers
- Assembly
- AiDotNet.dll
Modified Gradient Descent optimizer for Hope architecture. Based on Equations 27-29 from "Nested Learning" paper.
Traditional GD: W_{t+1} = W_t - η * ∇L(W_t; x_t) ⊗ x_t Modified GD: W_{t+1} = W_t * (I - x_t*x_t^T) - η * ∇L(W_t; x_t) ⊗ x_t
This formulation uses L2 regression objective instead of dot-product similarity, resulting in better handling of data dependencies in token space.
public class ModifiedGradientDescentOptimizer<T>
Type Parameters
TThe numeric type
- Inheritance
-
ModifiedGradientDescentOptimizer<T>
- Inherited Members
Constructors
ModifiedGradientDescentOptimizer(T, IEngine?)
Creates a modified gradient descent optimizer.
public ModifiedGradientDescentOptimizer(T learningRate, IEngine? engine = null)
Parameters
learningRateTLearning rate η
engineIEngineThe computation engine (CPU or GPU) for vectorized operations.
Properties
LearningRate
Gets the learning rate.
public T LearningRate { get; }
Property Value
- T
Methods
UpdateMatrix(Matrix<T>, Vector<T>, Vector<T>)
Updates parameters using modified gradient descent (Equations 27-29).
min_W ||W*x_t - ∇_y L(W_t; x_t)||²
Results in: W_{t+1} = W_t * (I - x_t*x_t^T) - η * ∇_y L(W_t; x_t) ⊗ x_t
public Matrix<T> UpdateMatrix(Matrix<T> currentParameters, Vector<T> input, Vector<T> outputGradient)
Parameters
currentParametersMatrix<T>Current parameter matrix W_t
inputVector<T>Input vector x_t
outputGradientVector<T>Gradient ∇_y L(W_t; x_t)
Returns
- Matrix<T>
Updated parameters W_{t+1}
UpdateVector(Vector<T>, Vector<T>, Vector<T>)
Updates a parameter vector using modified gradient descent.
For a vector parameter w, the matrix operation W * (I - x x^T) becomes: w_new = w * (I - x x^T) = w - x*(x^Tw) = w - xdot(w,x)
Full update: w_{t+1} = w_t - x_t*dot(w_t,x_t) - η * gradient
public Vector<T> UpdateVector(Vector<T> currentParameters, Vector<T> input, Vector<T> outputGradient)
Parameters
currentParametersVector<T>Current parameter vector w_t
inputVector<T>Input vector x_t
outputGradientVector<T>Output gradient ∇_y L(w_t; x_t)
Returns
- Vector<T>
Updated parameters w_{t+1}