Table of Contents

Class ModifiedGradientDescentOptimizer<T>

Namespace
AiDotNet.Optimizers
Assembly
AiDotNet.dll

Modified Gradient Descent optimizer for Hope architecture. Based on Equations 27-29 from "Nested Learning" paper.

Traditional GD: W_{t+1} = W_t - η * ∇L(W_t; x_t) ⊗ x_t Modified GD: W_{t+1} = W_t * (I - x_t*x_t^T) - η * ∇L(W_t; x_t) ⊗ x_t

This formulation uses L2 regression objective instead of dot-product similarity, resulting in better handling of data dependencies in token space.

public class ModifiedGradientDescentOptimizer<T>

Type Parameters

T

The numeric type

Inheritance
ModifiedGradientDescentOptimizer<T>
Inherited Members

Constructors

ModifiedGradientDescentOptimizer(T, IEngine?)

Creates a modified gradient descent optimizer.

public ModifiedGradientDescentOptimizer(T learningRate, IEngine? engine = null)

Parameters

learningRate T

Learning rate η

engine IEngine

The computation engine (CPU or GPU) for vectorized operations.

Properties

LearningRate

Gets the learning rate.

public T LearningRate { get; }

Property Value

T

Methods

UpdateMatrix(Matrix<T>, Vector<T>, Vector<T>)

Updates parameters using modified gradient descent (Equations 27-29).

min_W ||W*x_t - ∇_y L(W_t; x_t)||²

Results in: W_{t+1} = W_t * (I - x_t*x_t^T) - η * ∇_y L(W_t; x_t) ⊗ x_t

public Matrix<T> UpdateMatrix(Matrix<T> currentParameters, Vector<T> input, Vector<T> outputGradient)

Parameters

currentParameters Matrix<T>

Current parameter matrix W_t

input Vector<T>

Input vector x_t

outputGradient Vector<T>

Gradient ∇_y L(W_t; x_t)

Returns

Matrix<T>

Updated parameters W_{t+1}

UpdateVector(Vector<T>, Vector<T>, Vector<T>)

Updates a parameter vector using modified gradient descent.

For a vector parameter w, the matrix operation W * (I - x x^T) becomes: w_new = w * (I - x x^T) = w - x*(x^Tw) = w - xdot(w,x)

Full update: w_{t+1} = w_t - x_t*dot(w_t,x_t) - η * gradient

public Vector<T> UpdateVector(Vector<T> currentParameters, Vector<T> input, Vector<T> outputGradient)

Parameters

currentParameters Vector<T>

Current parameter vector w_t

input Vector<T>

Input vector x_t

outputGradient Vector<T>

Output gradient ∇_y L(w_t; x_t)

Returns

Vector<T>

Updated parameters w_{t+1}