Table of Contents

Class GradientTape<T>

Namespace
AiDotNet.Autodiff
Assembly
AiDotNet.dll

Records operations for automatic differentiation (autodiff).

public class GradientTape<T> : IDisposable

Type Parameters

T

The numeric type used for calculations.

Inheritance
GradientTape<T>
Implements
Inherited Members

Remarks

GradientTape implements automatic differentiation using the tape-based approach popularized by TensorFlow. It records operations performed within its context and builds a computation graph. When Gradient() is called, it performs reverse-mode automatic differentiation (backpropagation) to compute gradients.

This implementation follows industry-standard patterns: - Opt-in recording via using statement (like TensorFlow's GradientTape) - Memory-efficient as tapes can be disposed after gradient computation - Supports watching specific tensors/variables - Thread-safe with ThreadStatic tape stack

For Beginners: This automatically tracks calculations so gradients can be computed.

Think of it as a recording device:

  • You start recording by creating a GradientTape (use using statement)
  • All mathematical operations within the scope are recorded
  • When you're done, you can play it backwards to get gradients
  • This is how neural networks learn - by computing gradients automatically

Example usage:

using (var tape = new GradientTape<double>())
{
    tape.Watch(parameters);
    var loss = ComputeLoss(parameters);
    var gradients = tape.Gradient(loss, parameters);
    // Use gradients to update parameters
}

Constructors

GradientTape(bool)

Initializes a new instance of the GradientTape<T> class.

public GradientTape(bool persistent = false)

Parameters

persistent bool

Whether the tape should persist after first use.

Remarks

Creates a new gradient tape and pushes it onto the thread-local tape stack, making it the active tape for this thread. All operations performed within the scope of this tape will be recorded for automatic differentiation.

Graph caching is automatically enabled for persistent tapes to optimize performance when computing gradients multiple times.

For Beginners: This creates a new recording session.

When you create a tape:

  • It starts recording all operations
  • Use 'using' statement to ensure cleanup: using (var tape = new GradientTape<T>())
  • Operations inside the using block are tracked
  • When the block ends, recording stops and resources are cleaned up

Properties

Current

Gets the currently active tape for this thread, or null if no tape is active.

public static GradientTape<T>? Current { get; }

Property Value

GradientTape<T>

The active GradientTape, or null if none exists.

IsRecording

Gets or sets a value indicating whether this tape is actively recording.

public bool IsRecording { get; }

Property Value

bool

True if the tape is recording operations; false otherwise.

Persistent

Gets or sets a value indicating whether gradients should persist after first use.

public bool Persistent { get; set; }

Property Value

bool

If false, the tape can only compute gradients once.

Remarks

By default (false), tapes are single-use for memory efficiency. Set to true if you need to compute gradients multiple times from the same tape.

For Beginners: Controls whether you can reuse this tape.

  • False (default): Can only compute gradients once, then tape is used up (more efficient)
  • True: Can compute gradients multiple times (uses more memory)

Most of the time, false is fine since you create a new tape for each training step.

Methods

Dispose()

Disposes the gradient tape, stopping recording and popping it from the tape stack.

public void Dispose()

Remarks

This method is automatically called when exiting a using block. It stops recording, pops the tape from the thread-local stack, and cleans up resources if the tape is not persistent.

For Beginners: This cleans up the tape when you're done.

When you use 'using' statement:

using (var tape = new GradientTape<T>())
{
    // your code
} // Dispose is automatically called here

Dispose:

  • Stops recording
  • Removes the tape from the active stack
  • Frees up memory if not persistent

Gradient(ComputationNode<T>, IEnumerable<ComputationNode<T>>?, bool)

Computes the gradient of a target node with respect to watched variables.

public Dictionary<ComputationNode<T>, Tensor<T>> Gradient(ComputationNode<T> target, IEnumerable<ComputationNode<T>>? sources = null, bool createGraph = false)

Parameters

target ComputationNode<T>

The target node (typically the loss).

sources IEnumerable<ComputationNode<T>>

The source nodes to compute gradients for (if null, uses all watched nodes).

createGraph bool

If true, the gradient computation itself will be recorded, enabling higher-order derivatives.

Returns

Dictionary<ComputationNode<T>, Tensor<T>>

A dictionary mapping each source node to its gradient.

Remarks

This method performs reverse-mode automatic differentiation (backpropagation) to compute gradients. It builds the computation graph, performs topological sorting, and executes the backward pass.

After calling this method, the tape is marked as used. If Persistent is false, calling Gradient again will throw an exception.

**Higher-Order Gradients**: When createGraph=true, the gradient computation itself is recorded to an active tape. This enables computing gradients of gradients (second derivatives, Hessians, etc.).

For Beginners: This computes how the output changes with respect to inputs.

The process:

  1. You give it a target (like the loss you want to minimize)
  2. It computes how much each watched variable affects that target
  3. Returns gradients showing which direction to adjust each variable

These gradients are what you use to update neural network weights during training.

Higher-Order Derivatives (createGraph=true): Sometimes you need the gradient of a gradient (like computing curvature). When createGraph=true, the gradient calculation itself is tracked, so you can differentiate it again. This is used in:

  • Second-order optimization methods
  • Physics simulations
  • Some adversarial training techniques

Example:

// First-order gradient (normal)
var gradients = tape.Gradient(loss, parameters);

// Second-order gradient (gradient of gradient)
using (var tape1 = new GradientTape<T>())
{
    tape1.Watch(parameters);
    using (var tape2 = new GradientTape<T>())
    {
        tape2.Watch(parameters);
        var loss = ComputeLoss(parameters);
        var firstGrad = tape2.Gradient(loss, parameters, createGraph: true);
    }
    var secondGrad = tape1.Gradient(firstGrad, parameters);
}

RecordOperation(ComputationNode<T>)

Records a computation node in the tape.

public void RecordOperation(ComputationNode<T> node)

Parameters

node ComputationNode<T>

The computation node to record.

Remarks

This method is called automatically by operations that support autodiff. It adds the node to the tape's operation list so it can be included in gradient computation.

For Beginners: This adds an operation to the recording.

Usually you don't call this directly - operations call it automatically. Each mathematical operation records itself on the active tape.

Reset()

Resets the tape, clearing all recorded operations and watched variables.

public void Reset()

Remarks

This method clears all state from the tape, allowing it to be reused. It's useful when you want to reuse a persistent tape for a new computation.

For Beginners: This clears the tape to start fresh.

After reset:

  • All recorded operations are forgotten
  • Watched variables are cleared
  • The tape can be used for a new calculation

ResumeRecording()

Resumes recording operations on this tape.

public void ResumeRecording()

Remarks

This re-enables recording after it was stopped with StopRecording().

StopRecording()

Stops recording operations on this tape.

public void StopRecording()

Remarks

After calling this method, operations will no longer be recorded on this tape. This can be useful for inference or when you want to temporarily disable recording.

For Beginners: This pauses the recording.

Use this when:

  • You want to do calculations without tracking them
  • Running inference (not training)
  • Computing metrics that don't need gradients

Watch(ComputationNode<T>)

Watches a computation node so its gradient will be computed.

public void Watch(ComputationNode<T> node)

Parameters

node ComputationNode<T>

The computation node to watch.

Remarks

Only watched nodes will have their gradients computed during backpropagation. Typically, you watch model parameters or other variables you want to optimize.

For Beginners: This marks a value to track gradients for.

Watch variables you want to:

  • Train (like neural network weights)
  • Optimize
  • Compute gradients for

Think of it like saying "I care about how the output changes when THIS value changes."

Watch(IEnumerable<ComputationNode<T>>)

Watches multiple computation nodes.

public void Watch(IEnumerable<ComputationNode<T>> nodes)

Parameters

nodes IEnumerable<ComputationNode<T>>

The computation nodes to watch.