Class GradientTape<T>
Records operations for automatic differentiation (autodiff).
public class GradientTape<T> : IDisposable
Type Parameters
TThe numeric type used for calculations.
- Inheritance
-
GradientTape<T>
- Implements
- Inherited Members
Remarks
GradientTape implements automatic differentiation using the tape-based approach popularized by TensorFlow. It records operations performed within its context and builds a computation graph. When Gradient() is called, it performs reverse-mode automatic differentiation (backpropagation) to compute gradients.
This implementation follows industry-standard patterns: - Opt-in recording via using statement (like TensorFlow's GradientTape) - Memory-efficient as tapes can be disposed after gradient computation - Supports watching specific tensors/variables - Thread-safe with ThreadStatic tape stack
For Beginners: This automatically tracks calculations so gradients can be computed.
Think of it as a recording device:
- You start recording by creating a GradientTape (use using statement)
- All mathematical operations within the scope are recorded
- When you're done, you can play it backwards to get gradients
- This is how neural networks learn - by computing gradients automatically
Example usage:
using (var tape = new GradientTape<double>())
{
tape.Watch(parameters);
var loss = ComputeLoss(parameters);
var gradients = tape.Gradient(loss, parameters);
// Use gradients to update parameters
}
Constructors
GradientTape(bool)
Initializes a new instance of the GradientTape<T> class.
public GradientTape(bool persistent = false)
Parameters
persistentboolWhether the tape should persist after first use.
Remarks
Creates a new gradient tape and pushes it onto the thread-local tape stack, making it the active tape for this thread. All operations performed within the scope of this tape will be recorded for automatic differentiation.
Graph caching is automatically enabled for persistent tapes to optimize performance when computing gradients multiple times.
For Beginners: This creates a new recording session.
When you create a tape:
- It starts recording all operations
- Use 'using' statement to ensure cleanup: using (var tape = new GradientTape<T>())
- Operations inside the using block are tracked
- When the block ends, recording stops and resources are cleaned up
Properties
Current
Gets the currently active tape for this thread, or null if no tape is active.
public static GradientTape<T>? Current { get; }
Property Value
- GradientTape<T>
The active GradientTape, or null if none exists.
IsRecording
Gets or sets a value indicating whether this tape is actively recording.
public bool IsRecording { get; }
Property Value
- bool
True if the tape is recording operations; false otherwise.
Persistent
Gets or sets a value indicating whether gradients should persist after first use.
public bool Persistent { get; set; }
Property Value
- bool
If false, the tape can only compute gradients once.
Remarks
By default (false), tapes are single-use for memory efficiency. Set to true if you need to compute gradients multiple times from the same tape.
For Beginners: Controls whether you can reuse this tape.
- False (default): Can only compute gradients once, then tape is used up (more efficient)
- True: Can compute gradients multiple times (uses more memory)
Most of the time, false is fine since you create a new tape for each training step.
Methods
Dispose()
Disposes the gradient tape, stopping recording and popping it from the tape stack.
public void Dispose()
Remarks
This method is automatically called when exiting a using block. It stops recording, pops the tape from the thread-local stack, and cleans up resources if the tape is not persistent.
For Beginners: This cleans up the tape when you're done.
When you use 'using' statement:
using (var tape = new GradientTape<T>())
{
// your code
} // Dispose is automatically called here
Dispose:
- Stops recording
- Removes the tape from the active stack
- Frees up memory if not persistent
Gradient(ComputationNode<T>, IEnumerable<ComputationNode<T>>?, bool)
Computes the gradient of a target node with respect to watched variables.
public Dictionary<ComputationNode<T>, Tensor<T>> Gradient(ComputationNode<T> target, IEnumerable<ComputationNode<T>>? sources = null, bool createGraph = false)
Parameters
targetComputationNode<T>The target node (typically the loss).
sourcesIEnumerable<ComputationNode<T>>The source nodes to compute gradients for (if null, uses all watched nodes).
createGraphboolIf true, the gradient computation itself will be recorded, enabling higher-order derivatives.
Returns
- Dictionary<ComputationNode<T>, Tensor<T>>
A dictionary mapping each source node to its gradient.
Remarks
This method performs reverse-mode automatic differentiation (backpropagation) to compute gradients. It builds the computation graph, performs topological sorting, and executes the backward pass.
After calling this method, the tape is marked as used. If Persistent is false, calling Gradient again will throw an exception.
**Higher-Order Gradients**: When createGraph=true, the gradient computation itself is recorded to an active tape. This enables computing gradients of gradients (second derivatives, Hessians, etc.).
For Beginners: This computes how the output changes with respect to inputs.
The process:
- You give it a target (like the loss you want to minimize)
- It computes how much each watched variable affects that target
- Returns gradients showing which direction to adjust each variable
These gradients are what you use to update neural network weights during training.
Higher-Order Derivatives (createGraph=true): Sometimes you need the gradient of a gradient (like computing curvature). When createGraph=true, the gradient calculation itself is tracked, so you can differentiate it again. This is used in:
- Second-order optimization methods
- Physics simulations
- Some adversarial training techniques
Example:
// First-order gradient (normal)
var gradients = tape.Gradient(loss, parameters);
// Second-order gradient (gradient of gradient)
using (var tape1 = new GradientTape<T>())
{
tape1.Watch(parameters);
using (var tape2 = new GradientTape<T>())
{
tape2.Watch(parameters);
var loss = ComputeLoss(parameters);
var firstGrad = tape2.Gradient(loss, parameters, createGraph: true);
}
var secondGrad = tape1.Gradient(firstGrad, parameters);
}
RecordOperation(ComputationNode<T>)
Records a computation node in the tape.
public void RecordOperation(ComputationNode<T> node)
Parameters
nodeComputationNode<T>The computation node to record.
Remarks
This method is called automatically by operations that support autodiff. It adds the node to the tape's operation list so it can be included in gradient computation.
For Beginners: This adds an operation to the recording.
Usually you don't call this directly - operations call it automatically. Each mathematical operation records itself on the active tape.
Reset()
Resets the tape, clearing all recorded operations and watched variables.
public void Reset()
Remarks
This method clears all state from the tape, allowing it to be reused. It's useful when you want to reuse a persistent tape for a new computation.
For Beginners: This clears the tape to start fresh.
After reset:
- All recorded operations are forgotten
- Watched variables are cleared
- The tape can be used for a new calculation
ResumeRecording()
Resumes recording operations on this tape.
public void ResumeRecording()
Remarks
This re-enables recording after it was stopped with StopRecording().
StopRecording()
Stops recording operations on this tape.
public void StopRecording()
Remarks
After calling this method, operations will no longer be recorded on this tape. This can be useful for inference or when you want to temporarily disable recording.
For Beginners: This pauses the recording.
Use this when:
- You want to do calculations without tracking them
- Running inference (not training)
- Computing metrics that don't need gradients
Watch(ComputationNode<T>)
Watches a computation node so its gradient will be computed.
public void Watch(ComputationNode<T> node)
Parameters
nodeComputationNode<T>The computation node to watch.
Remarks
Only watched nodes will have their gradients computed during backpropagation. Typically, you watch model parameters or other variables you want to optimize.
For Beginners: This marks a value to track gradients for.
Watch variables you want to:
- Train (like neural network weights)
- Optimize
- Compute gradients for
Think of it like saying "I care about how the output changes when THIS value changes."
Watch(IEnumerable<ComputationNode<T>>)
Watches multiple computation nodes.
public void Watch(IEnumerable<ComputationNode<T>> nodes)
Parameters
nodesIEnumerable<ComputationNode<T>>The computation nodes to watch.