Class NeuralNetworkBase<T>

Namespace: AiDotNet.NeuralNetworks

Assembly: AiDotNet.dll

Base class for all neural network implementations in AiDotNet.

public abstract class NeuralNetworkBase<T> : INeuralNetworkModel<T>, INeuralNetwork<T>, IFullModel<T, Tensor<T>, Tensor<T>>, IModel<Tensor<T>, Tensor<T>, ModelMetadata<T>>, IModelSerializer, ICheckpointableModel, IParameterizable<T, Tensor<T>, Tensor<T>>, IFeatureAware, IFeatureImportance<T>, ICloneable<IFullModel<T, Tensor<T>, Tensor<T>>>, IGradientComputable<T, Tensor<T>, Tensor<T>>, IJitCompilable<T>, IInterpretableModel<T>, IInputGradientComputable<T>, IDisposable

Type Parameters

T: The numeric type used for calculations (e.g., float, double).

Inheritance: object

NeuralNetworkBase<T>

Implements: INeuralNetworkModel<T>

INeuralNetwork<T>

IFullModel<T, Tensor<T>, Tensor<T>>

IModel<Tensor<T>, Tensor<T>, ModelMetadata<T>>

IModelSerializer

ICheckpointableModel

IParameterizable<T, Tensor<T>, Tensor<T>>

IFeatureAware

IFeatureImportance<T>

ICloneable<IFullModel<T, Tensor<T>, Tensor<T>>>

IGradientComputable<T, Tensor<T>, Tensor<T>>

IJitCompilable<T>

IInterpretableModel<T>

IInputGradientComputable<T>

IDisposable

Derived: AudioNeuralNetworkBase<T>

DocumentNeuralNetworkBase<T>

ACGAN<T>

AttentionNetwork<T>

AudioVisualCorrespondenceNetwork<T>

AudioVisualEventLocalizationNetwork<T>

Autoencoder<T>

BigGAN<T>

Blip2NeuralNetwork<T>

BlipNeuralNetwork<T>

CapsuleNetwork<T>

ClipNeuralNetwork<T>

ConvolutionalNeuralNetwork<T>

CycleGAN<T>

DeepBeliefNetwork<T>

DeepBoltzmannMachine<T>

DeepQNetwork<T>

DenseNetNetwork<T>

DifferentiableNeuralComputer<T>

EchoStateNetwork<T>

EfficientNetNetwork<T>

ExtremeLearningMachine<T>

FastText<T>

FeedForwardNeuralNetwork<T>

FlamingoNeuralNetwork<T>

GRUNeuralNetwork<T>

GenerativeAdversarialNetwork<T>

GloVe<T>

Gpt4VisionNeuralNetwork<T>

GraphAttentionNetwork<T>

GraphGenerationModel<T>

GraphIsomorphismNetwork<T>

GraphNeuralNetwork<T>

GraphSAGENetwork<T>

HTMNetwork<T>

HopeNetwork<T>

HopfieldNetwork<T>

HyperbolicNeuralNetwork<T>

ImageBindNeuralNetwork<T>

InfoGAN<T>

LLaVANeuralNetwork<T>

LSTMNeuralNetwork<T>

LiquidStateMachine<T>

MemoryNetwork<T>

MeshCNN<T>

MixtureOfExpertsNeuralNetwork<T>

MobileNetV2Network<T>

MobileNetV3Network<T>

NEAT<T>

NeuralNetwork<T>

NeuralTuringMachine<T>

OccupancyNeuralNetwork<T>

OctonionNeuralNetwork<T>

Pix2Pix<T>

ProgressiveGAN<T>

QuantumNeuralNetwork<T>

RadialBasisFunctionNetwork<T>

RecurrentNeuralNetwork<T>

ResNetNetwork<T>

ResidualNeuralNetwork<T>

RestrictedBoltzmannMachine<T>

SAGAN<T>

SelfOrganizingMap<T>

SiameseNetwork<T>

SiameseNeuralNetwork<T>

SparseNeuralNetwork<T>

SpikingNeuralNetwork<T>

SpiralNet<T>

StyleGAN<T>

GraphClassificationModel<T>

LinkPredictionModel<T>

NodeClassificationModel<T>

TransformerEmbeddingNetwork<T>

Transformer<T>

UNet3D<T>

UnifiedMultimodalNetwork<T>

VGGNetwork<T>

VariationalAutoencoder<T>

VideoCLIPNeuralNetwork<T>

VisionTransformer<T>

VoxelCNN<T>

WGANGP<T>

WGAN<T>

Word2Vec<T>

GaussianSplatting<T>

InstantNGP<T>

NeRF<T>

DeepOperatorNetwork<T>

FourierNeuralOperator<T>

GraphNeuralOperator<T>

DeepRitzMethod<T>

InverseProblemPINN<T>

MultiScalePINN<T>

PhysicsInformedNeuralNetwork<T>

VariationalPINN<T>

HamiltonianNeuralNetwork<T>

LagrangianNeuralNetwork<T>

UniversalDifferentialEquation<T>

DGCNN<T>

PointNetPlusPlus<T>

PointNet<T>

CodeModelBase<T>

NeuralProgramSynthesizer<T>

SlowFast<T>

TimeSformer<T>

VideoMAE<T>

FastDVDNet<T>

DepthAnythingV2<T>

MiDaS<T>

BasicVSRPlusPlus<T>

EDVR<T>

FILM<T>

FLAVR<T>

RIFE<T>

AnimateDiff<T>

CogVideo<T>

OpenSora<T>

StableVideoDiffusion<T>

E2FGVI<T>

ProPainter<T>

RVM<T>

FlowFormer<T>

GMFlow<T>

RAFT<T>

RealESRGAN<T>

VRT<T>

Cutie<T>

SAM2<T>

XMem<T>

DIFRINT<T>

ByteTrack<T>

InternVideo2<T>

VideoCLIP<T>

Inherited Members: object.Equals(object)

object.Equals(object, object)

object.GetHashCode()

object.GetType()

object.MemberwiseClone()

object.ReferenceEquals(object, object)

object.ToString()

Extension Methods: DistributedExtensions.AsDistributedForHighBandwidth<T, TInput, TOutput>(IFullModel<T, TInput, TOutput>, ICommunicationBackend<T>)

DistributedExtensions.AsDistributedForLowBandwidth<T, TInput, TOutput>(IFullModel<T, TInput, TOutput>, ICommunicationBackend<T>)

DistributedExtensions.AsDistributed<T, TInput, TOutput>(IFullModel<T, TInput, TOutput>, ICommunicationBackend<T>)

DistributedExtensions.AsDistributed<T, TInput, TOutput>(IFullModel<T, TInput, TOutput>, IShardingConfiguration<T>)

Remarks

For Beginners: A neural network is a computing system inspired by the human brain. It consists of interconnected "layers" of artificial neurons that process information and learn patterns from data. This class provides the foundation for building different types of neural networks.

Constructors

NeuralNetworkBase(NeuralNetworkArchitecture<T>, ILossFunction<T>, double)

Creates a new neural network with the specified architecture.

protected NeuralNetworkBase(NeuralNetworkArchitecture<T> architecture, ILossFunction<T> lossFunction, double maxGradNorm = 1)

Parameters

architecture NeuralNetworkArchitecture<T>: The architecture defining the structure of the network.
lossFunction ILossFunction<T>
maxGradNorm double

Fields

Architecture

The architecture definition for this neural network.

public readonly NeuralNetworkArchitecture<T> Architecture

Field Value

NeuralNetworkArchitecture<T>

Remarks

For Beginners: The architecture defines the structure of your neural network - how many layers it has, how many neurons are in each layer, and how they're connected. Think of it as the blueprint for your network.

LastLoss

The last calculated loss value during training.

protected T? LastLoss

Field Value

T

Remarks

For Beginners: The loss value tells you how well your neural network is performing. A lower loss means better performance. This field stores the most recent loss value calculated during training, which you can use to track progress.

LossFunction

The loss function used to calculate error during training.

protected ILossFunction<T> LossFunction

Field Value

ILossFunction<T>

Remarks

For Beginners: The loss function measures how wrong the network's predictions are. Different types of problems need different loss functions: - Classification problems often use Cross Entropy Loss - Regression problems often use Mean Squared Error - Ranking problems might use Hinge Loss

This is like having different ways to score different games - you wouldn't use the same scoring system for basketball and golf.

MaxGradNorm

The maximum allowed norm for gradients during training.

protected T MaxGradNorm

Field Value

T

NumOps

Mathematical operations for the numeric type T.

protected readonly INumericOperations<T> NumOps

Field Value

INumericOperations<T>

_baseModel

Base model instance for interpretability delegation.

protected IFullModel<T, Tensor<T>, Tensor<T>>? _baseModel

Field Value

IFullModel<T, Tensor<T>, Tensor<T>>

Remarks

Typed as IFullModel<T, TInput, TOutput> to maintain type safety while supporting the interpretability infrastructure. This field stores models that implement the full model interface, which includes training, prediction, serialization, and parameterization capabilities.

_enabledMethods

Set of interpretation methods that are enabled for this neural network model. Controls which interpretability features (SHAP, LIME, etc.) are available.

protected readonly HashSet<InterpretationMethod> _enabledMethods

Field Value

HashSet<InterpretationMethod>

_fairnessMetrics

List of fairness metrics to evaluate for this model.

protected readonly List<FairnessMetric> _fairnessMetrics

Field Value

List<FairnessMetric>

_layerInputs

Stores the input values for each layer during forward pass.

protected Dictionary<int, Tensor<T>> _layerInputs

Field Value

Dictionary<int, Tensor<T>>

Remarks

For Beginners: When data flows through the network, we need to remember what values went into each layer. This is necessary for the learning process (backpropagation).

_layerOutputs

Stores the output values from each layer during forward pass.

protected Dictionary<int, Tensor<T>> _layerOutputs

Field Value

Dictionary<int, Tensor<T>>

Remarks

For Beginners: Similar to layer inputs, we also need to remember what values came out of each layer during the learning process.

_memoryManager

Memory manager for gradient checkpointing and activation pooling.

protected TrainingMemoryManager<T>? _memoryManager

Field Value

TrainingMemoryManager<T>

Remarks

For Beginners: The memory manager helps train larger models by reducing memory usage. It implements gradient checkpointing (trading compute for memory) and activation pooling (reusing tensor allocations to reduce garbage collection).

_mixedPrecisionContext

Mixed-precision training context (null if mixed-precision is disabled).

protected MixedPrecisionContext? _mixedPrecisionContext

Field Value

MixedPrecisionContext

Remarks

For Beginners: Mixed-precision training uses both 16-bit (FP16) and 32-bit (FP32) floating-point numbers to speed up training while maintaining accuracy. When enabled, this context manages the conversion between different precisions and handles loss scaling to prevent numerical issues.

_sensitiveFeatures

Indices of features considered sensitive for fairness analysis.

protected Vector<int> _sensitiveFeatures

Field Value

Vector<int>

Properties

CanTrainOnGpu

Gets whether GPU-resident training can be used right now.

public virtual bool CanTrainOnGpu { get; }

Property Value

bool

Remarks

This combines layer support with GPU engine availability.

For Beginners: Check this before calling TrainBatchGpu(). If false, use the standard TrainBatch() method instead.

DefaultLossFunction

Gets the default loss function for this network.

public virtual ILossFunction<T> DefaultLossFunction { get; }

Property Value

ILossFunction<T>

Remarks

For Beginners: A loss function measures how wrong the network's predictions are. This is used during training to guide learning.

Engine

Gets the global execution engine for vector operations.

protected IEngine Engine { get; }

Property Value

IEngine

GpuEngine

Gets the GPU tensor engine when available, or null if not using GPU.

protected DirectGpuTensorEngine? GpuEngine { get; }

Property Value

DirectGpuTensorEngine

IsGradientCheckpointingEnabled

Gets whether gradient checkpointing is enabled.

public bool IsGradientCheckpointingEnabled { get; }

Property Value

bool

IsMemoryManagementEnabled

Gets whether memory management (gradient checkpointing/pooling) is enabled.

public bool IsMemoryManagementEnabled { get; }

Property Value

bool

IsMixedPrecisionEnabled

Gets whether mixed-precision training is enabled.

public bool IsMixedPrecisionEnabled { get; }

Property Value

bool

Remarks

For Beginners: This property tells you if the network is using mixed-precision training. Mixed-precision can provide 2-3x faster training on modern GPUs with Tensor Cores.

IsTrainingMode

Indicates whether the network is currently in training mode.

public bool IsTrainingMode { get; }

Property Value

bool

Remarks

For Beginners: Neural networks behave differently during training versus when they're making predictions. In training mode, the network keeps track of additional information needed for learning.

LayerCount

Gets the number of layers in this neural network.

public int LayerCount { get; }

Property Value

int

Remarks

For Beginners: This tells you how many processing stages (layers) your network has. More layers generally means the network can learn more complex patterns.

Layers

Gets the collection of layers that make up this neural network (read-only access).

public List<ILayer<T>> Layers { get; }

Property Value

List<ILayer<T>>

Remarks

For Beginners: Layers are the building blocks of neural networks. Each layer contains neurons that process information and pass it to the next layer. A typical network has an input layer (receives data), hidden layers (process data), and an output layer (produces results).

Important: Do not directly modify this collection (e.g., Layers.Add()). Use AddLayerToCollection() or RemoveLayerFromCollection() instead to ensure proper cache invalidation.

ParameterCount

Gets the total number of parameters in the model.

public virtual int ParameterCount { get; }

Property Value

int

Remarks

For Beginners: This tells you how many adjustable values (weights and biases) your neural network has. More complex networks typically have more parameters and can learn more complex patterns, but also require more data to train effectively. This is part of the IFullModel interface for consistency with other model types.

Performance: This property uses caching to avoid recomputing the sum on every access. The cache is invalidated when layers are modified.

Random

Gets the thread-safe random number generator for initialization.

protected static Random Random { get; }

Property Value

Random

Remarks

Uses the centralized RandomHelper which is thread-safe and avoids creating multiple instances per thread.

SupportsGpuTraining

Gets whether all layers in the network support GPU-resident training.

public virtual bool SupportsGpuTraining { get; }

Property Value

bool

Remarks

GPU-resident training keeps all data on GPU during the entire training loop: - Forward pass runs on GPU - Loss computation on GPU - Backward pass on GPU - Parameter updates on GPU

For Beginners: When this returns true, training can be much faster because data doesn't need to be copied back and forth between CPU and GPU each step.

SupportsJitCompilation

Gets whether this model currently supports JIT compilation.

public virtual bool SupportsJitCompilation { get; }

Property Value

bool: True if the model can be JIT compiled, false otherwise.

Remarks

Some models may not support JIT compilation due to: - Dynamic graph structure (changes based on input) - Lack of computation graph representation - Use of operations not yet supported by the JIT compiler

For Beginners: This tells you whether this specific model can benefit from JIT compilation.

Models return false if they:

Use layer-based architecture without graph export (e.g., current neural networks)
Have control flow that changes based on input data
Use operations the JIT compiler doesn't understand yet

In these cases, the model will still work normally, just without JIT acceleration.

SupportsTraining

Indicates whether this network supports training (learning from data).

public virtual bool SupportsTraining { get; }

Property Value

bool

Remarks

For Beginners: Not all neural networks can learn. Some are designed only for making predictions with pre-set parameters. This property tells you if the network can learn from data.

Methods

AddBatchNormalizationLayer(int, double, double)

Adds a batch normalization layer to the neural network.

public virtual void AddBatchNormalizationLayer(int featureSize, double epsilon = 1E-05, double momentum = 0.9)

Parameters

featureSize int: The number of features to normalize.
epsilon double: A small constant for numerical stability (default: 1e-5).
momentum double: The momentum for running statistics (default: 0.9).

AddConvolutionalLayer(int, int, int, ActivationFunction)

Adds a convolutional layer to the neural network.

public virtual void AddConvolutionalLayer(int filters, int kernelSize, int stride, ActivationFunction activation)

Parameters

filters int
kernelSize int
stride int
activation ActivationFunction

AddDropoutLayer(double)

Adds a dropout layer to the neural network.

public virtual void AddDropoutLayer(double dropoutRate)

Parameters

dropoutRate double

AddLSTMLayer(int, bool)

Adds an LSTM layer to the neural network.

public virtual void AddLSTMLayer(int units, bool returnSequences = false)

Parameters

units int
returnSequences bool

AddLayer(LayerType, int, ActivationFunction)

Adds a layer to the neural network.

public virtual void AddLayer(LayerType layerType, int units, ActivationFunction activation)

Parameters

layerType LayerType: The type of layer to add.
units int: The number of units/neurons in the layer.
activation ActivationFunction: The activation function to use.

AddLayerToCollection(ILayer<T>)

Adds a layer to the internal layers collection and invalidates the parameter count cache.

protected void AddLayerToCollection(ILayer<T> layer)

Parameters

layer ILayer<T>: The layer to add

Remarks

This method ensures that the parameter count cache is properly invalidated when layers are added. Derived classes should use this method instead of directly accessing Layers.Add().

AddPoolingLayer(int[], PoolingType, int, int?)

Adds a pooling layer to the neural network.

public virtual void AddPoolingLayer(int[] inputShape, PoolingType poolingType, int poolSize, int? strides = null)

Parameters

inputShape int[]: The input shape (channels, height, width).
poolingType PoolingType: The type of pooling operation.
poolSize int: The size of the pooling window.
strides int?: The step size when moving the pooling window (default: same as poolSize).

ApplyGradients(Vector<T>, T)

Applies a flattened gradient vector to update the network's parameters.

public virtual void ApplyGradients(Vector<T> gradients, T learningRate)

Parameters

gradients Vector<T>: The concatenated gradients for all parameters.
learningRate T: The learning rate to scale updates.

Remarks

This method slices the provided gradient vector per layer, updates each layer's parameters, and writes them back.

For Beginners: The learning rate controls how big each update step is. Smaller values are safer but slower.

AreLayersCompatible(ILayer<T>, ILayer<T>)

Checks if two consecutive layers can be connected in a neural network.

protected virtual bool AreLayersCompatible(ILayer<T> prevLayer, ILayer<T> currentLayer)

Parameters

prevLayer ILayer<T>: The preceding layer.
currentLayer ILayer<T>: The current layer to check compatibility with.

Returns

bool: True if the layers can be connected; otherwise, false.

Remarks

For Beginners: Neural networks work by connecting layers in sequence. For two layers to connect properly, the output of one layer must match what the next layer expects as input. This is like making sure puzzle pieces fit together. This method checks if two layers can be properly connected.

For example, if a layer outputs 100 values, the next layer should expect 100 values as input. Some layer combinations also have special rules - like needing a "Flatten" layer between image processing layers and regular dense layers.

Backpropagate(Tensor<T>)

Performs backpropagation to compute gradients for network parameters.

public virtual Tensor<T> Backpropagate(Tensor<T> outputGradients)

Parameters

outputGradients Tensor<T>: The gradients of the loss with respect to the network outputs.

Returns

Tensor<T>: The gradients of the loss with respect to the network inputs.

Remarks

For Beginners: Backpropagation is how neural networks learn. After making a prediction, the network calculates how wrong it was (the error). Then it works backward through the layers to figure out how each parameter contributed to that error. This method handles that backward flow of information.

The "gradients" are numbers that tell us how to adjust each parameter to reduce the error.

API Change Note: The signature changed from Vector<T> to Tensor<T> to support multi-dimensional gradients. This is a breaking change. If you need backward compatibility, consider adding an overload that accepts Vector<T> and converts it internally to Tensor<T>.

Exceptions

InvalidOperationException: Thrown when the network is not in training mode or doesn't support training.

BackpropagateGpu(IGpuTensor<T>)

Performs backpropagation through all layers entirely on GPU.

public virtual IGpuTensor<T> BackpropagateGpu(IGpuTensor<T> outputGradients)

Parameters

outputGradients IGpuTensor<T>: The GPU-resident gradient of loss with respect to network output.

Returns

IGpuTensor<T>: The GPU-resident gradient with respect to network input.

Remarks

This method backpropagates through all layers on GPU: - Each layer computes input gradients and stores weight gradients on GPU - No data is transferred to CPU during backpropagation - After calling this, call UpdateParametersGpu() to apply the gradients

For Beginners: Like Backpropagate() but everything stays on GPU. The weight gradients are computed and stored on GPU, ready for the update step.

Exceptions

InvalidOperationException: Thrown when the network doesn't support GPU training.

BackpropagateGpuDeferred(IGpuTensor<T>, GpuExecutionOptions?)

Performs backpropagation through all layers with deferred GPU execution.

public virtual IGpuTensor<T> BackpropagateGpuDeferred(IGpuTensor<T> outputGradients, GpuExecutionOptions? options = null)

Parameters

outputGradients IGpuTensor<T>: The GPU-resident gradient of loss with respect to network output.
options GpuExecutionOptions: Optional GPU execution options.

Returns

IGpuTensor<T>: The GPU-resident gradient with respect to network input.

Remarks

Uses deferred execution to batch all backward pass operations into a single GPU command buffer. This reduces CPU-GPU synchronization overhead and improves performance.

BackpropagateWithRecompute(Tensor<T>)

Performs backpropagation with activation recomputation for non-checkpointed layers.

protected virtual Tensor<T> BackpropagateWithRecompute(Tensor<T> outputGradients)

Parameters

outputGradients Tensor<T>: Gradients from the loss function.

Returns

Tensor<T>: Gradients with respect to input.

Remarks

For Beginners: When gradient checkpointing is enabled, we don't store all layer activations during forward pass (to save memory). During backprop, we need those activations, so we recompute them from the nearest checkpoint.

BackwardWithInputGradient(Tensor<T>)

Computes gradients of the network output with respect to the network input using backpropagation.

public virtual Tensor<T> BackwardWithInputGradient(Tensor<T> outputGradient)

Parameters

outputGradient Tensor<T>: The gradient signal from the output (typically all ones for gradient computation).

Returns

Tensor<T>: A tensor containing the gradients with respect to the network input.

Remarks

This method performs a backward pass through all layers to compute how the output changes with respect to the input. Unlike the standard backward pass which computes gradients for parameters, this method computes gradients for the input itself.

This is essential for techniques like: - Gradient-based input optimization - Saliency maps and input attribution - WGAN-GP gradient penalty computation - Adversarial example generation

For Beginners: This calculates how sensitive the output is to changes in the input.

Normally, backpropagation adjusts the network's internal parameters (weights and biases). This method instead computes how the output would change if we modified the input data.

Use cases:

Understanding which input features matter most (interpretability)
Generating adversarial examples (security research)
Computing gradient penalties for training stability (WGAN-GP)

The process:

Assumes a forward pass has already been run (outputs are cached)
Starts with a gradient signal at the output (how much we "care" about each output)
Propagates this gradient backwards through each layer
Returns the gradient with respect to the original input

BeginGpuExecution(GpuExecutionOptions?)

Begins a GPU execution context for managing GPU-resident tensor lifecycle.

public virtual GpuExecutionContext BeginGpuExecution(GpuExecutionOptions? options = null)

Parameters

options GpuExecutionOptions: Optional GPU execution options.

Returns

GpuExecutionContext: A GPU execution context that should be disposed when done.

Remarks

For Beginners: This creates a scope for GPU operations where tensors stay on the GPU and are only downloaded when explicitly needed. This avoids redundant CPU-GPU transfers during batch inference or training.

// Example: Batch inference with GPU context
using (var ctx = network.BeginGpuExecution())
{
    foreach (var batch in batches)
    {
        var result = network.ForwardWithGpuContext(batch);
        // Results are GPU-resident until ToTensor() is called
        predictions.Add(result.ToTensor());
    }
} // All GPU tensors are cleaned up here

Exceptions

InvalidOperationException: Thrown when no GPU backend is available.

CanUseGpuResidentPath()

Checks if all layers in the network support GPU execution. Used to determine if the GPU-resident optimization path can be used.

protected virtual bool CanUseGpuResidentPath()

Returns

bool: True if all layers can execute on GPU; false otherwise.

Remarks

For Beginners: This method checks if every layer in your network can run on the GPU. If even one layer needs the CPU, we can't use the fast GPU-only path.

ClearLayers()

Clears all layers from the internal layers collection and invalidates the parameter count cache.

protected void ClearLayers()

Remarks

This method ensures that the parameter count cache is properly invalidated when layers are cleared. Derived classes should use this method instead of directly accessing Layers.Clear().

ClipGradient(Tensor<T>)

Clips the gradient tensor if its norm exceeds the maximum allowed gradient norm.

protected Tensor<T> ClipGradient(Tensor<T> gradient)

Parameters

gradient Tensor<T>: The gradient tensor to be clipped.

Returns

Tensor<T>: The clipped gradient tensor.

Remarks

This method is a convenience wrapper that clips a gradient tensor using the default MaxGradNorm.

For Beginners: This is a safety mechanism to prevent the "exploding gradient" problem. It ensures gradients don't become too large during training, which helps keep the learning process stable.

ClipGradient(Vector<T>)

Clips the gradient vector if its norm exceeds the maximum allowed gradient norm.

protected Vector<T> ClipGradient(Vector<T> gradient)

Parameters

gradient Vector<T>: The gradient vector to be clipped.

Returns

Vector<T>: The clipped gradient vector.

Remarks

This method converts the vector to a tensor, applies gradient clipping, and converts back to a vector.

For Beginners: This is another safety mechanism to prevent the "exploding gradient" problem, but specifically for vector inputs. It works just like the tensor version but handles vector data.

ClipGradients(List<Tensor<T>>)

Applies gradient clipping to prevent exploding gradients.

protected void ClipGradients(List<Tensor<T>> gradients)

Parameters

gradients List<Tensor<T>>: A list of tensors containing the gradients to be clipped.

Remarks

This method calculates the total norm of all gradients and scales them down if the norm exceeds the maximum allowed gradient norm (_maxGradNorm).

For Beginners: Think of this as a safety mechanism. Sometimes, the network might try to make very large adjustments, which can make learning unstable. This method checks if the adjustments are too big, and if they are, it scales them down to a safe level. It's like having a speed limiter on a car to prevent it from going too fast and losing control.

Clone()

Creates a clone of the neural network.

public virtual IFullModel<T, Tensor<T>, Tensor<T>> Clone()

Returns

IFullModel<T, Tensor<T>, Tensor<T>>: A new instance that is a clone of this neural network.

Remarks

For most neural networks, Clone and DeepCopy perform the same function - creating a complete independent copy of the network. Some specialized networks might implement this differently.

For Beginners: This creates an identical copy of your neural network.

In most cases, this works the same as DeepCopy and creates a completely independent duplicate of your network. The duplicate will have the same structure and the same learned parameters, but changes to one won't affect the other.

ComputeGradients(Tensor<T>, Tensor<T>, ILossFunction<T>?)

Computes a flattened gradient vector for all trainable parameters in the network.

public virtual Vector<T> ComputeGradients(Tensor<T> input, Tensor<T> target, ILossFunction<T>? lossFunction = null)

Parameters

input Tensor<T>: The input tensor.
target Tensor<T>: The target tensor.
lossFunction ILossFunction<T>: Optional override loss function (defaults to DefaultLossFunction).

Returns

Vector<T>: A vector containing the concatenated gradients for all layer parameters.

Remarks

This method performs a forward pass, computes the loss derivative, backpropagates gradients, and then concatenates the parameter gradients across all layers into a single vector.

For Beginners: Gradients are the "direction to change weights" so the model makes fewer mistakes.

ComputeInputGradient(Tensor<T>, Tensor<T>)

Computes the gradient of the model output with respect to the input using tensor format.

public virtual Tensor<T> ComputeInputGradient(Tensor<T> input, Tensor<T> outputGradient)

Parameters

input Tensor<T>: The input tensor for which to compute gradients.
outputGradient Tensor<T>: The gradient tensor with respect to the output.

Returns

Tensor<T>: The gradient tensor with respect to the input.

ComputeInputGradient(Vector<T>, Vector<T>)

Computes the gradient of the model output with respect to the input.

public virtual Vector<T> ComputeInputGradient(Vector<T> input, Vector<T> outputGradient)

Parameters

input Vector<T>: The input for which to compute gradients.
outputGradient Vector<T>: The gradient with respect to the output (typically from a loss function).

Returns

Vector<T>: The gradient with respect to the input.

Remarks

This method performs backpropagation through the model to compute input gradients. The outputGradient represents how much we "care" about each output dimension, typically derived from a loss function.

For Adversarial Attacks: Set outputGradient to emphasize the target class (for targeted attacks) or the true class (for untargeted attacks), then use the returned input gradient to perturb the input in a direction that maximizes misclassification.

ConfigureFairness(Vector<int>, params FairnessMetric[])

Configures fairness evaluation settings.

public virtual void ConfigureFairness(Vector<int> sensitiveFeatures, params FairnessMetric[] fairnessMetrics)

Parameters

sensitiveFeatures Vector<int>
fairnessMetrics FairnessMetric[]

ConvertLayerToGraph(ILayer<T>, ComputationNode<T>)

Converts a single layer to computation graph nodes by delegating to the layer's ExportComputationGraph method.

protected virtual ComputationNode<T> ConvertLayerToGraph(ILayer<T> layer, ComputationNode<T> input)

Parameters

layer ILayer<T>: The layer to convert.
input ComputationNode<T>: The input node to the layer.

Returns

ComputationNode<T>: The output node from the layer.

Remarks

This method follows the Open/Closed Principle by delegating to each layer's own ExportComputationGraph implementation. New layers can be added without modifying this base class.

Exceptions

NotSupportedException: Thrown when the layer does not support JIT compilation.

CreateNewInstance()

Creates a new instance of the same type as this neural network.

protected abstract IFullModel<T, Tensor<T>, Tensor<T>> CreateNewInstance()

Returns

IFullModel<T, Tensor<T>, Tensor<T>>: A new instance of the same neural network type.

Remarks

For Beginners: This creates a blank version of the same type of neural network.

It's used internally by methods like DeepCopy and Clone to create the right type of network before copying the data into it.

DeepCopy()

Creates a deep copy of the neural network.

public virtual IFullModel<T, Tensor<T>, Tensor<T>> DeepCopy()

Returns

IFullModel<T, Tensor<T>, Tensor<T>>: A new instance that is a deep copy of this neural network.

Remarks

This method creates a complete independent copy of the network, including all layers and their parameters. It uses serialization and deserialization to ensure a true deep copy.

For Beginners: This creates a completely independent duplicate of your neural network.

Think of it like creating an exact clone of your network where:

The copy has the same structure (layers, connections)
The copy has the same learned parameters (weights, biases)
Changes to one network don't affect the other

This is useful when you want to:

Experiment with modifications without risking your original network
Create multiple variations of a model
Save a snapshot of your model at a particular point in training

Deserialize(byte[])

Deserializes the neural network from a byte array.

public virtual void Deserialize(byte[] data)

Parameters

data byte[]: The byte array containing the serialized neural network data.

DeserializeNetworkSpecificData(BinaryReader)

Deserializes network-specific data that was not covered by the general deserialization process.

protected abstract void DeserializeNetworkSpecificData(BinaryReader reader)

Parameters

reader BinaryReader: The BinaryReader to read the data from.

Remarks

This method is called at the end of the general deserialization process to allow derived classes to read any additional data specific to their implementation.

For Beginners: Continuing the suitcase analogy, this is like unpacking that special compartment. After the main deserialization method has unpacked the common items (layers, parameters), this method allows each specific type of neural network to unpack its own unique items that were stored during serialization.

DisableMemoryManagement()

Disables memory management and releases associated resources.

public virtual void DisableMemoryManagement()

Dispose()

Disposes resources used by the neural network.

public void Dispose()

Remarks

Ensures that the mixed-precision context is properly disposed if it was enabled.

Dispose(bool)

Protected Dispose pattern implementation.

protected virtual void Dispose(bool disposing)

Parameters

disposing bool: True if called from Dispose(), false if called from finalizer.

DownloadWeightsFromGpu()

Downloads all layer weights from GPU back to CPU.

public virtual void DownloadWeightsFromGpu()

Remarks

Call this after GPU training to sync updated weights back to CPU for: - Model saving/checkpointing - CPU inference - Weight inspection

For Beginners: During GPU training, weights are updated on GPU. The CPU copy becomes stale. Call this to get the latest values back to CPU.

EnableMemoryManagement(TrainingMemoryConfig?)

Enables memory management with the specified configuration.

public virtual void EnableMemoryManagement(TrainingMemoryConfig? config = null)

Parameters

config TrainingMemoryConfig: Memory management configuration. If null, uses default settings.

Remarks

For Beginners: Memory management helps train larger models by:

Gradient Checkpointing: Instead of storing all layer activations (which uses lots of memory), only store some "checkpoints". During backpropagation, recompute the missing activations from the checkpoints. This trades compute time for memory (typically 40-50% memory savings).
Activation Pooling: Reuse tensor memory instead of allocating new tensors each time. This reduces garbage collection overhead and memory fragmentation.

Example:

// Enable memory-efficient training
network.EnableMemoryManagement(TrainingMemoryConfig.MemoryEfficient());

// Or with custom settings
network.EnableMemoryManagement(new TrainingMemoryConfig
{
    UseGradientCheckpointing = true,
    CheckpointEveryNLayers = 2,
    UseActivationPooling = true
});

EnableMethod(params InterpretationMethod[])

Enables specific interpretation methods.

public virtual void EnableMethod(params InterpretationMethod[] methods)

Parameters

methods InterpretationMethod[]

EnsureArchitectureInitialized()

Ensures the architecture is initialized before training begins.

protected void EnsureArchitectureInitialized()

ExportComputationGraph(List<ComputationNode<T>>)

Exports the model's computation graph for JIT compilation.

public virtual ComputationNode<T> ExportComputationGraph(List<ComputationNode<T>> inputNodes)

Parameters

inputNodes List<ComputationNode<T>>: List to populate with input computation nodes (parameters).

Returns

ComputationNode<T>: The output computation node representing the model's prediction.

Remarks

This method should construct a computation graph representing the model's forward pass. The graph should use placeholder input nodes that will be filled with actual data during execution.

For Beginners: This method creates a "recipe" of your model's calculations that the JIT compiler can optimize.

The method should:

Create placeholder nodes for inputs (features, parameters)
Build the computation graph using TensorOperations
Return the final output node
Add all input nodes to the inputNodes list (in order)

Example for a simple linear model (y = Wx + b):

public ComputationNode<T> ExportComputationGraph(List<ComputationNode<T>> inputNodes)
{
    // Create placeholder inputs
    var x = TensorOperations<T>.Variable(new Tensor<T>(InputShape), "x");
    var W = TensorOperations<T>.Variable(Weights, "W");
    var b = TensorOperations<T>.Variable(Bias, "b");

    // Add inputs in order
    inputNodes.Add(x);
    inputNodes.Add(W);
    inputNodes.Add(b);

    // Build graph: y = Wx + b
    var matmul = TensorOperations<T>.MatMul(x, W);
    var output = TensorOperations<T>.Add(matmul, b);

    return output;
}

The JIT compiler will then:

Optimize the graph (fuse operations, eliminate dead code)
Compile it to fast native code
Cache the compiled version for reuse

ExtractSingleExample(Tensor<T>, int)

Extracts a single example from a batch tensor and formats it as a tensor with shape [1, features].

protected Tensor<T> ExtractSingleExample(Tensor<T> batchTensor, int index)

Parameters

batchTensor Tensor<T>: The batch tensor to extract from.
index int: The index of the example to extract.

Returns

Tensor<T>: A tensor containing a single example with shape [1, features].

ForwardDeferred(Tensor<T>)

Performs a forward pass using deferred execution for optimized GPU performance. Operations are recorded and batched into an execution graph that runs with a single sync point.

public virtual Tensor<T> ForwardDeferred(Tensor<T> input)

Parameters

input Tensor<T>: The input tensor to process.

Returns

Tensor<T>: The output tensor from the network.

Remarks

This method uses deferred execution to batch all GPU operations and execute them as an optimized graph. This provides significant performance improvements over eager execution by: - Avoiding synchronization between layers - Enabling kernel fusion optimizations - Minimizing CPU-GPU data transfers

Execution Flow:

BeginDeferredScope()
  Layer1.ForwardGpu() → Record GPU op (no sync)
  Layer2.ForwardGpu() → Record GPU op (no sync)
  Layer3.ForwardGpu() → Record GPU op (no sync)
EndDeferredScope() → Execute all → Single sync → Download final result

For Beginners: Think of this like batch cooking vs cooking one dish at a time. Instead of starting and finishing each layer separately, we plan out all the operations and then execute them together more efficiently.

Exceptions

InvalidOperationException: Thrown if the engine doesn't support deferred execution.

ForwardDeferredAsync(Tensor<T>, CancellationToken)

Performs an asynchronous forward pass using deferred execution for optimized GPU performance.

public virtual Task<Tensor<T>> ForwardDeferredAsync(Tensor<T> input, CancellationToken cancellationToken = default)

Parameters

input Tensor<T>: The input tensor to process.
cancellationToken CancellationToken: Cancellation token to cancel the operation.

Returns

Task<Tensor<T>>: A task representing the async operation with the output tensor.

Remarks

This is the async version of ForwardDeferred(Tensor<T>). The GPU execution runs asynchronously, allowing the CPU to do other work while waiting.

ForwardGpu(IGpuTensor<T>)

Performs a forward pass through the network entirely on GPU.

public virtual IGpuTensor<T> ForwardGpu(IGpuTensor<T> input)

Parameters

input IGpuTensor<T>: The GPU-resident input tensor.

Returns

IGpuTensor<T>: The GPU-resident output tensor.

Remarks

This method passes data through all layers on GPU without CPU round-trips. The output remains on GPU and can be used directly for loss computation.

For Beginners: Like ForwardWithMemory() but everything stays on the GPU. This is much faster for training because there's no copying between CPU and GPU.

Exceptions

InvalidOperationException: Thrown when the network doesn't support GPU execution.

ForwardGpu(Tensor<T>)

Performs a GPU-resident forward pass, keeping intermediate results on GPU. Only downloads the final result to CPU when the returned tensor is accessed.

public virtual IGpuTensor<T> ForwardGpu(Tensor<T> input)

Parameters

input Tensor<T>: The input tensor to process.

Returns

IGpuTensor<T>: GPU-resident output tensor. Only downloads when AiDotNet.Tensors.Engines.Gpu.IGpuTensor<T>.ToTensor() is called.

Remarks

For Beginners: This method is like the regular forward pass, but keeps all intermediate calculations on the GPU instead of moving data back and forth between CPU and GPU. This can be 10-50x faster for multi-layer networks!

Performance Tip: Use this method for inference when you have multiple layers that all support GPU execution. The data stays on the GPU until you call ToTensor() on the result.

// Example: GPU-resident inference
using var gpuResult = network.ForwardGpu(input);
var output = gpuResult.ToTensor(); // Only downloads here

Exceptions

InvalidOperationException: Thrown when no GPU backend is available or the engine is not a DirectGpuTensorEngine.

ForwardWithCheckpointing(Tensor<T>)

Performs forward pass with gradient checkpointing to reduce memory usage.

protected virtual Tensor<T> ForwardWithCheckpointing(Tensor<T> input)

Parameters

input Tensor<T>: Input tensor.

Returns

Tensor<T>: Output tensor.

Remarks

For Beginners: Gradient checkpointing trades compute for memory: - Instead of storing ALL layer activations (high memory), only store SOME checkpoints - During backprop, recompute the missing activations from the nearest checkpoint - Typical memory savings: 40-50% with only ~20% extra compute time

ForwardWithFeatures(Tensor<T>, int[]?)

Performs a forward pass and returns intermediate layer activations for feature extraction.

public virtual (Tensor<T> output, Dictionary<int, Tensor<T>> features) ForwardWithFeatures(Tensor<T> input, int[]? layerIndices = null)

Parameters

input Tensor<T>: The input tensor to process.
layerIndices int[]: Optional array of layer indices to extract features from. If null, returns all layer outputs.

Returns

(Tensor<T> output, Dictionary<int, Tensor<T>> features): A tuple containing the final output tensor and a dictionary of intermediate features indexed by layer number.

Remarks

This method performs a forward pass through the network while capturing intermediate layer activations. This is useful for feature extraction, transfer learning, style transfer, and advanced training techniques like feature matching in GANs.

For Beginners: This method lets you see what's happening inside the network at each layer.

Think of it like watching a factory assembly line:

Normally, you only see the final product (output)
This method lets you inspect the product at each station (layer)
You can choose specific stations to inspect (layerIndices)

This is useful for:

Understanding what features the network has learned
Using intermediate representations for other tasks (transfer learning)
Debugging network behavior
Advanced training techniques like feature matching in GANs

Example:

layerIndices = new[] { -2, -1 } means "last two layers" (negative indices count from end)
layerIndices = null means "all layers"

Industry Standard: This pattern is common in modern ML frameworks: - PyTorch: model.forward_features() or register_forward_hook() - TensorFlow/Keras: Model(inputs=..., outputs=[layer1.output, layer2.output]) - This implementation follows the TensorFlow-style approach

ForwardWithGpuContext(IGpuTensor<T>)

Performs a GPU-resident forward pass within a GPU execution context with GPU-resident input.

public virtual IGpuTensor<T> ForwardWithGpuContext(IGpuTensor<T> input)

Parameters

input IGpuTensor<T>: GPU-resident input tensor.

Returns

IGpuTensor<T>: GPU-resident output tensor managed by the current context.

Exceptions

InvalidOperationException: Thrown when no GPU context is active.

ForwardWithGpuContext(Tensor<T>)

Performs a GPU-resident forward pass within a GPU execution context. Uses the current thread's GpuExecutionContext for tensor management.

public virtual IGpuTensor<T> ForwardWithGpuContext(Tensor<T> input)

Parameters

input Tensor<T>: The input tensor to process.

Returns

IGpuTensor<T>: GPU-resident output tensor managed by the current context.

Remarks

For Beginners: This method is like ForwardGpu but uses the GPU execution context to track all tensor allocations. The context handles memory management automatically, preventing memory leaks and enabling memory pressure monitoring.

Exceptions

InvalidOperationException: Thrown when no GPU context is active.

ForwardWithMemory(Tensor<T>)

Performs a forward pass through the network while storing intermediate values for backpropagation.

public virtual Tensor<T> ForwardWithMemory(Tensor<T> input)

Parameters

input Tensor<T>: The input data to the network.

Returns

Tensor<T>: The output of the network.

Remarks

For Beginners: This method passes data through the network from input to output, but also remembers all the intermediate values. This is necessary for the learning process, as the network needs to know these values when figuring out how to improve.

API Change Note: The signature changed from Vector<T> to Tensor<T> to support multi-dimensional inputs. This is a breaking change. For backward compatibility, consider adding an overload that accepts Vector<T> and converts it internally to Tensor<T>.

Exceptions

InvalidOperationException: Thrown when the network doesn't support training.

GenerateTextExplanationAsync(Tensor<T>, Tensor<T>)

Generates a text explanation for a prediction.

public virtual Task<string> GenerateTextExplanationAsync(Tensor<T> input, Tensor<T> prediction)

Parameters

input Tensor<T>
prediction Tensor<T>

Returns

Task<string>

GetActiveFeatureIndices()

Gets the indices of input features that are actively used by the network.

public virtual IEnumerable<int> GetActiveFeatureIndices()

Returns

IEnumerable<int>: A collection of indices representing the active features.

Remarks

This method determines which input features have the most influence on the network's output by analyzing the weights of the first layer. Features with larger absolute weights are considered more active or important.

For Beginners: This helps you understand which parts of your input data the network considers most important for making predictions.

For example, if your inputs are:

Age (index 0)
Income (index 1)
Education level (index 2)

And this method returns [0, 2], it means the network relies heavily on age and education level, but not so much on income when making its predictions.

This can help you:

Understand what your model is paying attention to
Potentially simplify your model by removing unused features
Gain insights about the problem you're solving

GetAnchorExplanationAsync(Tensor<T>, T)

Gets anchor explanation for a given input.

public virtual Task<AnchorExplanation<T>> GetAnchorExplanationAsync(Tensor<T> input, T threshold)

Parameters

input Tensor<T>
threshold T

Returns

Task<AnchorExplanation<T>>

GetArchitecture()

Gets the architectural structure of the neural network.

public virtual NeuralNetworkArchitecture<T> GetArchitecture()

Returns

NeuralNetworkArchitecture<T>

GetCounterfactualAsync(Tensor<T>, Tensor<T>, int)

Gets counterfactual explanation for a given input and desired output.

public virtual Task<CounterfactualExplanation<T>> GetCounterfactualAsync(Tensor<T> input, Tensor<T> desiredOutput, int maxChanges = 5)

Parameters

input Tensor<T>
desiredOutput Tensor<T>
maxChanges int

Returns

Task<CounterfactualExplanation<T>>

GetFeatureImportance()

Gets the feature importance scores for the model.

public virtual Dictionary<string, T> GetFeatureImportance()

Returns

Dictionary<string, T>: A dictionary mapping feature names to their importance scores.

Remarks

This method calculates the importance of each input feature by analyzing the weights in the first layer of the neural network. Features with larger absolute weights are considered more important to the model's predictions.

For Beginners: This tells you which parts of your input data are most important for the neural network's decisions.

For example, if you're predicting house prices with features like size, location, and age, this method might tell you that "location" has an importance of 0.8, "size" has 0.6, and "age" has 0.2 - meaning the network relies heavily on location and size, but less on age.

This is useful for:

Understanding what your model pays attention to
Explaining model decisions to others
Identifying which features matter most
Simplifying your model by removing unimportant features

GetFeatureInteractionAsync(int, int)

Gets feature interaction effects between two features.

public virtual Task<T> GetFeatureInteractionAsync(int feature1Index, int feature2Index)

Parameters

feature1Index int
feature2Index int

Returns

Task<T>

GetGlobalFeatureImportanceAsync()

Gets the global feature importance across all predictions.

public virtual Task<Dictionary<int, T>> GetGlobalFeatureImportanceAsync()

Returns

Task<Dictionary<int, T>>

GetGpuMemoryStats()

Gets GPU memory statistics if running within a GPU execution context.

public virtual GpuMemoryStats? GetGpuMemoryStats()

Returns

GpuMemoryStats: Memory statistics, or null if no context is active.

GetGradients()

Gets the gradients from all layers in the neural network.

public virtual Vector<T> GetGradients()

Returns

Vector<T>: A vector containing all gradients from all layers concatenated together.

Remarks

This method collects the gradients from every layer in the network and combines them into a single vector. This is useful for optimization algorithms that need access to all gradients at once.

For Beginners: During training, each layer calculates how its parameters should change (the gradients). This method gathers all those gradients from every layer and puts them into one long list.

Think of it like:

Each layer has notes about how to improve (gradients)
This method collects all those notes into one document
The optimizer can then use this document to update the entire network

This is essential for the learning process, as it tells the optimizer how to adjust all the network's parameters to improve performance.

GetInputShape()

Gets the input shape expected by the neural network.

public virtual int[] GetInputShape()

Returns

int[]: An array representing the dimensions of the input.

Remarks

This method returns the shape of input data that the network expects. For example, if the network expects images of size 28x28 pixels, this might return [28, 28]. If it expects a vector of 100 features, it would return [100].

For Beginners: This tells you what size and shape of data the network needs as input. Think of it like knowing what size batteries a device needs - you need to provide the right dimensions of data for the network to work properly.

GetLastLoss()

Gets the loss value from the most recent training iteration.

public virtual T GetLastLoss()

Returns

T: The loss value from the last training iteration, or zero if no training has occurred.

Remarks

This method returns the error/loss value calculated during the most recent call to the Train method. It's useful for monitoring the training progress and implementing early stopping.

For Beginners: This tells you how well your network is learning.

The loss value is a measure of how far off your network's predictions are from the correct answers.

A high loss means the network is making big mistakes
A low loss means the network is getting closer to the right answers

By tracking this value over time, you can:

See if your network is improving
Decide when to stop training (when the loss stops decreasing)
Compare different network designs to see which learns better

Think of it like a score in a game - the lower the score, the better your network is performing.

GetLayerActivations(Tensor<T>)

Gets the activations (outputs) from each layer for a given input.

public virtual Dictionary<int, Tensor<T>> GetLayerActivations(Tensor<T> input)

Parameters

input Tensor<T>: The input tensor to process.

Returns

Dictionary<int, Tensor<T>>: A dictionary mapping layer index to layer activation tensors.

Remarks

This method processes the input through the network and captures the output of each layer. This is useful for visualizing what each layer is detecting, debugging the network, or implementing techniques like feature extraction.

For Beginners: This shows you what each layer in your neural network "sees" or produces when given an input. It's like following a signal through a circuit and measuring the output at each component. This helps you understand what patterns each layer is detecting.

For example, in an image recognition network:

Early layers might detect edges and simple shapes
Middle layers might detect parts of objects (like eyes or wheels)
Later layers might detect whole objects

This method lets you see all of these intermediate representations.

GetLimeExplanationAsync(Tensor<T>, int)

Gets LIME explanation for a specific input.

public virtual Task<LimeExplanation<T>> GetLimeExplanationAsync(Tensor<T> input, int numFeatures = 10)

Parameters

input Tensor<T>
numFeatures int

Returns

Task<LimeExplanation<T>>

GetLocalFeatureImportanceAsync(Tensor<T>)

Gets the local feature importance for a specific input.

public virtual Task<Dictionary<int, T>> GetLocalFeatureImportanceAsync(Tensor<T> input)

Parameters

input Tensor<T>

Returns

Task<Dictionary<int, T>>

GetMemoryEstimate(int, int)

Gets memory usage statistics if memory management is enabled.

public MemorySavingsEstimate? GetMemoryEstimate(int batchSize = 32, int sequenceLength = 512)

Parameters

batchSize int
sequenceLength int

Returns

MemorySavingsEstimate: Memory savings estimate, or null if memory management is disabled.

GetModelMetadata()

Gets the metadata for this neural network model.

public abstract ModelMetadata<T> GetModelMetadata()

Returns

ModelMetadata<T>: A ModelMetaData object containing information about the model.

GetModelSpecificInterpretabilityAsync()

Gets model-specific interpretability information.

public virtual Task<Dictionary<string, object>> GetModelSpecificInterpretabilityAsync()

Returns

Task<Dictionary<string, object>>

GetNamedLayerActivations(Tensor<T>)

Gets the intermediate activations from each layer when processing the given input with named keys.

public virtual Dictionary<string, Tensor<T>> GetNamedLayerActivations(Tensor<T> input)

Parameters

input Tensor<T>

Returns

Dictionary<string, Tensor<T>>

GetParameterCount()

Gets the total number of parameters in the model.

public int GetParameterCount()

Returns

int: The total number of parameters in the neural network.

Remarks

This method returns the total count of all trainable parameters across all layers in the neural network. It uses the cached ParameterCount property for efficiency.

For Beginners: This tells you how many adjustable values (weights and biases) your neural network has. More parameters mean the network can learn more complex patterns, but also requires more training data and computational resources.

GetParameterGradients()

Retrieves the gradients for all trainable parameters in the network.

public virtual Vector<T> GetParameterGradients()

Returns

Vector<T>: A vector containing all parameter gradients.

Remarks

For Beginners: When a neural network learns, it needs to know how to adjust each of its internal values (parameters). These adjustments are called "gradients" - they tell the network which direction and how much to change each parameter. This method collects all those adjustment values into a single list.

Think of gradients as a recipe for improvement: "increase this weight by 0.01, decrease that one by 0.03," etc.

GetParameters()

Gets all trainable parameters of the network as a single vector.

public virtual Vector<T> GetParameters()

Returns

Vector<T>: A vector containing all parameters of the network.

Remarks

For Beginners: Neural networks learn by adjusting their "parameters" (also called weights and biases). This method collects all those adjustable values into a single list so they can be updated during training.

GetPartialDependenceAsync(Vector<int>, int)

Gets partial dependence data for specified features.

public virtual Task<PartialDependenceData<T>> GetPartialDependenceAsync(Vector<int> featureIndices, int gridResolution = 20)

Parameters

featureIndices Vector<int>
gridResolution int

Returns

Task<PartialDependenceData<T>>

GetShapValuesAsync(Tensor<T>)

Gets SHAP values for the given inputs.

public virtual Task<Matrix<T>> GetShapValuesAsync(Tensor<T> inputs)

Parameters

inputs Tensor<T>

Returns

Task<Matrix<T>>

InitializeLayers()

Initializes the layers of the neural network based on the architecture.

protected abstract void InitializeLayers()

Remarks

For Beginners: This method sets up all the layers in your neural network according to the architecture you've defined. It's like assembling the parts of your network before you can use it.

InvalidateParameterCountCache()

Invalidates the parameter count cache. Call this method whenever layers are added, removed, or modified.

protected void InvalidateParameterCountCache()

IsFeatureUsed(int)

Determines if a specific input feature is actively used by the network.

public virtual bool IsFeatureUsed(int featureIndex)

Parameters

featureIndex int: The index of the feature to check.

Returns

bool: True if the feature is actively used; otherwise, false.

Remarks

This method checks if a specific input feature has a significant influence on the network's output based on the weights in the first layer. A feature is considered used if its associated weights have non-negligible magnitudes.

For Beginners: This tells you whether a specific piece of your input data matters to the neural network's decisions.

For example, if your inputs include age, income, and education level, this method can tell you whether the network is actually using age (or any other specific feature) when making predictions.

This is useful for:

Understanding what information your model uses
Simplifying your inputs by removing unused features
Debugging models that ignore features you think should be important

IsValidInputLayer(ILayer<T>)

Determines if a layer can serve as a valid input layer for the neural network.

protected virtual bool IsValidInputLayer(ILayer<T> layer)

Parameters

layer ILayer<T>: The layer to check.

Returns

bool: True if the layer can be used as an input layer; otherwise, false.

Remarks

For Beginners: The input layer is the first layer of your neural network. It receives the raw data you want to process (like image pixels or text features). This method checks if a layer is suitable to be the first layer in your network.

IsValidOutputLayer(ILayer<T>)

Determines if a layer can serve as a valid output layer for the neural network.

protected virtual bool IsValidOutputLayer(ILayer<T> layer)

Parameters

layer ILayer<T>: The layer to check.

Returns

bool: True if the layer can be used as an output layer; otherwise, false.

Remarks

For Beginners: The output layer is the last layer of your neural network. It produces the final result (like a prediction or classification). This method checks if a layer is suitable to be the final layer in your network. Different tasks need different types of output layers - for example, image classification might use a Softmax activation, while regression might use a linear activation.

LoadModel(string)

Loads a neural network model from a file.

public virtual void LoadModel(string filePath)

Parameters

filePath string: The path to the file containing the saved model.

Remarks

For Beginners: This method allows you to load a previously saved neural network model from a file on disk. This is the counterpart to SaveModel and uses the Deserialize method to reconstruct the network from the saved data.

Exceptions

ArgumentException: Thrown when the file path is null or empty.
FileNotFoundException: Thrown when the file does not exist.

LoadState(Stream)

Loads the model's state from a stream.

public virtual void LoadState(Stream stream)

Parameters

stream Stream: The stream to read the model state from.

Predict(Tensor<T>)

Makes a prediction using the neural network.

public abstract Tensor<T> Predict(Tensor<T> input)

Parameters

input Tensor<T>: The input data to process.

Returns

Tensor<T>: The network's prediction.

Remarks

For Beginners: This is the main method you'll use to get results from your trained neural network. You provide some input data (like an image or text), and the network processes it through all its layers to produce an output (like a classification or prediction).

RemoveLayerFromCollection(ILayer<T>)

Removes a layer from the internal layers collection and invalidates the parameter count cache.

protected bool RemoveLayerFromCollection(ILayer<T> layer)

Parameters

layer ILayer<T>: The layer to remove

Returns

bool: True if the layer was successfully removed, false otherwise

Remarks

This method ensures that the parameter count cache is properly invalidated when layers are removed. Derived classes should use this method instead of directly accessing Layers.Remove().

ResetState()

Resets the internal state of the different layers, clearing any remembered information.

public virtual void ResetState()

Remarks

This method resets the internal state (hidden state and cell state) of all layers in the network. This is useful when starting to process a new, unrelated sequence or when the network's memory should be cleared before making new predictions.

For Beginners: This clears the neural network's memory to start fresh.

Think of this like:

Wiping the slate clean before starting a new task
Erasing the neural network's "memory" so past inputs don't influence new predictions
Starting fresh when processing a completely new sequence

For example, if you've been using an neural network to analyze one document and now want to analyze a completely different document, you would reset the state first to avoid having the first document influence the analysis of the second one.

SaveModel(string)

Saves the model to a file.

public virtual void SaveModel(string filePath)

Parameters

filePath string: The path where the model should be saved.

Remarks

This method serializes the entire neural network, including all layers and parameters, and saves it to the specified file path.

For Beginners: This saves your trained neural network to a file on your computer.

Think of it like saving a document - you can later load the model back from the file and use it to make predictions without having to retrain it from scratch.

This is useful when:

You've finished training and want to save your model
You want to use the model in a different application
You need to share the model with others
You want to deploy the model to production

SaveState(Stream)

Saves the model's current state to a stream.

public virtual void SaveState(Stream stream)

Parameters

stream Stream: The stream to write the model state to.

Serialize()

Serializes the neural network to a byte array.

public virtual byte[] Serialize()

Returns

byte[]: A byte array representing the serialized neural network.

SerializeNetworkSpecificData(BinaryWriter)

Serializes network-specific data that is not covered by the general serialization process.

protected abstract void SerializeNetworkSpecificData(BinaryWriter writer)

Parameters

writer BinaryWriter: The BinaryWriter to write the data to.

Remarks

This method is called at the end of the general serialization process to allow derived classes to write any additional data specific to their implementation.

For Beginners: Think of this as packing a special compartment in your suitcase. While the main serialization method packs the common items (layers, parameters), this method allows each specific type of neural network to pack its own unique items that other networks might not have.

SetActiveFeatureIndices(IEnumerable<int>)

Sets which input features should be considered active in the neural network.

public virtual void SetActiveFeatureIndices(IEnumerable<int> featureIndices)

Parameters

featureIndices IEnumerable<int>: The indices of features to mark as active.

Remarks

This method explicitly specifies which input features should be considered active in the neural network, overriding the automatic determination based on weights. Any features not included in the provided collection will be considered inactive, regardless of their weights in the network.

For Beginners: This method lets you manually select which parts of your input data the neural network should pay attention to. For example, if your inputs include various measurements or features, you can tell the network to focus only on specific ones that you know are important based on your domain knowledge.

This can be useful for:

Forcing the network to use features you know are important
Ignoring features you know are irrelevant or noisy
Testing how the network performs with different feature subsets
Implementing feature selection techniques

Exceptions

ArgumentNullException: Thrown when featureIndices is null.
ArgumentOutOfRangeException: Thrown when any feature index is negative or exceeds the input dimension.

SetBaseModel<TInput, TOutput>(IFullModel<T, TInput, TOutput>)

Sets the base model for interpretability analysis.

public virtual void SetBaseModel<TInput, TOutput>(IFullModel<T, TInput, TOutput> model)

Parameters

model IFullModel<T, TInput, TOutput>: The model to use for interpretability analysis. Must implement IFullModel.

Type Parameters

TInput: The input type for the model.
TOutput: The output type for the model.

Exceptions

ArgumentNullException: Thrown when model is null.

SetParameters(Vector<T>)

Sets the parameters of the neural network.

public virtual void SetParameters(Vector<T> parameters)

Parameters

parameters Vector<T>: The parameters to set.

Remarks

This method distributes the parameters to all layers in the network. The parameters should be in the same format as returned by GetParameters.

SetTrainingMode(bool)

Sets the neural network to either training or inference mode.

public virtual void SetTrainingMode(bool isTraining)

Parameters

isTraining bool: True to enable training mode; false to enable inference mode.

Remarks

For Beginners: Neural networks behave differently during training versus when making predictions.

When in training mode (isTraining = true): - The network keeps track of intermediate calculations needed for learning - Certain layers like Dropout and BatchNormalization behave differently - The network uses more memory but can learn from its mistakes

When in inference/prediction mode (isTraining = false): - The network only performs forward calculations - It uses less memory and runs faster - It cannot learn or update its parameters

Think of it like the difference between taking a practice test (training mode) where you can check your answers and learn from mistakes, versus taking the actual exam (inference mode) where you just give your best answers based on what you've already learned.

Train(Tensor<T>, Tensor<T>)

Trains the neural network on a single input-output pair.

public abstract void Train(Tensor<T> input, Tensor<T> expectedOutput)

Parameters

input Tensor<T>: The input data.
expectedOutput Tensor<T>: The expected output for the given input.

Remarks

This method performs one training step on the neural network using the provided input and expected output. It updates the network's parameters to reduce the error between the network's prediction and the expected output.

For Beginners: This is how your neural network learns. You provide: - An input (what the network should process) - The expected output (what the correct answer should be)

The network then:

Makes a prediction based on the input
Compares its prediction to the expected output
Calculates how wrong it was (the loss)
Adjusts its internal values to do better next time

After training, you can get the loss value using the GetLastLoss() method to see how well the network is learning.

TrainBatchGpuDeferred(IGpuTensor<T>, IGpuTensor<T>, IGpuOptimizerConfig, GpuExecutionOptions?)

Performs a complete training step (forward + backward + update) on GPU with deferred execution.

public virtual T TrainBatchGpuDeferred(IGpuTensor<T> input, IGpuTensor<T> target, IGpuOptimizerConfig config, GpuExecutionOptions? options = null)

Parameters

input IGpuTensor<T>: The GPU-resident input batch.
target IGpuTensor<T>: The GPU-resident target batch.
config IGpuOptimizerConfig: The GPU optimizer configuration.
options GpuExecutionOptions: Optional GPU execution options for deferred execution.

Returns

T: The loss value for this batch.

Remarks

This method wraps forward, backward, and update in a deferred execution scope, allowing the GPU to optimize the entire training step as a single execution graph. This provides significant performance improvements through: - Kernel fusion - Memory optimization - Stream parallelization - Reduced synchronization overhead

For Beginners: This is the fastest way to train on GPU. Instead of executing each operation immediately, it records all operations and executes them as one optimized graph. Think of it like batch processing - more efficient than doing things one at a time.

TrainBatchGpuDeferredAsync(IGpuTensor<T>, IGpuTensor<T>, IGpuOptimizerConfig, GpuExecutionOptions?, CancellationToken)

Performs a complete training step (forward + backward + update) on GPU with deferred execution asynchronously.

public virtual Task<T> TrainBatchGpuDeferredAsync(IGpuTensor<T> input, IGpuTensor<T> target, IGpuOptimizerConfig config, GpuExecutionOptions? options = null, CancellationToken cancellationToken = default)

Parameters

input IGpuTensor<T>: The GPU-resident input batch.
target IGpuTensor<T>: The GPU-resident target batch.
config IGpuOptimizerConfig: The GPU optimizer configuration.
options GpuExecutionOptions: Optional GPU execution options for deferred execution.
cancellationToken CancellationToken: Cancellation token.

Returns

Task<T>: The loss value for this batch.

TryForwardGpuOptimized(Tensor<T>, out Tensor<T>)

Attempts to perform a GPU-resident forward pass with automatic fallback to CPU. Use this in derived class Forward() methods to get GPU optimization with minimal code.

protected bool TryForwardGpuOptimized(Tensor<T> input, out Tensor<T> result)

Parameters

input Tensor<T>: The input tensor to process.
result Tensor<T>: The output tensor if GPU path succeeded.

Returns

bool: True if GPU path was used successfully; false if CPU path should be used.

Remarks

For Derived Classes: Call this at the start of your Forward() method:

public Tensor<T> Forward(Tensor<T> input)
{
    if (TryForwardGpuOptimized(input, out var result))
        return result;

    // CPU fallback path
    ...
}

UpdateParameters(Vector<T>)

Updates the network's parameters with new values.

public abstract void UpdateParameters(Vector<T> parameters)

Parameters

parameters Vector<T>: The new parameter values to set.

Remarks

For Beginners: During training, a neural network's internal values (parameters) get adjusted to improve its performance. This method allows you to update all those values at once by providing a complete set of new parameters.

This is typically used by optimization algorithms that calculate better parameter values based on training data.

UpdateParametersGpu(IGpuOptimizerConfig)

Updates all trainable parameters in the network using the specified optimizer configuration.

public virtual void UpdateParametersGpu(IGpuOptimizerConfig config)

Parameters

config IGpuOptimizerConfig: The GPU optimizer configuration specifying the update algorithm and hyperparameters.

Remarks

This method updates weights and biases directly on GPU using gradients computed by BackpropagateGpu. Supports all GPU optimizer types: SGD, Adam, AdamW, RMSprop, Adagrad, NAG, LARS, LAMB.

For Beginners: After computing gradients with BackpropagateGpu(), call this to actually update the weights. The config determines which optimizer algorithm to use: - SGD: Simple gradient descent with optional momentum - Adam: Adaptive learning rates (most popular) - AdamW: Adam with proper weight decay (best for transformers)

Exceptions

InvalidOperationException: Thrown when the network doesn't support GPU training.

UpdateParametersGpu(T, T?, T?)

Updates all trainable parameters in the network using GPU-computed gradients.

[Obsolete("Use UpdateParametersGpu(IGpuOptimizerConfig) instead for full optimizer support.")]
public virtual void UpdateParametersGpu(T learningRate, T? momentum = default, T? weightDecay = default)

Parameters

learningRate T: The learning rate for parameter updates.
momentum T: Optional momentum factor (default 0).
weightDecay T: Optional weight decay / L2 regularization factor (default 0).

Remarks

This method updates weights and biases directly on GPU using gradients computed by BackpropagateGpu. The update uses: w = w - lr * (grad + weightDecay * w) + momentum * velocity

For Beginners: After computing gradients with BackpropagateGpu(), call this to actually update the weights. Everything happens on GPU for maximum speed.

Exceptions

InvalidOperationException: Thrown when the network doesn't support GPU training.

UpdateParametersGpuDeferred(IGpuOptimizerConfig, GpuExecutionOptions?)

Updates all trainable parameters with deferred GPU execution.

public virtual void UpdateParametersGpuDeferred(IGpuOptimizerConfig config, GpuExecutionOptions? options = null)

Parameters

config IGpuOptimizerConfig: The GPU optimizer configuration.
options GpuExecutionOptions: Optional GPU execution options.

Remarks

Uses deferred execution to batch all parameter update operations into a single GPU command buffer. This reduces CPU-GPU synchronization overhead and improves training performance.

UploadWeightsToGpu()

Uploads all layer weights to GPU for GPU-resident training.

public virtual void UploadWeightsToGpu()

Remarks

Call this once before starting GPU training to: - Create GPU buffers for all weights and biases - Copy current CPU values to GPU - Create GPU buffers for gradients and optimizer states

For Beginners: This prepares the network for GPU training by copying all learned values to the GPU. After this, training can happen entirely on GPU.

ValidateCustomLayers(List<ILayer<T>>)

Validates that the provided layers form a valid neural network architecture.

protected virtual void ValidateCustomLayers(List<ILayer<T>> layers)

Parameters

layers List<ILayer<T>>: The layers to validate.

Remarks

For Beginners: Not all combinations of layers make a valid neural network. This method checks that the layers can properly connect to each other (like making sure puzzle pieces fit together).

Exceptions

ArgumentException: Thrown when the layer configuration is invalid.

ValidateCustomLayersInternal(List<ILayer<T>>)

protected void ValidateCustomLayersInternal(List<ILayer<T>> layers)

Parameters

layers List<ILayer<T>>

ValidateFairnessAsync(Tensor<T>, int)

Validates fairness metrics for the given inputs.

public virtual Task<FairnessMetrics<T>> ValidateFairnessAsync(Tensor<T> inputs, int sensitiveFeatureIndex)

Parameters

inputs Tensor<T>
sensitiveFeatureIndex int

Returns

Task<FairnessMetrics<T>>

WithParameters(Vector<T>)

Creates a new neural network with the specified parameters.

public virtual IFullModel<T, Tensor<T>, Tensor<T>> WithParameters(Vector<T> parameters)

Parameters

parameters Vector<T>: The parameters to use for the new network.

Returns

IFullModel<T, Tensor<T>, Tensor<T>>: A new neural network with the specified parameters.

Remarks

This method creates a new neural network that is a copy of this one, but with different parameter values. It's useful for creating variations of a model without retraining or for ensemble methods.

For Beginners: Think of this as creating a copy of your neural network but with different internal settings. It's like having the same blueprint for a house but using different materials.

This is useful when you want to:

Try different variations of a trained model
Create an ensemble of similar models with different parameters
Manually adjust model parameters without retraining

The new model will have the same structure but different parameter values.

ZeroGradientsGpu()

Zeros all GPU gradient accumulators in preparation for a new batch.

public virtual void ZeroGradientsGpu()

Remarks

Call this at the start of each training batch to clear gradients from the previous batch.

For Beginners: Before processing a new batch, you need to clear the old gradients. Otherwise they accumulate and training goes wrong.

Table of Contents

Class NeuralNetworkBase<T>

Type Parameters

Remarks

Constructors

NeuralNetworkBase(NeuralNetworkArchitecture<T>, ILossFunction<T>, double)

Parameters

Fields

Architecture

Field Value

Remarks

LastLoss

Field Value

Remarks

LossFunction

Field Value

Remarks

MaxGradNorm

Field Value

NumOps

Field Value

_baseModel

Field Value

Remarks

_enabledMethods

Field Value

_fairnessMetrics

Field Value

_layerInputs

Field Value

Remarks

_layerOutputs

Field Value

Remarks

_memoryManager

Field Value

Remarks

_mixedPrecisionContext

Field Value

Remarks

_sensitiveFeatures

Field Value

Properties

CanTrainOnGpu

Property Value

Remarks

DefaultLossFunction

Property Value

Remarks

Engine

Property Value

GpuEngine

Property Value

IsGradientCheckpointingEnabled

Property Value

IsMemoryManagementEnabled

Property Value

IsMixedPrecisionEnabled

Property Value

Remarks

IsTrainingMode

Property Value

Remarks

LayerCount

Property Value

Remarks

Layers

Property Value

Remarks

ParameterCount

Property Value

Remarks

Random

Property Value

Remarks

SupportsGpuTraining

Property Value

Remarks

SupportsJitCompilation

Property Value