Class NeuralNetworkModel<T>
Represents a neural network model that implements the IFullModel interface.
public class NeuralNetworkModel<T> : IFullModel<T, Tensor<T>, Tensor<T>>, IModel<Tensor<T>, Tensor<T>, ModelMetadata<T>>, IModelSerializer, ICheckpointableModel, IParameterizable<T, Tensor<T>, Tensor<T>>, IFeatureAware, IFeatureImportance<T>, ICloneable<IFullModel<T, Tensor<T>, Tensor<T>>>, IGradientComputable<T, Tensor<T>, Tensor<T>>, IJitCompilable<T>
Type Parameters
TThe numeric type used for calculations, typically float or double.
- Inheritance
-
NeuralNetworkModel<T>
- Implements
- Inherited Members
- Extension Methods
Remarks
This class wraps a neural network implementation to provide a consistent interface with other model types. It handles training, prediction, serialization, and other operations required by the IFullModel interface, delegating to the underlying neural network. This allows neural networks to be used interchangeably with other model types in optimization and model selection processes.
For Beginners: This is a wrapper that makes neural networks work with the same interface as simpler models.
Neural networks are powerful machine learning models that can:
- Learn complex patterns in data that simpler models might miss
- Process different types of data like images, text, or tabular data
- Automatically extract useful features from raw data
This class allows you to use neural networks anywhere you would use simpler models, making it easy to compare them or use them in the same optimization processes.
Constructors
NeuralNetworkModel(NeuralNetworkArchitecture<T>)
Initializes a new instance of the NeuralNetworkModel class with the specified architecture.
public NeuralNetworkModel(NeuralNetworkArchitecture<T> architecture)
Parameters
architectureNeuralNetworkArchitecture<T>The architecture defining the structure of the neural network.
Remarks
This constructor creates a new NeuralNetworkModel instance with the specified architecture. It initializes the underlying neural network based on the architecture provided. The architecture determines the network's structure, including the number and type of layers, the input and output dimensions, and the type of task the network is designed to perform.
For Beginners: This constructor creates a new neural network model with the specified design.
When creating a NeuralNetworkModel:
- You provide an architecture that defines the network's structure
- The constructor creates the actual neural network based on this design
- The model is ready to be trained or to make predictions
The architecture is crucial as it determines what kind of data the network can process and what kind of problems it can solve. Different architectures work better for different types of problems.
Properties
Architecture
Gets the architecture of the neural network.
public NeuralNetworkArchitecture<T> Architecture { get; }
Property Value
- NeuralNetworkArchitecture<T>
A NeuralNetworkArchitecture<T> instance defining the structure of the network.
Remarks
This property provides access to the architecture that defines the structure of the neural network, including its layers, input/output dimensions, and task-specific properties. The architecture serves as a blueprint for the network and contains information about the network's topology and configuration.
For Beginners: This property gives you access to the blueprint of the neural network.
The architecture:
- Defines how many layers the network has
- Specifies how many neurons are in each layer
- Determines what kind of data the network can process
- Configures how the network learns and makes predictions
Think of it like the plans for a building - it defines the structure but doesn't contain the actual building materials.
Complexity
Gets the complexity of the model.
public int Complexity { get; }
Property Value
- int
An integer representing the model's complexity.
Remarks
This property returns a measure of the model's complexity, which is calculated as the total number of trainable parameters (weights and biases) in the neural network. The complexity of a neural network is an important factor in understanding its capacity to learn, its potential for overfitting, and its computational requirements.
For Beginners: This tells you how complex the neural network is.
The complexity:
- Is measured by the total number of adjustable parameters in the network
- Higher complexity means the network can learn more complex patterns
- But higher complexity also means more training data is needed
- And higher complexity increases the risk of overfitting
A simple network might have hundreds of parameters, while deep networks can have millions or billions.
DefaultLossFunction
Gets the default loss function used by this model for gradient computation.
public ILossFunction<T> DefaultLossFunction { get; }
Property Value
Remarks
This loss function is used when calling ComputeGradients(TInput, TOutput, ILossFunction<T>?) without explicitly providing a loss function. It represents the model's primary training objective.
For Beginners: The loss function tells the model "what counts as a mistake". For example: - For regression (predicting numbers): Mean Squared Error measures how far predictions are from actual values - For classification (predicting categories): Cross Entropy measures how confident the model is in the right category
This property provides a sensible default so you don't have to specify the loss function every time, but you can still override it if needed for special cases.
Distributed Training: In distributed training, all workers use the same loss function to ensure consistent gradient computation. The default loss function is automatically used when workers compute local gradients.
Exceptions
- InvalidOperationException
Thrown if accessed before the model has been configured with a loss function.
FeatureCount
Gets the number of features used by the model.
public int FeatureCount { get; }
Property Value
- int
An integer representing the number of input features.
Remarks
This property returns the number of features that the model uses, which is determined by the input size of the neural network. For one-dimensional inputs, this is simply the input size. For multi-dimensional inputs, this is the total number of input elements (calculated as InputHeight * InputWidth * InputDepth).
For Beginners: This tells you how many input variables the neural network uses.
The feature count:
- For simple data, it's the number of input values (like age, height, weight)
- For image data, it's the total number of pixels times the number of color channels
- For text data, it might be the vocabulary size or embedding dimension
This helps you understand how much input information the network is considering, and it's important for ensuring your input data has the right dimensions.
Network
Gets the underlying neural network.
public NeuralNetworkBase<T> Network { get; }
Property Value
- NeuralNetworkBase<T>
A NeuralNetworkBase<T> instance containing the actual neural network.
Remarks
This property provides access to the underlying neural network implementation. The network is responsible for the actual computations, while this class serves as an adapter to the IFullModel interface. This property can be used to access network-specific features not exposed through the IFullModel interface.
For Beginners: This property gives you direct access to the actual neural network.
The network:
- Contains all the layers and connections of the neural network
- Handles the actual calculations and learning
- Stores all the learned weights and parameters
You can use this property to access neural network-specific features that aren't available through the standard model interface.
ParameterCount
Gets the number of parameters in the model.
public virtual int ParameterCount { get; }
Property Value
Remarks
This property returns the total count of trainable parameters in the model. It's useful for understanding model complexity and memory requirements.
SupportsJitCompilation
Gets whether this model currently supports JIT compilation.
public bool SupportsJitCompilation { get; }
Property Value
- bool
True if the model can be JIT compiled, false otherwise.
Remarks
Some models may not support JIT compilation due to: - Dynamic graph structure (changes based on input) - Lack of computation graph representation - Use of operations not yet supported by the JIT compiler
For Beginners: This tells you whether this specific model can benefit from JIT compilation.
Models return false if they:
- Use layer-based architecture without graph export (e.g., current neural networks)
- Have control flow that changes based on input data
- Use operations the JIT compiler doesn't understand yet
In these cases, the model will still work normally, just without JIT acceleration.
Methods
ApplyGradients(Vector<T>, T)
Applies pre-computed gradients to update the model parameters.
public void ApplyGradients(Vector<T> gradients, T learningRate)
Parameters
gradientsVector<T>The gradient vector to apply.
learningRateTThe learning rate for the update.
Remarks
Updates parameters using: θ = θ - learningRate * gradients
For Beginners: After computing gradients (seeing which direction to move), this method actually moves the model in that direction. The learning rate controls how big of a step to take.
Distributed Training: In DDP/ZeRO-2, this applies the synchronized (averaged) gradients after communication across workers. Each worker applies the same averaged gradients to keep parameters consistent.
Clone()
Creates a shallow copy of this model.
public IFullModel<T, Tensor<T>, Tensor<T>> Clone()
Returns
- IFullModel<T, Tensor<T>, Tensor<T>>
A new instance with the same architecture and parameters.
Remarks
This method creates a copy of the model that shares the same architecture but has its own set of parameters. It is equivalent to DeepCopy for this implementation but is provided for compatibility with the IFullModel interface.
For Beginners: This method creates a copy of the neural network model.
In this implementation, Clone and DeepCopy do the same thing - they both create a completely independent copy of the model with the same architecture and parameters. Both methods are provided for compatibility with the IFullModel interface.
ComputeGradients(Tensor<T>, Tensor<T>, ILossFunction<T>?)
Computes gradients of the loss function with respect to model parameters for the given data, WITHOUT updating the model parameters.
public Vector<T> ComputeGradients(Tensor<T> input, Tensor<T> target, ILossFunction<T>? lossFunction = null)
Parameters
inputTensor<T>The input data.
targetTensor<T>The target/expected output.
lossFunctionILossFunction<T>The loss function to use for gradient computation. If null, uses the model's default loss function.
Returns
- Vector<T>
A vector containing gradients with respect to all model parameters.
Remarks
This method performs a forward pass, computes the loss, and back-propagates to compute gradients, but does NOT update the model's parameters. The parameters remain unchanged after this call.
Distributed Training: In DDP/ZeRO-2, each worker calls this to compute local gradients on its data batch. These gradients are then synchronized (averaged) across workers before applying updates. This ensures all workers compute the same parameter updates despite having different data.
For Meta-Learning: After adapting a model on a support set, you can use this method to compute gradients on the query set. These gradients become the meta-gradients for updating the meta-parameters.
For Beginners: Think of this as "dry run" training: - The model sees what direction it should move (the gradients) - But it doesn't actually move (parameters stay the same) - You get to decide what to do with this information (average with others, inspect, modify, etc.)
Exceptions
- InvalidOperationException
If lossFunction is null and the model has no default loss function.
DeepCopy()
Creates a deep copy of this model.
public IFullModel<T, Tensor<T>, Tensor<T>> DeepCopy()
Returns
- IFullModel<T, Tensor<T>, Tensor<T>>
A new instance with the same architecture and parameters.
Remarks
This method creates a deep copy of the neural network model, including both its architecture and learned parameters. The new model is independent of the original, so changes to one will not affect the other. This is useful for creating variations of a model while preserving the original.
For Beginners: This method creates an exact duplicate of the neural network, with the same structure and the same learned weights. This is useful when you need to make changes to a model without affecting the original.
The deep copy:
- Has identical architecture (same layers, neurons, connections)
- Has identical parameters (same weights and biases)
- Is completely independent of the original
This is useful for:
- Creating model variants for experimentation
- Saving a checkpoint before making changes
- Creating ensemble models
- Implementing techniques like dropout ensemble
Deserialize(byte[])
Deserializes the model from a byte array.
public void Deserialize(byte[] data)
Parameters
databyte[]The byte array containing the serialized model.
Remarks
This method deserializes the model from a byte array by reading the architecture details and the network parameters. It expects the same format as produced by the Serialize method: the architecture information followed by the network parameters. This allows a model that was previously serialized to be reconstructed.
For Beginners: This method reconstructs a neural network model from a byte array created by Serialize.
When deserializing the model:
- The architecture is read first to recreate the structure
- Then the parameters (weights) are loaded into that structure
- The resulting model is identical to the one that was serialized
This is used when:
- Loading a previously saved model
- Receiving a model from another system
- Resuming training from a checkpoint
After deserialization, the model can be used for predictions or further training just as if it had never been serialized.
ExportComputationGraph(List<ComputationNode<T>>)
Exports the model's computation graph for JIT compilation.
public ComputationNode<T> ExportComputationGraph(List<ComputationNode<T>> inputNodes)
Parameters
inputNodesList<ComputationNode<T>>List to populate with input computation nodes (parameters).
Returns
- ComputationNode<T>
The output computation node representing the model's prediction.
Remarks
This method should construct a computation graph representing the model's forward pass. The graph should use placeholder input nodes that will be filled with actual data during execution.
For Beginners: This method creates a "recipe" of your model's calculations that the JIT compiler can optimize.
The method should:
- Create placeholder nodes for inputs (features, parameters)
- Build the computation graph using TensorOperations
- Return the final output node
- Add all input nodes to the inputNodes list (in order)
Example for a simple linear model (y = Wx + b):
public ComputationNode<T> ExportComputationGraph(List<ComputationNode<T>> inputNodes)
{
// Create placeholder inputs
var x = TensorOperations<T>.Variable(new Tensor<T>(InputShape), "x");
var W = TensorOperations<T>.Variable(Weights, "W");
var b = TensorOperations<T>.Variable(Bias, "b");
// Add inputs in order
inputNodes.Add(x);
inputNodes.Add(W);
inputNodes.Add(b);
// Build graph: y = Wx + b
var matmul = TensorOperations<T>.MatMul(x, W);
var output = TensorOperations<T>.Add(matmul, b);
return output;
}
The JIT compiler will then:
- Optimize the graph (fuse operations, eliminate dead code)
- Compile it to fast native code
- Cache the compiled version for reuse
GetActiveFeatureIndices()
Gets the indices of all features used by this model.
public IEnumerable<int> GetActiveFeatureIndices()
Returns
- IEnumerable<int>
A collection of feature indices.
Remarks
This method returns the indices of all features that are used by the model. For neural networks, this typically includes all features from 0 to FeatureCount-1, as neural networks generally use all input features to some extent.
For Beginners: This method returns a list of which input features the model actually uses. For neural networks, this typically includes all available features unless specific feature selection has been applied.
Unlike some simpler models (like linear regression with feature selection) where certain inputs might be completely ignored, neural networks typically process all input features and learn which ones are important during training.
This method returns all feature indices from 0 to (FeatureCount-1).
GetFeatureImportance()
Gets the feature importance scores as a dictionary.
public Dictionary<string, T> GetFeatureImportance()
Returns
- Dictionary<string, T>
A dictionary mapping feature names to their importance scores.
Exceptions
- NotSupportedException
This method is not supported for neural networks. Feature importance in neural networks requires specialized techniques like gradient-based attribution or permutation importance.
GetModelMetadata()
Gets metadata about the model.
public ModelMetadata<T> GetModelMetadata()
Returns
- ModelMetadata<T>
A ModelMetadata object containing information about the model.
Remarks
This method returns metadata about the model, including its type, feature count, complexity, and additional information about the neural network. The metadata includes the model type (Neural Network), the number of features, the complexity (total parameter count), a description, and additional information such as the architecture details, layer counts, and activation functions used. This metadata is useful for model selection, analysis, and visualization.
For Beginners: This method returns detailed information about the neural network model.
The metadata includes:
- Basic properties like model type, feature count, and complexity
- Architecture details like layer counts and types
- Statistics about the model's parameters
This information is useful for:
- Understanding the model's structure
- Comparing different models
- Analyzing the model's capabilities
- Documenting the model for future reference
GetParameters()
Gets all trainable parameters of the neural network as a single vector.
public Vector<T> GetParameters()
Returns
- Vector<T>
A vector containing all trainable parameters.
Remarks
This method returns all trainable parameters of the neural network as a single vector. These parameters include weights and biases from all layers that support training. The vector can be used to save the model's state, apply optimization techniques, or transfer learning between models.
For Beginners: This method collects all the learned weights and biases from the neural network into a single list. This is useful for saving the model, optimizing it, or transferring its knowledge.
The parameters:
- Are the numbers that the neural network has learned during training
- Include weights (how strongly neurons connect to each other)
- Include biases (baseline activation levels for neurons)
A simple network might have hundreds of parameters, while modern deep networks often have millions or billions of parameters.
IsFeatureUsed(int)
Determines whether a specific feature is used by the model.
public bool IsFeatureUsed(int featureIndex)
Parameters
featureIndexintThe index of the feature to check.
Returns
- bool
Always returns true for neural networks, as they typically use all input features.
Remarks
This method determines whether a specific feature is used by the model. For neural networks, all features are typically used in some capacity, so this method always returns true. Unlike some linear models where features can have zero coefficients and therefore no impact, neural networks generally incorporate all input features, though they may learn to assign different importance to different features during training.
For Beginners: This method checks if a particular input variable affects the model's predictions.
For neural networks:
- This method always returns true
- Neural networks typically use all input features in some way
- The network learns which features are important during training
- Even if a feature isn't useful, the network will learn to assign it less weight
This differs from simpler models like linear regression, where features can be explicitly excluded with zero coefficients.
LoadModel(string)
Loads the model from disk.
public virtual void LoadModel(string filePath)
Parameters
filePathstringThe file path to read the model from.
Remarks
This method reads the model bytes from the provided path and deserializes them into this instance.
For Beginners: This restores a model you saved earlier so you can run predictions right away.
LoadState(Stream)
Loads the model's state (parameters and configuration) from a stream.
public void LoadState(Stream stream)
Parameters
streamStreamThe stream to read the model state from.
Remarks
This method deserializes model state that was previously saved with SaveState, restoring all parameters and configuration to recreate the saved model state.
For Beginners: This is like loading a saved game.
When you call LoadState:
- All the parameters are read from the stream
- The model is configured to match the saved architecture
- The model becomes identical to when SaveState was called
After loading, the model can make predictions using the restored parameters.
Stream Handling: - The stream position will be advanced by the number of bytes read - The stream is not closed (caller must dispose) - Stream data must match the format written by SaveState
Versioning: Implementations should consider: - Including format version number in serialized data - Validating compatibility before deserialization - Providing migration paths for old formats when possible
Usage:
// Load from file
using var stream = File.OpenRead("model.bin");
model.LoadState(stream);
Important: The stream must contain state data saved by SaveState from a compatible model (same architecture and numeric type).
Exceptions
- ArgumentNullException
Thrown when stream is null.
- ArgumentException
Thrown when stream is not readable or contains invalid data.
- InvalidOperationException
Thrown when deserialization fails or data is incompatible with model architecture.
Predict(Tensor<T>)
Uses the model to make a prediction for the given input.
public Tensor<T> Predict(Tensor<T> input)
Parameters
inputTensor<T>The input tensor to make a prediction for.
Returns
- Tensor<T>
The predicted output tensor.
Remarks
This method uses the trained neural network to make a prediction for the given input tensor. It sets the network to prediction mode (not training mode), performs a forward pass through the network, and returns the output as a tensor with the appropriate shape.
For Beginners: This method makes predictions using what the neural network has learned.
When making a prediction:
- The input data is sent through the network
- Each layer processes the data based on its learned weights
- The final layer produces the output (prediction)
Unlike training, no weights are updated during prediction - the network is simply using what it already knows to make its best guess.
SaveModel(string)
Saves the model to disk.
public virtual void SaveModel(string filePath)
Parameters
filePathstringThe file path to write the model to.
Remarks
This method serializes the model and writes the bytes to the provided path, creating the directory if needed.
For Beginners: This is how you save your trained model so you can load it later without training again.
SaveState(Stream)
Saves the model's current state (parameters and configuration) to a stream.
public void SaveState(Stream stream)
Parameters
streamStreamThe stream to write the model state to.
Remarks
This method serializes all the information needed to recreate the model's current state, including trained parameters, layer configurations, and any internal state variables.
For Beginners: This is like creating a snapshot of your trained model.
When you call SaveState:
- All the learned parameters (weights and biases) are written to the stream
- The model's architecture information is saved
- Any other internal state (like normalization statistics) is preserved
You can later use LoadState to restore the model to this exact state.
Stream Handling: - The stream position will be advanced by the number of bytes written - The stream is flushed but not closed (caller must dispose) - For file-based persistence, wrap in File.Create/FileStream
Usage:
// Save to file
using var stream = File.Create("model.bin");
model.SaveState(stream);
Exceptions
- ArgumentNullException
Thrown when stream is null.
- ArgumentException
Thrown when stream is not writable.
- InvalidOperationException
Thrown when model state cannot be serialized (e.g., uninitialized model).
Serialize()
Serializes the model to a byte array.
public byte[] Serialize()
Returns
- byte[]
A byte array containing the serialized model.
Remarks
This method serializes the model to a byte array by writing the architecture details and the network parameters. The serialization format includes the architecture information followed by the network parameters. This allows the model to be stored or transmitted and later reconstructed using the Deserialize method.
For Beginners: This method converts the neural network model to a byte array that can be saved or transmitted.
When serializing the model:
- Both the architecture (structure) and parameters (weights) are saved
- The data is formatted in a way that can be efficiently stored
- The resulting byte array contains everything needed to reconstruct the model
This is useful for:
- Saving trained models to disk
- Sharing models with others
- Deploying models to production systems
- Creating model checkpoints during long training processes
SetActiveFeatureIndices(IEnumerable<int>)
Sets the active feature indices for this model.
public void SetActiveFeatureIndices(IEnumerable<int> featureIndices)
Parameters
featureIndicesIEnumerable<int>The indices of features to activate.
SetLearningRate(T)
Sets the learning rate for training the model.
public NeuralNetworkModel<T> SetLearningRate(T learningRate)
Parameters
learningRateTThe learning rate to use during training.
Returns
- NeuralNetworkModel<T>
This model instance for method chaining.
Remarks
This method sets the learning rate used during training. The learning rate controls how quickly the model adapts to the training data. A higher learning rate means faster learning but may cause instability, while a lower learning rate means slower but more stable learning.
For Beginners: This lets you control how big each learning step is during training.
The learning rate:
- Controls how quickly the network adjusts its weights
- Smaller values (like 0.001) make training more stable but slower
- Larger values (like 0.1) make training faster but potentially unstable
Finding the right learning rate is often a process of trial and error. This method lets you set it to the value you want to try.
SetParameters(Vector<T>)
Sets the parameters for this model.
public void SetParameters(Vector<T> parameters)
Parameters
parametersVector<T>A vector containing the model parameters.
SetTrainingMode(bool)
Sets whether the model is in training mode or prediction mode.
public NeuralNetworkModel<T> SetTrainingMode(bool isTraining)
Parameters
isTrainingboolTrue for training mode, false for prediction mode.
Returns
- NeuralNetworkModel<T>
This model instance for method chaining.
Remarks
This method sets whether the model is in training mode or prediction mode. Some components of neural networks behave differently during training versus prediction, such as dropout layers, which randomly disable neurons during training but not during prediction.
For Beginners: This switches the network between learning mode and prediction mode.
The two modes are:
- Training mode: The network is learning and updating its weights
- Prediction mode: The network is using what it learned to make predictions
Some special layers like Dropout and BatchNormalization work differently depending on which mode the network is in. This method lets you switch between them.
Train(Tensor<T>, Tensor<T>)
Trains the model with the provided input and expected output.
public void Train(Tensor<T> input, Tensor<T> expectedOutput)
Parameters
inputTensor<T>The input tensor to train with.
expectedOutputTensor<T>The expected output tensor.
Remarks
This method trains the neural network with the provided input and expected output tensors. It sets the network to training mode, performs a forward pass through the network, calculates the error between the predicted output and the expected output, and backpropagates the error to update the network's weights.
For Beginners: This method teaches the neural network using an example.
During training:
- The input data is sent through the network (forward pass)
- The network makes a prediction
- The prediction is compared to the expected output
- The error is calculated
- The network adjusts its weights to reduce the error
This process is repeated with many examples to gradually improve the network's performance. Each example helps the network learn a little more about the patterns in your data.
WithParameters(Vector<T>)
Updates the model with new parameter values.
public IFullModel<T, Tensor<T>, Tensor<T>> WithParameters(Vector<T> parameters)
Parameters
parametersVector<T>The new parameter values to use.
Returns
- IFullModel<T, Tensor<T>, Tensor<T>>
The updated model.
Remarks
This method creates a new model with the same architecture as the current model but with the provided parameter values. This allows creating a modified version of the model without altering the original. The new parameters must match the number of parameters in the original model.
For Beginners: This method lets you change all the weights and biases in the neural network at once by providing a list of new values. It's useful when optimizing the model or loading saved weights.
When updating parameters:
- A new model is created with the same structure as this one
- The new model's weights and biases are set to the values you provide
- The original model remains unchanged
This is useful for:
- Loading pre-trained weights
- Testing different parameter values
- Implementing evolutionary algorithms
- Creating ensemble models with different parameter sets