Class DifferentiableNeuralComputer<T>
- Namespace
- AiDotNet.NeuralNetworks
- Assembly
- AiDotNet.dll
Represents a Differentiable Neural Computer (DNC), a neural network architecture that combines neural networks with external memory resources.
public class DifferentiableNeuralComputer<T> : NeuralNetworkBase<T>, INeuralNetworkModel<T>, INeuralNetwork<T>, IFullModel<T, Tensor<T>, Tensor<T>>, IModel<Tensor<T>, Tensor<T>, ModelMetadata<T>>, IModelSerializer, ICheckpointableModel, IParameterizable<T, Tensor<T>, Tensor<T>>, IFeatureAware, IFeatureImportance<T>, ICloneable<IFullModel<T, Tensor<T>, Tensor<T>>>, IGradientComputable<T, Tensor<T>, Tensor<T>>, IJitCompilable<T>, IInterpretableModel<T>, IInputGradientComputable<T>, IDisposable, IAuxiliaryLossLayer<T>, IDiagnosticsProvider
Type Parameters
TThe numeric type used for calculations, typically float or double.
- Inheritance
-
DifferentiableNeuralComputer<T>
- Implements
- Inherited Members
- Extension Methods
Remarks
A Differentiable Neural Computer (DNC) is an advanced neural network architecture that augments neural networks with an external memory matrix and mechanisms to read from and write to this memory. DNCs can learn to use their memory to store and retrieve information, enabling them to solve complex, structured problems that require reasoning and algorithm-like behavior. The key components include a controller neural network, a memory matrix, and read/write heads that interact with the memory through differentiable attention mechanisms.
For Beginners: A Differentiable Neural Computer is like a neural network with a notepad.
Imagine a traditional neural network as a person who can make decisions based on what they see, but can only keep information in their head. A DNC is like giving that person a notepad to:
- Write down important information
- Organize notes in a systematic way
- Look back at previously written notes when making decisions
- Learn which information is worth writing down and when to refer back to it
This combination of neural processing with external memory allows the DNC to solve problems that require remembering and reasoning about complex relationships or sequences of information, like navigating a subway map or following a multi-step recipe.
Constructors
DifferentiableNeuralComputer(NeuralNetworkArchitecture<T>, int, int, int, int, ILossFunction<T>?, IActivationFunction<T>?)
Initializes a new instance of the DifferentiableNeuralComputer<T> class with the specified parameters.
public DifferentiableNeuralComputer(NeuralNetworkArchitecture<T> architecture, int memorySize, int memoryWordSize, int controllerSize, int readHeads, ILossFunction<T>? lossFunction = null, IActivationFunction<T>? activationFunction = null)
Parameters
architectureNeuralNetworkArchitecture<T>The neural network architecture configuration.
memorySizeintThe number of memory locations in the memory matrix.
memoryWordSizeintThe size of each memory word or location.
controllerSizeintThe size of the controller network's output.
readHeadsintThe number of read heads that can access the memory simultaneously.
lossFunctionILossFunction<T>activationFunctionIActivationFunction<T>The scalar activation function to use. If null, defaults based on task type.
Remarks
This constructor initializes a new Differentiable Neural Computer with the specified architecture and memory parameters. It sets up the memory matrix, usage tracking vectors, read/write weightings, and temporal link matrix. The memory is initialized with small random values, and the usage vector is initialized to indicate that all memory locations are free.
For Beginners: This sets up a new DNC with a specific notepad size and reading capacity.
When creating a new DNC:
- The architecture defines the neural network's structure
- memorySize determines how many pages are in the notepad
- memoryWordSize determines how much information fits on each page
- controllerSize determines how powerful the "brain" of the system is
- readHeads determines how many pages can be read simultaneously
Think of it like configuring a new assistant with specific mental capabilities and a notepad of specific size to help them remember and reason about information.
DifferentiableNeuralComputer(NeuralNetworkArchitecture<T>, int, int, int, int, ILossFunction<T>?, IVectorActivationFunction<T>?)
Initializes a new instance of the DifferentiableNeuralComputer<T> class with the specified parameters.
public DifferentiableNeuralComputer(NeuralNetworkArchitecture<T> architecture, int memorySize, int memoryWordSize, int controllerSize, int readHeads, ILossFunction<T>? lossFunction = null, IVectorActivationFunction<T>? vectorActivationFunction = null)
Parameters
architectureNeuralNetworkArchitecture<T>The neural network architecture configuration.
memorySizeintThe number of memory locations in the memory matrix.
memoryWordSizeintThe size of each memory word or location.
controllerSizeintThe size of the controller network's output.
readHeadsintThe number of read heads that can access the memory simultaneously.
lossFunctionILossFunction<T>vectorActivationFunctionIVectorActivationFunction<T>
Remarks
This constructor initializes a new Differentiable Neural Computer with the specified architecture and memory parameters. It sets up the memory matrix, usage tracking vectors, read/write weightings, and temporal link matrix. The memory is initialized with small random values, and the usage vector is initialized to indicate that all memory locations are free.
For Beginners: This sets up a new DNC with a specific notepad size and reading capacity.
When creating a new DNC:
- The architecture defines the neural network's structure
- memorySize determines how many pages are in the notepad
- memoryWordSize determines how much information fits on each page
- controllerSize determines how powerful the "brain" of the system is
- readHeads determines how many pages can be read simultaneously
Think of it like configuring a new assistant with specific mental capabilities and a notepad of specific size to help them remember and reason about information.
Properties
AuxiliaryLossWeight
Gets or sets the weight for the memory addressing auxiliary loss.
public T AuxiliaryLossWeight { get; set; }
Property Value
- T
Remarks
This weight controls how much memory addressing regularization contributes to the total loss. Typical values range from 0.001 to 0.01.
For Beginners: This controls how much we encourage focused memory access.
Common values:
- 0.005 (default): Balanced addressing regularization
- 0.001-0.003: Light regularization
- 0.008-0.01: Strong regularization
Higher values encourage sharper memory addressing patterns.
UseAuxiliaryLoss
Gets or sets whether auxiliary loss (memory addressing regularization) should be used during training.
public bool UseAuxiliaryLoss { get; set; }
Property Value
Remarks
Memory addressing regularization prevents soft addressing from becoming too diffuse or collapsing. This encourages the DNC to learn focused, interpretable memory access patterns.
For Beginners: This helps the DNC use memory effectively.
Memory addressing regularization ensures:
- Read/write heads focus on relevant memory locations
- Addressing doesn't spread too thin across all locations
- Memory operations are interpretable and efficient
This is important because:
- Focused addressing improves memory utilization
- Sharp addressing patterns are more interpretable
- Prevents wasting computation on irrelevant memory locations
Methods
ComputeAuxiliaryLoss()
Computes the auxiliary loss for memory addressing regularization.
public T ComputeAuxiliaryLoss()
Returns
- T
The computed memory addressing auxiliary loss.
Remarks
This method computes entropy-based regularization for memory read/write addressing. It encourages focused, sharp addressing patterns while preventing diffuse addressing. Formula: L = -Σ_heads H(addressing) where H is entropy of addressing weights
For Beginners: This calculates how focused the DNC's memory access is.
Memory addressing regularization works by:
- Measuring entropy of read/write addressing weights
- Lower entropy means more focused, sharp addressing
- Higher entropy means diffuse, spread-out addressing
- We minimize negative entropy to encourage focused access
This helps because:
- Focused addressing is more interpretable
- Sharp addressing improves memory efficiency
- Prevents wasting computation on many irrelevant locations
- Encourages the DNC to learn clear memory access patterns
The auxiliary loss is added to the main task loss during training.
CreateNewInstance()
Creates a new instance of the differentiable neural computer model.
protected override IFullModel<T, Tensor<T>, Tensor<T>> CreateNewInstance()
Returns
- IFullModel<T, Tensor<T>, Tensor<T>>
A new instance of the differentiable neural computer model with the same configuration.
Remarks
This method creates a new instance of the differentiable neural computer model with the same configuration as the current instance. It is used internally during serialization/deserialization processes to create a fresh instance that can be populated with the serialized data. The new instance will have the same architecture, memory size, memory word size, controller size, read heads count, and activation function type as the original.
For Beginners: This method creates a copy of the network structure without copying the learned data.
Think of it like creating a blueprint copy of the DNC:
- It copies the same neural network architecture
- It sets up the same memory size (same notepad dimensions)
- It configures the same number of read heads (how many pages to look at at once)
- It uses the same controller size (brain power)
- It keeps the same activation function (how neurons respond to input)
- But it doesn't copy any of the actual memories or learned behaviors
This is primarily used when saving or loading models, creating an empty framework that the saved parameters and memory state can be loaded into later.
DeserializeNetworkSpecificData(BinaryReader)
Deserializes Differentiable Neural Computer-specific data from a binary reader.
protected override void DeserializeNetworkSpecificData(BinaryReader reader)
Parameters
readerBinaryReaderThe BinaryReader to read the data from.
Remarks
This method reads DNC-specific configuration and state data from a binary stream. It retrieves properties such as memory size, memory word size, controller size, read heads count, and the saved state of the memory matrix, usage vector, and other memory tracking structures.
For Beginners: This restores the special configuration and state of your DNC from saved data.
It's like restoring a snapshot of the DNC that includes:
- Its structural configuration (memory size, read heads, etc.)
- The saved contents of memory
- The saved state of all memory tracking systems
- The saved state of all memory connections
This allows you to resume from exactly the same state that was saved, with both the network's learned parameters and its memory contents intact.
GetAuxiliaryLossDiagnostics()
Gets diagnostic information about the memory addressing auxiliary loss.
public Dictionary<string, string> GetAuxiliaryLossDiagnostics()
Returns
- Dictionary<string, string>
A dictionary containing diagnostic information about memory addressing regularization.
Remarks
This method returns detailed diagnostics about memory addressing regularization, including addressing entropy, number of read/write heads, and configuration parameters. This information is useful for monitoring memory access patterns and debugging.
For Beginners: This provides information about how the DNC accesses memory.
The diagnostics include:
- Total addressing entropy loss (how focused memory access is)
- Weight applied to the regularization
- Number of read and write heads
- Whether addressing regularization is enabled
This helps you:
- Monitor if memory addressing is focused or diffuse
- Debug issues with memory access patterns
- Understand the impact of regularization on memory usage
You can use this information to adjust regularization weights for better memory utilization.
GetDiagnostics()
Gets diagnostic information about this component's state and behavior. Overrides GetDiagnostics() to include auxiliary loss diagnostics.
public Dictionary<string, string> GetDiagnostics()
Returns
- Dictionary<string, string>
A dictionary containing diagnostic metrics including both base layer diagnostics and auxiliary loss diagnostics from GetAuxiliaryLossDiagnostics().
GetModelMetadata()
Gets metadata about the Differentiable Neural Computer model.
public override ModelMetadata<T> GetModelMetadata()
Returns
- ModelMetadata<T>
A ModelMetaData object containing information about the model.
Remarks
This method returns metadata about the DNC, including its model type, memory configuration, and additional configuration information. This metadata is useful for model management and for generating reports about the model's structure and configuration.
For Beginners: This provides a summary of your DNC's configuration.
The metadata includes:
- The type of model (Differentiable Neural Computer)
- Details about memory size and word size
- Number of read heads and controller size
- Information about the network architecture
- Serialized data that can be used to save and reload the model
This information is useful for tracking different model configurations and for saving/loading models for later use.
InitializeLayers()
Initializes the layers of the Differentiable Neural Computer based on the architecture.
protected override void InitializeLayers()
Remarks
This method sets up the neural network layers of the DNC. If custom layers are provided in the architecture, those layers are used. Otherwise, default layers are created based on the architecture's specifications and the DNC's memory parameters. The layers typically include a controller network and interface layers for interacting with the memory.
For Beginners: This builds the neural network "brain" of the DNC.
When initializing the layers:
- If you've specified your own custom layers, the network will use those
- If not, the network will create a standard set of layers suitable for a DNC
- These layers include a controller (the main processing network) and interfaces to interact with the memory system
- The network calculates how large the interface needs to be based on the memory size
This is like assembling the thinking and decision-making parts of the system that will work together with the memory to solve problems.
Predict(Tensor<T>)
Makes a prediction using the Differentiable Neural Computer.
public override Tensor<T> Predict(Tensor<T> input)
Parameters
inputTensor<T>The input tensor to process.
Returns
- Tensor<T>
The output tensor containing the prediction.
Remarks
This method passes the input data through the DNC to make a prediction. It processes the input through the controller network, calculates memory interactions, and produces an output. For sequential inputs, this method should be called repeatedly, with the DNC maintaining its memory state between calls.
For Beginners: This is how the DNC processes new information and makes predictions.
The prediction process works like this:
- Input data is processed by the controller neural network
- The controller produces signals that determine how to interact with memory
- Based on these signals, information is written to and read from the memory
- The final output combines the controller's processing with information read from memory
Unlike traditional neural networks, the DNC maintains its memory state between predictions, allowing it to build up knowledge over a sequence of inputs.
ProcessSequence(List<Tensor<T>>, bool)
Processes a sequence of inputs through the DNC.
public List<Tensor<T>> ProcessSequence(List<Tensor<T>> inputs, bool resetMemory = true)
Parameters
Returns
- List<Tensor<T>>
A list of output tensors corresponding to each input.
Remarks
This method processes a sequence of inputs through the DNC, maintaining the memory state between inputs. This is particularly useful for tasks that require processing sequences of data, like language modeling, sequence prediction, or graph traversal.
For Beginners: This processes a series of inputs while maintaining memory between them.
It works like:
- Starting with a fresh memory state (or continuing from the current state)
- Processing each input one by one through the network
- Using information stored in memory from previous inputs to help process current ones
- Building up knowledge across the sequence in the external memory
This is ideal for tasks where each input is related to previous ones, like:
- Processing a paragraph of text word by word
- Following a sequence of instructions step by step
- Analyzing a time series of data points
ResetMemoryState()
Resets the state of the Differentiable Neural Computer.
public void ResetMemoryState()
Remarks
This method resets the DNC's memory state, including the memory matrix, usage tracking vectors, read/write weightings, temporal link matrix, and read vectors. This is useful when starting to process a new, unrelated sequence of inputs, or when initializing the network for a new task.
For Beginners: This is like erasing the notepad and starting fresh.
The reset process:
- Clears the memory matrix
- Resets all memory tracking systems
- Clears all memory connections
- Resets all reading and writing mechanisms
This is useful when:
- Starting a completely new task
- Ensuring that information from a previous task doesn't influence the current one
- Testing the DNC on different problems independently
Note that this doesn't reset the network's learned parameters, just its current memory state.
SerializeNetworkSpecificData(BinaryWriter)
Serializes Differentiable Neural Computer-specific data to a binary writer.
protected override void SerializeNetworkSpecificData(BinaryWriter writer)
Parameters
writerBinaryWriterThe BinaryWriter to write the data to.
Remarks
This method writes DNC-specific configuration and state data to a binary stream. It includes properties such as memory size, memory word size, controller size, read heads count, and the current state of the memory matrix, usage vector, and other memory tracking structures.
For Beginners: This saves the special configuration and current state of your DNC.
It's like taking a snapshot of the DNC that includes:
- Its structural configuration (memory size, read heads, etc.)
- The current contents of memory
- The current state of all memory tracking systems
- The current state of all memory connections
This allows you to save both the network's learned parameters and its current memory state, so you can resume from exactly the same state later.
Train(Tensor<T>, Tensor<T>)
Trains the Differentiable Neural Computer on a single batch of data.
public override void Train(Tensor<T> input, Tensor<T> expectedOutput)
Parameters
inputTensor<T>The input tensor for training.
expectedOutputTensor<T>The expected output tensor for the given input.
Remarks
This method trains the DNC on a single batch of data using backpropagation through time (BPTT). It processes the input through the network, computes the error with respect to the expected output, and updates the network parameters to reduce this error. For sequential data, this method should be called with sequences of inputs and expected outputs.
For Beginners: This is how the DNC learns from examples.
The training process works like this:
- The input is processed through the network (like in prediction)
- The output is compared to the expected output to calculate the error
- This error is propagated backward through the network
- The network's parameters are updated to reduce this error
Unlike traditional neural networks, DNCs must be careful to propagate errors through their memory operations as well as through the neural network components.
UpdateParameters(Vector<T>)
Updates the parameters of all layers in the Differentiable Neural Computer.
public override void UpdateParameters(Vector<T> parameters)
Parameters
parametersVector<T>A vector containing the parameters to update all layers with.
Remarks
This method distributes the provided parameter vector among all the layers in the network. Each layer receives a portion of the parameter vector corresponding to its number of parameters. The method keeps track of the starting index for each layer's parameters in the input vector.
For Beginners: This updates all the internal values of the neural network at once.
When updating parameters:
- The input is a long list of numbers representing all values in the entire network
- The method divides this list into smaller chunks
- Each layer gets its own chunk of values
- The layers use these values to adjust their internal settings
This method is typically used during training or when loading a pre-trained model, allowing all network parameters to be updated at once.