Table of Contents

Class DifferentiableNeuralComputer<T>

Namespace
AiDotNet.NeuralNetworks
Assembly
AiDotNet.dll

Represents a Differentiable Neural Computer (DNC), a neural network architecture that combines neural networks with external memory resources.

public class DifferentiableNeuralComputer<T> : NeuralNetworkBase<T>, INeuralNetworkModel<T>, INeuralNetwork<T>, IFullModel<T, Tensor<T>, Tensor<T>>, IModel<Tensor<T>, Tensor<T>, ModelMetadata<T>>, IModelSerializer, ICheckpointableModel, IParameterizable<T, Tensor<T>, Tensor<T>>, IFeatureAware, IFeatureImportance<T>, ICloneable<IFullModel<T, Tensor<T>, Tensor<T>>>, IGradientComputable<T, Tensor<T>, Tensor<T>>, IJitCompilable<T>, IInterpretableModel<T>, IInputGradientComputable<T>, IDisposable, IAuxiliaryLossLayer<T>, IDiagnosticsProvider

Type Parameters

T

The numeric type used for calculations, typically float or double.

Inheritance
DifferentiableNeuralComputer<T>
Implements
IFullModel<T, Tensor<T>, Tensor<T>>
IModel<Tensor<T>, Tensor<T>, ModelMetadata<T>>
IParameterizable<T, Tensor<T>, Tensor<T>>
ICloneable<IFullModel<T, Tensor<T>, Tensor<T>>>
IGradientComputable<T, Tensor<T>, Tensor<T>>
Inherited Members
Extension Methods

Remarks

A Differentiable Neural Computer (DNC) is an advanced neural network architecture that augments neural networks with an external memory matrix and mechanisms to read from and write to this memory. DNCs can learn to use their memory to store and retrieve information, enabling them to solve complex, structured problems that require reasoning and algorithm-like behavior. The key components include a controller neural network, a memory matrix, and read/write heads that interact with the memory through differentiable attention mechanisms.

For Beginners: A Differentiable Neural Computer is like a neural network with a notepad.

Imagine a traditional neural network as a person who can make decisions based on what they see, but can only keep information in their head. A DNC is like giving that person a notepad to:

  • Write down important information
  • Organize notes in a systematic way
  • Look back at previously written notes when making decisions
  • Learn which information is worth writing down and when to refer back to it

This combination of neural processing with external memory allows the DNC to solve problems that require remembering and reasoning about complex relationships or sequences of information, like navigating a subway map or following a multi-step recipe.

Constructors

DifferentiableNeuralComputer(NeuralNetworkArchitecture<T>, int, int, int, int, ILossFunction<T>?, IActivationFunction<T>?)

Initializes a new instance of the DifferentiableNeuralComputer<T> class with the specified parameters.

public DifferentiableNeuralComputer(NeuralNetworkArchitecture<T> architecture, int memorySize, int memoryWordSize, int controllerSize, int readHeads, ILossFunction<T>? lossFunction = null, IActivationFunction<T>? activationFunction = null)

Parameters

architecture NeuralNetworkArchitecture<T>

The neural network architecture configuration.

memorySize int

The number of memory locations in the memory matrix.

memoryWordSize int

The size of each memory word or location.

controllerSize int

The size of the controller network's output.

readHeads int

The number of read heads that can access the memory simultaneously.

lossFunction ILossFunction<T>
activationFunction IActivationFunction<T>

The scalar activation function to use. If null, defaults based on task type.

Remarks

This constructor initializes a new Differentiable Neural Computer with the specified architecture and memory parameters. It sets up the memory matrix, usage tracking vectors, read/write weightings, and temporal link matrix. The memory is initialized with small random values, and the usage vector is initialized to indicate that all memory locations are free.

For Beginners: This sets up a new DNC with a specific notepad size and reading capacity.

When creating a new DNC:

  • The architecture defines the neural network's structure
  • memorySize determines how many pages are in the notepad
  • memoryWordSize determines how much information fits on each page
  • controllerSize determines how powerful the "brain" of the system is
  • readHeads determines how many pages can be read simultaneously

Think of it like configuring a new assistant with specific mental capabilities and a notepad of specific size to help them remember and reason about information.

DifferentiableNeuralComputer(NeuralNetworkArchitecture<T>, int, int, int, int, ILossFunction<T>?, IVectorActivationFunction<T>?)

Initializes a new instance of the DifferentiableNeuralComputer<T> class with the specified parameters.

public DifferentiableNeuralComputer(NeuralNetworkArchitecture<T> architecture, int memorySize, int memoryWordSize, int controllerSize, int readHeads, ILossFunction<T>? lossFunction = null, IVectorActivationFunction<T>? vectorActivationFunction = null)

Parameters

architecture NeuralNetworkArchitecture<T>

The neural network architecture configuration.

memorySize int

The number of memory locations in the memory matrix.

memoryWordSize int

The size of each memory word or location.

controllerSize int

The size of the controller network's output.

readHeads int

The number of read heads that can access the memory simultaneously.

lossFunction ILossFunction<T>
vectorActivationFunction IVectorActivationFunction<T>

Remarks

This constructor initializes a new Differentiable Neural Computer with the specified architecture and memory parameters. It sets up the memory matrix, usage tracking vectors, read/write weightings, and temporal link matrix. The memory is initialized with small random values, and the usage vector is initialized to indicate that all memory locations are free.

For Beginners: This sets up a new DNC with a specific notepad size and reading capacity.

When creating a new DNC:

  • The architecture defines the neural network's structure
  • memorySize determines how many pages are in the notepad
  • memoryWordSize determines how much information fits on each page
  • controllerSize determines how powerful the "brain" of the system is
  • readHeads determines how many pages can be read simultaneously

Think of it like configuring a new assistant with specific mental capabilities and a notepad of specific size to help them remember and reason about information.

Properties

AuxiliaryLossWeight

Gets or sets the weight for the memory addressing auxiliary loss.

public T AuxiliaryLossWeight { get; set; }

Property Value

T

Remarks

This weight controls how much memory addressing regularization contributes to the total loss. Typical values range from 0.001 to 0.01.

For Beginners: This controls how much we encourage focused memory access.

Common values:

  • 0.005 (default): Balanced addressing regularization
  • 0.001-0.003: Light regularization
  • 0.008-0.01: Strong regularization

Higher values encourage sharper memory addressing patterns.

UseAuxiliaryLoss

Gets or sets whether auxiliary loss (memory addressing regularization) should be used during training.

public bool UseAuxiliaryLoss { get; set; }

Property Value

bool

Remarks

Memory addressing regularization prevents soft addressing from becoming too diffuse or collapsing. This encourages the DNC to learn focused, interpretable memory access patterns.

For Beginners: This helps the DNC use memory effectively.

Memory addressing regularization ensures:

  • Read/write heads focus on relevant memory locations
  • Addressing doesn't spread too thin across all locations
  • Memory operations are interpretable and efficient

This is important because:

  • Focused addressing improves memory utilization
  • Sharp addressing patterns are more interpretable
  • Prevents wasting computation on irrelevant memory locations

Methods

ComputeAuxiliaryLoss()

Computes the auxiliary loss for memory addressing regularization.

public T ComputeAuxiliaryLoss()

Returns

T

The computed memory addressing auxiliary loss.

Remarks

This method computes entropy-based regularization for memory read/write addressing. It encourages focused, sharp addressing patterns while preventing diffuse addressing. Formula: L = -Σ_heads H(addressing) where H is entropy of addressing weights

For Beginners: This calculates how focused the DNC's memory access is.

Memory addressing regularization works by:

  1. Measuring entropy of read/write addressing weights
  2. Lower entropy means more focused, sharp addressing
  3. Higher entropy means diffuse, spread-out addressing
  4. We minimize negative entropy to encourage focused access

This helps because:

  • Focused addressing is more interpretable
  • Sharp addressing improves memory efficiency
  • Prevents wasting computation on many irrelevant locations
  • Encourages the DNC to learn clear memory access patterns

The auxiliary loss is added to the main task loss during training.

CreateNewInstance()

Creates a new instance of the differentiable neural computer model.

protected override IFullModel<T, Tensor<T>, Tensor<T>> CreateNewInstance()

Returns

IFullModel<T, Tensor<T>, Tensor<T>>

A new instance of the differentiable neural computer model with the same configuration.

Remarks

This method creates a new instance of the differentiable neural computer model with the same configuration as the current instance. It is used internally during serialization/deserialization processes to create a fresh instance that can be populated with the serialized data. The new instance will have the same architecture, memory size, memory word size, controller size, read heads count, and activation function type as the original.

For Beginners: This method creates a copy of the network structure without copying the learned data.

Think of it like creating a blueprint copy of the DNC:

  • It copies the same neural network architecture
  • It sets up the same memory size (same notepad dimensions)
  • It configures the same number of read heads (how many pages to look at at once)
  • It uses the same controller size (brain power)
  • It keeps the same activation function (how neurons respond to input)
  • But it doesn't copy any of the actual memories or learned behaviors

This is primarily used when saving or loading models, creating an empty framework that the saved parameters and memory state can be loaded into later.

DeserializeNetworkSpecificData(BinaryReader)

Deserializes Differentiable Neural Computer-specific data from a binary reader.

protected override void DeserializeNetworkSpecificData(BinaryReader reader)

Parameters

reader BinaryReader

The BinaryReader to read the data from.

Remarks

This method reads DNC-specific configuration and state data from a binary stream. It retrieves properties such as memory size, memory word size, controller size, read heads count, and the saved state of the memory matrix, usage vector, and other memory tracking structures.

For Beginners: This restores the special configuration and state of your DNC from saved data.

It's like restoring a snapshot of the DNC that includes:

  • Its structural configuration (memory size, read heads, etc.)
  • The saved contents of memory
  • The saved state of all memory tracking systems
  • The saved state of all memory connections

This allows you to resume from exactly the same state that was saved, with both the network's learned parameters and its memory contents intact.

GetAuxiliaryLossDiagnostics()

Gets diagnostic information about the memory addressing auxiliary loss.

public Dictionary<string, string> GetAuxiliaryLossDiagnostics()

Returns

Dictionary<string, string>

A dictionary containing diagnostic information about memory addressing regularization.

Remarks

This method returns detailed diagnostics about memory addressing regularization, including addressing entropy, number of read/write heads, and configuration parameters. This information is useful for monitoring memory access patterns and debugging.

For Beginners: This provides information about how the DNC accesses memory.

The diagnostics include:

  • Total addressing entropy loss (how focused memory access is)
  • Weight applied to the regularization
  • Number of read and write heads
  • Whether addressing regularization is enabled

This helps you:

  • Monitor if memory addressing is focused or diffuse
  • Debug issues with memory access patterns
  • Understand the impact of regularization on memory usage

You can use this information to adjust regularization weights for better memory utilization.

GetDiagnostics()

Gets diagnostic information about this component's state and behavior. Overrides GetDiagnostics() to include auxiliary loss diagnostics.

public Dictionary<string, string> GetDiagnostics()

Returns

Dictionary<string, string>

A dictionary containing diagnostic metrics including both base layer diagnostics and auxiliary loss diagnostics from GetAuxiliaryLossDiagnostics().

GetModelMetadata()

Gets metadata about the Differentiable Neural Computer model.

public override ModelMetadata<T> GetModelMetadata()

Returns

ModelMetadata<T>

A ModelMetaData object containing information about the model.

Remarks

This method returns metadata about the DNC, including its model type, memory configuration, and additional configuration information. This metadata is useful for model management and for generating reports about the model's structure and configuration.

For Beginners: This provides a summary of your DNC's configuration.

The metadata includes:

  • The type of model (Differentiable Neural Computer)
  • Details about memory size and word size
  • Number of read heads and controller size
  • Information about the network architecture
  • Serialized data that can be used to save and reload the model

This information is useful for tracking different model configurations and for saving/loading models for later use.

InitializeLayers()

Initializes the layers of the Differentiable Neural Computer based on the architecture.

protected override void InitializeLayers()

Remarks

This method sets up the neural network layers of the DNC. If custom layers are provided in the architecture, those layers are used. Otherwise, default layers are created based on the architecture's specifications and the DNC's memory parameters. The layers typically include a controller network and interface layers for interacting with the memory.

For Beginners: This builds the neural network "brain" of the DNC.

When initializing the layers:

  • If you've specified your own custom layers, the network will use those
  • If not, the network will create a standard set of layers suitable for a DNC
  • These layers include a controller (the main processing network) and interfaces to interact with the memory system
  • The network calculates how large the interface needs to be based on the memory size

This is like assembling the thinking and decision-making parts of the system that will work together with the memory to solve problems.

Predict(Tensor<T>)

Makes a prediction using the Differentiable Neural Computer.

public override Tensor<T> Predict(Tensor<T> input)

Parameters

input Tensor<T>

The input tensor to process.

Returns

Tensor<T>

The output tensor containing the prediction.

Remarks

This method passes the input data through the DNC to make a prediction. It processes the input through the controller network, calculates memory interactions, and produces an output. For sequential inputs, this method should be called repeatedly, with the DNC maintaining its memory state between calls.

For Beginners: This is how the DNC processes new information and makes predictions.

The prediction process works like this:

  1. Input data is processed by the controller neural network
  2. The controller produces signals that determine how to interact with memory
  3. Based on these signals, information is written to and read from the memory
  4. The final output combines the controller's processing with information read from memory

Unlike traditional neural networks, the DNC maintains its memory state between predictions, allowing it to build up knowledge over a sequence of inputs.

ProcessSequence(List<Tensor<T>>, bool)

Processes a sequence of inputs through the DNC.

public List<Tensor<T>> ProcessSequence(List<Tensor<T>> inputs, bool resetMemory = true)

Parameters

inputs List<Tensor<T>>

A list of input tensors representing a sequence.

resetMemory bool

Returns

List<Tensor<T>>

A list of output tensors corresponding to each input.

Remarks

This method processes a sequence of inputs through the DNC, maintaining the memory state between inputs. This is particularly useful for tasks that require processing sequences of data, like language modeling, sequence prediction, or graph traversal.

For Beginners: This processes a series of inputs while maintaining memory between them.

It works like:

  1. Starting with a fresh memory state (or continuing from the current state)
  2. Processing each input one by one through the network
  3. Using information stored in memory from previous inputs to help process current ones
  4. Building up knowledge across the sequence in the external memory

This is ideal for tasks where each input is related to previous ones, like:

  • Processing a paragraph of text word by word
  • Following a sequence of instructions step by step
  • Analyzing a time series of data points

ResetMemoryState()

Resets the state of the Differentiable Neural Computer.

public void ResetMemoryState()

Remarks

This method resets the DNC's memory state, including the memory matrix, usage tracking vectors, read/write weightings, temporal link matrix, and read vectors. This is useful when starting to process a new, unrelated sequence of inputs, or when initializing the network for a new task.

For Beginners: This is like erasing the notepad and starting fresh.

The reset process:

  1. Clears the memory matrix
  2. Resets all memory tracking systems
  3. Clears all memory connections
  4. Resets all reading and writing mechanisms

This is useful when:

  • Starting a completely new task
  • Ensuring that information from a previous task doesn't influence the current one
  • Testing the DNC on different problems independently

Note that this doesn't reset the network's learned parameters, just its current memory state.

SerializeNetworkSpecificData(BinaryWriter)

Serializes Differentiable Neural Computer-specific data to a binary writer.

protected override void SerializeNetworkSpecificData(BinaryWriter writer)

Parameters

writer BinaryWriter

The BinaryWriter to write the data to.

Remarks

This method writes DNC-specific configuration and state data to a binary stream. It includes properties such as memory size, memory word size, controller size, read heads count, and the current state of the memory matrix, usage vector, and other memory tracking structures.

For Beginners: This saves the special configuration and current state of your DNC.

It's like taking a snapshot of the DNC that includes:

  • Its structural configuration (memory size, read heads, etc.)
  • The current contents of memory
  • The current state of all memory tracking systems
  • The current state of all memory connections

This allows you to save both the network's learned parameters and its current memory state, so you can resume from exactly the same state later.

Train(Tensor<T>, Tensor<T>)

Trains the Differentiable Neural Computer on a single batch of data.

public override void Train(Tensor<T> input, Tensor<T> expectedOutput)

Parameters

input Tensor<T>

The input tensor for training.

expectedOutput Tensor<T>

The expected output tensor for the given input.

Remarks

This method trains the DNC on a single batch of data using backpropagation through time (BPTT). It processes the input through the network, computes the error with respect to the expected output, and updates the network parameters to reduce this error. For sequential data, this method should be called with sequences of inputs and expected outputs.

For Beginners: This is how the DNC learns from examples.

The training process works like this:

  1. The input is processed through the network (like in prediction)
  2. The output is compared to the expected output to calculate the error
  3. This error is propagated backward through the network
  4. The network's parameters are updated to reduce this error

Unlike traditional neural networks, DNCs must be careful to propagate errors through their memory operations as well as through the neural network components.

UpdateParameters(Vector<T>)

Updates the parameters of all layers in the Differentiable Neural Computer.

public override void UpdateParameters(Vector<T> parameters)

Parameters

parameters Vector<T>

A vector containing the parameters to update all layers with.

Remarks

This method distributes the provided parameter vector among all the layers in the network. Each layer receives a portion of the parameter vector corresponding to its number of parameters. The method keeps track of the starting index for each layer's parameters in the input vector.

For Beginners: This updates all the internal values of the neural network at once.

When updating parameters:

  • The input is a long list of numbers representing all values in the entire network
  • The method divides this list into smaller chunks
  • Each layer gets its own chunk of values
  • The layers use these values to adjust their internal settings

This method is typically used during training or when loading a pre-trained model, allowing all network parameters to be updated at once.