Class NeuralTuringMachine<T>

Namespace: AiDotNet.NeuralNetworks

Assembly: AiDotNet.dll

Represents a Neural Turing Machine, which is a neural network architecture that combines a neural network with external memory.

public class NeuralTuringMachine<T> : NeuralNetworkBase<T>, INeuralNetworkModel<T>, INeuralNetwork<T>, IFullModel<T, Tensor<T>, Tensor<T>>, IModel<Tensor<T>, Tensor<T>, ModelMetadata<T>>, IModelSerializer, ICheckpointableModel, IParameterizable<T, Tensor<T>, Tensor<T>>, IFeatureAware, IFeatureImportance<T>, ICloneable<IFullModel<T, Tensor<T>, Tensor<T>>>, IGradientComputable<T, Tensor<T>, Tensor<T>>, IJitCompilable<T>, IInterpretableModel<T>, IInputGradientComputable<T>, IDisposable, IAuxiliaryLossLayer<T>, IDiagnosticsProvider

Type Parameters

T: The numeric type used for calculations, typically float or double.

Inheritance: object

NeuralNetworkBase<T>

NeuralTuringMachine<T>

Implements: INeuralNetworkModel<T>

INeuralNetwork<T>

IFullModel<T, Tensor<T>, Tensor<T>>

IModel<Tensor<T>, Tensor<T>, ModelMetadata<T>>

IModelSerializer

ICheckpointableModel

IParameterizable<T, Tensor<T>, Tensor<T>>

IFeatureAware

IFeatureImportance<T>

ICloneable<IFullModel<T, Tensor<T>, Tensor<T>>>

IGradientComputable<T, Tensor<T>, Tensor<T>>

IJitCompilable<T>

IInterpretableModel<T>

IInputGradientComputable<T>

IDisposable

IAuxiliaryLossLayer<T>

IDiagnosticsProvider

Inherited Members: NeuralNetworkBase<T>.Layers

NeuralNetworkBase<T>.LayerCount

NeuralNetworkBase<T>.Architecture

NeuralNetworkBase<T>.NumOps

NeuralNetworkBase<T>.Engine

NeuralNetworkBase<T>._layerInputs

NeuralNetworkBase<T>._layerOutputs

NeuralNetworkBase<T>.Random

NeuralNetworkBase<T>.LossFunction

NeuralNetworkBase<T>.LastLoss

NeuralNetworkBase<T>.IsTrainingMode

NeuralNetworkBase<T>.SupportsTraining

NeuralNetworkBase<T>.SupportsGpuTraining

NeuralNetworkBase<T>.CanTrainOnGpu

NeuralNetworkBase<T>.GpuEngine

NeuralNetworkBase<T>.MaxGradNorm

NeuralNetworkBase<T>._mixedPrecisionContext

NeuralNetworkBase<T>._memoryManager

NeuralNetworkBase<T>.IsMemoryManagementEnabled

NeuralNetworkBase<T>.IsGradientCheckpointingEnabled

NeuralNetworkBase<T>.IsMixedPrecisionEnabled

NeuralNetworkBase<T>.ClipGradients(List<Tensor<T>>)

NeuralNetworkBase<T>.ClipGradient(Tensor<T>)

NeuralNetworkBase<T>.ClipGradient(Vector<T>)

NeuralNetworkBase<T>.GetParameters()

NeuralNetworkBase<T>.Backpropagate(Tensor<T>)

NeuralNetworkBase<T>.BackpropagateWithRecompute(Tensor<T>)

NeuralNetworkBase<T>.ForwardGpu(IGpuTensor<T>)

NeuralNetworkBase<T>.BackpropagateGpu(IGpuTensor<T>)

NeuralNetworkBase<T>.BackpropagateGpuDeferred(IGpuTensor<T>, GpuExecutionOptions)

NeuralNetworkBase<T>.UpdateParametersGpu(T, T, T)

NeuralNetworkBase<T>.UpdateParametersGpu(IGpuOptimizerConfig)

NeuralNetworkBase<T>.UpdateParametersGpuDeferred(IGpuOptimizerConfig, GpuExecutionOptions)

NeuralNetworkBase<T>.TrainBatchGpuDeferred(IGpuTensor<T>, IGpuTensor<T>, IGpuOptimizerConfig, GpuExecutionOptions)

NeuralNetworkBase<T>.TrainBatchGpuDeferredAsync(IGpuTensor<T>, IGpuTensor<T>, IGpuOptimizerConfig, GpuExecutionOptions, CancellationToken)

NeuralNetworkBase<T>.UploadWeightsToGpu()

NeuralNetworkBase<T>.DownloadWeightsFromGpu()

NeuralNetworkBase<T>.ZeroGradientsGpu()

NeuralNetworkBase<T>.ExtractSingleExample(Tensor<T>, int)

NeuralNetworkBase<T>.ForwardWithMemory(Tensor<T>)

NeuralNetworkBase<T>.ForwardWithCheckpointing(Tensor<T>)

NeuralNetworkBase<T>.CanUseGpuResidentPath()

NeuralNetworkBase<T>.TryForwardGpuOptimized(Tensor<T>, out Tensor<T>)

NeuralNetworkBase<T>.ForwardGpu(Tensor<T>)

NeuralNetworkBase<T>.ForwardDeferred(Tensor<T>)

NeuralNetworkBase<T>.ForwardDeferredAsync(Tensor<T>, CancellationToken)

NeuralNetworkBase<T>.BeginGpuExecution(GpuExecutionOptions)

NeuralNetworkBase<T>.ForwardWithGpuContext(Tensor<T>)

NeuralNetworkBase<T>.ForwardWithGpuContext(IGpuTensor<T>)

NeuralNetworkBase<T>.GetGpuMemoryStats()

NeuralNetworkBase<T>.ForwardWithFeatures(Tensor<T>, int[])

NeuralNetworkBase<T>.ParameterCount

NeuralNetworkBase<T>.GetParameterCount()

NeuralNetworkBase<T>.InvalidateParameterCountCache()

NeuralNetworkBase<T>.AddLayerToCollection(ILayer<T>)

NeuralNetworkBase<T>.RemoveLayerFromCollection(ILayer<T>)

NeuralNetworkBase<T>.ClearLayers()

NeuralNetworkBase<T>.ValidateCustomLayers(List<ILayer<T>>)

NeuralNetworkBase<T>.ValidateCustomLayersInternal(List<ILayer<T>>)

NeuralNetworkBase<T>.IsValidInputLayer(ILayer<T>)

NeuralNetworkBase<T>.IsValidOutputLayer(ILayer<T>)

NeuralNetworkBase<T>.AreLayersCompatible(ILayer<T>, ILayer<T>)

NeuralNetworkBase<T>.GetParameterGradients()

NeuralNetworkBase<T>.EnsureArchitectureInitialized()

NeuralNetworkBase<T>.EnableMemoryManagement(TrainingMemoryConfig)

NeuralNetworkBase<T>.DisableMemoryManagement()

NeuralNetworkBase<T>.GetMemoryEstimate(int, int)

NeuralNetworkBase<T>.GetLastLoss()

NeuralNetworkBase<T>.BackwardWithInputGradient(Tensor<T>)

NeuralNetworkBase<T>.ComputeInputGradient(Vector<T>, Vector<T>)

NeuralNetworkBase<T>.ComputeInputGradient(Tensor<T>, Tensor<T>)

NeuralNetworkBase<T>.SaveModel(string)

NeuralNetworkBase<T>.LoadModel(string)

NeuralNetworkBase<T>.Serialize()

NeuralNetworkBase<T>.Deserialize(byte[])

NeuralNetworkBase<T>.WithParameters(Vector<T>)

NeuralNetworkBase<T>.GetActiveFeatureIndices()

NeuralNetworkBase<T>.IsFeatureUsed(int)

NeuralNetworkBase<T>.DeepCopy()

NeuralNetworkBase<T>.Clone()

NeuralNetworkBase<T>.SetActiveFeatureIndices(IEnumerable<int>)

NeuralNetworkBase<T>._enabledMethods

NeuralNetworkBase<T>._sensitiveFeatures

NeuralNetworkBase<T>._fairnessMetrics

NeuralNetworkBase<T>._baseModel

NeuralNetworkBase<T>.GetGlobalFeatureImportanceAsync()

NeuralNetworkBase<T>.GetLocalFeatureImportanceAsync(Tensor<T>)

NeuralNetworkBase<T>.GetShapValuesAsync(Tensor<T>)

NeuralNetworkBase<T>.GetLimeExplanationAsync(Tensor<T>, int)

NeuralNetworkBase<T>.GetPartialDependenceAsync(Vector<int>, int)

NeuralNetworkBase<T>.GetCounterfactualAsync(Tensor<T>, Tensor<T>, int)

NeuralNetworkBase<T>.GetModelSpecificInterpretabilityAsync()

NeuralNetworkBase<T>.GenerateTextExplanationAsync(Tensor<T>, Tensor<T>)

NeuralNetworkBase<T>.GetFeatureInteractionAsync(int, int)

NeuralNetworkBase<T>.ValidateFairnessAsync(Tensor<T>, int)

NeuralNetworkBase<T>.GetAnchorExplanationAsync(Tensor<T>, T)

NeuralNetworkBase<T>.SetBaseModel<TInput, TOutput>(IFullModel<T, TInput, TOutput>)

NeuralNetworkBase<T>.EnableMethod(params InterpretationMethod[])

NeuralNetworkBase<T>.ConfigureFairness(Vector<int>, params FairnessMetric[])

NeuralNetworkBase<T>.GetNamedLayerActivations(Tensor<T>)

NeuralNetworkBase<T>.GetArchitecture()

NeuralNetworkBase<T>.GetFeatureImportance()

NeuralNetworkBase<T>.SetParameters(Vector<T>)

NeuralNetworkBase<T>.AddLayer(LayerType, int, ActivationFunction)

NeuralNetworkBase<T>.AddConvolutionalLayer(int, int, int, ActivationFunction)

NeuralNetworkBase<T>.AddLSTMLayer(int, bool)

NeuralNetworkBase<T>.AddDropoutLayer(double)

NeuralNetworkBase<T>.AddBatchNormalizationLayer(int, double, double)

NeuralNetworkBase<T>.AddPoolingLayer(int[], PoolingType, int, int?)

NeuralNetworkBase<T>.GetGradients()

NeuralNetworkBase<T>.GetInputShape()

NeuralNetworkBase<T>.GetLayerActivations(Tensor<T>)

NeuralNetworkBase<T>.DefaultLossFunction

NeuralNetworkBase<T>.ComputeGradients(Tensor<T>, Tensor<T>, ILossFunction<T>)

NeuralNetworkBase<T>.ApplyGradients(Vector<T>, T)

NeuralNetworkBase<T>.SaveState(Stream)

NeuralNetworkBase<T>.LoadState(Stream)

NeuralNetworkBase<T>.Dispose()

NeuralNetworkBase<T>.Dispose(bool)

NeuralNetworkBase<T>.SupportsJitCompilation

NeuralNetworkBase<T>.ExportComputationGraph(List<ComputationNode<T>>)

NeuralNetworkBase<T>.ConvertLayerToGraph(ILayer<T>, ComputationNode<T>)

object.Equals(object)

object.Equals(object, object)

object.GetHashCode()

object.GetType()

object.MemberwiseClone()

object.ReferenceEquals(object, object)

object.ToString()

Extension Methods: DistributedExtensions.AsDistributedForHighBandwidth<T, TInput, TOutput>(IFullModel<T, TInput, TOutput>, ICommunicationBackend<T>)

DistributedExtensions.AsDistributedForLowBandwidth<T, TInput, TOutput>(IFullModel<T, TInput, TOutput>, ICommunicationBackend<T>)

DistributedExtensions.AsDistributed<T, TInput, TOutput>(IFullModel<T, TInput, TOutput>, ICommunicationBackend<T>)

DistributedExtensions.AsDistributed<T, TInput, TOutput>(IFullModel<T, TInput, TOutput>, IShardingConfiguration<T>)

Remarks

A Neural Turing Machine (NTM) extends traditional neural networks by adding an external memory component that the network can read from and write to. This allows the network to store and retrieve information over long sequences, making it particularly effective for tasks requiring complex memory operations.

For Beginners: A Neural Turing Machine is like a neural network with a "notebook" that it can write to and read from.

Think of it like a student solving a math problem:

The student (neural network) can process information directly
But for complex problems, the student needs to write down intermediate steps in a notebook (external memory)
The student can later refer back to these notes when needed

This memory capability helps the network:

Remember information over long periods
Store and retrieve specific pieces of data
Learn more complex patterns that require step-by-step reasoning

For example, a standard neural network might struggle to add two long numbers, but an NTM can learn to write down partial results and carry digits, similar to how humans solve addition problems.

Constructors

NeuralTuringMachine(NeuralNetworkArchitecture<T>, int, int, int, ILossFunction<T>?, IActivationFunction<T>?, IActivationFunction<T>?, IActivationFunction<T>?)

Initializes a new instance of the NeuralTuringMachine<T> class.

public NeuralTuringMachine(NeuralNetworkArchitecture<T> architecture, int memorySize, int memoryVectorSize, int controllerSize, ILossFunction<T>? lossFunction = null, IActivationFunction<T>? contentAddressingActivation = null, IActivationFunction<T>? gateActivation = null, IActivationFunction<T>? outputActivation = null)

Parameters

architecture NeuralNetworkArchitecture<T>: The neural network architecture to use for the NTM.
memorySize int: The number of memory locations (rows in the memory matrix).
memoryVectorSize int: The size of each memory vector (columns in the memory matrix).
controllerSize int: The size of the controller network that manages memory operations.
lossFunction ILossFunction<T>: The loss function to use for training.
contentAddressingActivation IActivationFunction<T>: The activation function to apply to content-based addressing. If null, softmax will be used.
gateActivation IActivationFunction<T>: The activation function to apply to interpolation gates. If null, sigmoid will be used.
outputActivation IActivationFunction<T>: The activation function to apply to the final output. If null, a default based on task type will be used.

NeuralTuringMachine(NeuralNetworkArchitecture<T>, int, int, int, ILossFunction<T>?, IVectorActivationFunction<T>?, IVectorActivationFunction<T>?, IVectorActivationFunction<T>?)

Initializes a new instance of the NeuralTuringMachine<T> class.

public NeuralTuringMachine(NeuralNetworkArchitecture<T> architecture, int memorySize, int memoryVectorSize, int controllerSize, ILossFunction<T>? lossFunction = null, IVectorActivationFunction<T>? contentAddressingActivation = null, IVectorActivationFunction<T>? gateActivation = null, IVectorActivationFunction<T>? outputActivation = null)

Parameters

architecture NeuralNetworkArchitecture<T>: The neural network architecture to use for the NTM.
memorySize int: The number of memory locations (rows in the memory matrix).
memoryVectorSize int: The size of each memory vector (columns in the memory matrix).
controllerSize int: The size of the controller network that manages memory operations.
lossFunction ILossFunction<T>: The loss function to use for training.
contentAddressingActivation IVectorActivationFunction<T>: The activation function to apply to content-based addressing. If null, softmax will be used.
gateActivation IVectorActivationFunction<T>: The activation function to apply to interpolation gates. If null, sigmoid will be used.
outputActivation IVectorActivationFunction<T>: The activation function to apply to the final output. If null, a default based on task type will be used.

Properties

AuxiliaryLossWeight

Gets or sets the weight for the memory usage auxiliary loss.

public T AuxiliaryLossWeight { get; set; }

Property Value

T

Remarks

This weight controls how much memory usage regularization contributes to the total loss. Typical values range from 0.001 to 0.01.

For Beginners: This controls how much we encourage focused memory access.

Common values:

0.005 (default): Balanced memory regularization
0.001-0.003: Light regularization
0.008-0.01: Strong regularization

Higher values encourage sharper, more focused memory usage.

ContentAddressingActivation

The activation function to apply to content-based addressing similarity scores.

public IActivationFunction<T>? ContentAddressingActivation { get; }

Property Value

IActivationFunction<T>

ContentAddressingVectorActivation

The activation function to apply to content-based addressing similarity scores.

public IVectorActivationFunction<T>? ContentAddressingVectorActivation { get; }

Property Value

IVectorActivationFunction<T>

GateActivation

The activation function to apply to interpolation gates.

public IActivationFunction<T>? GateActivation { get; }

Property Value

IActivationFunction<T>

GateVectorActivation

The activation function to apply to interpolation gates.

public IVectorActivationFunction<T>? GateVectorActivation { get; }

Property Value

IVectorActivationFunction<T>

OutputActivation

The activation function to apply to the final output.

public IActivationFunction<T>? OutputActivation { get; }

Property Value

IActivationFunction<T>

OutputVectorActivation

The activation function to apply to the final output.

public IVectorActivationFunction<T>? OutputVectorActivation { get; }

Property Value

IVectorActivationFunction<T>

UseAuxiliaryLoss

Gets or sets whether auxiliary loss (memory usage regularization) should be used during training.

public bool UseAuxiliaryLoss { get; set; }

Property Value

bool

Remarks

Memory usage regularization prevents memory addressing from becoming too diffuse or collapsing. This encourages the NTM to learn focused, interpretable memory access patterns.

For Beginners: This helps the NTM use its memory notebook effectively.

Memory usage regularization ensures:

Read/write operations focus on relevant memory locations
Memory access doesn't spread too thin
Memory operations are interpretable and efficient

This is like encouraging a student to:

Write clearly in specific sections of the notebook
Not scribble all over every page
Use the notebook in an organized, focused way

Methods

ComputeAuxiliaryLoss()

Computes the auxiliary loss for memory usage regularization.

public T ComputeAuxiliaryLoss()

Returns

T: The computed memory usage auxiliary loss.

Remarks

This method computes entropy-based regularization for memory read/write addressing. It encourages focused, sharp memory access patterns while preventing diffuse addressing. Formula: L = -Σ H(addressing_weights) where H is entropy

For Beginners: This calculates how focused the NTM's memory usage is.

Memory usage regularization works by:

Measuring entropy of read/write addressing weights
Lower entropy means more focused, organized memory usage
Higher entropy means scattered, disorganized access
We minimize negative entropy to encourage focused access

This helps because:

Focused memory access is more interpretable
Sharp addressing improves efficiency
Prevents wasting computation on irrelevant locations
Encourages the NTM to use memory like an organized notebook

The auxiliary loss is added to the main task loss during training.

CreateNewInstance()

Creates a new instance of the neural turing machine model.

protected override IFullModel<T, Tensor<T>, Tensor<T>> CreateNewInstance()

Returns

IFullModel<T, Tensor<T>, Tensor<T>>: A new instance of the neural turing machine model with the same configuration.

DeserializeNetworkSpecificData(BinaryReader)

Deserializes NTM-specific data from a binary reader.

protected override void DeserializeNetworkSpecificData(BinaryReader reader)

Parameters

reader BinaryReader: The binary reader to read from.

GetAuxiliaryLossDiagnostics()

Gets diagnostic information about the memory usage auxiliary loss.

public Dictionary<string, string> GetAuxiliaryLossDiagnostics()

Returns

Dictionary<string, string>: A dictionary containing diagnostic information about memory usage regularization.

Remarks

This method returns detailed diagnostics about memory usage regularization, including addressing entropy, memory configuration, and regularization parameters. This information is useful for monitoring memory access patterns and debugging.

For Beginners: This provides information about how the NTM uses its memory.

The diagnostics include:

Total memory usage loss (how focused memory access is)
Weight applied to the regularization
Memory size (number of memory locations)
Memory vector size (size of each location)
Whether memory usage regularization is enabled

This helps you:

Monitor if memory addressing is focused or scattered
Debug issues with memory access patterns
Understand the impact of regularization on memory efficiency

You can use this information to adjust regularization weights for better memory utilization.

GetDiagnostics()

Gets diagnostic information about this component's state and behavior. Overrides GetDiagnostics() to include auxiliary loss diagnostics.

public Dictionary<string, string> GetDiagnostics()

Returns

Dictionary<string, string>: A dictionary containing diagnostic metrics including both base layer diagnostics and auxiliary loss diagnostics from GetAuxiliaryLossDiagnostics().

GetModelMetadata()

Gets metadata about the Neural Turing Machine model.

public override ModelMetadata<T> GetModelMetadata()

Returns

ModelMetadata<T>: A ModelMetaData object containing information about the NTM.

InitializeLayers()

Initializes the neural network layers based on the provided architecture.

protected override void InitializeLayers()

Predict(Tensor<T>)

Performs a forward pass through the Neural Turing Machine.

public override Tensor<T> Predict(Tensor<T> input)

Parameters

input Tensor<T>: The input tensor to process.

Returns

Tensor<T>: The output tensor after processing.

ResetState()

Resets the internal state of the neural network.

public override void ResetState()

Remarks

For Beginners: This clears the memory and attention weights, essentially making the network "forget" everything it has learned during sequence processing. It's useful when starting to process a new sequence that should not be influenced by previous sequences.

SerializeNetworkSpecificData(BinaryWriter)

Serializes NTM-specific data to a binary writer.

protected override void SerializeNetworkSpecificData(BinaryWriter writer)

Parameters

writer BinaryWriter: The binary writer to write to.

SetTrainingMode(bool)

Sets the layer to training or evaluation mode.

public override void SetTrainingMode(bool isTraining)

Parameters

isTraining bool: True to set the layer to training mode, false for evaluation mode.

Train(Tensor<T>, Tensor<T>)

Trains the Neural Turing Machine on a single batch of input-output pairs.

public override void Train(Tensor<T> input, Tensor<T> expectedOutput)

Parameters

input Tensor<T>: The input tensor for training.
expectedOutput Tensor<T>: The expected output tensor.

UpdateParameters(Vector<T>)

Updates the parameters of the neural network layers.

public override void UpdateParameters(Vector<T> parameters)

Parameters

parameters Vector<T>: The vector of parameter updates to apply.

Table of Contents

Class NeuralTuringMachine<T>

Type Parameters

Remarks

Constructors

NeuralTuringMachine(NeuralNetworkArchitecture<T>, int, int, int, ILossFunction<T>?, IActivationFunction<T>?, IActivationFunction<T>?, IActivationFunction<T>?)

Parameters

NeuralTuringMachine(NeuralNetworkArchitecture<T>, int, int, int, ILossFunction<T>?, IVectorActivationFunction<T>?, IVectorActivationFunction<T>?, IVectorActivationFunction<T>?)

Parameters

Properties

AuxiliaryLossWeight

Property Value

Remarks

ContentAddressingActivation

Property Value

ContentAddressingVectorActivation

Property Value

GateActivation

Property Value

GateVectorActivation

Property Value

OutputActivation

Property Value

OutputVectorActivation

Property Value

UseAuxiliaryLoss

Property Value

Remarks

Methods

ComputeAuxiliaryLoss()

Returns

Remarks

CreateNewInstance()

Returns

DeserializeNetworkSpecificData(BinaryReader)

Parameters

GetAuxiliaryLossDiagnostics()

Returns

Remarks

GetDiagnostics()

Returns

GetModelMetadata()

Returns

InitializeLayers()

Predict(Tensor<T>)

Parameters

Returns

ResetState()

Remarks

SerializeNetworkSpecificData(BinaryWriter)

Parameters

SetTrainingMode(bool)

Parameters

Train(Tensor<T>, Tensor<T>)

Parameters

UpdateParameters(Vector<T>)

Parameters