Class RestrictedBoltzmannMachine<T>
- Namespace
- AiDotNet.NeuralNetworks
- Assembly
- AiDotNet.dll
Represents a Restricted Boltzmann Machine, which is a type of neural network that learns probability distributions over its inputs.
public class RestrictedBoltzmannMachine<T> : NeuralNetworkBase<T>, INeuralNetworkModel<T>, INeuralNetwork<T>, IFullModel<T, Tensor<T>, Tensor<T>>, IModel<Tensor<T>, Tensor<T>, ModelMetadata<T>>, IModelSerializer, ICheckpointableModel, IParameterizable<T, Tensor<T>, Tensor<T>>, IFeatureAware, IFeatureImportance<T>, ICloneable<IFullModel<T, Tensor<T>, Tensor<T>>>, IGradientComputable<T, Tensor<T>, Tensor<T>>, IJitCompilable<T>, IInterpretableModel<T>, IInputGradientComputable<T>, IDisposable
Type Parameters
TThe numeric type used for calculations, typically float or double.
- Inheritance
-
RestrictedBoltzmannMachine<T>
- Implements
- Inherited Members
- Extension Methods
Remarks
A Restricted Boltzmann Machine (RBM) is a two-layer neural network that learns to reconstruct its input data. Unlike feedforward networks, RBMs are generative models that learn the probability distribution of the training data. They consist of a visible layer (representing the input data) and a hidden layer (representing features), with connections between layers but no connections within a layer (hence "restricted"). RBMs are trained using an algorithm called Contrastive Divergence, which involves both forward and backward passes between layers.
For Beginners: A Restricted Boltzmann Machine is like a two-way translator between data and features.
Think of it like this:
- The visible layer is like words in English
- The hidden layer is like words in French
- The network learns how to translate back and forth between the languages
When you train an RBM:
- It learns to recognize patterns in your data (translate English to French)
- It also learns to recreate the original data from those patterns (translate French back to English)
For example, if you train an RBM on images of faces:
- The visible layer represents the pixel values of the images
- The hidden layer might learn to recognize features like "has a mustache" or "is smiling"
- Once trained, you could activate certain hidden units to generate new face images with specific features
RBMs can be used for dimensionality reduction, feature learning, pattern completion, and even generating new data samples similar to the training data.
Constructors
RestrictedBoltzmannMachine(NeuralNetworkArchitecture<T>, int, int, double, int, IActivationFunction<T>?, ILossFunction<T>?)
Initializes a new instance of the RestrictedBoltzmannMachine<T> class with the specified architecture, sizes, and scalar activation function.
public RestrictedBoltzmannMachine(NeuralNetworkArchitecture<T> architecture, int visibleSize, int hiddenSize, double learningRate = 0.01, int cdSteps = 1, IActivationFunction<T>? scalarActivation = null, ILossFunction<T>? lossFunction = null)
Parameters
architectureNeuralNetworkArchitecture<T>The neural network architecture to use for the RBM.
visibleSizeintThe number of neurons in the visible layer.
hiddenSizeintThe number of neurons in the hidden layer.
learningRatedoublecdStepsintscalarActivationIActivationFunction<T>The scalar activation function to use. If null, a default activation is used.
lossFunctionILossFunction<T>
Remarks
This constructor creates a new Restricted Boltzmann Machine with the specified visible and hidden layer sizes, using the provided scalar activation function. It initializes weights to small random values and biases to zero, which is a common starting point for training RBMs.
For Beginners: This sets up the RBM with specific dimensions and an activation function that works on one neuron at a time.
When creating a new RBM this way:
- You specify how many visible neurons (input values) you have
- You specify how many hidden neurons (feature detectors) you want
- You can optionally provide a specific activation function
The constructor sets up:
- A weights matrix connecting all visible neurons to all hidden neurons
- Bias values for all neurons (initially set to zero)
- The specified scalar activation function
This prepares the RBM for training, but it won't actually learn anything until you train it with data.
RestrictedBoltzmannMachine(NeuralNetworkArchitecture<T>, int, int, double, int, IVectorActivationFunction<T>?, ILossFunction<T>?)
Initializes a new instance of the RestrictedBoltzmannMachine<T> class with the specified architecture, sizes, and vector activation function.
public RestrictedBoltzmannMachine(NeuralNetworkArchitecture<T> architecture, int visibleSize, int hiddenSize, double learningRate = 0.01, int cdSteps = 1, IVectorActivationFunction<T>? vectorActivation = null, ILossFunction<T>? lossFunction = null)
Parameters
architectureNeuralNetworkArchitecture<T>The neural network architecture to use for the RBM.
visibleSizeintThe number of neurons in the visible layer.
hiddenSizeintThe number of neurons in the hidden layer.
learningRatedoublecdStepsintvectorActivationIVectorActivationFunction<T>The vector activation function to use. If null, a default activation is used.
lossFunctionILossFunction<T>
Remarks
This constructor creates a new Restricted Boltzmann Machine with the specified visible and hidden layer sizes, using the provided vector activation function. It initializes weights to small random values and biases to zero, which is a common starting point for training RBMs. The vector activation function operates on entire layers at once, which may be more efficient for certain implementations.
For Beginners: This sets up the RBM with specific dimensions and an activation function that works on many neurons at once.
When creating a new RBM this way:
- You specify how many visible neurons (input values) you have
- You specify how many hidden neurons (feature detectors) you want
- You can optionally provide a specific vector activation function
The constructor sets up:
- A weights matrix connecting all visible neurons to all hidden neurons
- Bias values for all neurons (initially set to zero)
- The specified vector activation function
The main difference from the previous constructor is that this one uses an activation function that can process all neurons in a layer simultaneously, which can be more efficient.
Properties
HiddenSize
Gets the number of neurons in the hidden layer.
public int HiddenSize { get; }
Property Value
Remarks
The hidden size determines the capacity of the RBM to learn patterns and features from the input data. A larger hidden size allows the RBM to learn more complex representations but may require more data and time to train effectively.
For Beginners: This is how many pattern detectors or features the RBM can learn.
Choosing the right hidden size is important:
- Too small: The RBM won't be able to capture all important patterns in your data
- Too large: The RBM might "memorize" the training data instead of learning general patterns
For example, if analyzing face images:
- HiddenSize = 10 might only let the RBM learn very basic features
- HiddenSize = 100 might allow it to learn more subtle patterns like facial expressions
Think of it as the number of "concepts" the network can understand about your data.
ParameterCount
Gets the total number of parameters (weights and biases) in the RBM.
public override int ParameterCount { get; }
Property Value
Remarks
The parameter count includes: - Weights matrix: HiddenSize × VisibleSize parameters - Visible biases: VisibleSize parameters - Hidden biases: HiddenSize parameters
For Beginners: This tells you the total number of learnable values in the RBM. More parameters means the RBM can learn more complex patterns, but also requires more data and computation.
VisibleSize
Gets the number of neurons in the visible layer.
public int VisibleSize { get; }
Property Value
Remarks
The visible size determines the dimensionality of the input data that the RBM can process. It should match the number of features in the input data (e.g., the number of pixels in an image).
For Beginners: This is how many input values the RBM can accept.
For example:
- If processing 28×28 pixel images, VisibleSize would be 784 (28×28)
- If processing customer data with 15 attributes, VisibleSize would be 15
Think of it as the number of "sensors" the network has to observe the input data.
Methods
ComputeReconstructionError(Tensor<T>)
public T ComputeReconstructionError(Tensor<T> input)
Parameters
inputTensor<T>
Returns
- T
CreateNewInstance()
Creates a new instance of the same type as this neural network.
protected override IFullModel<T, Tensor<T>, Tensor<T>> CreateNewInstance()
Returns
- IFullModel<T, Tensor<T>, Tensor<T>>
A new instance of the same neural network type.
Remarks
For Beginners: This creates a blank version of the same type of neural network.
It's used internally by methods like DeepCopy and Clone to create the right type of network before copying the data into it.
DeserializeNetworkSpecificData(BinaryReader)
Deserializes RBM-specific data from a binary reader.
protected override void DeserializeNetworkSpecificData(BinaryReader reader)
Parameters
readerBinaryReaderThe binary reader to read from.
Remarks
This method loads RBM-specific data from the binary stream, including the weights, biases, and configuration parameters like learning rate and CD steps. It restores the RBM to the exact state it was in when serialized.
For Beginners: This method loads all the RBM's saved knowledge from a file.
The deserialization process loads:
- All weights between visible and hidden neurons
- All bias values for both layers
- Configuration settings like learning rate
This allows you to restore a previously trained RBM exactly as it was, without needing to retrain it from scratch.
ExtractFeatures(Tensor<T>, bool)
Extracts features from input data using the trained RBM.
public Tensor<T> ExtractFeatures(Tensor<T> input, bool binarize = false)
Parameters
inputTensor<T>The input data tensor.
binarizeboolWhether to binarize the hidden activations.
Returns
- Tensor<T>
The hidden layer features as a tensor.
Remarks
This method transforms input data into features learned by the RBM's hidden layer. It can be used for feature extraction, dimensionality reduction, or as a pre-processing step before using the data with another algorithm.
For Beginners: This method converts raw data into abstract features.
When extracting features:
- The input data is passed to the visible layer
- The hidden layer activations represent learned features
- These features can capture important patterns in the data
You can choose to get:
- Probability values (binarize=false) showing how strongly each feature is detected
- Binary values (binarize=true) indicating whether each feature is present or not
This is useful for:
- Reducing data dimensionality (e.g., compressing 784 pixels to 100 features)
- Extracting meaningful patterns for other algorithms to use
- Pre-processing data for classification or other tasks
GenerateSamples(int, int)
Generates samples from the RBM by starting with a random visible state and performing Gibbs sampling.
public Tensor<T> GenerateSamples(int numSamples, int numSteps = 1000)
Parameters
numSamplesintThe number of samples to generate.
numStepsintThe number of Gibbs sampling steps to perform.
Returns
- Tensor<T>
Tensor containing the generated samples.
Remarks
This method generates new data samples that follow the distribution learned by the RBM. It starts with random visible units, then repeatedly samples the hidden and visible layers in a process called Gibbs sampling to get samples from the model's learned distribution.
For Beginners: This method creates new data samples based on patterns the RBM has learned.
The generation process works like this:
- Start with random values for the visible layer
- Compute hidden layer activations based on these visible values
- Reconstruct a new visible layer from the hidden activations
- Repeat steps 2-3 multiple times (Gibbs sampling)
- Return the final visible layer as a generated sample
This allows the RBM to "dream up" new data that resembles the training data. For example, if trained on face images, it might generate new faces that don't exist but look realistic.
GetHiddenLayerActivation(Tensor<T>)
Calculates the activation probabilities of the hidden layer given the visible layer.
public Tensor<T> GetHiddenLayerActivation(Tensor<T> visibleLayer)
Parameters
visibleLayerTensor<T>The visible layer tensor.
Returns
- Tensor<T>
A tensor containing the activation probabilities of the hidden layer.
Remarks
This method computes the activation probabilities of each hidden unit given the state of the visible layer. It calculates the weighted sum of visible unit values for each hidden unit, adds the hidden bias, and applies the activation function to obtain the probability of activation.
For Beginners: This method finds which patterns or features are present in the input data.
When calculating hidden layer activations:
- Each hidden neuron receives input from all visible neurons
- The inputs are weighted by the connection strengths
- The hidden neuron's bias is added
- An activation function converts this sum to a probability
This is like asking each feature detector: "Based on what you see in the input data, how confident are you that your specific pattern is present?"
The result is a set of probabilities for each hidden neuron, indicating how strongly each feature is detected in the current input.
Exceptions
- InvalidOperationException
Thrown when no activation function is specified.
GetModelMetadata()
Gets metadata about the RBM model.
public override ModelMetadata<T> GetModelMetadata()
Returns
- ModelMetadata<T>
A ModelMetaData object containing information about the RBM.
Remarks
This method returns comprehensive metadata about the RBM, including its architecture, layer sizes, and other relevant parameters. This information is useful for model management, tracking experiments, and reporting.
For Beginners: This provides detailed information about your RBM.
The metadata includes:
- The sizes of visible and hidden layers
- Information about the activation functions used
- The total number of parameters (weights and biases)
- Other configuration details
This information is useful for documentation, comparing different RBM configurations, and understanding the structure of your model at a glance.
GetParameters()
Updates the parameters of the RBM. This method is not typically used in RBMs and throws a NotImplementedException.
public override Vector<T> GetParameters()
Returns
- Vector<T>
Remarks
RBMs typically use specialized training algorithms like Contrastive Divergence rather than the generic parameter update approach used by other neural networks. This method throws a NotImplementedException to indicate that RBMs should be trained using the Train method instead.
For Beginners: This method is not used in RBMs because they train differently.
While standard neural networks update their parameters based on error gradients:
- RBMs use a different approach called Contrastive Divergence
- They compare "reality" (input data) with "imagination" (reconstructions)
- They directly adjust weights based on this comparison
Instead of using this method, you should use the Train method to train an RBM.
Exceptions
- NotImplementedException
Always thrown as this method is not implemented for RBMs.
GetVisibleLayerActivation(Tensor<T>)
Calculates the activation probabilities of the visible layer given the hidden layer.
public Tensor<T> GetVisibleLayerActivation(Tensor<T> hiddenLayer)
Parameters
hiddenLayerTensor<T>The hidden layer tensor.
Returns
- Tensor<T>
A tensor containing the activation probabilities of the visible layer.
Remarks
This method computes the activation probabilities of each visible unit given the state of the hidden layer. It calculates the weighted sum of hidden unit values for each visible unit, adds the visible bias, and applies the activation function to obtain the probability of activation.
For Beginners: This method reconstructs the input data based on detected patterns.
When calculating visible layer activations:
- Each visible neuron receives input from all hidden neurons
- The inputs are weighted by the connection strengths
- The visible neuron's bias is added
- An activation function converts this sum to a probability
This is like asking each input neuron: "Based on the patterns the network detected, what's the probability that you should be active?"
The result is a reconstruction of the input data based on the patterns detected, which might not be identical to the original input.
Exceptions
- InvalidOperationException
Thrown when no activation function is specified.
InitializeLayers()
Initializes the neural network layers. In an RBM, this method is typically empty as RBMs use direct weight and bias parameters rather than standard neural network layers.
protected override void InitializeLayers()
Remarks
RBMs differ from feedforward neural networks in that they don't use a layer-based computation model. Instead, they directly manipulate weights and biases for the visible and hidden units. Therefore, this method is typically empty or performs specialized initialization for RBMs.
For Beginners: RBMs work differently from standard neural networks.
While standard neural networks process data through sequential layers:
- RBMs work by going back and forth between just two layers
- They don't use the same layer concept as feedforward networks
- They operate directly on the weights and biases connecting the visible and hidden layers
That's why this method is empty - the RBM initializes its weights and biases directly rather than creating a sequence of layers like a standard neural network.
Predict(Tensor<T>)
Makes predictions using the RBM by computing hidden layer activations.
public override Tensor<T> Predict(Tensor<T> input)
Parameters
inputTensor<T>The input tensor to process.
Returns
- Tensor<T>
The hidden layer activations as a tensor.
Remarks
This method performs a forward pass through the RBM, mapping the input data to its corresponding hidden representation. For RBMs, "prediction" typically means extracting features or transforming the input data to a different representation.
For Beginners: This method extracts patterns or features from the input data.
Unlike standard neural networks that might predict a class or value:
- RBMs transform input data into a representation of detected patterns
- The output tells you which features or patterns were found in the input
- This can be used for feature extraction or dimensionality reduction
For example, if your RBM has learned to recognize features in face images, this method would tell you which of those features (like "has glasses" or "is smiling") are present in a new face image you provide.
SerializeNetworkSpecificData(BinaryWriter)
Serializes the RBM-specific data to a binary writer.
protected override void SerializeNetworkSpecificData(BinaryWriter writer)
Parameters
writerBinaryWriterThe binary writer to write to.
Remarks
This method saves RBM-specific data to the binary stream, including the weights, biases, and configuration parameters like learning rate and CD steps.
For Beginners: This method saves all the RBM's learned knowledge to a file.
The serialization process saves:
- All weights between visible and hidden neurons
- All bias values for both layers
- Configuration settings like learning rate
This allows you to save a trained RBM and reload it later without having to retrain it from scratch, which can be time-consuming.
SetParameters(Vector<T>)
Sets the parameters of the neural network.
public override void SetParameters(Vector<T> parameters)
Parameters
parametersVector<T>The parameters to set.
Remarks
This method distributes the parameters to all layers in the network. The parameters should be in the same format as returned by GetParameters.
SetTrainingParameters(T, int)
Sets the training parameters for the RBM.
public void SetTrainingParameters(T learningRate, int cdSteps = 1)
Parameters
learningRateTThe learning rate for weight updates.
cdStepsintThe number of Contrastive Divergence steps.
Remarks
This method configures the learning rate and the number of Contrastive Divergence steps used during training. The learning rate controls how quickly the RBM updates its weights, while the CD steps control how many Gibbs sampling steps are performed in each update.
For Beginners: This method lets you adjust how the RBM learns.
You can configure:
- Learning rate: How big each learning step is (typical values: 0.001 to 0.1)
- CD steps: How many back-and-forth cycles to run during training (often 1, sometimes more)
These parameters affect learning quality and speed:
- Higher learning rates learn faster but may be less stable
- More CD steps give more accurate updates but take longer
Finding the right balance for your specific data is important for effective training.
Train(Tensor<T>, Tensor<T>)
Trains the RBM using Contrastive Divergence.
public override void Train(Tensor<T> input, Tensor<T> expectedOutput)
Parameters
inputTensor<T>The input data tensor.
expectedOutputTensor<T>Not used for RBMs as they are unsupervised models.
Remarks
This method implements Contrastive Divergence (CD) training for the RBM. It compares the correlation between visible and hidden units when driven by the data to the correlation when driven by the model's own reconstructions, and updates the weights and biases accordingly.
For Beginners: This method teaches the RBM to recognize patterns in your data.
The training process works like this:
- Start with real data (the visible layer)
- Compute which patterns (hidden layer) are activated by this data
- Reconstruct an approximation of the data from these patterns
- See what patterns this reconstruction would activate
- Update the weights based on the difference between steps 2 and 4
The goal is for the RBM to generate reconstructions that are statistically similar to the real data, which means it has learned the underlying patterns.
Note that unlike supervised learning, RBMs don't use expected outputs - they learn the structure of the input data on their own.
UpdateParameters(Vector<T>)
Updates the network's parameters with new values.
public override void UpdateParameters(Vector<T> parameters)
Parameters
parametersVector<T>The new parameter values to set.
Remarks
For Beginners: During training, a neural network's internal values (parameters) get adjusted to improve its performance. This method allows you to update all those values at once by providing a complete set of new parameters.
This is typically used by optimization algorithms that calculate better parameter values based on training data.