Class MultilayerPerceptronOptions<T, TInput, TOutput>

Namespace: AiDotNet.Models.Options

Assembly: AiDotNet.dll

Configuration options for Multilayer Perceptron (MLP), a type of feedforward artificial neural network that consists of multiple layers of neurons.

public class MultilayerPerceptronOptions<T, TInput, TOutput> : NonLinearRegressionOptions

Type Parameters

T
TInput
TOutput

Inheritance: object

ModelOptions

NonLinearRegressionOptions

MultilayerPerceptronOptions<T, TInput, TOutput>

Inherited Members: NonLinearRegressionOptions.MaxIterations

NonLinearRegressionOptions.Tolerance

NonLinearRegressionOptions.KernelType

NonLinearRegressionOptions.Gamma

NonLinearRegressionOptions.Coef0

NonLinearRegressionOptions.PolynomialDegree

ModelOptions.Seed

object.Equals(object)

object.Equals(object, object)

object.GetHashCode()

object.GetType()

object.MemberwiseClone()

object.ReferenceEquals(object, object)

object.ToString()

Remarks

The Multilayer Perceptron is a versatile neural network architecture capable of learning complex non-linear relationships between inputs and outputs. It consists of an input layer, one or more hidden layers, and an output layer. Each neuron in a layer is connected to all neurons in the next layer, forming a fully connected network. The MLP learns through a process called backpropagation, where the network parameters are adjusted to minimize a loss function using gradient-based optimization techniques. This class provides comprehensive configuration options for the network architecture, training process, activation functions, and optimization strategy.

For Beginners: A Multilayer Perceptron (MLP) is a basic type of neural network that can learn to recognize patterns and make predictions from data.

Think of an MLP like a system of interconnected filters that work together:

The input layer receives your data (like the temperature, humidity, and pressure for weather prediction)
The hidden layers process this information through a series of transformations
The output layer provides the prediction (like "chance of rain: 70%")

As the network trains, it gradually adjusts thousands of internal settings (weights) to get better at making accurate predictions. This process is similar to how a child learns to recognize animals: at first they make many mistakes, but with each example, they get better at identifying the patterns that distinguish a cat from a dog.

This class lets you configure every aspect of your neural network: how many layers it has, how it learns, how quickly it adapts, and much more. The default settings provide a good starting point, but you may need to adjust them based on your specific problem.

Properties

BatchSize

Gets or sets the number of training examples used in each parameter update step.

public int BatchSize { get; set; }

Property Value

int: The batch size, defaulting to 32.

Remarks

The batch size determines how many training examples are processed before the model parameters are updated. When set to 1, this becomes stochastic gradient descent (updating after each example). When set to the size of the training set, this becomes batch gradient descent (updating after seeing all examples). Mini-batch training (values between these extremes) is often the most efficient approach, balancing the stability of batch updates with the speed of stochastic updates. The optimal batch size depends on the specific problem, hardware constraints, and the size of the training dataset.

For Beginners: This setting controls how many examples the network looks at before making each adjustment to its internal settings.

Imagine learning to cook a new dish:

BatchSize = 1: You taste and adjust seasoning after each ingredient (frequent but potentially erratic adjustments)
BatchSize = 32: You add 32 ingredients, then taste and adjust (more stable but less frequent adjustments)
BatchSize = [entire recipe]: You only taste and adjust after completing the whole recipe (very stable but only one chance to adjust)

The default value of 32 works well for many problems because:

It's large enough to provide somewhat stable gradient estimates
It's small enough to allow for frequent updates
It often fits well in memory for parallel processing

You might want to increase this value if:

Training seems unstable (weights jumping around too much)
You have plenty of memory and computational resources
Your dataset is very noisy

You might want to decrease this value if:

You have limited memory available
Training seems to be progressing too slowly
You want the model to adapt more quickly

Common batch sizes are powers of 2 (16, 32, 64, 128, 256) because they often optimize performance on modern hardware.

HiddenActivation

Gets or sets the activation function used in the hidden layers of the network.

public IActivationFunction<T>? HiddenActivation { get; set; }

Property Value

IActivationFunction<T>: The hidden layer activation function, defaulting to ReLU.

Remarks

Activation functions introduce non-linearity into the neural network, allowing it to learn complex patterns. This parameter sets the activation function used for all neurons in the hidden layers. The Rectified Linear Unit (ReLU) function is a popular choice for hidden layers as it helps mitigate the vanishing gradient problem and generally allows for faster training. Other common choices include sigmoid, tanh, and leaky ReLU, each with different properties that may be more suitable for specific types of problems.

For Beginners: This setting determines the mathematical function that each "neuron" in the hidden layers uses to process its input.

Activation functions are like decision rules for neurons:

They determine how a neuron responds to different input values
They introduce non-linearity, which allows the network to learn complex patterns

The default ReLU (Rectified Linear Unit) function:

Outputs 0 for negative inputs
Outputs the input value unchanged for positive inputs
Is computationally efficient and helps networks learn faster

You might want to change this to:

Sigmoid: If outputs need to be between 0 and 1
Tanh: If outputs need to be between -1 and 1
Leaky ReLU: If you're experiencing "dead neurons" (neurons that stop learning)

For most problems, ReLU works well and is a good default choice.

HiddenVectorActivation

Gets or sets the vector-based activation function used in the hidden layers of the network.

public IVectorActivationFunction<T>? HiddenVectorActivation { get; set; }

Property Value

IVectorActivationFunction<T>: The hidden layer vector activation function, defaulting to ReLU.

Remarks

This property provides a vector-optimized implementation of the activation function for hidden layers. When set, it will be used instead of the scalar HiddenActivation property for more efficient computation on entire vectors of data. The default implementation uses ReLU activation, which is well-suited for most neural network hidden layers.

For Beginners: This is a more efficient version of the hidden layer activation function that works on entire groups of neurons at once.

It serves the same purpose as the regular hidden activation function, but:

It can process multiple neurons simultaneously
It's optimized for performance on modern hardware
It's particularly helpful for large networks

You typically don't need to change this unless you're implementing custom activation functions or optimizing for specific hardware.

LayerSizes

Gets or sets the sizes of each layer in the neural network, including input, hidden, and output layers.

public List<int> LayerSizes { get; set; }

Property Value

List<int>: A list of integers representing the number of neurons in each layer, defaulting to [1, 10, 1].

Remarks

This parameter defines the architecture of the neural network by specifying how many neurons are in each layer. The first element represents the input layer size (number of features), the last element represents the output layer size (number of target variables), and all elements in between represent the sizes of hidden layers. The default value creates a network with 1 input feature, 1 hidden layer with 10 neurons, and 1 output variable. The depth and width of the network should be chosen based on the complexity of the problem and the amount of available training data.

For Beginners: This setting determines the structure of your neural network - how many "neurons" are in each layer and how many layers you have.

Imagine building a factory assembly line:

The first number is how many inputs your data has (like 4 if you have height, weight, age, and blood pressure)
The middle numbers represent your "hidden layers" (the internal processing stages)
The last number is how many outputs you want (like 1 for a yes/no prediction, or 3 for classifying into three categories)

The default value [1, 10, 1] means:

1 input feature (very simple data)
1 hidden layer with 10 neurons (moderate processing capacity)
1 output value (single prediction or measurement)

You should change this based on your specific data:

The first number should match the number of features in your input data
The last number should match how many values you're trying to predict
The middle numbers control the network's learning capacity:
- More/larger hidden layers = more learning capacity but requires more data and time
- Fewer/smaller hidden layers = learns faster but might miss complex patterns

For complex problems, you might use something like [50, 100, 50, 10, 3], which has 50 inputs, 3 hidden layers (with 100, 50, and 10 neurons), and 3 outputs.

LearningRate

Gets or sets the learning rate that controls the step size in each update of the model parameters.

public double LearningRate { get; set; }

Property Value

double: The learning rate, defaulting to 0.001.

Remarks

The learning rate is a critical hyperparameter that determines how large of a step to take in the direction of the negative gradient during optimization. A higher learning rate allows for faster learning but risks overshooting the optimal solution or causing instability. A lower learning rate provides more stable updates but may require more iterations to converge and risks getting stuck in local minima. Note that the actual learning rate used in training may be further modified by the chosen optimizer, which may implement adaptive learning rate strategies.

For Beginners: This setting controls how big of an adjustment the network makes to its internal settings during each update.

Think of it like turning a dial to tune a radio:

A high learning rate (like 0.1) means making big turns of the dial
A low learning rate (like 0.0001) means making tiny, precise turns

The default value of 0.001 is relatively conservative, which helps prevent:

Overshooting the optimal settings
Unstable behavior during training

You might want to increase this value if:

Training is progressing very slowly
You have a tight compute budget and need faster results
You're in early exploration phases

You might want to decrease this value if:

Training is unstable (loss is fluctuating wildly)
You're fine-tuning an already well-trained model
You want more precise final results

Note that this setting interacts with your choice of optimizer. Some optimizers (like Adam) adaptively adjust the effective learning rate, making the training less sensitive to this initial value.

LossFunction

Gets or sets the loss function used to calculate the error between predictions and targets.

public ILossFunction<T>? LossFunction { get; set; }

Property Value

ILossFunction<T>: The loss function, defaulting to Mean Squared Error.

Remarks

The loss function quantifies how far the network's predictions are from the true values, providing the optimization target during training. Mean Squared Error (MSE) is commonly used for regression problems, calculating the average of the squared differences between predictions and targets. For classification problems, cross-entropy loss would be more appropriate. The choice of loss function should align with the problem type and the output activation function.

For Beginners: This setting defines how the network measures its prediction errors during training.

Think of the loss function as a scorekeeper:

It calculates how far off the network's predictions are from the correct answers
The network tries to minimize this score during training
Different types of problems need different ways of keeping score

The default Mean Squared Error (MSE):

Calculates the average of the squared differences between predictions and actual values
Works well for regression problems (predicting continuous values)
Heavily penalizes large errors

You might want to change this to:

Mean Absolute Error: If you want to treat all errors equally, regardless of direction
Binary Cross-Entropy: For binary classification problems
Categorical Cross-Entropy: For multi-class classification problems

The loss function should match your problem type and output activation function. For example:

Regression ? MSE + Linear output activation
Binary classification ? Binary Cross-Entropy + Sigmoid output activation
Multi-class classification ? Categorical Cross-Entropy + Softmax output activation

MaxEpochs

Gets or sets the maximum number of complete passes through the training dataset.

public int MaxEpochs { get; set; }

Property Value

int: The maximum number of epochs, defaulting to 1000.

Remarks

An epoch represents one complete pass through the entire training dataset. This parameter sets the maximum number of epochs the training process will perform. The actual training might terminate earlier if other stopping criteria are met, such as reaching a target error threshold or detecting overfitting through validation. More epochs allow the model more opportunities to learn from the training data but increase the risk of overfitting and computational cost.

For Beginners: This setting determines how many times the neural network will process your entire dataset during training.

Think of it like practicing for a music recital:

Each "epoch" is like practicing the entire piece from start to finish
More practice sessions generally lead to better performance
But too much practice might lead to memorization rather than understanding

The default value of 1000 means the algorithm will go through your entire dataset up to 1000 times.

You might want to increase this value if:

Your network is complex and learning slowly
You have a large dataset with lots of variation
You're using techniques to prevent overfitting

You might want to decrease this value if:

Your network seems to be memorizing the training data
Training is taking too long
You're doing initial experimentation

In practice, neural networks are often trained with early stopping mechanisms that monitor performance on validation data and stop training when improvement plateaus, regardless of whether this maximum has been reached.

Optimizer

public IOptimizer<T, TInput, TOutput> Optimizer { get; set; }

Property Value

IOptimizer<T, TInput, TOutput>

OutputActivation

Gets or sets the activation function used in the output layer of the network.

public IActivationFunction<T>? OutputActivation { get; set; }

Property Value

IActivationFunction<T>: The output layer activation function, defaulting to Linear (Identity).

Remarks

The output activation function determines the range and type of values that the neural network can produce. The linear activation function (also called identity) is appropriate for regression problems where the output can be any real number. For classification problems, other functions like sigmoid (for binary classification) or softmax (for multi-class classification) would be more appropriate. The choice of output activation should match the nature of the target variable and the loss function.

For Beginners: This setting determines the mathematical function that the final layer uses to produce the network's output.

The output activation function shapes your predictions:

Linear (the default): Can output any number, positive or negative
Sigmoid: Outputs values between 0 and 1, good for probabilities
Softmax: Outputs probabilities that sum to 1, good for multi-class problems

The default Linear function is appropriate for:

Regression problems (predicting continuous values like price, temperature, etc.)
Cases where you need unbounded outputs

You should change this to:

Sigmoid: For binary classification (yes/no, spam/not spam)
Softmax: For multi-class classification (cat/dog/bird)
TanH: For outputs that should be between -1 and 1

Choosing the right output activation is important - it should match both the type of problem you're solving and the loss function you're using.

OutputVectorActivation

Gets or sets the vector-based activation function used in the output layer of the network.

public IVectorActivationFunction<T>? OutputVectorActivation { get; set; }

Property Value

IVectorActivationFunction<T>: The output layer vector activation function, defaulting to Linear (Identity).

Remarks

This property provides a vector-optimized implementation of the activation function for the output layer. When set, it will be used instead of the scalar OutputActivation property for more efficient computation on entire vectors of data. The default implementation uses the identity (linear) activation, which is appropriate for regression problems.

For Beginners: This is a more efficient version of the output layer activation function that works on entire groups of neurons at once.

It serves the same purpose as the regular output activation function, but:

It can process multiple output neurons simultaneously
It's optimized for performance on modern hardware
It's particularly helpful for networks with multiple outputs

For regression problems, the default linear activation is usually appropriate. For classification, you might want to use sigmoid or softmax vector activations.

Verbose

Gets or sets whether to display detailed progress information during training.

public bool Verbose { get; set; }

Property Value

bool: Flag indicating whether to display progress, defaulting to false.

Remarks

When set to true, the training process will output detailed information about its progress, such as the current epoch, loss value, and potentially other metrics. This can be useful for monitoring the training process and diagnosing issues, but may slow down training slightly and generate a large amount of output for long training runs or large datasets. By default, this verbose output is disabled.

For Beginners: This setting determines whether the training process will show you detailed progress updates as it runs.

Think of it like tracking a package:

When Verbose = false: You only know when the package is delivered
When Verbose = true: You get updates at each step of the delivery process

The default value of false means training will run silently without progress updates.

You might want to set this to true if:

You want to monitor how quickly the model is learning
You're debugging training issues
You want to know when to stop training early

You might want to keep it false if:

You're running many experiments and don't need the extra output
You're running training in a production environment
You're using other methods to monitor progress (like logging metrics to a file)

Enabling verbose output is especially helpful when you're new to neural networks or when you're trying to debug an underperforming model.

Table of Contents

Class MultilayerPerceptronOptions<T, TInput, TOutput>

Type Parameters

Remarks

Properties

BatchSize

Property Value

Remarks

HiddenActivation

Property Value

Remarks

HiddenVectorActivation

Property Value

Remarks

LayerSizes

Property Value

Remarks

LearningRate

Property Value

Remarks

LossFunction

Property Value

Remarks

MaxEpochs

Property Value

Remarks

Optimizer

Property Value

OutputActivation

Property Value

Remarks

OutputVectorActivation

Property Value

Remarks

Verbose

Property Value

Remarks