Table of Contents

Class NeuralNetworkArchitecture<T>

Namespace
AiDotNet.NeuralNetworks
Assembly
AiDotNet.dll

Defines the structure and configuration of a neural network, including its layers, input/output dimensions, and task-specific properties.

public class NeuralNetworkArchitecture<T>

Type Parameters

T

The numeric type used for calculations, typically float or double.

Inheritance
NeuralNetworkArchitecture<T>
Derived
Inherited Members

Remarks

The NeuralNetworkArchitecture class serves as a blueprint for constructing neural networks with specific configurations. It handles the validation of input dimensions, layer compatibility, and provides methods for retrieving information about the network's structure. This architecture can be used to create various types of neural networks with different input dimensionalities and layer arrangements.

For Beginners: Think of NeuralNetworkArchitecture as the blueprint for building a neural network.

Just like an architect's blueprint for a building specifies:

  • How many floors the building will have
  • The size and purpose of each room
  • How rooms connect to each other

The NeuralNetworkArchitecture defines:

  • What kind of data your network will process (like images or text)
  • How many layers your network will have
  • How many neurons are in each layer
  • How the layers connect to process your data

Before you can build a neural network, you need this blueprint to ensure all the parts will fit together correctly. It helps prevent errors like trying to feed image data into a network designed for text, or having layers that don't match up in size.

Constructors

NeuralNetworkArchitecture(InputType, NeuralNetworkTaskType, NetworkComplexity, int, int, int, int, int, List<ILayer<T>>?, bool, int, int)

Initializes a new instance of the NeuralNetworkArchitecture<T> class with the specified parameters.

[JsonConstructor]
public NeuralNetworkArchitecture(InputType inputType, NeuralNetworkTaskType taskType, NetworkComplexity complexity = NetworkComplexity.Medium, int inputSize = 0, int inputHeight = 0, int inputWidth = 0, int inputDepth = 1, int outputSize = 0, List<ILayer<T>>? layers = null, bool shouldReturnFullSequence = false, int imageEmbeddingDim = 0, int textEmbeddingDim = 0)

Parameters

inputType InputType

The type of input data (one-dimensional, two-dimensional, or three-dimensional).

taskType NeuralNetworkTaskType

The type of task the neural network will perform (classification, regression, etc.).

complexity NetworkComplexity

The complexity level of the neural network. Default is Medium.

inputSize int

The size of the input vector (for one-dimensional input). Default is 0.

inputHeight int

The height of the input (for two/three-dimensional input). Default is 0.

inputWidth int

The width of the input (for two/three-dimensional input). Default is 0.

inputDepth int

The depth of the input (for three-dimensional input). Default is 1.

outputSize int

The size of the output vector. Default is 0.

layers List<ILayer<T>>

Optional predefined layers for the neural network. Default is null.

shouldReturnFullSequence bool
imageEmbeddingDim int

The dimensionality of image embeddings for multimodal networks. Default is 0 (not multimodal).

textEmbeddingDim int

The dimensionality of text embeddings for multimodal networks. Default is 0 (not multimodal).

Remarks

This constructor initializes a neural network architecture with the specified parameters and validates that the input dimensions are consistent and appropriate for the selected input type. It also checks that any provided layers are compatible with the input and output dimensions.

For Beginners: This creates the blueprint for your neural network with your chosen settings.

When creating a neural network architecture, you specify:

  1. What kind of input data you're using:

    • One-dimensional for lists of values
    • Two-dimensional for grid data like grayscale images
    • Three-dimensional for volumetric data like color images
  2. What task you're solving:

    • Classification (sorting into categories)
    • Regression (predicting numerical values)
    • Other specialized tasks
  3. Other settings like:

    • Complexity (how powerful the network should be)
    • Input dimensions (size, height, width, depth)
    • Output size (how many values to predict)
    • Optional custom layers

The constructor checks that all your settings make sense together. For example, it will catch errors like trying to use both InputSize=100 and InputHeight=10, InputWidth=20 (which would imply InputSize=200).

Exceptions

ArgumentException

Thrown when the input dimensions are invalid or inconsistent.

NeuralNetworkArchitecture(int, int, NetworkComplexity)

public NeuralNetworkArchitecture(int inputFeatures, int outputSize, NetworkComplexity complexity = NetworkComplexity.Medium)

Parameters

inputFeatures int
outputSize int
complexity NetworkComplexity

NeuralNetworkArchitecture(int, int, bool, NetworkComplexity)

public NeuralNetworkArchitecture(int inputFeatures, int numClasses, bool isMultiClass = true, NetworkComplexity complexity = NetworkComplexity.Medium)

Parameters

inputFeatures int
numClasses int
isMultiClass bool
complexity NetworkComplexity

Properties

CalculatedInputSize

Gets the calculated total size of the input based on dimensions.

public int CalculatedInputSize { get; }

Property Value

int

Remarks

This computed property calculates the total number of input elements based on the input dimensions. For one-dimensional inputs, it returns InputSize. For multi-dimensional inputs, it calculates the product of the dimensions (InputHeight * InputWidth * InputDepth).

For Beginners: This calculates the total number of input values based on your dimensions.

It's automatically calculated depending on your input type:

  • For 1D data: Just returns your InputSize
  • For 2D data: Calculates InputHeight × InputWidth
  • For 3D data: Calculates InputHeight × InputWidth × InputDepth

For example, a 28×28 image has 784 total pixels, so CalculatedInputSize would be 784.

This helps ensure all your dimension settings are consistent with each other.

Complexity

Gets the complexity level of the neural network.

public NetworkComplexity Complexity { get; }

Property Value

NetworkComplexity

Remarks

This property defines the general complexity of the network architecture, which affects the default number of layers and neurons when automatically generating the network structure. Options typically include Simple, Medium, and Complex.

For Beginners: This controls how powerful and complex your network will be.

Think of this like choosing a building size:

  • Simple: A small network with few layers and neurons (fast but less powerful)
  • Medium: A balanced network (good for many common tasks)
  • Complex: A large network with many layers and neurons (powerful but slower to train)

Simpler networks train faster and need less data, but may not learn very complex patterns. Complex networks can learn sophisticated patterns but need more data and computing power.

When starting out, Medium complexity is often a good choice.

ImageEmbeddingDim

Gets the dimensionality of image embeddings for multimodal networks.

public int ImageEmbeddingDim { get; }

Property Value

int

Remarks

This property specifies the output dimension of the image encoder in multimodal architectures like CLIP. Common values are 768 (ViT-B/32) or 1024 (ViT-L/14). This dimension represents the size of the feature vector produced by the vision transformer or CNN backbone.

For Beginners: This is the size of the "description" vector that represents an image.

When a multimodal model (like CLIP) processes an image:

  • The image encoder converts the image into a vector of numbers
  • This vector captures the "meaning" or "content" of the image
  • ImageEmbeddingDim specifies how many numbers are in this vector

For example:

  • ImageEmbeddingDim = 768 means images become 768-dimensional vectors
  • Larger dimensions can capture more detail but need more computation

This is only used for multimodal networks that process both images and text. For standard image classifiers, use InputHeight/InputWidth instead.

InputDepth

Gets the depth dimension for 3D inputs.

public int InputDepth { get; }

Property Value

int

Remarks

For three-dimensional inputs, this property specifies the depth of the input. For example, for color image data, this would typically be 3 (for RGB channels).

For Beginners: For 3D data (like color images), this is the number of channels or layers.

For example:

  • For a color RGB image: InputDepth = 3 (red, green, blue channels)
  • For a grayscale image: InputDepth = 1

This is only used when working with three-dimensional data. For simpler data types, it's usually set to 1 and doesn't affect the network.

InputDimension

Gets the dimensionality of the input (1, 2, or 3).

public int InputDimension { get; }

Property Value

int

Remarks

This computed property returns the number of dimensions in the input data based on the InputType. It returns 1 for OneDimensional, 2 for TwoDimensional, and 3 for ThreeDimensional.

For Beginners: This tells you how many dimensions your input data has.

It's calculated automatically based on your InputType:

  • 1: For simple lists of values (like customer attributes)
  • 2: For grid data (like grayscale images)
  • 3: For volumetric data (like color images)

This information helps the network properly process the structure of your data.

InputHeight

Gets the height dimension for 2D or 3D inputs.

public int InputHeight { get; }

Property Value

int

Remarks

For two-dimensional or three-dimensional inputs, this property specifies the height of the input. For example, for image data, this would be the height in pixels.

For Beginners: For grid-like data (like images), this is the number of rows.

For example:

  • For a 28×28 image: InputHeight = 28

This is only used when working with multi-dimensional data like images. For simple lists of values, you'd use InputSize instead.

InputSize

Gets or sets the size of the input vector.

public int InputSize { get; }

Property Value

int

Remarks

For one-dimensional inputs, this specifies the number of input features. For multi-dimensional inputs, this represents the total number of input elements (calculated as InputHeight * InputWidth * InputDepth).

For Beginners: This is the total number of input values your network receives.

For example:

  • For a list of 10 customer attributes: InputSize = 10
  • For a 28×28 grayscale image: InputSize = 784 (28×28)
  • For a 32×32 color image: InputSize = 3072 (32×32×3 color channels)

This tells the network how many "inputs" to expect. Think of it like how many separate pieces of information your network will consider at once.

InputType

Gets the type of input the neural network is designed to handle.

public InputType InputType { get; }

Property Value

InputType

Remarks

This property defines the dimensionality of the input data that the neural network is designed to process. Options include OneDimensional (for vector data), TwoDimensional (for matrix data like images), and ThreeDimensional (for volumetric data like color images or video).

For Beginners: This specifies what shape of data your network will process.

Neural networks can handle different types of data:

  • OneDimensional: Simple lists of values (like a customer's age, income, etc.)
  • TwoDimensional: Grid-like data (like a grayscale image)
  • ThreeDimensional: Cube-like data (like a color image with red, green, blue channels)

This is important because the network needs to know how to interpret the input. For example, in a color image, pixels that are next to each other horizontally, vertically, or across color channels have different kinds of relationships.

InputWidth

Gets the width dimension for 2D or 3D inputs.

public int InputWidth { get; }

Property Value

int

Remarks

For two-dimensional or three-dimensional inputs, this property specifies the width of the input. For example, for image data, this would be the width in pixels.

For Beginners: For grid-like data (like images), this is the number of columns.

For example:

  • For a 28×28 image: InputWidth = 28

This is only used when working with multi-dimensional data like images. For simple lists of values, you'd use InputSize instead.

IsInitialized

Gets a value indicating whether the architecture has been initialized.

public bool IsInitialized { get; }

Property Value

bool

Remarks

This property tracks whether the neural network architecture has been properly initialized with all necessary data and configurations. An uninitialized architecture may need to load cached data or perform other initialization steps before the network can be used.

For Beginners: This tells you if the network architecture is ready to use.

Think of this like a checklist before starting:

  • false: The architecture is created but not fully set up yet
  • true: Everything is ready and the network can be used

This is useful because sometimes a network needs to load previously saved data or perform setup steps before it can start training or making predictions.

Layers

Gets the optional list of predefined layers for the neural network.

public List<ILayer<T>>? Layers { get; }

Property Value

List<ILayer<T>>

Remarks

This property allows you to explicitly define the layers that will make up the neural network. If not provided, default layers will be created based on other architecture parameters when the neural network is initialized.

For Beginners: This is where you can specify exact layers for your network.

Think of this as customizing the rooms in your building:

  • Instead of using standard room designs, you specify exactly what you want
  • You control the exact size, type, and connections of each layer
  • This gives you precise control over how your network processes data

If you leave this empty (null), the system will create standard layers based on your other settings, like creating standard rooms based on the overall building design.

OutputSize

Gets the size of the output vector.

public int OutputSize { get; }

Property Value

int

Remarks

This property specifies the dimensionality of the network's output. It represents the number of output neurons in the final layer. For classification tasks, this typically equals the number of classes. For regression tasks, this equals the number of values to predict.

For Beginners: This is how many values your network will output.

For example:

  • For classifying 10 digits (0-9): OutputSize = 10
  • For predicting a single value (like house price): OutputSize = 1
  • For predicting multiple values (like x,y coordinates): OutputSize = 2

Think of this as how many answers your network gives at once.

ShouldReturnFullSequence

Determines whether the network should return the full sequence or just the final output.

public bool ShouldReturnFullSequence { get; }

Property Value

bool

True if full sequence should be returned; otherwise, false.

TaskType

Gets the type of task the neural network is designed to perform.

public NeuralNetworkTaskType TaskType { get; }

Property Value

NeuralNetworkTaskType

Remarks

This property specifies the type of task that the neural network is intended to solve, such as classification (assigning inputs to discrete categories) or regression (predicting continuous values). The task type affects the default network configuration, particularly the output layer and activation functions.

For Beginners: This defines what kind of problem your network is solving.

Common task types include:

  • Classification: Sorting inputs into categories (like "dog" or "cat" for images)
  • Regression: Predicting a number value (like house prices)
  • Sequence Generation: Creating sequences (like text or music)

The task type helps determine how your network should be structured and trained. For example, a classification network typically ends with a Softmax activation to output probabilities for each category, while a regression network might end with a linear activation to output any numerical value.

TextEmbeddingDim

Gets the dimensionality of text embeddings for multimodal networks.

public int TextEmbeddingDim { get; }

Property Value

int

Remarks

This property specifies the output dimension of the text encoder in multimodal architectures like CLIP. Common values are 512 or 768. This dimension represents the size of the feature vector produced by the text transformer.

For Beginners: This is the size of the "description" vector that represents text.

When a multimodal model (like CLIP) processes text:

  • The text encoder converts words/sentences into a vector of numbers
  • This vector captures the "meaning" or "semantics" of the text
  • TextEmbeddingDim specifies how many numbers are in this vector

For example:

  • TextEmbeddingDim = 512 means text becomes 512-dimensional vectors
  • This vector can then be compared with image embeddings

For CLIP-style models, text and image embeddings are projected to the same space, allowing direct comparison between text descriptions and images.

UseAutodiff

Gets or sets a value indicating whether all layers in this architecture should use automatic differentiation by default.

public bool UseAutodiff { get; set; }

Property Value

bool

true if layers should use autodiff by default; false for manual backward implementations. Default is false.

Remarks

This property sets the default autodiff mode for all layers created with this architecture. Individual layers can still override this setting via their UseAutodiff property. Manual backward passes are typically faster but require explicit gradient code, while autodiff is more flexible for custom or experimental layers.

For Beginners: This controls how gradient computation works for all layers in the network.

When building a network from this architecture:

  • false (default): All layers use fast, hand-optimized gradient code
  • true: All layers use automatic differentiation for gradients

Most users should leave this as false (default) for best performance. Set to true only for:

  • Research and experimentation with novel architectures
  • Networks with many custom layers that have complex gradients
  • When you want to verify gradient correctness during development

Note: Individual layers can override this setting. This just sets the default.

Methods

CalculateOutputSize()

Calculates the total size of the output.

public int CalculateOutputSize()

Returns

int

The total number of output elements.

Remarks

This method calculates the total number of elements in the output by multiplying all dimensions of the output shape. For example, if the output shape is [10, 10], the total output size would be 100.

For Beginners: This calculates the total number of values in your network's output.

It multiplies all the dimensions of your output shape:

  • For a shape [10] (like 10 classes): Total is 10
  • For a shape [5, 5] (like a 5×5 grid): Total is 25

Most common networks have simple outputs:

  • Classification: Equal to the number of categories
  • Regression: Equal to the number of values being predicted

But some specialized networks might output matrices or tensors, in which case this method helps calculate the total number of output values.

GetHiddenLayerSizes()

Gets the sizes of the hidden layers in the neural network.

public int[] GetHiddenLayerSizes()

Returns

int[]

An array containing the size of each hidden layer.

Remarks

This method calculates and returns the number of neurons in each hidden layer of the neural network. A hidden layer is any layer between the input and output layers. If no layers are defined or if there are fewer than 3 layers (meaning no hidden layers), an empty array is returned.

For Beginners: This method tells you how many neurons are in each hidden layer.

Hidden layers are the middle layers in your network:

  • They sit between the input layer (which receives your data)
  • And the output layer (which produces the final prediction)
  • They're where most of the pattern recognition happens

For example, if your network has structure [784, 128, 64, 10]:

  • Input layer: 784 neurons
  • First hidden layer: 128 neurons
  • Second hidden layer: 64 neurons
  • Output layer: 10 neurons

This method would return [128, 64], the sizes of just the hidden layers.

This is useful for understanding or visualizing your network's structure.

GetInputShape()

Gets the shape of the input as an array of dimensions.

public int[] GetInputShape()

Returns

int[]

An array representing the input shape.

Remarks

This method returns the shape of the input data as an array of dimensions. The format depends on the InputType: - For OneDimensional: [InputSize] - For TwoDimensional: [InputHeight, InputWidth] - For ThreeDimensional: [InputDepth, InputHeight, InputWidth]

For Beginners: This tells you the exact shape of your input data.

Different types of data have different shapes:

  • 1D data: Returns [size] - like [10] for 10 features
  • 2D data: Returns [height, width] - like [28, 28] for a 28×28 image
  • 3D data: Returns [depth, height, width] - like [3, 32, 32] for a color image

This shape information is important when:

  • Preparing your data for the network
  • Designing compatible layers
  • Debugging shape-related errors

Many neural network errors happen because data shapes don't match up correctly, so this method helps ensure your network is properly configured.

GetLayerSizes()

Gets the size of each layer in the neural network.

public int[] GetLayerSizes()

Returns

int[]

An array containing the size of each layer, including input and output layers.

Remarks

This method returns an array containing the number of neurons in each layer of the neural network, starting with the input layer and ending with the output layer. The size of each layer is calculated as the product of all dimensions in the layer's output shape.

For Beginners: This gives you the number of neurons in each layer of your network.

It returns a list of sizes for all layers, including:

  • The input layer (first value)
  • All hidden layers (middle values)
  • The output layer (last value)

For example, a network for classifying 28×28 images into 10 categories might return: [784, 128, 64, 10]

  • 784: Input layer (28×28 pixels)
  • 128: First hidden layer
  • 64: Second hidden layer
  • 10: Output layer (10 categories)

This is useful for visualizing your network structure or debugging to make sure your layers are the sizes you expect.

GetOutputShape()

Gets the shape of the output as an array of dimensions.

public int[] GetOutputShape()

Returns

int[]

An array representing the output shape.

Remarks

This method returns the shape of the output from the neural network. If layers are defined, it returns the output shape of the final layer. If no layers are defined, it returns the same shape as the input, since the output would be unchanged.

For Beginners: This tells you the shape of your network's output.

For most common networks:

  • Classification networks: Returns [number_of_classes]
  • Regression networks: Returns [number_of_values_to_predict]

For example:

  • A network classifying digits 0-9 would have output shape [10]
  • A network predicting x,y coordinates would have output shape [2]

If you haven't defined any layers yet, this returns the same shape as your input (since with no layers, input flows straight to output).

This helps you understand what shape of data to expect from your network.

InitializeFromCachedData()

Initializes the architecture from cached data.

public void InitializeFromCachedData()

Remarks

This method initializes the neural network architecture using previously cached or saved data. It marks the architecture as initialized once the process is complete. This is useful when loading a pre-trained network or resuming training from a checkpoint.

For Beginners: This method prepares the architecture to use saved information.

Think of this like:

  • Loading a saved game - you want to continue from where you left off
  • Restoring a workspace - bringing back your previous setup
  • Rehydrating freeze-dried food - adding back what was removed to make it usable again

When you train a neural network, you might save its state and come back to it later. This method helps restore that saved state so the network can continue working.

After calling this method, IsInitialized will be set to true, indicating the architecture is ready for use.