Table of Contents

Namespace AiDotNet.NeuralNetworks

Classes

ACGAN<T>

Represents an Auxiliary Classifier Generative Adversarial Network (AC-GAN), which extends conditional GANs by having the discriminator also predict the class label of the input.

AttentionNetwork<T>

Represents a neural network that utilizes attention mechanisms for sequence processing.

AudioVisualCorrespondenceNetwork<T>

Audio-visual correspondence learning network for cross-modal understanding.

AudioVisualEventLocalizationNetwork<T>

Neural network for audio-visual event localization - identifying WHEN and WHERE events occur in video by jointly analyzing audio and visual streams with precise temporal boundaries.

Autoencoder<T>

Represents an autoencoder neural network that can compress data into a lower-dimensional representation and reconstruct it.

BGE<T>

BGE (BAAI General Embedding) neural network implementation. A state-of-the-art retrieval model known for its high accuracy across diverse benchmarks.

BigGAN<T>

BigGAN implementation for large-scale high-fidelity image generation.

For Beginners: BigGAN is a state-of-the-art GAN architecture that generates extremely high-quality images by scaling up training in several ways:

  1. Using very large batch sizes (256-2048 images at once)
  2. Increasing model capacity (more parameters and feature maps)
  3. Using class information to generate specific types of images

Think of it like training an artist:

  • Small batch = showing the artist 1-2 examples at a time
  • BigGAN batch = showing 256+ examples at once for better learning
  • Class conditioning = telling the artist exactly what to draw ("draw a cat" vs "draw something")

Key innovations:

  1. Large Batch Training: Uses batch sizes of 256-2048 (vs typical 32-128)
  2. Spectral Normalization: Stabilizes training for both G and D
  3. Self-Attention: Helps model long-range dependencies in images
  4. Class Conditioning: Uses class embeddings for controlled generation
  5. Truncation Trick: Trade diversity for quality at generation time
  6. Orthogonal Initialization: Better weight initialization
  7. Skip Connections: Direct paths in generator architecture

Based on "Large Scale GAN Training for High Fidelity Natural Image Synthesis" by Brock et al. (2019)

Blip2NeuralNetwork<T>

BLIP-2 (Bootstrapped Language-Image Pre-training 2) neural network for vision-language tasks.

BlipNeuralNetwork<T>

BLIP (Bootstrapped Language-Image Pre-training) neural network for vision-language tasks.

CapsuleNetwork<T>

Represents a Capsule Network, a type of neural network that preserves spatial relationships between features.

ClipModelConfig

Configuration for a CLIP model variant.

ClipModelLoader

Loads CLIP models from HuggingFace Hub or local directories.

ClipNeuralNetwork<T>

CLIP (Contrastive Language-Image Pre-training) neural network that encodes both text and images into a shared embedding space, enabling cross-modal similarity and zero-shot classification.

ColBERT<T>

ColBERT (Contextualized Late Interaction over BERT) neural network implementation. Uses token-level representations for high-precision document retrieval.

ConditionalGAN<T>

Represents a Conditional Generative Adversarial Network (cGAN), which generates data conditioned on additional information such as class labels, attributes, or other contextual data.

Connection<T>

Represents a connection between two nodes in a neural network, particularly used in evolving neural networks.

ConvolutionalNeuralNetwork<T>

Represents a Convolutional Neural Network (CNN) that processes multi-dimensional data.

CycleGAN<T>

Represents a CycleGAN for unpaired image-to-image translation.

DCGAN<T>

Represents a Deep Convolutional Generative Adversarial Network (DCGAN), an architecture that uses convolutional and transposed convolutional layers with specific design guidelines for stable training.

DeepBeliefNetwork<T>

Represents a Deep Belief Network, a generative graphical model composed of multiple layers of Restricted Boltzmann Machines.

DeepBoltzmannMachine<T>

Represents a Deep Boltzmann Machine (DBM), a hierarchical generative model consisting of multiple layers of stochastic neurons.

DeepQNetwork<T>

Represents a Deep Q-Network (DQN), a reinforcement learning algorithm that combines Q-learning with deep neural networks.

DenseNetNetwork<T>

Implements the DenseNet (Densely Connected Convolutional Network) architecture.

DifferentiableNeuralComputer<T>

Represents a Differentiable Neural Computer (DNC), a neural network architecture that combines neural networks with external memory resources.

EchoStateNetwork<T>

Represents an Echo State Network (ESN), a type of recurrent neural network with a sparsely connected hidden layer called a reservoir.

EfficientNetNetwork<T>

Implements the EfficientNet architecture with compound scaling.

ExtremeLearningMachine<T>

Represents an Extreme Learning Machine (ELM), a type of feedforward neural network with a unique training approach.

FastText<T>

FastText neural network implementation, an extension of Word2Vec that considers subword information.

FeedForwardNeuralNetwork<T>

Represents a Feed-Forward Neural Network (FFNN) for processing data in a forward path.

FlamingoNeuralNetwork<T>

Flamingo neural network for in-context visual learning and few-shot tasks.

GRUNeuralNetwork<T>

Represents a Gated Recurrent Unit (GRU) Neural Network for processing sequential data.

GenerativeAdversarialNetwork<T>

Represents a Generative Adversarial Network (GAN), a deep learning architecture that consists of two neural networks (a generator and a discriminator) competing against each other in a zero-sum game.

Genome<T>

Represents a genome in a neuroevolutionary algorithm, containing a collection of connections between nodes.

GloVe<T>

GloVe (Global Vectors for Word Representation) neural network implementation.

Gpt4VisionNeuralNetwork<T>

GPT-4V-style neural network that combines vision understanding with large language model capabilities.

GraphAttentionNetwork<T>

Represents a Graph Attention Network (GAT) that uses attention mechanisms to process graph-structured data.

GraphGenerationModel<T>

Represents a Graph Generation Model using Variational Autoencoder (VAE) architecture.

GraphIsomorphismNetwork<T>

Represents a Graph Isomorphism Network (GIN) for powerful graph representation learning.

GraphNeuralNetwork<T>

Represents a Graph Neural Network that can process data represented as graphs.

GraphSAGENetwork<T>

Represents a GraphSAGE (Graph Sample and Aggregate) Network for inductive learning on graphs.

HTMNetwork<T>

Represents a Hierarchical Temporal Memory (HTM) network, a biologically-inspired sequence learning algorithm.

HopeNetwork<T>

Hope architecture - a self-modifying recurrent neural network variant of Titans with unbounded levels of in-context learning. Core innovation of Google's Nested Learning paradigm.

HopfieldNetwork<T>

Represents a Hopfield Network, a recurrent neural network designed for pattern storage and retrieval.

HyperbolicNeuralNetwork<T>

Represents a Hyperbolic Neural Network for learning hierarchical representations in Poincare ball space.

ImageBindNeuralNetwork<T>

ImageBind neural network for binding multiple modalities (6+) into a shared embedding space.

InfoGAN<T>

Represents an Information Maximizing Generative Adversarial Network (InfoGAN), which learns disentangled representations in an unsupervised manner by maximizing mutual information between latent codes and generated observations.

InstructorEmbedding<T>

Instructor/E5 (Instruction-Tuned) embedding model implementation. Uses task-specific instructions to adapt embeddings for different use cases.

LLaVANeuralNetwork<T>

LLaVA (Large Language and Vision Assistant) neural network for visual instruction following.

LSTMNeuralNetwork<T>

Represents a Long Short-Term Memory (LSTM) Neural Network, which is specialized for processing sequential data like text, time series, or audio.

LiquidStateMachine<T>

Represents a Liquid State Machine (LSM), a type of reservoir computing neural network.

MatryoshkaEmbedding<T>

Matryoshka Representation Learning (MRL) neural network implementation. Learns nested embeddings where smaller prefixes of the full vector are valid representations.

MemoryNetwork<T>

Represents a Memory Network, a neural network architecture designed with explicit memory components for improved reasoning and question answering capabilities.

MeshCNN<T>

Implements the MeshCNN architecture for processing 3D triangle meshes.

MixtureOfExpertsNeuralNetwork<T>

Represents a Mixture-of-Experts (MoE) neural network that routes inputs through multiple specialist networks.

MobileNetV2Network<T>

Implements the MobileNetV2 architecture for efficient mobile inference.

MobileNetV3Network<T>

Implements the MobileNetV3 architecture for efficient mobile inference.

NEAT<T>

Represents a NeuroEvolution of Augmenting Topologies (NEAT) algorithm implementation, which evolves neural networks through genetic algorithms.

NeuralNetworkArchitecture<T>

Defines the structure and configuration of a neural network, including its layers, input/output dimensions, and task-specific properties.

NeuralNetworkBase<T>

Base class for all neural network implementations in AiDotNet.

NeuralNetwork<T>

A neural network implementation that processes data through multiple layers to make predictions.

NeuralTuringMachine<T>

Represents a Neural Turing Machine, which is a neural network architecture that combines a neural network with external memory.

OccupancyNeuralNetwork<T>

Represents a Neural Network specialized for occupancy detection and prediction in spaces.

OctonionNeuralNetwork<T>

Represents an Octonion-valued Neural Network for processing data in 8-dimensional hypercomplex space.

Pix2Pix<T>

Represents a Pix2Pix GAN for paired image-to-image translation tasks.

ProgressiveGAN<T>

Production-ready Progressive GAN (ProGAN) implementation that generates high-resolution images by progressively growing the generator and discriminator during training.

For Beginners: Progressive GAN is a technique for training GANs that can generate very high-resolution images (e.g., 1024x1024 pixels). Instead of trying to generate high-resolution images from the start, it begins by generating small images (e.g., 4x4) and progressively adds new layers to both the generator and discriminator to increase the resolution (4x4 → 8x8 → 16x16 → 32x32 → 64x64 → 128x128 → 256x256 → 1024x1024).

Key innovations:

  1. Progressive Growing: Start with low resolution and gradually add layers
  2. Smooth Fade-in: New layers are faded in smoothly using a blending parameter (alpha)
  3. Minibatch Standard Deviation: Helps prevent mode collapse by adding diversity
  4. Equalized Learning Rate: Normalizes weights at runtime for better training dynamics
  5. Pixel Normalization: Normalizes feature vectors in generator to prevent escalation

Based on "Progressive Growing of GANs for Improved Quality, Stability, and Variation" by Karras et al. (2018)

QuantumNeuralNetwork<T>

Represents a Quantum Neural Network, which combines quantum computing principles with neural network architecture.

RadialBasisFunctionNetwork<T>

Represents a Radial Basis Function Network, which is a type of neural network that uses radial basis functions as activation functions.

RecurrentNeuralNetwork<T>

Represents a Recurrent Neural Network, which is a type of neural network designed to process sequential data by maintaining an internal state.

ResNetNetwork<T>

Represents a ResNet (Residual Network) neural network architecture for image classification.

ResidualNeuralNetwork<T>

Represents a Residual Neural Network, which is a type of neural network that uses skip connections to address the vanishing gradient problem in deep networks.

RestrictedBoltzmannMachine<T>

Represents a Restricted Boltzmann Machine, which is a type of neural network that learns probability distributions over its inputs.

SAGAN<T>

Self-Attention GAN (SAGAN) implementation that uses self-attention mechanisms to model long-range dependencies in generated images.

For Beginners: Traditional CNNs in GANs only look at nearby pixels (local receptive fields). This works well for textures and local patterns, but struggles with global structure and long-range relationships (like making sure both eyes of a face look similar, or ensuring consistent geometric patterns).

Self-Attention solves this by letting each pixel "attend to" all other pixels, similar to how Transformers work in NLP. Think of it as:

  • CNN: "I can only see my immediate neighbors"
  • Self-Attention: "I can see the entire image and decide what's important"

Example: When generating a dog's face:

  • CNN: Might make one ear pointy and one floppy (inconsistent)
  • SAGAN: Notices both ears and makes them match (consistent)

Key innovations:

  1. Self-Attention Layers: Allow modeling of long-range dependencies
  2. Spectral Normalization: Stabilizes training for both G and D
  3. Hinge Loss: More stable than standard GAN loss
  4. Two Time-Scale Update Rule (TTUR): Different learning rates for G and D
  5. Conditional Batch Normalization: For class-conditional generation

Based on "Self-Attention Generative Adversarial Networks" by Zhang et al. (2019)

SGPT<T>

SGPT (Sentence GPT) neural network implementation using decoder-only transformer architectures.

SPLADE<T>

SPLADE (Sparse Lexical and Expansion Model) neural network implementation. Maps text to a high-dimensional sparse vector in the vocabulary space.

SelfOrganizingMap<T>

Represents a Self-Organizing Map, which is an unsupervised neural network that produces a low-dimensional representation of input data.

SiameseNetwork<T>

Implements a Siamese Neural Network for comparing pairs of inputs and determining their similarity.

SiameseNeuralNetwork<T>

Siamese Neural Network implementation for dual-encoder comparison and similarity learning.

SimCSE<T>

SimCSE (Simple Contrastive Learning of Sentence Embeddings) neural network implementation.

SparseNeuralNetwork<T>

Represents a Sparse Neural Network with efficient sparse weight matrices.

SpikingNeuralNetwork<T>

Represents a Spiking Neural Network, which is a type of neural network that more closely models biological neurons with temporal dynamics.

SpiralNet<T>

Implements the SpiralNet++ architecture for mesh-based deep learning.

StyleGAN<T>

Represents a StyleGAN (Style-Based Generator Architecture for GANs) that generates high-quality images with fine-grained control over image style at different levels.

SuperNet<T>

SuperNet implementation for gradient-based neural architecture search (DARTS). Implements a differentiable architecture search by maintaining architecture parameters (alpha) and network weights simultaneously.

TransformerArchitecture<T>

Defines the architecture configuration for a Transformer neural network.

TransformerEmbeddingNetwork<T>

A customizable Transformer-based embedding network. This serves as the high-performance foundation for modern sentence and document encoders.

Transformer<T>

Represents a Transformer neural network architecture, which is particularly effective for sequence-based tasks like natural language processing.

UNet3D<T>

Represents a 3D U-Net neural network for volumetric semantic segmentation.

UnifiedMultimodalNetwork<T>

Unified multimodal network that handles text, images, audio, and video in a single architecture with cross-modal attention and any-to-any generation.

VGGNetwork<T>

Represents a VGG (Visual Geometry Group) neural network architecture for image classification.

VariationalAutoencoder<T>

Represents a Variational Autoencoder (VAE) neural network architecture, which is used for generating new data similar to the training data and learning compressed representations.

VideoCLIPNeuralNetwork<T>

VideoCLIP neural network for video-text alignment and temporal understanding.

VisionTransformer<T>

Implements the Vision Transformer (ViT) architecture for image classification tasks.

VoxelCNN<T>

Represents a Voxel-based 3D Convolutional Neural Network for processing volumetric data.

WGANGP<T>

Represents a Wasserstein GAN with Gradient Penalty (WGAN-GP), an improved version of WGAN that uses gradient penalty instead of weight clipping to enforce the Lipschitz constraint.

WGAN<T>

Represents a Wasserstein Generative Adversarial Network (WGAN), which uses the Wasserstein distance (Earth Mover's distance) to measure the difference between the generated and real data distributions.

Word2Vec<T>

Word2Vec neural network implementation supporting both Skip-Gram and CBOW architectures.

Enums

TransformerEmbeddingNetwork<T>.PoolingStrategy

Defines the available pooling strategies for creating a single sentence embedding.