Enum SSLMethodType

Namespace: AiDotNet.Enums

Assembly: AiDotNet.dll

Specifies the type of self-supervised learning method to use for representation learning.

public enum SSLMethodType

Fields

BYOL = 4

BYOL: Bootstrap Your Own Latent (Grill et al., 2020). Non-contrastive method using momentum encoder without negative samples.

Best for: Avoiding negative sample mining, asymmetric networks.

Key Parameters: Momentum (0.99-0.999), predictor MLP.

Pros: No negative samples needed, robust to batch size.

Cons: Requires careful design to prevent collapse.

BarlowTwins = 6

Barlow Twins: Self-Supervised Learning via Redundancy Reduction (Zbontar et al., 2021). Reduces redundancy in embeddings by making cross-correlation close to identity.

Best for: Interpretable approach, avoiding collapse naturally.

Key Parameters: Lambda (redundancy reduction weight), projection dimension.

Pros: Interpretable loss, naturally avoids collapse, no negative samples.

Cons: Requires careful scaling of loss terms.

DINO = 7

DINO: Emerging Properties in Self-Supervised Vision Transformers (Caron et al., 2021). Self-distillation with no labels using centering and sharpening.

Best for: Vision Transformers, emergent attention properties.

Key Parameters: Teacher temperature (0.04-0.07), centering momentum.

Pros: Emergent attention maps, strong ViT performance.

Cons: Primarily designed for Vision Transformers.

MAE = 9

MAE: Masked Autoencoders Are Scalable Vision Learners (He et al., 2022). Generative approach that reconstructs masked image patches.

Best for: Efficient pretraining, generative understanding.

Key Parameters: Mask ratio (0.75), decoder depth.

Pros: Efficient (only encode visible patches), scalable.

Cons: May require fine-tuning for best downstream performance.

MoCo = 1

MoCo: Momentum Contrast for Unsupervised Visual Representation Learning (He et al., 2020). Uses a momentum encoder and memory queue for efficient contrastive learning.

Best for: Limited GPU memory, consistent negative samples.

Key Parameters: Queue size (65536), momentum (0.999), temperature (0.07).

Pros: Memory efficient, consistent negative samples, good performance.

Cons: More complex than SimCLR, requires momentum encoder.

MoCoV2 = 2

MoCo v2: Improved Baselines with Momentum Contrastive Learning (Chen et al., 2020). Adds MLP projection head and stronger augmentations to MoCo.

Best for: Better performance than MoCo v1 with similar efficiency.

Key Parameters: Same as MoCo plus MLP projection head.

Pros: Combines MoCo efficiency with SimCLR improvements.

Cons: Slightly more complex than MoCo v1.

MoCoV3 = 3

MoCo v3: An Empirical Study of Training Self-Supervised Vision Transformers (Chen et al., 2021). Adapted for Vision Transformers without memory queue.

Best for: Vision Transformers (ViT), modern architectures.

Key Parameters: Momentum (0.99-0.999), symmetric loss.

Pros: Optimized for ViT, simpler than MoCo v1/v2.

Cons: Best suited for transformer architectures.

SimCLR = 0

SimCLR: A Simple Framework for Contrastive Learning of Visual Representations (Chen et al., 2020). Uses large batch contrastive learning with strong augmentations.

Best for: Simple setup, strong performance, research baselines.

Key Parameters: Temperature (0.1-0.5), batch size (256-8192), projection dimension (128).

Pros: Simple architecture, strong performance, well-understood.

Cons: Requires large batch sizes for best performance.

SimSiam = 5

iBOT = 8

iBOT: Image BERT Pre-Training with Online Tokenizer (Zhou et al., 2022). Combines masked image modeling with self-distillation.

Best for: Combining generative and discriminative approaches.

Key Parameters: Mask ratio (0.4), patch tokenizer.

Pros: Best of both worlds (DINO + MAE-like objectives).

Cons: More complex than pure DINO or MAE.

Remarks

For Beginners: Self-supervised learning (SSL) methods learn useful representations from unlabeled data by creating "pretext tasks" - artificial tasks that force the model to learn meaningful features. Different methods use different strategies to achieve this.

Choosing a Method:

Use SimCLR for simplicity and good performance (no memory bank needed)
Use MoCo variants for large batch sizes or limited GPU memory
Use BYOL or SimSiam to avoid negative sample mining
Use BarlowTwins for interpretable redundancy-reduction approach
Use DINO for Vision Transformers with self-distillation
Use MAE for generative masked autoencoding approach

Table of Contents

Enum SSLMethodType

Fields

Remarks