Table of Contents

Enum SSLMethodType

Namespace
AiDotNet.Enums
Assembly
AiDotNet.dll

Specifies the type of self-supervised learning method to use for representation learning.

public enum SSLMethodType

Fields

BYOL = 4

BYOL: Bootstrap Your Own Latent (Grill et al., 2020). Non-contrastive method using momentum encoder without negative samples.

Best for: Avoiding negative sample mining, asymmetric networks.

Key Parameters: Momentum (0.99-0.999), predictor MLP.

Pros: No negative samples needed, robust to batch size.

Cons: Requires careful design to prevent collapse.

BarlowTwins = 6

Barlow Twins: Self-Supervised Learning via Redundancy Reduction (Zbontar et al., 2021). Reduces redundancy in embeddings by making cross-correlation close to identity.

Best for: Interpretable approach, avoiding collapse naturally.

Key Parameters: Lambda (redundancy reduction weight), projection dimension.

Pros: Interpretable loss, naturally avoids collapse, no negative samples.

Cons: Requires careful scaling of loss terms.

DINO = 7

DINO: Emerging Properties in Self-Supervised Vision Transformers (Caron et al., 2021). Self-distillation with no labels using centering and sharpening.

Best for: Vision Transformers, emergent attention properties.

Key Parameters: Teacher temperature (0.04-0.07), centering momentum.

Pros: Emergent attention maps, strong ViT performance.

Cons: Primarily designed for Vision Transformers.

MAE = 9

MAE: Masked Autoencoders Are Scalable Vision Learners (He et al., 2022). Generative approach that reconstructs masked image patches.

Best for: Efficient pretraining, generative understanding.

Key Parameters: Mask ratio (0.75), decoder depth.

Pros: Efficient (only encode visible patches), scalable.

Cons: May require fine-tuning for best downstream performance.

MoCo = 1

MoCo: Momentum Contrast for Unsupervised Visual Representation Learning (He et al., 2020). Uses a momentum encoder and memory queue for efficient contrastive learning.

Best for: Limited GPU memory, consistent negative samples.

Key Parameters: Queue size (65536), momentum (0.999), temperature (0.07).

Pros: Memory efficient, consistent negative samples, good performance.

Cons: More complex than SimCLR, requires momentum encoder.

MoCoV2 = 2

MoCo v2: Improved Baselines with Momentum Contrastive Learning (Chen et al., 2020). Adds MLP projection head and stronger augmentations to MoCo.

Best for: Better performance than MoCo v1 with similar efficiency.

Key Parameters: Same as MoCo plus MLP projection head.

Pros: Combines MoCo efficiency with SimCLR improvements.

Cons: Slightly more complex than MoCo v1.

MoCoV3 = 3

MoCo v3: An Empirical Study of Training Self-Supervised Vision Transformers (Chen et al., 2021). Adapted for Vision Transformers without memory queue.

Best for: Vision Transformers (ViT), modern architectures.

Key Parameters: Momentum (0.99-0.999), symmetric loss.

Pros: Optimized for ViT, simpler than MoCo v1/v2.

Cons: Best suited for transformer architectures.

SimCLR = 0

SimCLR: A Simple Framework for Contrastive Learning of Visual Representations (Chen et al., 2020). Uses large batch contrastive learning with strong augmentations.

Best for: Simple setup, strong performance, research baselines.

Key Parameters: Temperature (0.1-0.5), batch size (256-8192), projection dimension (128).

Pros: Simple architecture, strong performance, well-understood.

Cons: Requires large batch sizes for best performance.

SimSiam = 5
iBOT = 8

iBOT: Image BERT Pre-Training with Online Tokenizer (Zhou et al., 2022). Combines masked image modeling with self-distillation.

Best for: Combining generative and discriminative approaches.

Key Parameters: Mask ratio (0.4), patch tokenizer.

Pros: Best of both worlds (DINO + MAE-like objectives).

Cons: More complex than pure DINO or MAE.

Remarks

For Beginners: Self-supervised learning (SSL) methods learn useful representations from unlabeled data by creating "pretext tasks" - artificial tasks that force the model to learn meaningful features. Different methods use different strategies to achieve this.

Choosing a Method:

  • Use SimCLR for simplicity and good performance (no memory bank needed)
  • Use MoCo variants for large batch sizes or limited GPU memory
  • Use BYOL or SimSiam to avoid negative sample mining
  • Use BarlowTwins for interpretable redundancy-reduction approach
  • Use DINO for Vision Transformers with self-distillation
  • Use MAE for generative masked autoencoding approach