Table of Contents

Class DINO<T>

Namespace
AiDotNet.SelfSupervisedLearning
Assembly
AiDotNet.dll

DINO: Self-Distillation with No Labels - a self-supervised method for Vision Transformers.

public class DINO<T> : TeacherStudentSSL<T>, ISSLMethod<T>

Type Parameters

T

The numeric type used for computations.

Inheritance
DINO<T>
Implements
Inherited Members

Remarks

For Beginners: DINO is a self-supervised method specifically designed for Vision Transformers (ViT). It learns by having a student network predict the output of a teacher network, where the teacher is an EMA of the student.

Key innovations:

  • Self-distillation: Student learns from teacher's soft labels
  • Centering and sharpening: Prevents collapse without negative samples
  • Multi-crop training: Uses global and local crops for efficiency
  • Emergent properties: Learns features that segment objects without supervision

Architecture:

Global views → Teacher → Softmax(z/τ_t - center) → P_t
All views → Student → Softmax(z/τ_s) → P_s
Loss: Cross-entropy(P_s, P_t)

Reference: Caron et al., "Emerging Properties in Self-Supervised Vision Transformers" (ICCV 2021)

Constructors

DINO(INeuralNetwork<T>, IMomentumEncoder<T>, IProjectorHead<T>, IProjectorHead<T>, int, SSLConfig?)

Initializes a new instance of the DINO class.

public DINO(INeuralNetwork<T> studentEncoder, IMomentumEncoder<T> teacherEncoder, IProjectorHead<T> studentProjector, IProjectorHead<T> teacherProjector, int outputDim = 65536, SSLConfig? config = null)

Parameters

studentEncoder INeuralNetwork<T>

The student encoder (ViT recommended).

teacherEncoder IMomentumEncoder<T>

The teacher encoder (momentum-updated copy).

studentProjector IProjectorHead<T>

Projection head for student.

teacherProjector IProjectorHead<T>

Projection head for teacher.

outputDim int

Output dimension of the projection heads.

config SSLConfig

Optional SSL configuration.

Properties

Category

Gets the category of this SSL method.

public override SSLMethodCategory Category { get; }

Property Value

SSLMethodCategory

Remarks

Categories include Contrastive, NonContrastive, Generative, and SelfDistillation.

Name

Gets the name of this SSL method.

public override string Name { get; }

Property Value

string

Remarks

Examples: "SimCLR", "MoCo v2", "BYOL", "DINO", "MAE"

Methods

Create(INeuralNetwork<T>, Func<INeuralNetwork<T>, INeuralNetwork<T>>, int, int, int, int)

Creates a DINO instance with default configuration.

public static DINO<T> Create(INeuralNetwork<T> encoder, Func<INeuralNetwork<T>, INeuralNetwork<T>> createEncoderCopy, int encoderOutputDim, int projectionDim = 256, int hiddenDim = 2048, int outputDim = 65536)

Parameters

encoder INeuralNetwork<T>

The backbone encoder (ViT recommended).

createEncoderCopy Func<INeuralNetwork<T>, INeuralNetwork<T>>

Function to create a copy of the encoder for teacher.

encoderOutputDim int

Output dimension of the encoder.

projectionDim int

Dimension of the projection space (default: 256).

hiddenDim int

Hidden dimension of the projector MLP (default: 2048).

outputDim int

Output dimension for softmax (default: 65536).

Returns

DINO<T>

A configured DINO instance.

TrainStepCore(Tensor<T>, SSLAugmentationContext<T>?)

Implementation-specific training step logic.

protected override SSLStepResult<T> TrainStepCore(Tensor<T> batch, SSLAugmentationContext<T>? augmentationContext)

Parameters

batch Tensor<T>

The input batch tensor.

augmentationContext SSLAugmentationContext<T>

Optional augmentation context.

Returns

SSLStepResult<T>

The result of the training step.