Table of Contents

Class TrainingStage<T, TInput, TOutput>

Namespace
AiDotNet.Models.Options
Assembly
AiDotNet.dll

Represents a single stage in a training pipeline with comprehensive configuration options.

public class TrainingStage<T, TInput, TOutput>

Type Parameters

T

The numeric data type used for calculations.

TInput

The input data type for the model.

TOutput

The output data type for the model.

Inheritance
TrainingStage<T, TInput, TOutput>
Inherited Members

Remarks

A training stage encapsulates all configuration needed for one step in a multi-stage training pipeline. Each stage can have its own:

  • Training method (SFT, DPO, RLHF, etc.)
  • Optimizer and learning rate
  • Dataset and validation data
  • Layer freezing and LoRA configuration
  • Scheduler and warmup settings
  • Early stopping and checkpointing

For Beginners: Think of each stage as a chapter in a training book. Each chapter teaches the model something different, and you can configure exactly how that teaching happens.

Properties

AdamBeta1

Gets or sets the Adam beta1 parameter (momentum).

public double AdamBeta1 { get; set; }

Property Value

double

Remarks

Default is 0.9, standard for Adam/AdamW.

AdamBeta2

Gets or sets the Adam beta2 parameter (RMSprop-like).

public double AdamBeta2 { get; set; }

Property Value

double

Remarks

Default is 0.999, standard for Adam/AdamW.

AdamEpsilon

Gets or sets the Adam epsilon for numerical stability.

public double AdamEpsilon { get; set; }

Property Value

double

Remarks

Default is 1e-8, standard for Adam/AdamW.

BatchSize

Gets or sets the batch size for this stage.

public int BatchSize { get; set; }

Property Value

int

Remarks

Overrides the value in Options if set. Default is 8.

BestCheckpointMetric

Gets or sets the metric to use for determining the best checkpoint.

public CheckpointMetricType BestCheckpointMetric { get; set; }

Property Value

CheckpointMetricType

Remarks

Default is Loss (lower is better).

BestCheckpointMetricMaximize

Gets or sets whether higher is better for the best checkpoint metric.

public bool BestCheckpointMetricMaximize { get; set; }

Property Value

bool

Remarks

Default is false (lower is better, appropriate for Loss). Set to true for metrics like Accuracy, F1, BLEU, etc.

Callbacks

Gets or sets stage-specific callbacks.

public StageCallbacks<T, TInput, TOutput>? Callbacks { get; set; }

Property Value

StageCallbacks<T, TInput, TOutput>

CheckpointSaveEpochs

Gets or sets the checkpoint save interval (in epochs).

public int CheckpointSaveEpochs { get; set; }

Property Value

int

Remarks

Default is 1 (save every epoch). Set to 0 to use step-based saving only.

CheckpointSaveSteps

Gets or sets the checkpoint save interval (in steps).

public int CheckpointSaveSteps { get; set; }

Property Value

int

Remarks

Default is 500 steps. Set to 0 to only save at epoch boundaries.

ConstitutionalPrinciples

Gets or sets the constitutional principles for CAI stages.

public string[] ConstitutionalPrinciples { get; set; }

Property Value

string[]

Remarks

Default includes standard HHH (Helpful, Harmless, Honest) principles.

ContrastiveMargin

Gets or sets the margin for contrastive preference methods.

public double ContrastiveMargin { get; set; }

Property Value

double

Remarks

Default is 0.0. Used by contrastive methods like CPO.

CritiqueModel

Gets or sets the model to use for generating critiques.

public IFullModel<T, TInput, TOutput>? CritiqueModel { get; set; }

Property Value

IFullModel<T, TInput, TOutput>

Remarks

If null, uses the model being trained for self-critique.

CritiqueRevisionRounds

Gets or sets the number of critique-revision rounds.

public int CritiqueRevisionRounds { get; set; }

Property Value

int

Remarks

Default is 2. More rounds = more refined responses but slower training.

CustomEvaluationFunction

Gets or sets the custom evaluation function for this stage.

public Func<IFullModel<T, TInput, TOutput>, FineTuningData<T, TInput, TOutput>?, Task<Dictionary<string, double>>>? CustomEvaluationFunction { get; set; }

Property Value

Func<IFullModel<T, TInput, TOutput>, FineTuningData<T, TInput, TOutput>, Task<Dictionary<string, double>>>

CustomLossFunction

Gets or sets custom loss function for this stage.

public ILossFunction<T>? CustomLossFunction { get; set; }

Property Value

ILossFunction<T>

CustomMetricName

Gets or sets custom metric name when BestCheckpointMetric is Custom.

public string CustomMetricName { get; set; }

Property Value

string

Remarks

Only used when BestCheckpointMetric is set to Custom.

CustomTrainingFunction

Gets or sets the custom training function for custom stages.

public Func<IFullModel<T, TInput, TOutput>, FineTuningData<T, TInput, TOutput>, CancellationToken, Task<IFullModel<T, TInput, TOutput>>>? CustomTrainingFunction { get; set; }

Property Value

Func<IFullModel<T, TInput, TOutput>, FineTuningData<T, TInput, TOutput>, CancellationToken, Task<IFullModel<T, TInput, TOutput>>>

DPOBeta

Gets or sets the beta parameter for DPO/IPO loss.

public double DPOBeta { get; set; }

Property Value

double

Remarks

Default is 0.1. Controls the strength of the KL constraint. Typical range: 0.01-0.5.

DataMixingRatios

Gets or sets the data mixing ratio when combining multiple datasets.

public Dictionary<string, double>? DataMixingRatios { get; set; }

Property Value

Dictionary<string, double>

Remarks

Keys are dataset names/identifiers, values are sampling weights. For example: { "instruction": 0.5, "conversation": 0.3, "safety": 0.2 }

DataPreprocessor

Gets or sets custom data preprocessing for this stage.

public Func<FineTuningData<T, TInput, TOutput>, FineTuningData<T, TInput, TOutput>>? DataPreprocessor { get; set; }

Property Value

Func<FineTuningData<T, TInput, TOutput>, FineTuningData<T, TInput, TOutput>>

DataShuffleSeed

Gets or sets the random seed for data shuffling (for reproducibility).

public int? DataShuffleSeed { get; set; }

Property Value

int?

Description

Gets or sets a description of what this stage accomplishes.

public string? Description { get; set; }

Property Value

string

DistillationAlpha

Gets or sets the alpha for balancing hard vs soft targets.

public double DistillationAlpha { get; set; }

Property Value

double

Remarks

Default is 0.5. alpha * soft_loss + (1-alpha) * hard_loss.

DistillationLayerMapping

Gets or sets the layer mapping for intermediate distillation.

public Dictionary<int, int> DistillationLayerMapping { get; set; }

Property Value

Dictionary<int, int>

Remarks

Maps teacher layers to student layers: { teacherLayerIdx: studentLayerIdx }. Default is empty (auto-map if UseIntermediateDistillation is true).

DistillationTemperature

Gets or sets the distillation temperature.

public double DistillationTemperature { get; set; }

Property Value

double

Remarks

Default is 2.0. Higher values produce softer probability distributions.

DistributedStrategy

Gets or sets the distributed training strategy for this stage.

public DistributedStrategy DistributedStrategy { get; set; }

Property Value

DistributedStrategy

Remarks

Default is None (single device). Use DataParallel or FSDP for multi-GPU.

EarlyStopping

Gets or sets early stopping configuration specific to this stage.

public EarlyStoppingConfig EarlyStopping { get; set; }

Property Value

EarlyStoppingConfig

Remarks

Default configuration with patience=5 and monitoring loss.

Enabled

Gets or sets whether this stage is enabled (skipped if false).

public bool Enabled { get; set; }

Property Value

bool

EntropyCoefficient

Gets or sets the entropy bonus coefficient for exploration.

public double EntropyCoefficient { get; set; }

Property Value

double

Remarks

Default is 0.01. Encourages exploration by penalizing deterministic policies.

Epochs

Gets or sets the number of epochs for this stage.

public int Epochs { get; set; }

Property Value

int

Remarks

Overrides the value in Options if set. Default is 3 epochs.

EvaluationSteps

Gets or sets the evaluation interval (in steps).

public int EvaluationSteps { get; set; }

Property Value

int

Remarks

Default is 100 steps. Run validation every N steps.

FineTuningMethod

Gets or sets the fine-tuning method to use in this stage.

public FineTuningMethodType FineTuningMethod { get; set; }

Property Value

FineTuningMethodType

FreezeBaseModel

Gets or sets whether to freeze the base model during this stage.

public bool FreezeBaseModel { get; set; }

Property Value

bool

Remarks

Default is false (train all layers). Set to true for LoRA or when using adapters.

FrozenLayers

Gets or sets layer names/patterns to freeze during this stage.

public string[] FrozenLayers { get; set; }

Property Value

string[]

Remarks

Supports patterns like "encoder.", "layer.0-5.", "embedding". Default is empty (no specific layers frozen unless FreezeBaseModel is true).

GAELambda

Gets or sets the GAE lambda for advantage estimation.

public double GAELambda { get; set; }

Property Value

double

Remarks

Default is 0.95. Controls bias-variance tradeoff in advantage estimation.

GRPOGroupSize

Gets or sets the group size for GRPO.

public int GRPOGroupSize { get; set; }

Property Value

int

Remarks

Default is 4. Number of responses to generate per prompt for group ranking.

GRPOUseRelativeRewards

Gets or sets whether to use relative rewards in GRPO.

public bool GRPOUseRelativeRewards { get; set; }

Property Value

bool

Remarks

Default is true. Normalize rewards within each group for stable training.

GradientAccumulationSteps

Gets or sets the gradient accumulation steps.

public int GradientAccumulationSteps { get; set; }

Property Value

int

Remarks

Allows effective larger batch sizes with limited memory. Effective batch size = BatchSize * GradientAccumulationSteps

GradualUnfreezingInterval

Gets or sets the epoch interval for gradual unfreezing.

public int GradualUnfreezingInterval { get; set; }

Property Value

int

Remarks

Default is 1 epoch. Every N epochs, unfreeze one more layer group.

InitialLossScale

Gets or sets the initial loss scale for mixed precision.

public double InitialLossScale { get; set; }

Property Value

double

Remarks

Default is 65536.0. Starting scale for dynamic loss scaling.

IsEvaluationOnly

Gets or sets whether this stage is evaluation-only (no training).

public bool IsEvaluationOnly { get; set; }

Property Value

bool

KLPenaltyCoefficient

Gets or sets the KL penalty coefficient for RLHF.

public double KLPenaltyCoefficient { get; set; }

Property Value

double

Remarks

Default is 0.01. Prevents the policy from diverging too far from the reference.

LearningRate

Gets or sets the learning rate for this stage.

public double LearningRate { get; set; }

Property Value

double

Remarks

Default is 2e-5, a common choice for fine-tuning pre-trained models.

LoRAAlpha

Gets or sets the LoRA alpha scaling factor.

public double LoRAAlpha { get; set; }

Property Value

double

Remarks

Default is 32 (2x rank). Effective scaling = alpha / rank.

LoRADropout

Gets or sets the LoRA dropout rate.

public double LoRADropout { get; set; }

Property Value

double

Remarks

Default is 0.05 (5%). Light dropout helps regularization.

LoRARank

Gets or sets the LoRA rank (dimension of low-rank matrices).

public int LoRARank { get; set; }

Property Value

int

Remarks

Default is 16. Common values: 4, 8, 16, 32, 64. Higher = more capacity but more parameters.

LoRATargetModules

Gets or sets which modules to apply LoRA to.

public string[] LoRATargetModules { get; set; }

Property Value

string[]

Remarks

Default targets query and value projections. Common patterns: ["q_proj", "v_proj"], ["query", "key", "value", "output"].

LogGradientNorms

Gets or sets whether to log gradient norms.

public bool LogGradientNorms { get; set; }

Property Value

bool

Remarks

Default is false. Enable to debug gradient issues.

LogLearningRate

Gets or sets whether to log learning rate.

public bool LogLearningRate { get; set; }

Property Value

bool

Remarks

Default is true. Useful for verifying scheduler behavior.

LoggingSteps

Gets or sets the logging interval (in steps).

public int LoggingSteps { get; set; }

Property Value

int

Remarks

Default is 10 steps. Log training metrics every N steps.

MaxCheckpointsToKeep

Gets or sets the maximum number of checkpoints to keep.

public int MaxCheckpointsToKeep { get; set; }

Property Value

int

Remarks

Default is 3. Older checkpoints are deleted to save disk space.

MaxDuration

Gets or sets the maximum duration for this stage.

public TimeSpan MaxDuration { get; set; }

Property Value

TimeSpan

Remarks

Default is 24 hours. Set to TimeSpan.MaxValue for no limit.

MaxGradientNorm

Gets or sets the maximum gradient norm for gradient clipping.

public double MaxGradientNorm { get; set; }

Property Value

double

Remarks

Default is 1.0. Set to 0 to disable gradient clipping.

MaxSteps

Gets or sets the maximum number of steps for this stage.

public int MaxSteps { get; set; }

Property Value

int

Remarks

Default is 0 (no step limit, use Epochs instead).

MaxTrainingSamples

Gets or sets the maximum number of training samples to use.

public int? MaxTrainingSamples { get; set; }

Property Value

int?

Remarks

Useful for limiting data in curriculum learning or quick experiments.

MergeLoRAAfterTraining

Gets or sets whether to merge LoRA weights into base model after training.

public bool MergeLoRAAfterTraining { get; set; }

Property Value

bool

Remarks

Default is false. Set to true to produce a merged model for deployment.

Metadata

Gets or sets custom metadata for this stage.

public Dictionary<string, object> Metadata { get; set; }

Property Value

Dictionary<string, object>

Remarks

Empty by default. Use to store custom key-value pairs.

MetricsToTrack

Gets or sets the metrics to track during this stage.

public string[] MetricsToTrack { get; set; }

Property Value

string[]

Remarks

Default includes loss and perplexity.

MinLearningRate

Gets or sets the minimum learning rate (for schedulers with decay).

public double MinLearningRate { get; set; }

Property Value

double

Remarks

Default is 0 (learning rate can decay to zero).

MixedPrecisionDType

Gets or sets the mixed precision data type.

public MixedPrecisionType MixedPrecisionDType { get; set; }

Property Value

MixedPrecisionType

Remarks

Default is FP16 for broad compatibility. BF16 is better on Ampere+ GPUs.

Name

Gets or sets the name of this stage for logging and identification.

public string Name { get; set; }

Property Value

string

NumCycles

Gets or sets the number of cycles for cosine scheduler with restarts.

public int NumCycles { get; set; }

Property Value

int

Remarks

Default is 1 (no restarts, single decay to min learning rate).

OptimizerOverride

Gets or sets the optimizer type override for this stage.

public OptimizerType OptimizerOverride { get; set; }

Property Value

OptimizerType

Remarks

Default is AdamW, the standard choice for fine-tuning.

Options

Gets or sets the fine-tuning options for this stage.

public FineTuningOptions<T>? Options { get; set; }

Property Value

FineTuningOptions<T>

PPOClipRange

Gets or sets the PPO clip range.

public double PPOClipRange { get; set; }

Property Value

double

Remarks

Default is 0.2. Limits how much the policy can change per update.

PPOEpochsPerBatch

Gets or sets the number of PPO epochs per batch.

public int PPOEpochsPerBatch { get; set; }

Property Value

int

Remarks

Default is 4. Number of times to reuse collected experiences.

PreferenceLabelSmoothing

Gets or sets the label smoothing factor for preference learning.

public double PreferenceLabelSmoothing { get; set; }

Property Value

double

Remarks

Default is 0.0 (no smoothing). Values like 0.1 can help with noisy preferences.

PreferenceLossType

Gets or sets the loss type for preference optimization.

public PreferenceLossType PreferenceLossType { get; set; }

Property Value

PreferenceLossType

Remarks

Default is Sigmoid (standard DPO).

QLoRABits

Gets or sets the quantization bits for QLoRA.

public int QLoRABits { get; set; }

Property Value

int

Remarks

Default is 4 bits (most memory efficient). Use 8 for higher precision.

RandomSeed

Gets or sets the random seed for this stage.

public int RandomSeed { get; set; }

Property Value

int

Remarks

Default is 42. Set to different values for different runs.

ReferenceModelUpdateInterval

Gets or sets the interval (in steps) for updating the reference model.

public int ReferenceModelUpdateInterval { get; set; }

Property Value

int

Remarks

Default is 100 steps. Only used when UpdateReferenceModel is true.

RejectionSamplingMinReward

Gets or sets the minimum reward threshold for rejection sampling.

public double RejectionSamplingMinReward { get; set; }

Property Value

double

Remarks

Default is 0.0. Only keep responses with reward above this threshold.

RejectionSamplingN

Gets or sets the number of samples to generate for rejection sampling.

public int RejectionSamplingN { get; set; }

Property Value

int

Remarks

Default is 10. Generate N responses and select the best ones.

RejectionSamplingTopK

Gets or sets the top-K samples to keep from rejection sampling.

public int RejectionSamplingTopK { get; set; }

Property Value

int

Remarks

Default is 1. Keep only the best response per prompt.

RewardModel

Gets or sets the reward model to use for RLHF stages.

public IFullModel<T, TInput, TOutput>? RewardModel { get; set; }

Property Value

IFullModel<T, TInput, TOutput>

Remarks

Required for PPO/RLHF. Can be null if using a reward-free method like DPO.

RolloutSamples

Gets or sets the number of rollout samples per update.

public int RolloutSamples { get; set; }

Property Value

int

Remarks

Default is 2048. Number of environment steps to collect before each PPO update.

RunCondition

Gets or sets conditions that must be met to run this stage.

public Func<TrainingStageResult<T, TInput, TOutput>?, bool>? RunCondition { get; set; }

Property Value

Func<TrainingStageResult<T, TInput, TOutput>, bool>

Remarks

If the condition returns false, the stage is skipped. Receives the result of the previous stage (null for first stage).

SaveCheckpointAfter

Gets or sets whether to save a checkpoint after this stage.

public bool SaveCheckpointAfter { get; set; }

Property Value

bool

Remarks

Default is true. Always save after each stage for recovery.

SaveOnlyBest

Gets or sets whether to save only the best checkpoint based on validation metrics.

public bool SaveOnlyBest { get; set; }

Property Value

bool

Remarks

Default is false. When true, only keeps the checkpoint with best metric.

SchedulerPower

Gets or sets the power for polynomial decay scheduler.

public double SchedulerPower { get; set; }

Property Value

double

Remarks

Default is 1.0 (linear decay). Higher values = faster initial decay.

SchedulerType

Gets or sets the learning rate scheduler type.

public LearningRateSchedulerType SchedulerType { get; set; }

Property Value

LearningRateSchedulerType

Remarks

Default is CosineAnnealing, which works well for most fine-tuning scenarios.

SelfPlayIterations

Gets or sets the number of self-play iterations.

public int SelfPlayIterations { get; set; }

Property Value

int

Remarks

Default is 3. Number of self-play rounds per training cycle.

SelfPlayResponsesPerPrompt

Gets or sets the number of responses to generate per prompt in self-play.

public int SelfPlayResponsesPerPrompt { get; set; }

Property Value

int

Remarks

Default is 4. More responses = better coverage but slower training.

SelfPlayTemperature

Gets or sets the generation temperature for self-play responses.

public double SelfPlayTemperature { get; set; }

Property Value

double

Remarks

Default is 0.7. Higher values = more diverse responses, lower = more focused.

ShareReferenceModel

Gets or sets whether to share reference model with the training model.

public bool ShareReferenceModel { get; set; }

Property Value

bool

Remarks

Default is true (memory efficient). If false, loads a separate copy.

ShuffleData

Gets or sets whether to shuffle the training data each epoch.

public bool ShuffleData { get; set; }

Property Value

bool

StageType

Gets or sets the type of training stage.

public TrainingStageType StageType { get; set; }

Property Value

TrainingStageType

SyncBatchNorm

Gets or sets whether to sync batch normalization across devices.

public bool SyncBatchNorm { get; set; }

Property Value

bool

Remarks

Default is false. Enable for better accuracy in distributed training.

Tags

Gets or sets tags for categorizing this stage.

public string[] Tags { get; set; }

Property Value

string[]

Remarks

Empty by default. Useful for filtering and organizing stages.

TeacherModel

Gets or sets the teacher model for distillation stages.

public IFullModel<T, TInput, TOutput>? TeacherModel { get; set; }

Property Value

IFullModel<T, TInput, TOutput>

Remarks

Required for knowledge distillation. The larger model to distill from.

TrainableLayers

Gets or sets layer names/patterns to unfreeze (train) during this stage.

public string[] TrainableLayers { get; set; }

Property Value

string[]

Remarks

If FreezeBaseModel is true, only these layers will be trained. Supports patterns like "classifier", "lm_head", "layer.10-11.*". Default is empty (all unfrozen layers are trainable).

TrainingData

Gets or sets the training data for this stage.

public FineTuningData<T, TInput, TOutput>? TrainingData { get; set; }

Property Value

FineTuningData<T, TInput, TOutput>

UnfreezeTopNLayers

Gets or sets the number of layers to unfreeze from the top.

public int UnfreezeTopNLayers { get; set; }

Property Value

int

Remarks

Default is 0 (use FrozenLayers/TrainableLayers patterns instead). Common approach: freeze most layers, train only top N layers.

UpdateReferenceModel

Gets or sets whether to update the reference model periodically.

public bool UpdateReferenceModel { get; set; }

Property Value

bool

Remarks

Default is false (frozen reference model).

UseDeterministicAlgorithms

Gets or sets whether to use deterministic algorithms (may be slower).

public bool UseDeterministicAlgorithms { get; set; }

Property Value

bool

Remarks

Default is false. Enable for exact reproducibility at cost of speed.

UseDynamicLossScaling

Gets or sets whether to use dynamic loss scaling for mixed precision.

public bool UseDynamicLossScaling { get; set; }

Property Value

bool

Remarks

Default is true. Required for FP16, optional for BF16.

UseGradientCheckpointing

Gets or sets whether to use gradient checkpointing to save memory.

public bool UseGradientCheckpointing { get; set; }

Property Value

bool

UseGradualUnfreezing

Gets or sets whether to gradually unfreeze layers during training.

public bool UseGradualUnfreezing { get; set; }

Property Value

bool

Remarks

Default is false. When true, layers are unfrozen progressively during training.

UseIntermediateDistillation

Gets or sets whether to use intermediate layer distillation.

public bool UseIntermediateDistillation { get; set; }

Property Value

bool

Remarks

Default is false. When true, also distills intermediate representations.

UseLoRA

Gets or sets whether to use LoRA (Low-Rank Adaptation) for this stage.

public bool UseLoRA { get; set; }

Property Value

bool

Remarks

Default is false (full fine-tuning). Set to true for parameter-efficient training.

UseMixedPrecision

Gets or sets whether to use mixed precision training (FP16/BF16).

public bool UseMixedPrecision { get; set; }

Property Value

bool

Remarks

Default is false (FP32). Enable for faster training with lower memory.

UseQLoRA

Gets or sets whether to use QLoRA (quantized LoRA) for memory efficiency.

public bool UseQLoRA { get; set; }

Property Value

bool

Remarks

Default is false. Set to true for 4-bit or 8-bit quantized training.

UseReferenceModel

Gets or sets whether to use a reference model for preference methods.

public bool UseReferenceModel { get; set; }

Property Value

bool

Remarks

Required for DPO, IPO, etc. Not required for SimPO, ORPO. Default is true (use reference model for KL constraint).

ValidationData

Gets or sets the validation data for this stage.

public FineTuningData<T, TInput, TOutput>? ValidationData { get; set; }

Property Value

FineTuningData<T, TInput, TOutput>

ValueFunctionCoefficient

Gets or sets the value function coefficient for PPO.

public double ValueFunctionCoefficient { get; set; }

Property Value

double

Remarks

Default is 0.5. Weight of value loss relative to policy loss.

WarmupRatio

Gets or sets the warmup ratio (fraction of total steps for warmup).

public double WarmupRatio { get; set; }

Property Value

double

Remarks

Default is 0.1 (10% of training for warmup). If WarmupSteps is set, this is ignored.

WarmupSteps

Gets or sets the number of warmup steps.

public int WarmupSteps { get; set; }

Property Value

int

Remarks

Default is 0. Set this or WarmupRatio, not both.

WeightDecay

Gets or sets the weight decay (L2 regularization) coefficient.

public double WeightDecay { get; set; }

Property Value

double

Remarks

Default is 0.01, standard for AdamW.