Class TrainingStage<T, TInput, TOutput>
Represents a single stage in a training pipeline with comprehensive configuration options.
public class TrainingStage<T, TInput, TOutput>
Type Parameters
TThe numeric data type used for calculations.
TInputThe input data type for the model.
TOutputThe output data type for the model.
- Inheritance
-
TrainingStage<T, TInput, TOutput>
- Inherited Members
Remarks
A training stage encapsulates all configuration needed for one step in a multi-stage training pipeline. Each stage can have its own:
- Training method (SFT, DPO, RLHF, etc.)
- Optimizer and learning rate
- Dataset and validation data
- Layer freezing and LoRA configuration
- Scheduler and warmup settings
- Early stopping and checkpointing
For Beginners: Think of each stage as a chapter in a training book. Each chapter teaches the model something different, and you can configure exactly how that teaching happens.
Properties
AdamBeta1
Gets or sets the Adam beta1 parameter (momentum).
public double AdamBeta1 { get; set; }
Property Value
Remarks
Default is 0.9, standard for Adam/AdamW.
AdamBeta2
Gets or sets the Adam beta2 parameter (RMSprop-like).
public double AdamBeta2 { get; set; }
Property Value
Remarks
Default is 0.999, standard for Adam/AdamW.
AdamEpsilon
Gets or sets the Adam epsilon for numerical stability.
public double AdamEpsilon { get; set; }
Property Value
Remarks
Default is 1e-8, standard for Adam/AdamW.
BatchSize
Gets or sets the batch size for this stage.
public int BatchSize { get; set; }
Property Value
Remarks
Overrides the value in Options if set. Default is 8.
BestCheckpointMetric
Gets or sets the metric to use for determining the best checkpoint.
public CheckpointMetricType BestCheckpointMetric { get; set; }
Property Value
Remarks
Default is Loss (lower is better).
BestCheckpointMetricMaximize
Gets or sets whether higher is better for the best checkpoint metric.
public bool BestCheckpointMetricMaximize { get; set; }
Property Value
Remarks
Default is false (lower is better, appropriate for Loss). Set to true for metrics like Accuracy, F1, BLEU, etc.
Callbacks
Gets or sets stage-specific callbacks.
public StageCallbacks<T, TInput, TOutput>? Callbacks { get; set; }
Property Value
- StageCallbacks<T, TInput, TOutput>
CheckpointSaveEpochs
Gets or sets the checkpoint save interval (in epochs).
public int CheckpointSaveEpochs { get; set; }
Property Value
Remarks
Default is 1 (save every epoch). Set to 0 to use step-based saving only.
CheckpointSaveSteps
Gets or sets the checkpoint save interval (in steps).
public int CheckpointSaveSteps { get; set; }
Property Value
Remarks
Default is 500 steps. Set to 0 to only save at epoch boundaries.
ConstitutionalPrinciples
Gets or sets the constitutional principles for CAI stages.
public string[] ConstitutionalPrinciples { get; set; }
Property Value
- string[]
Remarks
Default includes standard HHH (Helpful, Harmless, Honest) principles.
ContrastiveMargin
Gets or sets the margin for contrastive preference methods.
public double ContrastiveMargin { get; set; }
Property Value
Remarks
Default is 0.0. Used by contrastive methods like CPO.
CritiqueModel
Gets or sets the model to use for generating critiques.
public IFullModel<T, TInput, TOutput>? CritiqueModel { get; set; }
Property Value
- IFullModel<T, TInput, TOutput>
Remarks
If null, uses the model being trained for self-critique.
CritiqueRevisionRounds
Gets or sets the number of critique-revision rounds.
public int CritiqueRevisionRounds { get; set; }
Property Value
Remarks
Default is 2. More rounds = more refined responses but slower training.
CustomEvaluationFunction
Gets or sets the custom evaluation function for this stage.
public Func<IFullModel<T, TInput, TOutput>, FineTuningData<T, TInput, TOutput>?, Task<Dictionary<string, double>>>? CustomEvaluationFunction { get; set; }
Property Value
- Func<IFullModel<T, TInput, TOutput>, FineTuningData<T, TInput, TOutput>, Task<Dictionary<string, double>>>
CustomLossFunction
Gets or sets custom loss function for this stage.
public ILossFunction<T>? CustomLossFunction { get; set; }
Property Value
CustomMetricName
Gets or sets custom metric name when BestCheckpointMetric is Custom.
public string CustomMetricName { get; set; }
Property Value
Remarks
Only used when BestCheckpointMetric is set to Custom.
CustomTrainingFunction
Gets or sets the custom training function for custom stages.
public Func<IFullModel<T, TInput, TOutput>, FineTuningData<T, TInput, TOutput>, CancellationToken, Task<IFullModel<T, TInput, TOutput>>>? CustomTrainingFunction { get; set; }
Property Value
- Func<IFullModel<T, TInput, TOutput>, FineTuningData<T, TInput, TOutput>, CancellationToken, Task<IFullModel<T, TInput, TOutput>>>
DPOBeta
Gets or sets the beta parameter for DPO/IPO loss.
public double DPOBeta { get; set; }
Property Value
Remarks
Default is 0.1. Controls the strength of the KL constraint. Typical range: 0.01-0.5.
DataMixingRatios
Gets or sets the data mixing ratio when combining multiple datasets.
public Dictionary<string, double>? DataMixingRatios { get; set; }
Property Value
Remarks
Keys are dataset names/identifiers, values are sampling weights. For example: { "instruction": 0.5, "conversation": 0.3, "safety": 0.2 }
DataPreprocessor
Gets or sets custom data preprocessing for this stage.
public Func<FineTuningData<T, TInput, TOutput>, FineTuningData<T, TInput, TOutput>>? DataPreprocessor { get; set; }
Property Value
- Func<FineTuningData<T, TInput, TOutput>, FineTuningData<T, TInput, TOutput>>
DataShuffleSeed
Gets or sets the random seed for data shuffling (for reproducibility).
public int? DataShuffleSeed { get; set; }
Property Value
- int?
Description
Gets or sets a description of what this stage accomplishes.
public string? Description { get; set; }
Property Value
DistillationAlpha
Gets or sets the alpha for balancing hard vs soft targets.
public double DistillationAlpha { get; set; }
Property Value
Remarks
Default is 0.5. alpha * soft_loss + (1-alpha) * hard_loss.
DistillationLayerMapping
Gets or sets the layer mapping for intermediate distillation.
public Dictionary<int, int> DistillationLayerMapping { get; set; }
Property Value
- Dictionary<int, int>
Remarks
Maps teacher layers to student layers: { teacherLayerIdx: studentLayerIdx }. Default is empty (auto-map if UseIntermediateDistillation is true).
DistillationTemperature
Gets or sets the distillation temperature.
public double DistillationTemperature { get; set; }
Property Value
Remarks
Default is 2.0. Higher values produce softer probability distributions.
DistributedStrategy
Gets or sets the distributed training strategy for this stage.
public DistributedStrategy DistributedStrategy { get; set; }
Property Value
Remarks
Default is None (single device). Use DataParallel or FSDP for multi-GPU.
EarlyStopping
Gets or sets early stopping configuration specific to this stage.
public EarlyStoppingConfig EarlyStopping { get; set; }
Property Value
Remarks
Default configuration with patience=5 and monitoring loss.
Enabled
Gets or sets whether this stage is enabled (skipped if false).
public bool Enabled { get; set; }
Property Value
EntropyCoefficient
Gets or sets the entropy bonus coefficient for exploration.
public double EntropyCoefficient { get; set; }
Property Value
Remarks
Default is 0.01. Encourages exploration by penalizing deterministic policies.
Epochs
Gets or sets the number of epochs for this stage.
public int Epochs { get; set; }
Property Value
Remarks
Overrides the value in Options if set. Default is 3 epochs.
EvaluationSteps
Gets or sets the evaluation interval (in steps).
public int EvaluationSteps { get; set; }
Property Value
Remarks
Default is 100 steps. Run validation every N steps.
FineTuningMethod
Gets or sets the fine-tuning method to use in this stage.
public FineTuningMethodType FineTuningMethod { get; set; }
Property Value
FreezeBaseModel
Gets or sets whether to freeze the base model during this stage.
public bool FreezeBaseModel { get; set; }
Property Value
Remarks
Default is false (train all layers). Set to true for LoRA or when using adapters.
FrozenLayers
Gets or sets layer names/patterns to freeze during this stage.
public string[] FrozenLayers { get; set; }
Property Value
- string[]
Remarks
Supports patterns like "encoder.", "layer.0-5.", "embedding". Default is empty (no specific layers frozen unless FreezeBaseModel is true).
GAELambda
Gets or sets the GAE lambda for advantage estimation.
public double GAELambda { get; set; }
Property Value
Remarks
Default is 0.95. Controls bias-variance tradeoff in advantage estimation.
GRPOGroupSize
Gets or sets the group size for GRPO.
public int GRPOGroupSize { get; set; }
Property Value
Remarks
Default is 4. Number of responses to generate per prompt for group ranking.
GRPOUseRelativeRewards
Gets or sets whether to use relative rewards in GRPO.
public bool GRPOUseRelativeRewards { get; set; }
Property Value
Remarks
Default is true. Normalize rewards within each group for stable training.
GradientAccumulationSteps
Gets or sets the gradient accumulation steps.
public int GradientAccumulationSteps { get; set; }
Property Value
Remarks
Allows effective larger batch sizes with limited memory. Effective batch size = BatchSize * GradientAccumulationSteps
GradualUnfreezingInterval
Gets or sets the epoch interval for gradual unfreezing.
public int GradualUnfreezingInterval { get; set; }
Property Value
Remarks
Default is 1 epoch. Every N epochs, unfreeze one more layer group.
InitialLossScale
Gets or sets the initial loss scale for mixed precision.
public double InitialLossScale { get; set; }
Property Value
Remarks
Default is 65536.0. Starting scale for dynamic loss scaling.
IsEvaluationOnly
Gets or sets whether this stage is evaluation-only (no training).
public bool IsEvaluationOnly { get; set; }
Property Value
KLPenaltyCoefficient
Gets or sets the KL penalty coefficient for RLHF.
public double KLPenaltyCoefficient { get; set; }
Property Value
Remarks
Default is 0.01. Prevents the policy from diverging too far from the reference.
LearningRate
Gets or sets the learning rate for this stage.
public double LearningRate { get; set; }
Property Value
Remarks
Default is 2e-5, a common choice for fine-tuning pre-trained models.
LoRAAlpha
Gets or sets the LoRA alpha scaling factor.
public double LoRAAlpha { get; set; }
Property Value
Remarks
Default is 32 (2x rank). Effective scaling = alpha / rank.
LoRADropout
Gets or sets the LoRA dropout rate.
public double LoRADropout { get; set; }
Property Value
Remarks
Default is 0.05 (5%). Light dropout helps regularization.
LoRARank
Gets or sets the LoRA rank (dimension of low-rank matrices).
public int LoRARank { get; set; }
Property Value
Remarks
Default is 16. Common values: 4, 8, 16, 32, 64. Higher = more capacity but more parameters.
LoRATargetModules
Gets or sets which modules to apply LoRA to.
public string[] LoRATargetModules { get; set; }
Property Value
- string[]
Remarks
Default targets query and value projections. Common patterns: ["q_proj", "v_proj"], ["query", "key", "value", "output"].
LogGradientNorms
Gets or sets whether to log gradient norms.
public bool LogGradientNorms { get; set; }
Property Value
Remarks
Default is false. Enable to debug gradient issues.
LogLearningRate
Gets or sets whether to log learning rate.
public bool LogLearningRate { get; set; }
Property Value
Remarks
Default is true. Useful for verifying scheduler behavior.
LoggingSteps
Gets or sets the logging interval (in steps).
public int LoggingSteps { get; set; }
Property Value
Remarks
Default is 10 steps. Log training metrics every N steps.
MaxCheckpointsToKeep
Gets or sets the maximum number of checkpoints to keep.
public int MaxCheckpointsToKeep { get; set; }
Property Value
Remarks
Default is 3. Older checkpoints are deleted to save disk space.
MaxDuration
Gets or sets the maximum duration for this stage.
public TimeSpan MaxDuration { get; set; }
Property Value
Remarks
Default is 24 hours. Set to TimeSpan.MaxValue for no limit.
MaxGradientNorm
Gets or sets the maximum gradient norm for gradient clipping.
public double MaxGradientNorm { get; set; }
Property Value
Remarks
Default is 1.0. Set to 0 to disable gradient clipping.
MaxSteps
Gets or sets the maximum number of steps for this stage.
public int MaxSteps { get; set; }
Property Value
Remarks
Default is 0 (no step limit, use Epochs instead).
MaxTrainingSamples
Gets or sets the maximum number of training samples to use.
public int? MaxTrainingSamples { get; set; }
Property Value
- int?
Remarks
Useful for limiting data in curriculum learning or quick experiments.
MergeLoRAAfterTraining
Gets or sets whether to merge LoRA weights into base model after training.
public bool MergeLoRAAfterTraining { get; set; }
Property Value
Remarks
Default is false. Set to true to produce a merged model for deployment.
Metadata
Gets or sets custom metadata for this stage.
public Dictionary<string, object> Metadata { get; set; }
Property Value
Remarks
Empty by default. Use to store custom key-value pairs.
MetricsToTrack
Gets or sets the metrics to track during this stage.
public string[] MetricsToTrack { get; set; }
Property Value
- string[]
Remarks
Default includes loss and perplexity.
MinLearningRate
Gets or sets the minimum learning rate (for schedulers with decay).
public double MinLearningRate { get; set; }
Property Value
Remarks
Default is 0 (learning rate can decay to zero).
MixedPrecisionDType
Gets or sets the mixed precision data type.
public MixedPrecisionType MixedPrecisionDType { get; set; }
Property Value
Remarks
Default is FP16 for broad compatibility. BF16 is better on Ampere+ GPUs.
Name
Gets or sets the name of this stage for logging and identification.
public string Name { get; set; }
Property Value
NumCycles
Gets or sets the number of cycles for cosine scheduler with restarts.
public int NumCycles { get; set; }
Property Value
Remarks
Default is 1 (no restarts, single decay to min learning rate).
OptimizerOverride
Gets or sets the optimizer type override for this stage.
public OptimizerType OptimizerOverride { get; set; }
Property Value
Remarks
Default is AdamW, the standard choice for fine-tuning.
Options
Gets or sets the fine-tuning options for this stage.
public FineTuningOptions<T>? Options { get; set; }
Property Value
PPOClipRange
Gets or sets the PPO clip range.
public double PPOClipRange { get; set; }
Property Value
Remarks
Default is 0.2. Limits how much the policy can change per update.
PPOEpochsPerBatch
Gets or sets the number of PPO epochs per batch.
public int PPOEpochsPerBatch { get; set; }
Property Value
Remarks
Default is 4. Number of times to reuse collected experiences.
PreferenceLabelSmoothing
Gets or sets the label smoothing factor for preference learning.
public double PreferenceLabelSmoothing { get; set; }
Property Value
Remarks
Default is 0.0 (no smoothing). Values like 0.1 can help with noisy preferences.
PreferenceLossType
Gets or sets the loss type for preference optimization.
public PreferenceLossType PreferenceLossType { get; set; }
Property Value
Remarks
Default is Sigmoid (standard DPO).
QLoRABits
Gets or sets the quantization bits for QLoRA.
public int QLoRABits { get; set; }
Property Value
Remarks
Default is 4 bits (most memory efficient). Use 8 for higher precision.
RandomSeed
Gets or sets the random seed for this stage.
public int RandomSeed { get; set; }
Property Value
Remarks
Default is 42. Set to different values for different runs.
ReferenceModelUpdateInterval
Gets or sets the interval (in steps) for updating the reference model.
public int ReferenceModelUpdateInterval { get; set; }
Property Value
Remarks
Default is 100 steps. Only used when UpdateReferenceModel is true.
RejectionSamplingMinReward
Gets or sets the minimum reward threshold for rejection sampling.
public double RejectionSamplingMinReward { get; set; }
Property Value
Remarks
Default is 0.0. Only keep responses with reward above this threshold.
RejectionSamplingN
Gets or sets the number of samples to generate for rejection sampling.
public int RejectionSamplingN { get; set; }
Property Value
Remarks
Default is 10. Generate N responses and select the best ones.
RejectionSamplingTopK
Gets or sets the top-K samples to keep from rejection sampling.
public int RejectionSamplingTopK { get; set; }
Property Value
Remarks
Default is 1. Keep only the best response per prompt.
RewardModel
Gets or sets the reward model to use for RLHF stages.
public IFullModel<T, TInput, TOutput>? RewardModel { get; set; }
Property Value
- IFullModel<T, TInput, TOutput>
Remarks
Required for PPO/RLHF. Can be null if using a reward-free method like DPO.
RolloutSamples
Gets or sets the number of rollout samples per update.
public int RolloutSamples { get; set; }
Property Value
Remarks
Default is 2048. Number of environment steps to collect before each PPO update.
RunCondition
Gets or sets conditions that must be met to run this stage.
public Func<TrainingStageResult<T, TInput, TOutput>?, bool>? RunCondition { get; set; }
Property Value
- Func<TrainingStageResult<T, TInput, TOutput>, bool>
Remarks
If the condition returns false, the stage is skipped. Receives the result of the previous stage (null for first stage).
SaveCheckpointAfter
Gets or sets whether to save a checkpoint after this stage.
public bool SaveCheckpointAfter { get; set; }
Property Value
Remarks
Default is true. Always save after each stage for recovery.
SaveOnlyBest
Gets or sets whether to save only the best checkpoint based on validation metrics.
public bool SaveOnlyBest { get; set; }
Property Value
Remarks
Default is false. When true, only keeps the checkpoint with best metric.
SchedulerPower
Gets or sets the power for polynomial decay scheduler.
public double SchedulerPower { get; set; }
Property Value
Remarks
Default is 1.0 (linear decay). Higher values = faster initial decay.
SchedulerType
Gets or sets the learning rate scheduler type.
public LearningRateSchedulerType SchedulerType { get; set; }
Property Value
Remarks
Default is CosineAnnealing, which works well for most fine-tuning scenarios.
SelfPlayIterations
Gets or sets the number of self-play iterations.
public int SelfPlayIterations { get; set; }
Property Value
Remarks
Default is 3. Number of self-play rounds per training cycle.
SelfPlayResponsesPerPrompt
Gets or sets the number of responses to generate per prompt in self-play.
public int SelfPlayResponsesPerPrompt { get; set; }
Property Value
Remarks
Default is 4. More responses = better coverage but slower training.
SelfPlayTemperature
Gets or sets the generation temperature for self-play responses.
public double SelfPlayTemperature { get; set; }
Property Value
Remarks
Default is 0.7. Higher values = more diverse responses, lower = more focused.
ShareReferenceModel
Gets or sets whether to share reference model with the training model.
public bool ShareReferenceModel { get; set; }
Property Value
Remarks
Default is true (memory efficient). If false, loads a separate copy.
ShuffleData
Gets or sets whether to shuffle the training data each epoch.
public bool ShuffleData { get; set; }
Property Value
StageType
Gets or sets the type of training stage.
public TrainingStageType StageType { get; set; }
Property Value
SyncBatchNorm
Gets or sets whether to sync batch normalization across devices.
public bool SyncBatchNorm { get; set; }
Property Value
Remarks
Default is false. Enable for better accuracy in distributed training.
Tags
Gets or sets tags for categorizing this stage.
public string[] Tags { get; set; }
Property Value
- string[]
Remarks
Empty by default. Useful for filtering and organizing stages.
TeacherModel
Gets or sets the teacher model for distillation stages.
public IFullModel<T, TInput, TOutput>? TeacherModel { get; set; }
Property Value
- IFullModel<T, TInput, TOutput>
Remarks
Required for knowledge distillation. The larger model to distill from.
TrainableLayers
Gets or sets layer names/patterns to unfreeze (train) during this stage.
public string[] TrainableLayers { get; set; }
Property Value
- string[]
Remarks
If FreezeBaseModel is true, only these layers will be trained. Supports patterns like "classifier", "lm_head", "layer.10-11.*". Default is empty (all unfrozen layers are trainable).
TrainingData
Gets or sets the training data for this stage.
public FineTuningData<T, TInput, TOutput>? TrainingData { get; set; }
Property Value
- FineTuningData<T, TInput, TOutput>
UnfreezeTopNLayers
Gets or sets the number of layers to unfreeze from the top.
public int UnfreezeTopNLayers { get; set; }
Property Value
Remarks
Default is 0 (use FrozenLayers/TrainableLayers patterns instead). Common approach: freeze most layers, train only top N layers.
UpdateReferenceModel
Gets or sets whether to update the reference model periodically.
public bool UpdateReferenceModel { get; set; }
Property Value
Remarks
Default is false (frozen reference model).
UseDeterministicAlgorithms
Gets or sets whether to use deterministic algorithms (may be slower).
public bool UseDeterministicAlgorithms { get; set; }
Property Value
Remarks
Default is false. Enable for exact reproducibility at cost of speed.
UseDynamicLossScaling
Gets or sets whether to use dynamic loss scaling for mixed precision.
public bool UseDynamicLossScaling { get; set; }
Property Value
Remarks
Default is true. Required for FP16, optional for BF16.
UseGradientCheckpointing
Gets or sets whether to use gradient checkpointing to save memory.
public bool UseGradientCheckpointing { get; set; }
Property Value
UseGradualUnfreezing
Gets or sets whether to gradually unfreeze layers during training.
public bool UseGradualUnfreezing { get; set; }
Property Value
Remarks
Default is false. When true, layers are unfrozen progressively during training.
UseIntermediateDistillation
Gets or sets whether to use intermediate layer distillation.
public bool UseIntermediateDistillation { get; set; }
Property Value
Remarks
Default is false. When true, also distills intermediate representations.
UseLoRA
Gets or sets whether to use LoRA (Low-Rank Adaptation) for this stage.
public bool UseLoRA { get; set; }
Property Value
Remarks
Default is false (full fine-tuning). Set to true for parameter-efficient training.
UseMixedPrecision
Gets or sets whether to use mixed precision training (FP16/BF16).
public bool UseMixedPrecision { get; set; }
Property Value
Remarks
Default is false (FP32). Enable for faster training with lower memory.
UseQLoRA
Gets or sets whether to use QLoRA (quantized LoRA) for memory efficiency.
public bool UseQLoRA { get; set; }
Property Value
Remarks
Default is false. Set to true for 4-bit or 8-bit quantized training.
UseReferenceModel
Gets or sets whether to use a reference model for preference methods.
public bool UseReferenceModel { get; set; }
Property Value
Remarks
Required for DPO, IPO, etc. Not required for SimPO, ORPO. Default is true (use reference model for KL constraint).
ValidationData
Gets or sets the validation data for this stage.
public FineTuningData<T, TInput, TOutput>? ValidationData { get; set; }
Property Value
- FineTuningData<T, TInput, TOutput>
ValueFunctionCoefficient
Gets or sets the value function coefficient for PPO.
public double ValueFunctionCoefficient { get; set; }
Property Value
Remarks
Default is 0.5. Weight of value loss relative to policy loss.
WarmupRatio
Gets or sets the warmup ratio (fraction of total steps for warmup).
public double WarmupRatio { get; set; }
Property Value
Remarks
Default is 0.1 (10% of training for warmup). If WarmupSteps is set, this is ignored.
WarmupSteps
Gets or sets the number of warmup steps.
public int WarmupSteps { get; set; }
Property Value
Remarks
Default is 0. Set this or WarmupRatio, not both.
WeightDecay
Gets or sets the weight decay (L2 regularization) coefficient.
public double WeightDecay { get; set; }
Property Value
Remarks
Default is 0.01, standard for AdamW.