Class FineTuningOptions<T>
Configuration options for fine-tuning methods.
public class FineTuningOptions<T>
Type Parameters
TThe numeric data type used for calculations.
- Inheritance
-
FineTuningOptions<T>
- Inherited Members
Remarks
This class provides a comprehensive set of options that cover all fine-tuning method categories. Each method type uses the relevant subset of options.
For Beginners: These settings control how the fine-tuning process works. Most settings have sensible defaults based on research papers, so you can start with the defaults and adjust as needed.
Properties
BatchSize
Gets or sets the batch size for training.
public int BatchSize { get; set; }
Property Value
Beta
Gets or sets the beta parameter for DPO-family methods.
public double Beta { get; set; }
Property Value
Remarks
Controls the strength of the KL penalty from the reference model. Higher beta = stay closer to reference model.
Default: 0.1 (DPO paper recommendation)
CheckpointSteps
Gets or sets the checkpoint frequency in steps.
public int CheckpointSteps { get; set; }
Property Value
CompileModel
Gets or sets whether to compile the model for faster training.
public bool CompileModel { get; set; }
Property Value
ConstitutionalPrinciples
Gets or sets the constitutional principles for CAI methods.
public string[] ConstitutionalPrinciples { get; set; }
Property Value
- string[]
Remarks
These principles guide the model's self-critique and revision process.
CritiqueIterations
Gets or sets the number of critique-revision iterations.
public int CritiqueIterations { get; set; }
Property Value
DistillationAlpha
Gets or sets the alpha weight between hard and soft labels.
public double DistillationAlpha { get; set; }
Property Value
Remarks
Alpha = 1.0 uses only soft labels, Alpha = 0.0 uses only hard labels.
Default: 0.5 (balanced)
DistillationTemperature
Gets or sets the distillation temperature.
public double DistillationTemperature { get; set; }
Property Value
Remarks
Higher temperature produces softer probability distributions.
Default: 2.0
EntropyCoefficient
Gets or sets the entropy coefficient for exploration.
public double EntropyCoefficient { get; set; }
Property Value
Epochs
Gets or sets the number of training epochs.
public int Epochs { get; set; }
Property Value
GAELambda
Gets or sets the GAE lambda for advantage estimation.
public double GAELambda { get; set; }
Property Value
Remarks
Default: 0.95 (standard for PPO)
GRPOGroupSize
Gets or sets the group size for GRPO sampling.
public int GRPOGroupSize { get; set; }
Property Value
Remarks
Number of responses to generate per prompt for group comparison.
Default: 8 (DeepSeek recommendation)
GRPOTemperature
Gets or sets the GRPO temperature for sampling.
public double GRPOTemperature { get; set; }
Property Value
Gamma
Gets or sets the discount factor for rewards.
public double Gamma { get; set; }
Property Value
GradientAccumulationSteps
Gets or sets the gradient accumulation steps.
public int GradientAccumulationSteps { get; set; }
Property Value
Remarks
Allows effective batch sizes larger than memory permits.
KLCoefficient
Gets or sets the KL coefficient for RL-based methods.
public double KLCoefficient { get; set; }
Property Value
Remarks
Controls the KL penalty to prevent the model from diverging too far from the reference.
Default: 0.02
KTODesirableWeight
Gets or sets the desirable weight for KTO.
public double KTODesirableWeight { get; set; }
Property Value
Remarks
KTO uses prospect theory with separate weights for desirable/undesirable.
Default: 1.0
KTOUndesirableWeight
Gets or sets the undesirable weight for KTO.
public double KTOUndesirableWeight { get; set; }
Property Value
Remarks
Typically higher to emphasize avoiding bad outputs (loss aversion).
Default: 1.0
LabelSmoothing
Gets or sets the label smoothing factor for preference learning.
public double LabelSmoothing { get; set; }
Property Value
Remarks
Default: 0.0 (no smoothing). Values like 0.1 can help with noisy preferences.
LearningRate
Gets or sets the learning rate for fine-tuning.
public double LearningRate { get; set; }
Property Value
Remarks
Default: 1e-5 (suitable for most preference methods)
LoRAConfig
Gets or sets the LoRA configuration when UseLoRA is true.
public LoRAConfiguration? LoRAConfig { get; set; }
Property Value
LoggingSteps
Gets or sets the logging frequency in steps.
public int LoggingSteps { get; set; }
Property Value
MaxCheckpoints
Gets or sets the maximum number of checkpoints to keep.
public int MaxCheckpoints { get; set; }
Property Value
MaxGradientNorm
Gets or sets the maximum gradient norm for clipping.
public double MaxGradientNorm { get; set; }
Property Value
MaxSequenceLength
Gets or sets the maximum sequence length.
public int MaxSequenceLength { get; set; }
Property Value
MethodType
Gets or sets the fine-tuning method type.
public FineTuningMethodType MethodType { get; set; }
Property Value
ORPOLambda
Gets or sets the lambda parameter for ORPO odds ratio loss.
public double ORPOLambda { get; set; }
Property Value
Remarks
Controls the weight of the odds ratio loss relative to the SFT loss.
Default: 0.1 (ORPO paper recommendation)
PPOClipRange
Gets or sets the PPO clip range.
public double PPOClipRange { get; set; }
Property Value
Remarks
Standard PPO clipping parameter for policy updates.
Default: 0.2
PPOEpochsPerBatch
Gets or sets the number of PPO epochs per batch.
public int PPOEpochsPerBatch { get; set; }
Property Value
RandomSeed
Gets or sets the random seed for reproducibility.
public int? RandomSeed { get; set; }
Property Value
- int?
RankingMargin
Gets or sets the margin for ranking loss.
public double RankingMargin { get; set; }
Property Value
RankingTemperature
Gets or sets the temperature for ranking softmax.
public double RankingTemperature { get; set; }
Property Value
SPINIterations
Gets or sets the number of self-play iterations.
public int SPINIterations { get; set; }
Property Value
SimPOGamma
Gets or sets the gamma parameter for SimPO length normalization.
public double SimPOGamma { get; set; }
Property Value
Remarks
SimPO uses average log probability instead of sum, and gamma controls the target reward margin.
Default: 0.5 (SimPO paper recommendation)
UseLoRA
Gets or sets whether to use LoRA for parameter-efficient fine-tuning.
public bool UseLoRA { get; set; }
Property Value
UseMixedPrecision
Gets or sets whether to use mixed precision training.
public bool UseMixedPrecision { get; set; }
Property Value
ValueCoefficient
Gets or sets the value function coefficient for PPO.
public double ValueCoefficient { get; set; }
Property Value
WarmupRatio
Gets or sets the warmup ratio for learning rate scheduling.
public double WarmupRatio { get; set; }
Property Value
WeightDecay
Gets or sets the weight decay for regularization.
public double WeightDecay { get; set; }