Table of Contents

Class FineTuningOptions<T>

Namespace
AiDotNet.Models.Options
Assembly
AiDotNet.dll

Configuration options for fine-tuning methods.

public class FineTuningOptions<T>

Type Parameters

T

The numeric data type used for calculations.

Inheritance
FineTuningOptions<T>
Inherited Members

Remarks

This class provides a comprehensive set of options that cover all fine-tuning method categories. Each method type uses the relevant subset of options.

For Beginners: These settings control how the fine-tuning process works. Most settings have sensible defaults based on research papers, so you can start with the defaults and adjust as needed.

Properties

BatchSize

Gets or sets the batch size for training.

public int BatchSize { get; set; }

Property Value

int

Beta

Gets or sets the beta parameter for DPO-family methods.

public double Beta { get; set; }

Property Value

double

Remarks

Controls the strength of the KL penalty from the reference model. Higher beta = stay closer to reference model.

Default: 0.1 (DPO paper recommendation)

CheckpointSteps

Gets or sets the checkpoint frequency in steps.

public int CheckpointSteps { get; set; }

Property Value

int

CompileModel

Gets or sets whether to compile the model for faster training.

public bool CompileModel { get; set; }

Property Value

bool

ConstitutionalPrinciples

Gets or sets the constitutional principles for CAI methods.

public string[] ConstitutionalPrinciples { get; set; }

Property Value

string[]

Remarks

These principles guide the model's self-critique and revision process.

CritiqueIterations

Gets or sets the number of critique-revision iterations.

public int CritiqueIterations { get; set; }

Property Value

int

DistillationAlpha

Gets or sets the alpha weight between hard and soft labels.

public double DistillationAlpha { get; set; }

Property Value

double

Remarks

Alpha = 1.0 uses only soft labels, Alpha = 0.0 uses only hard labels.

Default: 0.5 (balanced)

DistillationTemperature

Gets or sets the distillation temperature.

public double DistillationTemperature { get; set; }

Property Value

double

Remarks

Higher temperature produces softer probability distributions.

Default: 2.0

EntropyCoefficient

Gets or sets the entropy coefficient for exploration.

public double EntropyCoefficient { get; set; }

Property Value

double

Epochs

Gets or sets the number of training epochs.

public int Epochs { get; set; }

Property Value

int

GAELambda

Gets or sets the GAE lambda for advantage estimation.

public double GAELambda { get; set; }

Property Value

double

Remarks

Default: 0.95 (standard for PPO)

GRPOGroupSize

Gets or sets the group size for GRPO sampling.

public int GRPOGroupSize { get; set; }

Property Value

int

Remarks

Number of responses to generate per prompt for group comparison.

Default: 8 (DeepSeek recommendation)

GRPOTemperature

Gets or sets the GRPO temperature for sampling.

public double GRPOTemperature { get; set; }

Property Value

double

Gamma

Gets or sets the discount factor for rewards.

public double Gamma { get; set; }

Property Value

double

GradientAccumulationSteps

Gets or sets the gradient accumulation steps.

public int GradientAccumulationSteps { get; set; }

Property Value

int

Remarks

Allows effective batch sizes larger than memory permits.

KLCoefficient

Gets or sets the KL coefficient for RL-based methods.

public double KLCoefficient { get; set; }

Property Value

double

Remarks

Controls the KL penalty to prevent the model from diverging too far from the reference.

Default: 0.02

KTODesirableWeight

Gets or sets the desirable weight for KTO.

public double KTODesirableWeight { get; set; }

Property Value

double

Remarks

KTO uses prospect theory with separate weights for desirable/undesirable.

Default: 1.0

KTOUndesirableWeight

Gets or sets the undesirable weight for KTO.

public double KTOUndesirableWeight { get; set; }

Property Value

double

Remarks

Typically higher to emphasize avoiding bad outputs (loss aversion).

Default: 1.0

LabelSmoothing

Gets or sets the label smoothing factor for preference learning.

public double LabelSmoothing { get; set; }

Property Value

double

Remarks

Default: 0.0 (no smoothing). Values like 0.1 can help with noisy preferences.

LearningRate

Gets or sets the learning rate for fine-tuning.

public double LearningRate { get; set; }

Property Value

double

Remarks

Default: 1e-5 (suitable for most preference methods)

LoRAConfig

Gets or sets the LoRA configuration when UseLoRA is true.

public LoRAConfiguration? LoRAConfig { get; set; }

Property Value

LoRAConfiguration

LoggingSteps

Gets or sets the logging frequency in steps.

public int LoggingSteps { get; set; }

Property Value

int

MaxCheckpoints

Gets or sets the maximum number of checkpoints to keep.

public int MaxCheckpoints { get; set; }

Property Value

int

MaxGradientNorm

Gets or sets the maximum gradient norm for clipping.

public double MaxGradientNorm { get; set; }

Property Value

double

MaxSequenceLength

Gets or sets the maximum sequence length.

public int MaxSequenceLength { get; set; }

Property Value

int

MethodType

Gets or sets the fine-tuning method type.

public FineTuningMethodType MethodType { get; set; }

Property Value

FineTuningMethodType

ORPOLambda

Gets or sets the lambda parameter for ORPO odds ratio loss.

public double ORPOLambda { get; set; }

Property Value

double

Remarks

Controls the weight of the odds ratio loss relative to the SFT loss.

Default: 0.1 (ORPO paper recommendation)

PPOClipRange

Gets or sets the PPO clip range.

public double PPOClipRange { get; set; }

Property Value

double

Remarks

Standard PPO clipping parameter for policy updates.

Default: 0.2

PPOEpochsPerBatch

Gets or sets the number of PPO epochs per batch.

public int PPOEpochsPerBatch { get; set; }

Property Value

int

RandomSeed

Gets or sets the random seed for reproducibility.

public int? RandomSeed { get; set; }

Property Value

int?

RankingMargin

Gets or sets the margin for ranking loss.

public double RankingMargin { get; set; }

Property Value

double

RankingTemperature

Gets or sets the temperature for ranking softmax.

public double RankingTemperature { get; set; }

Property Value

double

SPINIterations

Gets or sets the number of self-play iterations.

public int SPINIterations { get; set; }

Property Value

int

SimPOGamma

Gets or sets the gamma parameter for SimPO length normalization.

public double SimPOGamma { get; set; }

Property Value

double

Remarks

SimPO uses average log probability instead of sum, and gamma controls the target reward margin.

Default: 0.5 (SimPO paper recommendation)

UseLoRA

Gets or sets whether to use LoRA for parameter-efficient fine-tuning.

public bool UseLoRA { get; set; }

Property Value

bool

UseMixedPrecision

Gets or sets whether to use mixed precision training.

public bool UseMixedPrecision { get; set; }

Property Value

bool

ValueCoefficient

Gets or sets the value function coefficient for PPO.

public double ValueCoefficient { get; set; }

Property Value

double

WarmupRatio

Gets or sets the warmup ratio for learning rate scheduling.

public double WarmupRatio { get; set; }

Property Value

double

WeightDecay

Gets or sets the weight decay for regularization.

public double WeightDecay { get; set; }

Property Value

double