Interface IFineTuning<T, TInput, TOutput>
- Namespace
- AiDotNet.Interfaces
- Assembly
- AiDotNet.dll
Defines the contract for fine-tuning methods that adapt pre-trained models to specific tasks or preferences.
public interface IFineTuning<T, TInput, TOutput> : IModelSerializer
Type Parameters
TThe numeric data type used for calculations (e.g., float, double).
TInputThe input data type for the model.
TOutputThe output data type for the model.
- Inherited Members
Remarks
Fine-tuning encompasses a wide range of techniques for adapting models, from supervised fine-tuning (SFT) to advanced preference optimization methods like DPO, RLHF, and their variants.
For Beginners: Fine-tuning is like specialized training for an AI that already knows the basics. Just like a doctor goes through general education before specializing, AI models first learn general knowledge (pre-training) and then learn specific skills or behaviors (fine-tuning).
Categories of Fine-Tuning Methods:
- Supervised Fine-Tuning (SFT)Train on labeled input-output pairs
- Reinforcement LearningRLHF, PPO, GRPO - learn from reward signals
- Direct Preference OptimizationDPO, IPO, KTO, SimPO - learn from preference pairs
- Constitutional MethodsCAI, RLAIF - learn from AI-generated feedback with principles
- Self-Play MethodsSPIN - model learns from itself
Properties
Category
Gets the category of this fine-tuning method.
FineTuningCategory Category { get; }
Property Value
MethodName
Gets the name of this fine-tuning method.
string MethodName { get; }
Property Value
Remarks
Examples: "DPO", "RLHF", "SimPO", "ORPO", "SFT", "Constitutional AI"
RequiresReferenceModel
Gets whether this method requires a reference model.
bool RequiresReferenceModel { get; }
Property Value
Remarks
Most preference methods (DPO, IPO) require a reference model for KL regularization. Reference-free methods (SimPO, ORPO) do not require one, making them more memory efficient.
RequiresRewardModel
Gets whether this method requires a reward model.
bool RequiresRewardModel { get; }
Property Value
Remarks
RL-based methods (RLHF, PPO, GRPO) require a reward model. Direct preference methods (DPO, IPO, KTO) do not require one.
SupportsPEFT
Gets whether this method supports parameter-efficient fine-tuning (PEFT).
bool SupportsPEFT { get; }
Property Value
Remarks
When true, this method can be combined with LoRA, QLoRA, or other PEFT techniques to reduce memory requirements during fine-tuning.
Methods
EvaluateAsync(IFullModel<T, TInput, TOutput>, FineTuningData<T, TInput, TOutput>, CancellationToken)
Evaluates the fine-tuning quality of a model.
Task<FineTuningMetrics<T>> EvaluateAsync(IFullModel<T, TInput, TOutput> model, FineTuningData<T, TInput, TOutput> evaluationData, CancellationToken cancellationToken = default)
Parameters
modelIFullModel<T, TInput, TOutput>The fine-tuned model to evaluate.
evaluationDataFineTuningData<T, TInput, TOutput>Evaluation dataset.
cancellationTokenCancellationTokenToken for cancellation.
Returns
- Task<FineTuningMetrics<T>>
Metrics describing the fine-tuning quality.
Remarks
Different fine-tuning methods have different evaluation metrics:
- Preference methodsWin rate against reference, preference accuracy
- RL methodsReward scores, KL divergence from base model
- Safety methodsHarmlessness scores, refusal rates
FineTuneAsync(IFullModel<T, TInput, TOutput>, FineTuningData<T, TInput, TOutput>, CancellationToken)
Fine-tunes a model using the configured method and provided training data.
Task<IFullModel<T, TInput, TOutput>> FineTuneAsync(IFullModel<T, TInput, TOutput> baseModel, FineTuningData<T, TInput, TOutput> trainingData, CancellationToken cancellationToken = default)
Parameters
baseModelIFullModel<T, TInput, TOutput>The pre-trained model to fine-tune.
trainingDataFineTuningData<T, TInput, TOutput>The training data appropriate for this fine-tuning method.
cancellationTokenCancellationTokenToken for cancellation.
Returns
- Task<IFullModel<T, TInput, TOutput>>
The fine-tuned model.
Remarks
This method applies the fine-tuning algorithm to adapt the base model. The specific behavior depends on the method category:
- SFTUses labeled examples from trainingData
- Preference-based (DPO, etc.)Uses preference pairs from trainingData
- RL-based (RLHF, PPO)Uses reward model and training data
For Beginners: This is where the actual training happens. You give it a model and training data, and it returns an improved model that's better at the specific task.
GetOptions()
Gets the configuration options for this fine-tuning method.
FineTuningOptions<T> GetOptions()
Returns
Reset()
Resets the fine-tuning method state.
void Reset()