Class FineTuningData<T, TInput, TOutput>
Container for fine-tuning training and evaluation data.
public class FineTuningData<T, TInput, TOutput>
Type Parameters
TThe numeric data type.
TInputThe input data type.
TOutputThe output data type.
- Inheritance
-
FineTuningData<T, TInput, TOutput>
- Inherited Members
Remarks
This class holds data for various fine-tuning methods. Different methods use different subsets of the data:
- SFTUses Inputs and Outputs
- Preference methodsUses Inputs, ChosenOutputs, RejectedOutputs
- RL methodsUses Inputs and Rewards
- Ranking methodsUses Inputs and RankedOutputs
For Beginners: Think of this as a container that holds all the training examples the model needs to learn from. Different training methods need different kinds of examples.
Properties
Advantages
Gets or sets advantages for PPO-style methods.
public double[] Advantages { get; set; }
Property Value
- double[]
ChosenOutputs
Gets or sets the chosen (preferred) outputs for preference learning.
public TOutput[] ChosenOutputs { get; set; }
Property Value
- TOutput[]
Remarks
Used by DPO, IPO, KTO, SimPO, CPO, and other preference methods.
Count
Gets the number of samples in the dataset.
public int Count { get; }
Property Value
CritiqueRevisions
Gets or sets critique-revision pairs for constitutional methods.
public (TOutput Original, string Critique, TOutput Revised)[] CritiqueRevisions { get; set; }
Property Value
Remarks
Each tuple contains (original response, critique, revised response).
DesirabilityLabels
Gets or sets binary labels indicating if outputs are desirable.
public bool[] DesirabilityLabels { get; set; }
Property Value
- bool[]
Remarks
Used by KTO which doesn't require pairwise data. True = desirable, False = undesirable.
HasDistillationData
Gets whether this data is suitable for distillation.
public bool HasDistillationData { get; }
Property Value
HasPairwisePreferenceData
Gets whether this data is suitable for pairwise preference methods.
public bool HasPairwisePreferenceData { get; }
Property Value
HasRLData
Gets whether this data is suitable for RL methods.
public bool HasRLData { get; }
Property Value
HasRankingData
Gets whether this data is suitable for ranking methods.
public bool HasRankingData { get; }
Property Value
HasSFTData
Gets whether this data is suitable for SFT.
public bool HasSFTData { get; }
Property Value
HasUnpairedPreferenceData
Gets whether this data is suitable for KTO (unpaired preferences).
public bool HasUnpairedPreferenceData { get; }
Property Value
Inputs
Gets or sets the input data samples.
public TInput[] Inputs { get; set; }
Property Value
- TInput[]
Remarks
Required for all fine-tuning methods.
Outputs
Gets or sets the target outputs for supervised fine-tuning.
public TOutput[] Outputs { get; set; }
Property Value
- TOutput[]
Remarks
Used by SFT methods. Each output corresponds to an input.
RankedOutputs
Gets or sets ranked outputs for ranking-based methods.
public TOutput[][] RankedOutputs { get; set; }
Property Value
- TOutput[][]
Remarks
Used by RSO, RRHF, SLiC-HF, PRO. Each inner array contains outputs ranked from best to worst.
RejectedOutputs
Gets or sets the rejected outputs for preference learning.
public TOutput[] RejectedOutputs { get; set; }
Property Value
- TOutput[]
Remarks
Used by DPO, IPO, SimPO, CPO, and other pairwise preference methods.
Rewards
Gets or sets reward values for RL-based methods.
public double[] Rewards { get; set; }
Property Value
- double[]
Remarks
Used by RLHF, PPO, GRPO, REINFORCE.
SampleIds
Gets or sets optional sample identifiers for tracking.
public string[] SampleIds { get; set; }
Property Value
- string[]
SampleWeights
Gets or sets optional sample weights for weighted training.
public double[] SampleWeights { get; set; }
Property Value
- double[]
TeacherConfidences
Gets or sets teacher model confidence scores.
public double[] TeacherConfidences { get; set; }
Property Value
- double[]
TeacherOutputs
Gets or sets teacher model logits/outputs for distillation.
public TOutput[] TeacherOutputs { get; set; }
Property Value
- TOutput[]
Values
Gets or sets value estimates for critic-based methods.
public double[] Values { get; set; }
Property Value
- double[]
Methods
Split(double, int?)
Splits the data into training and validation sets.
public (FineTuningData<T, TInput, TOutput> Train, FineTuningData<T, TInput, TOutput> Validation) Split(double validationRatio = 0.1, int? seed = null)
Parameters
validationRatiodoubleThe ratio of data to use for validation (0.0 to 1.0).
seedint?Optional random seed for reproducibility.
Returns
- (FineTuningData<T, TInput, TOutput> Train, FineTuningData<T, TInput, TOutput> Validation)
A tuple of (training data, validation data).
Subset(int[])
Creates a subset of the data for the given indices.
public FineTuningData<T, TInput, TOutput> Subset(int[] indices)
Parameters
indicesint[]The indices to include in the subset.
Returns
- FineTuningData<T, TInput, TOutput>
A new FineTuningData containing only the specified samples.