Class TrainingPipelineConfiguration<T, TInput, TOutput>
Configuration for a multi-step training pipeline with customizable stages.
public class TrainingPipelineConfiguration<T, TInput, TOutput>
Type Parameters
TThe numeric data type used for calculations.
TInputThe input data type for the model.
TOutputThe output data type for the model.
- Inheritance
-
TrainingPipelineConfiguration<T, TInput, TOutput>
- Inherited Members
Remarks
A training pipeline defines a sequence of training stages that are executed in order. Each stage can have its own training method, optimizer, learning rate, dataset, and evaluation criteria. This enables advanced training workflows like:
- InstructGPT: SFT → Reward Model → PPO
- Llama 2: SFT → Rejection Sampling → DPO
- Anthropic Claude: SFT → Constitutional AI → RLHF
- DeepSeek: SFT → GRPO
- Curriculum: Easy → Medium → Hard stages
For Beginners: Think of this as a recipe with multiple cooking steps. Just like you might marinate, then sear, then bake - training can have multiple phases where each phase teaches the model something different.
Properties
CheckpointDirectory
Gets or sets the directory for intermediate checkpoints.
public string CheckpointDirectory { get; set; }
Property Value
Remarks
Default is "./checkpoints". Relative to working directory.
DefaultBatchSize
Gets or sets the default batch size for all stages.
public int DefaultBatchSize { get; set; }
Property Value
Remarks
Default is 8. Adjust based on available GPU memory.
DefaultLearningRate
Gets or sets the default learning rate for all stages.
public double DefaultLearningRate { get; set; }
Property Value
Remarks
Default is 2e-5, standard for fine-tuning pre-trained models.
DefaultOptimizer
Gets or sets the default optimizer type for all stages.
public OptimizerType DefaultOptimizer { get; set; }
Property Value
Remarks
Default is AdamW. Individual stages can override this with OptimizerOverride.
Description
Gets or sets the description of this pipeline.
public string? Description { get; set; }
Property Value
EnableAutoSelection
Gets or sets whether to use automatic pipeline selection when no stages are defined.
public bool EnableAutoSelection { get; set; }
Property Value
Remarks
When true and Stages is empty, the system analyzes available data and automatically constructs an appropriate training pipeline.
EnableExperimentTracking
Gets or sets whether to log to WandB or similar experiment trackers.
public bool EnableExperimentTracking { get; set; }
Property Value
Remarks
Default is false. Enable to use experiment tracking services.
EvaluateAfterEachStage
Gets or sets whether to run evaluation after each stage.
public bool EvaluateAfterEachStage { get; set; }
Property Value
Remarks
Default is true. Helps track progress between stages.
EvaluationMetrics
Gets or sets the evaluation metrics to track.
public string[] EvaluationMetrics { get; set; }
Property Value
- string[]
Remarks
Default includes loss and perplexity.
ExperimentName
Gets or sets the experiment name for tracking.
public string ExperimentName { get; set; }
Property Value
Remarks
Default is empty (auto-generated from pipeline name).
GlobalEarlyStopping
Gets or sets the global early stopping configuration applied across stages.
public GlobalEarlyStoppingConfig GlobalEarlyStopping { get; set; }
Property Value
Remarks
Default configuration with patience=5 and monitoring loss.
GlobalEvaluationData
Gets or sets the evaluation data to use across all stages.
public FineTuningData<T, TInput, TOutput>? GlobalEvaluationData { get; set; }
Property Value
- FineTuningData<T, TInput, TOutput>
Remarks
Default is null. Set this to use the same validation data across all stages.
GlobalSeed
Gets or sets the global random seed for reproducibility across all stages.
public int GlobalSeed { get; set; }
Property Value
Remarks
Default is 42. Use for reproducible training runs.
InterStageCallbacks
Gets or sets callback actions to execute between stages.
public List<Action<TrainingStageResult<T, TInput, TOutput>>>? InterStageCallbacks { get; set; }
Property Value
- List<Action<TrainingStageResult<T, TInput, TOutput>>>
LogDirectory
Gets or sets the logging directory.
public string LogDirectory { get; set; }
Property Value
Remarks
Default is "./logs". Relative to working directory.
MaxCheckpointsToKeep
Gets or sets the maximum number of checkpoints to keep.
public int MaxCheckpointsToKeep { get; set; }
Property Value
Remarks
Default is 3. Older checkpoints are deleted to save disk space.
Metadata
Gets or sets custom metadata for the pipeline.
public Dictionary<string, object> Metadata { get; set; }
Property Value
Remarks
Empty by default. Use to store custom key-value pairs.
MixedPrecisionDType
Gets or sets the mixed precision data type.
public MixedPrecisionType MixedPrecisionDType { get; set; }
Property Value
Remarks
Default is FP16 for broad GPU compatibility. Use BF16 on Ampere+ GPUs.
Name
Gets or sets the name of this pipeline for identification.
public string Name { get; set; }
Property Value
OnPipelineComplete
Gets or sets callback actions to execute when the pipeline completes.
public Action<List<TrainingStageResult<T, TInput, TOutput>>>? OnPipelineComplete { get; set; }
Property Value
- Action<List<TrainingStageResult<T, TInput, TOutput>>>
OnPipelineError
Gets or sets callback actions to execute on pipeline failure.
public Action<Exception, TrainingStageResult<T, TInput, TOutput>?>? OnPipelineError { get; set; }
Property Value
- Action<Exception, TrainingStageResult<T, TInput, TOutput>>
OnPipelineStart
Gets or sets callback actions to execute before the pipeline starts.
public Action<TrainingPipelineConfiguration<T, TInput, TOutput>>? OnPipelineStart { get; set; }
Property Value
- Action<TrainingPipelineConfiguration<T, TInput, TOutput>>
ResumeCheckpointPath
Gets or sets the specific checkpoint path to resume from.
public string ResumeCheckpointPath { get; set; }
Property Value
Remarks
Default is empty (use latest if ResumeFromCheckpoint is true).
ResumeFromCheckpoint
Gets or sets whether to resume from the latest checkpoint.
public bool ResumeFromCheckpoint { get; set; }
Property Value
Remarks
Default is false. Set to true to continue interrupted training.
SaveIntermediateCheckpoints
Gets or sets whether to save checkpoints between stages.
public bool SaveIntermediateCheckpoints { get; set; }
Property Value
Remarks
Default is true. Enables recovery from failures.
Stages
Gets or sets the ordered list of training stages in the pipeline.
public List<TrainingStage<T, TInput, TOutput>> Stages { get; set; }
Property Value
- List<TrainingStage<T, TInput, TOutput>>
Remarks
Stages are executed sequentially. The output model from each stage becomes the input model for the next stage.
Tags
Gets or sets tags for categorizing the pipeline.
public string[] Tags { get; set; }
Property Value
- string[]
Remarks
Empty by default. Useful for filtering and organizing pipelines.
UseMixedPrecision
Gets or sets whether to use mixed precision training globally.
public bool UseMixedPrecision { get; set; }
Property Value
Remarks
Default is false. Enable for faster training with lower memory.
VerboseLogging
Gets or sets whether to enable verbose logging.
public bool VerboseLogging { get; set; }
Property Value
Remarks
Default is false. Enable for detailed debug output.
Methods
AddAdapterMergingStage(Action<TrainingStage<T, TInput, TOutput>>?)
Adds an adapter merging stage.
public TrainingPipelineConfiguration<T, TInput, TOutput> AddAdapterMergingStage(Action<TrainingStage<T, TInput, TOutput>>? configure = null)
Parameters
configureAction<TrainingStage<T, TInput, TOutput>>Optional configuration action for the stage.
Returns
- TrainingPipelineConfiguration<T, TInput, TOutput>
This configuration for method chaining.
AddAgenticStage(Action<TrainingStage<T, TInput, TOutput>>?)
Adds an agentic behavior training stage.
public TrainingPipelineConfiguration<T, TInput, TOutput> AddAgenticStage(Action<TrainingStage<T, TInput, TOutput>>? configure = null)
Parameters
configureAction<TrainingStage<T, TInput, TOutput>>Optional configuration action for the stage.
Returns
- TrainingPipelineConfiguration<T, TInput, TOutput>
This configuration for method chaining.
AddCPOStage(Action<TrainingStage<T, TInput, TOutput>>?)
Adds a Contrastive Preference Optimization (CPO) stage.
public TrainingPipelineConfiguration<T, TInput, TOutput> AddCPOStage(Action<TrainingStage<T, TInput, TOutput>>? configure = null)
Parameters
configureAction<TrainingStage<T, TInput, TOutput>>Optional configuration action for the stage.
Returns
- TrainingPipelineConfiguration<T, TInput, TOutput>
This configuration for method chaining.
AddChainOfThoughtStage(Action<TrainingStage<T, TInput, TOutput>>?)
Adds a chain-of-thought training stage.
public TrainingPipelineConfiguration<T, TInput, TOutput> AddChainOfThoughtStage(Action<TrainingStage<T, TInput, TOutput>>? configure = null)
Parameters
configureAction<TrainingStage<T, TInput, TOutput>>Optional configuration action for the stage.
Returns
- TrainingPipelineConfiguration<T, TInput, TOutput>
This configuration for method chaining.
AddCheckpointStage(Action<TrainingStage<T, TInput, TOutput>>?)
Adds a checkpoint stage.
public TrainingPipelineConfiguration<T, TInput, TOutput> AddCheckpointStage(Action<TrainingStage<T, TInput, TOutput>>? configure = null)
Parameters
configureAction<TrainingStage<T, TInput, TOutput>>Optional configuration action for the stage.
Returns
- TrainingPipelineConfiguration<T, TInput, TOutput>
This configuration for method chaining.
AddCodeFineTuningStage(Action<TrainingStage<T, TInput, TOutput>>?)
Adds a code fine-tuning stage.
public TrainingPipelineConfiguration<T, TInput, TOutput> AddCodeFineTuningStage(Action<TrainingStage<T, TInput, TOutput>>? configure = null)
Parameters
configureAction<TrainingStage<T, TInput, TOutput>>Optional configuration action for the stage.
Returns
- TrainingPipelineConfiguration<T, TInput, TOutput>
This configuration for method chaining.
AddConstitutionalAIStage(string[]?, Action<TrainingStage<T, TInput, TOutput>>?)
Adds a Constitutional AI stage.
public TrainingPipelineConfiguration<T, TInput, TOutput> AddConstitutionalAIStage(string[]? principles = null, Action<TrainingStage<T, TInput, TOutput>>? configure = null)
Parameters
principlesstring[]The constitutional principles to use.
configureAction<TrainingStage<T, TInput, TOutput>>Optional configuration action for the stage.
Returns
- TrainingPipelineConfiguration<T, TInput, TOutput>
This configuration for method chaining.
AddCustomStage(string, Func<IFullModel<T, TInput, TOutput>, FineTuningData<T, TInput, TOutput>, CancellationToken, Task<IFullModel<T, TInput, TOutput>>>, Action<TrainingStage<T, TInput, TOutput>>?)
Adds a custom training stage with user-defined logic.
public TrainingPipelineConfiguration<T, TInput, TOutput> AddCustomStage(string name, Func<IFullModel<T, TInput, TOutput>, FineTuningData<T, TInput, TOutput>, CancellationToken, Task<IFullModel<T, TInput, TOutput>>> trainFunc, Action<TrainingStage<T, TInput, TOutput>>? configure = null)
Parameters
namestringName of the custom stage.
trainFuncFunc<IFullModel<T, TInput, TOutput>, FineTuningData<T, TInput, TOutput>, CancellationToken, Task<IFullModel<T, TInput, TOutput>>>The custom training function.
configureAction<TrainingStage<T, TInput, TOutput>>Optional configuration action for the stage.
Returns
- TrainingPipelineConfiguration<T, TInput, TOutput>
This configuration for method chaining.
AddDPOStage(Action<TrainingStage<T, TInput, TOutput>>?)
Adds a Direct Preference Optimization (DPO) stage.
public TrainingPipelineConfiguration<T, TInput, TOutput> AddDPOStage(Action<TrainingStage<T, TInput, TOutput>>? configure = null)
Parameters
configureAction<TrainingStage<T, TInput, TOutput>>Optional configuration action for the stage.
Returns
- TrainingPipelineConfiguration<T, TInput, TOutput>
This configuration for method chaining.
AddDistillationStage(IFullModel<T, TInput, TOutput>?, Action<TrainingStage<T, TInput, TOutput>>?)
Adds a knowledge distillation stage.
public TrainingPipelineConfiguration<T, TInput, TOutput> AddDistillationStage(IFullModel<T, TInput, TOutput>? teacherModel = null, Action<TrainingStage<T, TInput, TOutput>>? configure = null)
Parameters
teacherModelIFullModel<T, TInput, TOutput>The teacher model to distill from.
configureAction<TrainingStage<T, TInput, TOutput>>Optional configuration action for the stage.
Returns
- TrainingPipelineConfiguration<T, TInput, TOutput>
This configuration for method chaining.
AddEvaluationStage(Action<TrainingStage<T, TInput, TOutput>>?)
Adds an evaluation-only stage (no training, just metrics).
public TrainingPipelineConfiguration<T, TInput, TOutput> AddEvaluationStage(Action<TrainingStage<T, TInput, TOutput>>? configure = null)
Parameters
configureAction<TrainingStage<T, TInput, TOutput>>Optional configuration action for the stage.
Returns
- TrainingPipelineConfiguration<T, TInput, TOutput>
This configuration for method chaining.
AddGRPOStage(Action<TrainingStage<T, TInput, TOutput>>?)
Adds a GRPO (Group Relative Policy Optimization) stage.
public TrainingPipelineConfiguration<T, TInput, TOutput> AddGRPOStage(Action<TrainingStage<T, TInput, TOutput>>? configure = null)
Parameters
configureAction<TrainingStage<T, TInput, TOutput>>Optional configuration action for the stage.
Returns
- TrainingPipelineConfiguration<T, TInput, TOutput>
This configuration for method chaining.
AddHarmlessnessStage(Action<TrainingStage<T, TInput, TOutput>>?)
Adds a harmlessness training stage.
public TrainingPipelineConfiguration<T, TInput, TOutput> AddHarmlessnessStage(Action<TrainingStage<T, TInput, TOutput>>? configure = null)
Parameters
configureAction<TrainingStage<T, TInput, TOutput>>Optional configuration action for the stage.
Returns
- TrainingPipelineConfiguration<T, TInput, TOutput>
This configuration for method chaining.
AddHelpfulnessStage(Action<TrainingStage<T, TInput, TOutput>>?)
Adds a helpfulness training stage.
public TrainingPipelineConfiguration<T, TInput, TOutput> AddHelpfulnessStage(Action<TrainingStage<T, TInput, TOutput>>? configure = null)
Parameters
configureAction<TrainingStage<T, TInput, TOutput>>Optional configuration action for the stage.
Returns
- TrainingPipelineConfiguration<T, TInput, TOutput>
This configuration for method chaining.
AddIPOStage(Action<TrainingStage<T, TInput, TOutput>>?)
Adds an Identity Preference Optimization (IPO) stage.
public TrainingPipelineConfiguration<T, TInput, TOutput> AddIPOStage(Action<TrainingStage<T, TInput, TOutput>>? configure = null)
Parameters
configureAction<TrainingStage<T, TInput, TOutput>>Optional configuration action for the stage.
Returns
- TrainingPipelineConfiguration<T, TInput, TOutput>
This configuration for method chaining.
AddInstructionTuningStage(Action<TrainingStage<T, TInput, TOutput>>?)
Adds an instruction tuning stage (specialized SFT).
public TrainingPipelineConfiguration<T, TInput, TOutput> AddInstructionTuningStage(Action<TrainingStage<T, TInput, TOutput>>? configure = null)
Parameters
configureAction<TrainingStage<T, TInput, TOutput>>Optional configuration action for the stage.
Returns
- TrainingPipelineConfiguration<T, TInput, TOutput>
This configuration for method chaining.
AddKTOStage(Action<TrainingStage<T, TInput, TOutput>>?)
Adds a Kahneman-Tversky Optimization (KTO) stage.
public TrainingPipelineConfiguration<T, TInput, TOutput> AddKTOStage(Action<TrainingStage<T, TInput, TOutput>>? configure = null)
Parameters
configureAction<TrainingStage<T, TInput, TOutput>>Optional configuration action for the stage.
Returns
- TrainingPipelineConfiguration<T, TInput, TOutput>
This configuration for method chaining.
AddLoRAStage(int, Action<TrainingStage<T, TInput, TOutput>>?)
Adds a LoRA adapter training stage.
public TrainingPipelineConfiguration<T, TInput, TOutput> AddLoRAStage(int rank = 16, Action<TrainingStage<T, TInput, TOutput>>? configure = null)
Parameters
rankintLoRA rank (dimension of low-rank matrices).
configureAction<TrainingStage<T, TInput, TOutput>>Optional configuration action for the stage.
Returns
- TrainingPipelineConfiguration<T, TInput, TOutput>
This configuration for method chaining.
AddMathReasoningStage(Action<TrainingStage<T, TInput, TOutput>>?)
Adds a math reasoning fine-tuning stage.
public TrainingPipelineConfiguration<T, TInput, TOutput> AddMathReasoningStage(Action<TrainingStage<T, TInput, TOutput>>? configure = null)
Parameters
configureAction<TrainingStage<T, TInput, TOutput>>Optional configuration action for the stage.
Returns
- TrainingPipelineConfiguration<T, TInput, TOutput>
This configuration for method chaining.
AddMultiTurnConversationStage(Action<TrainingStage<T, TInput, TOutput>>?)
Adds a multi-turn conversation training stage.
public TrainingPipelineConfiguration<T, TInput, TOutput> AddMultiTurnConversationStage(Action<TrainingStage<T, TInput, TOutput>>? configure = null)
Parameters
configureAction<TrainingStage<T, TInput, TOutput>>Optional configuration action for the stage.
Returns
- TrainingPipelineConfiguration<T, TInput, TOutput>
This configuration for method chaining.
AddORPOStage(Action<TrainingStage<T, TInput, TOutput>>?)
Adds an Odds Ratio Preference Optimization (ORPO) stage.
public TrainingPipelineConfiguration<T, TInput, TOutput> AddORPOStage(Action<TrainingStage<T, TInput, TOutput>>? configure = null)
Parameters
configureAction<TrainingStage<T, TInput, TOutput>>Optional configuration action for the stage.
Returns
- TrainingPipelineConfiguration<T, TInput, TOutput>
This configuration for method chaining.
AddPPOStage(Action<TrainingStage<T, TInput, TOutput>>?)
Adds a PPO stage.
public TrainingPipelineConfiguration<T, TInput, TOutput> AddPPOStage(Action<TrainingStage<T, TInput, TOutput>>? configure = null)
Parameters
configureAction<TrainingStage<T, TInput, TOutput>>Optional configuration action for the stage.
Returns
- TrainingPipelineConfiguration<T, TInput, TOutput>
This configuration for method chaining.
AddPROStage(Action<TrainingStage<T, TInput, TOutput>>?)
Adds a Preference Ranking Optimization (PRO) stage.
public TrainingPipelineConfiguration<T, TInput, TOutput> AddPROStage(Action<TrainingStage<T, TInput, TOutput>>? configure = null)
Parameters
configureAction<TrainingStage<T, TInput, TOutput>>Optional configuration action for the stage.
Returns
- TrainingPipelineConfiguration<T, TInput, TOutput>
This configuration for method chaining.
AddPreferenceStage(FineTuningMethodType, Action<TrainingStage<T, TInput, TOutput>>?)
Adds a generic preference optimization stage with configurable method.
public TrainingPipelineConfiguration<T, TInput, TOutput> AddPreferenceStage(FineTuningMethodType method = FineTuningMethodType.DPO, Action<TrainingStage<T, TInput, TOutput>>? configure = null)
Parameters
methodFineTuningMethodTypeThe preference optimization method to use.
configureAction<TrainingStage<T, TInput, TOutput>>Optional configuration action for the stage.
Returns
- TrainingPipelineConfiguration<T, TInput, TOutput>
This configuration for method chaining.
AddProcessRewardModelStage(Action<TrainingStage<T, TInput, TOutput>>?)
Adds a process reward model (PRM) training stage.
public TrainingPipelineConfiguration<T, TInput, TOutput> AddProcessRewardModelStage(Action<TrainingStage<T, TInput, TOutput>>? configure = null)
Parameters
configureAction<TrainingStage<T, TInput, TOutput>>Optional configuration action for the stage.
Returns
- TrainingPipelineConfiguration<T, TInput, TOutput>
This configuration for method chaining.
AddQLoRAStage(int, int, Action<TrainingStage<T, TInput, TOutput>>?)
Adds a QLoRA (quantized LoRA) stage.
public TrainingPipelineConfiguration<T, TInput, TOutput> AddQLoRAStage(int rank = 16, int bits = 4, Action<TrainingStage<T, TInput, TOutput>>? configure = null)
Parameters
rankintLoRA rank.
bitsintQuantization bits (4 or 8).
configureAction<TrainingStage<T, TInput, TOutput>>Optional configuration action for the stage.
Returns
- TrainingPipelineConfiguration<T, TInput, TOutput>
This configuration for method chaining.
AddRLAIFStage(Action<TrainingStage<T, TInput, TOutput>>?)
Adds an RLAIF (RL from AI Feedback) stage.
public TrainingPipelineConfiguration<T, TInput, TOutput> AddRLAIFStage(Action<TrainingStage<T, TInput, TOutput>>? configure = null)
Parameters
configureAction<TrainingStage<T, TInput, TOutput>>Optional configuration action for the stage.
Returns
- TrainingPipelineConfiguration<T, TInput, TOutput>
This configuration for method chaining.
AddRLHFStage(Action<TrainingStage<T, TInput, TOutput>>?)
Adds an RLHF (PPO-based) stage.
public TrainingPipelineConfiguration<T, TInput, TOutput> AddRLHFStage(Action<TrainingStage<T, TInput, TOutput>>? configure = null)
Parameters
configureAction<TrainingStage<T, TInput, TOutput>>Optional configuration action for the stage.
Returns
- TrainingPipelineConfiguration<T, TInput, TOutput>
This configuration for method chaining.
AddRLStage(FineTuningMethodType, Action<TrainingStage<T, TInput, TOutput>>?)
Adds a generic reinforcement learning stage.
public TrainingPipelineConfiguration<T, TInput, TOutput> AddRLStage(FineTuningMethodType method = FineTuningMethodType.RLHF, Action<TrainingStage<T, TInput, TOutput>>? configure = null)
Parameters
methodFineTuningMethodTypeThe RL method to use.
configureAction<TrainingStage<T, TInput, TOutput>>Optional configuration action for the stage.
Returns
- TrainingPipelineConfiguration<T, TInput, TOutput>
This configuration for method chaining.
AddRRHFStage(Action<TrainingStage<T, TInput, TOutput>>?)
Adds a Rank Responses to align Human Feedback (RRHF) stage.
public TrainingPipelineConfiguration<T, TInput, TOutput> AddRRHFStage(Action<TrainingStage<T, TInput, TOutput>>? configure = null)
Parameters
configureAction<TrainingStage<T, TInput, TOutput>>Optional configuration action for the stage.
Returns
- TrainingPipelineConfiguration<T, TInput, TOutput>
This configuration for method chaining.
AddRejectionSamplingStage(Action<TrainingStage<T, TInput, TOutput>>?)
Adds a Rejection Sampling Optimization (RSO) stage.
public TrainingPipelineConfiguration<T, TInput, TOutput> AddRejectionSamplingStage(Action<TrainingStage<T, TInput, TOutput>>? configure = null)
Parameters
configureAction<TrainingStage<T, TInput, TOutput>>Optional configuration action for the stage.
Returns
- TrainingPipelineConfiguration<T, TInput, TOutput>
This configuration for method chaining.
AddRewardModelStage(Action<TrainingStage<T, TInput, TOutput>>?)
Adds a reward model training stage.
public TrainingPipelineConfiguration<T, TInput, TOutput> AddRewardModelStage(Action<TrainingStage<T, TInput, TOutput>>? configure = null)
Parameters
configureAction<TrainingStage<T, TInput, TOutput>>Optional configuration action for the stage.
Returns
- TrainingPipelineConfiguration<T, TInput, TOutput>
This configuration for method chaining.
AddRobustDPOStage(Action<TrainingStage<T, TInput, TOutput>>?)
Adds a Robust DPO (R-DPO) stage.
public TrainingPipelineConfiguration<T, TInput, TOutput> AddRobustDPOStage(Action<TrainingStage<T, TInput, TOutput>>? configure = null)
Parameters
configureAction<TrainingStage<T, TInput, TOutput>>Optional configuration action for the stage.
Returns
- TrainingPipelineConfiguration<T, TInput, TOutput>
This configuration for method chaining.
AddSFTStage(Action<TrainingStage<T, TInput, TOutput>>?)
Adds a supervised fine-tuning (SFT) stage.
public TrainingPipelineConfiguration<T, TInput, TOutput> AddSFTStage(Action<TrainingStage<T, TInput, TOutput>>? configure = null)
Parameters
configureAction<TrainingStage<T, TInput, TOutput>>Optional configuration action for the stage.
Returns
- TrainingPipelineConfiguration<T, TInput, TOutput>
This configuration for method chaining.
AddSLiCStage(Action<TrainingStage<T, TInput, TOutput>>?)
Adds a Sequence Likelihood Calibration (SLiC-HF) stage.
public TrainingPipelineConfiguration<T, TInput, TOutput> AddSLiCStage(Action<TrainingStage<T, TInput, TOutput>>? configure = null)
Parameters
configureAction<TrainingStage<T, TInput, TOutput>>Optional configuration action for the stage.
Returns
- TrainingPipelineConfiguration<T, TInput, TOutput>
This configuration for method chaining.
AddSPINStage(Action<TrainingStage<T, TInput, TOutput>>?)
Adds a Self-Play Fine-Tuning (SPIN) stage.
public TrainingPipelineConfiguration<T, TInput, TOutput> AddSPINStage(Action<TrainingStage<T, TInput, TOutput>>? configure = null)
Parameters
configureAction<TrainingStage<T, TInput, TOutput>>Optional configuration action for the stage.
Returns
- TrainingPipelineConfiguration<T, TInput, TOutput>
This configuration for method chaining.
AddSafetyAlignmentStage(Action<TrainingStage<T, TInput, TOutput>>?)
Adds a safety alignment stage.
public TrainingPipelineConfiguration<T, TInput, TOutput> AddSafetyAlignmentStage(Action<TrainingStage<T, TInput, TOutput>>? configure = null)
Parameters
configureAction<TrainingStage<T, TInput, TOutput>>Optional configuration action for the stage.
Returns
- TrainingPipelineConfiguration<T, TInput, TOutput>
This configuration for method chaining.
AddSelfRewardingStage(Action<TrainingStage<T, TInput, TOutput>>?)
Adds a self-rewarding stage.
public TrainingPipelineConfiguration<T, TInput, TOutput> AddSelfRewardingStage(Action<TrainingStage<T, TInput, TOutput>>? configure = null)
Parameters
configureAction<TrainingStage<T, TInput, TOutput>>Optional configuration action for the stage.
Returns
- TrainingPipelineConfiguration<T, TInput, TOutput>
This configuration for method chaining.
AddSimPOStage(Action<TrainingStage<T, TInput, TOutput>>?)
Adds a Simple Preference Optimization (SimPO) stage.
public TrainingPipelineConfiguration<T, TInput, TOutput> AddSimPOStage(Action<TrainingStage<T, TInput, TOutput>>? configure = null)
Parameters
configureAction<TrainingStage<T, TInput, TOutput>>Optional configuration action for the stage.
Returns
- TrainingPipelineConfiguration<T, TInput, TOutput>
This configuration for method chaining.
AddStage(TrainingStage<T, TInput, TOutput>)
Adds a training stage to the pipeline.
public TrainingPipelineConfiguration<T, TInput, TOutput> AddStage(TrainingStage<T, TInput, TOutput> stage)
Parameters
stageTrainingStage<T, TInput, TOutput>The stage to add.
Returns
- TrainingPipelineConfiguration<T, TInput, TOutput>
This configuration for method chaining.
AddSyntheticDataStage(IFullModel<T, TInput, TOutput>?, Action<TrainingStage<T, TInput, TOutput>>?)
Adds a synthetic data training stage.
public TrainingPipelineConfiguration<T, TInput, TOutput> AddSyntheticDataStage(IFullModel<T, TInput, TOutput>? teacherModel = null, Action<TrainingStage<T, TInput, TOutput>>? configure = null)
Parameters
teacherModelIFullModel<T, TInput, TOutput>The teacher model to generate synthetic data.
configureAction<TrainingStage<T, TInput, TOutput>>Optional configuration action for the stage.
Returns
- TrainingPipelineConfiguration<T, TInput, TOutput>
This configuration for method chaining.
AddToolUseStage(Action<TrainingStage<T, TInput, TOutput>>?)
Adds a tool use training stage.
public TrainingPipelineConfiguration<T, TInput, TOutput> AddToolUseStage(Action<TrainingStage<T, TInput, TOutput>>? configure = null)
Parameters
configureAction<TrainingStage<T, TInput, TOutput>>Optional configuration action for the stage.
Returns
- TrainingPipelineConfiguration<T, TInput, TOutput>
This configuration for method chaining.
AgentTraining(FineTuningData<T, TInput, TOutput>?, FineTuningData<T, TInput, TOutput>?, FineTuningData<T, TInput, TOutput>?)
Creates an agent/tool-use training pipeline.
public static TrainingPipelineConfiguration<T, TInput, TOutput> AgentTraining(FineTuningData<T, TInput, TOutput>? sftData = null, FineTuningData<T, TInput, TOutput>? toolData = null, FineTuningData<T, TInput, TOutput>? agenticData = null)
Parameters
sftDataFineTuningData<T, TInput, TOutput>toolDataFineTuningData<T, TInput, TOutput>agenticDataFineTuningData<T, TInput, TOutput>
Returns
- TrainingPipelineConfiguration<T, TInput, TOutput>
AnthropicClaude(FineTuningData<T, TInput, TOutput>?, string[]?)
Creates an Anthropic Claude-style pipeline (SFT → Constitutional AI → RLHF).
public static TrainingPipelineConfiguration<T, TInput, TOutput> AnthropicClaude(FineTuningData<T, TInput, TOutput>? sftData = null, string[]? principles = null)
Parameters
sftDataFineTuningData<T, TInput, TOutput>principlesstring[]
Returns
- TrainingPipelineConfiguration<T, TInput, TOutput>
Remarks
Anthropic's Constitutional AI approach for Claude models.
Auto(FineTuningData<T, TInput, TOutput>)
Automatically selects an appropriate pipeline based on available data.
public static TrainingPipelineConfiguration<T, TInput, TOutput> Auto(FineTuningData<T, TInput, TOutput> availableData)
Parameters
availableDataFineTuningData<T, TInput, TOutput>
Returns
- TrainingPipelineConfiguration<T, TInput, TOutput>
CodeModel(FineTuningData<T, TInput, TOutput>?, FineTuningData<T, TInput, TOutput>?)
Creates a code model training pipeline.
public static TrainingPipelineConfiguration<T, TInput, TOutput> CodeModel(FineTuningData<T, TInput, TOutput>? sftData = null, FineTuningData<T, TInput, TOutput>? codeExecutionData = null)
Parameters
sftDataFineTuningData<T, TInput, TOutput>codeExecutionDataFineTuningData<T, TInput, TOutput>
Returns
- TrainingPipelineConfiguration<T, TInput, TOutput>
ConstitutionalAI(FineTuningData<T, TInput, TOutput>?, string[]?)
Creates a Constitutional AI pipeline (SFT → CAI critique/revision → preference).
public static TrainingPipelineConfiguration<T, TInput, TOutput> ConstitutionalAI(FineTuningData<T, TInput, TOutput>? sftData = null, string[]? principles = null)
Parameters
sftDataFineTuningData<T, TInput, TOutput>principlesstring[]
Returns
- TrainingPipelineConfiguration<T, TInput, TOutput>
CurriculumLearning(params (string Name, FineTuningData<T, TInput, TOutput> Data)[])
Creates a curriculum learning pipeline with progressively harder stages.
public static TrainingPipelineConfiguration<T, TInput, TOutput> CurriculumLearning(params (string Name, FineTuningData<T, TInput, TOutput> Data)[] curriculumStages)
Parameters
curriculumStages(string Name, FineTuningData<T, TInput, TOutput> Data)[]
Returns
- TrainingPipelineConfiguration<T, TInput, TOutput>
DeepSeek(FineTuningData<T, TInput, TOutput>?, FineTuningData<T, TInput, TOutput>?)
Creates a DeepSeek-style pipeline (SFT → GRPO).
public static TrainingPipelineConfiguration<T, TInput, TOutput> DeepSeek(FineTuningData<T, TInput, TOutput>? sftData = null, FineTuningData<T, TInput, TOutput>? grpoData = null)
Parameters
sftDataFineTuningData<T, TInput, TOutput>grpoDataFineTuningData<T, TInput, TOutput>
Returns
- TrainingPipelineConfiguration<T, TInput, TOutput>
Remarks
DeepSeek's efficient training approach using GRPO instead of PPO.
FullRLHF(FineTuningData<T, TInput, TOutput>?, FineTuningData<T, TInput, TOutput>?)
Creates a full RLHF pipeline (SFT → Reward Model → PPO).
public static TrainingPipelineConfiguration<T, TInput, TOutput> FullRLHF(FineTuningData<T, TInput, TOutput>? sftData = null, FineTuningData<T, TInput, TOutput>? rlData = null)
Parameters
sftDataFineTuningData<T, TInput, TOutput>rlDataFineTuningData<T, TInput, TOutput>
Returns
- TrainingPipelineConfiguration<T, TInput, TOutput>
InstructGPT(FineTuningData<T, TInput, TOutput>?, FineTuningData<T, TInput, TOutput>?, FineTuningData<T, TInput, TOutput>?)
Creates an OpenAI InstructGPT-style pipeline (SFT → Reward Model → PPO).
public static TrainingPipelineConfiguration<T, TInput, TOutput> InstructGPT(FineTuningData<T, TInput, TOutput>? sftData = null, FineTuningData<T, TInput, TOutput>? preferenceData = null, FineTuningData<T, TInput, TOutput>? rlData = null)
Parameters
sftDataFineTuningData<T, TInput, TOutput>preferenceDataFineTuningData<T, TInput, TOutput>rlDataFineTuningData<T, TInput, TOutput>
Returns
- TrainingPipelineConfiguration<T, TInput, TOutput>
Remarks
The original ChatGPT training pipeline from the InstructGPT paper.
IterativeRefinement(int, FineTuningData<T, TInput, TOutput>?, FineTuningData<T, TInput, TOutput>?)
Creates an iterative refinement pipeline that runs multiple DPO rounds.
public static TrainingPipelineConfiguration<T, TInput, TOutput> IterativeRefinement(int iterations, FineTuningData<T, TInput, TOutput>? sftData = null, FineTuningData<T, TInput, TOutput>? preferenceData = null)
Parameters
iterationsintsftDataFineTuningData<T, TInput, TOutput>preferenceDataFineTuningData<T, TInput, TOutput>
Returns
- TrainingPipelineConfiguration<T, TInput, TOutput>
IterativeSPIN(int, FineTuningData<T, TInput, TOutput>?)
Creates an iterative SPIN pipeline.
public static TrainingPipelineConfiguration<T, TInput, TOutput> IterativeSPIN(int iterations = 3, FineTuningData<T, TInput, TOutput>? sftData = null)
Parameters
iterationsintsftDataFineTuningData<T, TInput, TOutput>
Returns
- TrainingPipelineConfiguration<T, TInput, TOutput>
KnowledgeDistillation(IFullModel<T, TInput, TOutput>?, FineTuningData<T, TInput, TOutput>?)
Creates a distillation pipeline (teacher → student).
public static TrainingPipelineConfiguration<T, TInput, TOutput> KnowledgeDistillation(IFullModel<T, TInput, TOutput>? teacherModel = null, FineTuningData<T, TInput, TOutput>? data = null)
Parameters
teacherModelIFullModel<T, TInput, TOutput>dataFineTuningData<T, TInput, TOutput>
Returns
- TrainingPipelineConfiguration<T, TInput, TOutput>
Llama2(FineTuningData<T, TInput, TOutput>?, FineTuningData<T, TInput, TOutput>?)
Creates a Meta Llama 2-style pipeline (SFT → Rejection Sampling → DPO).
public static TrainingPipelineConfiguration<T, TInput, TOutput> Llama2(FineTuningData<T, TInput, TOutput>? sftData = null, FineTuningData<T, TInput, TOutput>? preferenceData = null)
Parameters
sftDataFineTuningData<T, TInput, TOutput>preferenceDataFineTuningData<T, TInput, TOutput>
Returns
- TrainingPipelineConfiguration<T, TInput, TOutput>
Remarks
Meta's approach for Llama 2 Chat models.
LoRAFineTuning(int, FineTuningData<T, TInput, TOutput>?, FineTuningData<T, TInput, TOutput>?)
Creates a memory-efficient LoRA fine-tuning pipeline.
public static TrainingPipelineConfiguration<T, TInput, TOutput> LoRAFineTuning(int rank = 16, FineTuningData<T, TInput, TOutput>? sftData = null, FineTuningData<T, TInput, TOutput>? preferenceData = null)
Parameters
rankintsftDataFineTuningData<T, TInput, TOutput>preferenceDataFineTuningData<T, TInput, TOutput>
Returns
- TrainingPipelineConfiguration<T, TInput, TOutput>
MathReasoning(FineTuningData<T, TInput, TOutput>?, FineTuningData<T, TInput, TOutput>?)
Creates a math reasoning model pipeline.
public static TrainingPipelineConfiguration<T, TInput, TOutput> MathReasoning(FineTuningData<T, TInput, TOutput>? sftData = null, FineTuningData<T, TInput, TOutput>? cotData = null)
Parameters
sftDataFineTuningData<T, TInput, TOutput>cotDataFineTuningData<T, TInput, TOutput>
Returns
- TrainingPipelineConfiguration<T, TInput, TOutput>
ORPOAlignment(FineTuningData<T, TInput, TOutput>?)
Creates a reference-free alignment pipeline using ORPO.
public static TrainingPipelineConfiguration<T, TInput, TOutput> ORPOAlignment(FineTuningData<T, TInput, TOutput>? data = null)
Parameters
dataFineTuningData<T, TInput, TOutput>
Returns
- TrainingPipelineConfiguration<T, TInput, TOutput>
Remarks
ORPO combines SFT and preference learning, requiring no reference model. More memory-efficient than DPO.
QLoRAFineTuning(int, int, FineTuningData<T, TInput, TOutput>?)
Creates a QLoRA fine-tuning pipeline for maximum memory efficiency.
public static TrainingPipelineConfiguration<T, TInput, TOutput> QLoRAFineTuning(int rank = 16, int bits = 4, FineTuningData<T, TInput, TOutput>? data = null)
Parameters
rankintbitsintdataFineTuningData<T, TInput, TOutput>
Returns
- TrainingPipelineConfiguration<T, TInput, TOutput>
SafetyFocused(FineTuningData<T, TInput, TOutput>?, FineTuningData<T, TInput, TOutput>?, string[]?)
Creates a safety-focused training pipeline.
public static TrainingPipelineConfiguration<T, TInput, TOutput> SafetyFocused(FineTuningData<T, TInput, TOutput>? sftData = null, FineTuningData<T, TInput, TOutput>? safetyData = null, string[]? principles = null)
Parameters
sftDataFineTuningData<T, TInput, TOutput>safetyDataFineTuningData<T, TInput, TOutput>principlesstring[]
Returns
- TrainingPipelineConfiguration<T, TInput, TOutput>
SimPOAlignment(FineTuningData<T, TInput, TOutput>?, FineTuningData<T, TInput, TOutput>?)
Creates a SimPO alignment pipeline (reference-free, simple).
public static TrainingPipelineConfiguration<T, TInput, TOutput> SimPOAlignment(FineTuningData<T, TInput, TOutput>? sftData = null, FineTuningData<T, TInput, TOutput>? preferenceData = null)
Parameters
sftDataFineTuningData<T, TInput, TOutput>preferenceDataFineTuningData<T, TInput, TOutput>
Returns
- TrainingPipelineConfiguration<T, TInput, TOutput>
StandardAlignment(FineTuningData<T, TInput, TOutput>?, FineTuningData<T, TInput, TOutput>?)
Creates a standard SFT → DPO pipeline (most common alignment workflow).
public static TrainingPipelineConfiguration<T, TInput, TOutput> StandardAlignment(FineTuningData<T, TInput, TOutput>? sftData = null, FineTuningData<T, TInput, TOutput>? preferenceData = null)
Parameters
sftDataFineTuningData<T, TInput, TOutput>preferenceDataFineTuningData<T, TInput, TOutput>
Returns
- TrainingPipelineConfiguration<T, TInput, TOutput>
Validate()
Validates the pipeline configuration.
public List<string> Validate()
Returns
ValidateOrThrow()
Throws an exception if the pipeline is invalid.
public void ValidateOrThrow()