Table of Contents

Class TrainingPipelineConfiguration<T, TInput, TOutput>

Namespace
AiDotNet.Models.Options
Assembly
AiDotNet.dll

Configuration for a multi-step training pipeline with customizable stages.

public class TrainingPipelineConfiguration<T, TInput, TOutput>

Type Parameters

T

The numeric data type used for calculations.

TInput

The input data type for the model.

TOutput

The output data type for the model.

Inheritance
TrainingPipelineConfiguration<T, TInput, TOutput>
Inherited Members

Remarks

A training pipeline defines a sequence of training stages that are executed in order. Each stage can have its own training method, optimizer, learning rate, dataset, and evaluation criteria. This enables advanced training workflows like:

  • InstructGPT: SFT → Reward Model → PPO
  • Llama 2: SFT → Rejection Sampling → DPO
  • Anthropic Claude: SFT → Constitutional AI → RLHF
  • DeepSeek: SFT → GRPO
  • Curriculum: Easy → Medium → Hard stages

For Beginners: Think of this as a recipe with multiple cooking steps. Just like you might marinate, then sear, then bake - training can have multiple phases where each phase teaches the model something different.

Properties

CheckpointDirectory

Gets or sets the directory for intermediate checkpoints.

public string CheckpointDirectory { get; set; }

Property Value

string

Remarks

Default is "./checkpoints". Relative to working directory.

DefaultBatchSize

Gets or sets the default batch size for all stages.

public int DefaultBatchSize { get; set; }

Property Value

int

Remarks

Default is 8. Adjust based on available GPU memory.

DefaultLearningRate

Gets or sets the default learning rate for all stages.

public double DefaultLearningRate { get; set; }

Property Value

double

Remarks

Default is 2e-5, standard for fine-tuning pre-trained models.

DefaultOptimizer

Gets or sets the default optimizer type for all stages.

public OptimizerType DefaultOptimizer { get; set; }

Property Value

OptimizerType

Remarks

Default is AdamW. Individual stages can override this with OptimizerOverride.

Description

Gets or sets the description of this pipeline.

public string? Description { get; set; }

Property Value

string

EnableAutoSelection

Gets or sets whether to use automatic pipeline selection when no stages are defined.

public bool EnableAutoSelection { get; set; }

Property Value

bool

Remarks

When true and Stages is empty, the system analyzes available data and automatically constructs an appropriate training pipeline.

EnableExperimentTracking

Gets or sets whether to log to WandB or similar experiment trackers.

public bool EnableExperimentTracking { get; set; }

Property Value

bool

Remarks

Default is false. Enable to use experiment tracking services.

EvaluateAfterEachStage

Gets or sets whether to run evaluation after each stage.

public bool EvaluateAfterEachStage { get; set; }

Property Value

bool

Remarks

Default is true. Helps track progress between stages.

EvaluationMetrics

Gets or sets the evaluation metrics to track.

public string[] EvaluationMetrics { get; set; }

Property Value

string[]

Remarks

Default includes loss and perplexity.

ExperimentName

Gets or sets the experiment name for tracking.

public string ExperimentName { get; set; }

Property Value

string

Remarks

Default is empty (auto-generated from pipeline name).

GlobalEarlyStopping

Gets or sets the global early stopping configuration applied across stages.

public GlobalEarlyStoppingConfig GlobalEarlyStopping { get; set; }

Property Value

GlobalEarlyStoppingConfig

Remarks

Default configuration with patience=5 and monitoring loss.

GlobalEvaluationData

Gets or sets the evaluation data to use across all stages.

public FineTuningData<T, TInput, TOutput>? GlobalEvaluationData { get; set; }

Property Value

FineTuningData<T, TInput, TOutput>

Remarks

Default is null. Set this to use the same validation data across all stages.

GlobalSeed

Gets or sets the global random seed for reproducibility across all stages.

public int GlobalSeed { get; set; }

Property Value

int

Remarks

Default is 42. Use for reproducible training runs.

InterStageCallbacks

Gets or sets callback actions to execute between stages.

public List<Action<TrainingStageResult<T, TInput, TOutput>>>? InterStageCallbacks { get; set; }

Property Value

List<Action<TrainingStageResult<T, TInput, TOutput>>>

LogDirectory

Gets or sets the logging directory.

public string LogDirectory { get; set; }

Property Value

string

Remarks

Default is "./logs". Relative to working directory.

MaxCheckpointsToKeep

Gets or sets the maximum number of checkpoints to keep.

public int MaxCheckpointsToKeep { get; set; }

Property Value

int

Remarks

Default is 3. Older checkpoints are deleted to save disk space.

Metadata

Gets or sets custom metadata for the pipeline.

public Dictionary<string, object> Metadata { get; set; }

Property Value

Dictionary<string, object>

Remarks

Empty by default. Use to store custom key-value pairs.

MixedPrecisionDType

Gets or sets the mixed precision data type.

public MixedPrecisionType MixedPrecisionDType { get; set; }

Property Value

MixedPrecisionType

Remarks

Default is FP16 for broad GPU compatibility. Use BF16 on Ampere+ GPUs.

Name

Gets or sets the name of this pipeline for identification.

public string Name { get; set; }

Property Value

string

OnPipelineComplete

Gets or sets callback actions to execute when the pipeline completes.

public Action<List<TrainingStageResult<T, TInput, TOutput>>>? OnPipelineComplete { get; set; }

Property Value

Action<List<TrainingStageResult<T, TInput, TOutput>>>

OnPipelineError

Gets or sets callback actions to execute on pipeline failure.

public Action<Exception, TrainingStageResult<T, TInput, TOutput>?>? OnPipelineError { get; set; }

Property Value

Action<Exception, TrainingStageResult<T, TInput, TOutput>>

OnPipelineStart

Gets or sets callback actions to execute before the pipeline starts.

public Action<TrainingPipelineConfiguration<T, TInput, TOutput>>? OnPipelineStart { get; set; }

Property Value

Action<TrainingPipelineConfiguration<T, TInput, TOutput>>

ResumeCheckpointPath

Gets or sets the specific checkpoint path to resume from.

public string ResumeCheckpointPath { get; set; }

Property Value

string

Remarks

Default is empty (use latest if ResumeFromCheckpoint is true).

ResumeFromCheckpoint

Gets or sets whether to resume from the latest checkpoint.

public bool ResumeFromCheckpoint { get; set; }

Property Value

bool

Remarks

Default is false. Set to true to continue interrupted training.

SaveIntermediateCheckpoints

Gets or sets whether to save checkpoints between stages.

public bool SaveIntermediateCheckpoints { get; set; }

Property Value

bool

Remarks

Default is true. Enables recovery from failures.

Stages

Gets or sets the ordered list of training stages in the pipeline.

public List<TrainingStage<T, TInput, TOutput>> Stages { get; set; }

Property Value

List<TrainingStage<T, TInput, TOutput>>

Remarks

Stages are executed sequentially. The output model from each stage becomes the input model for the next stage.

Tags

Gets or sets tags for categorizing the pipeline.

public string[] Tags { get; set; }

Property Value

string[]

Remarks

Empty by default. Useful for filtering and organizing pipelines.

UseMixedPrecision

Gets or sets whether to use mixed precision training globally.

public bool UseMixedPrecision { get; set; }

Property Value

bool

Remarks

Default is false. Enable for faster training with lower memory.

VerboseLogging

Gets or sets whether to enable verbose logging.

public bool VerboseLogging { get; set; }

Property Value

bool

Remarks

Default is false. Enable for detailed debug output.

Methods

AddAdapterMergingStage(Action<TrainingStage<T, TInput, TOutput>>?)

Adds an adapter merging stage.

public TrainingPipelineConfiguration<T, TInput, TOutput> AddAdapterMergingStage(Action<TrainingStage<T, TInput, TOutput>>? configure = null)

Parameters

configure Action<TrainingStage<T, TInput, TOutput>>

Optional configuration action for the stage.

Returns

TrainingPipelineConfiguration<T, TInput, TOutput>

This configuration for method chaining.

AddAgenticStage(Action<TrainingStage<T, TInput, TOutput>>?)

Adds an agentic behavior training stage.

public TrainingPipelineConfiguration<T, TInput, TOutput> AddAgenticStage(Action<TrainingStage<T, TInput, TOutput>>? configure = null)

Parameters

configure Action<TrainingStage<T, TInput, TOutput>>

Optional configuration action for the stage.

Returns

TrainingPipelineConfiguration<T, TInput, TOutput>

This configuration for method chaining.

AddCPOStage(Action<TrainingStage<T, TInput, TOutput>>?)

Adds a Contrastive Preference Optimization (CPO) stage.

public TrainingPipelineConfiguration<T, TInput, TOutput> AddCPOStage(Action<TrainingStage<T, TInput, TOutput>>? configure = null)

Parameters

configure Action<TrainingStage<T, TInput, TOutput>>

Optional configuration action for the stage.

Returns

TrainingPipelineConfiguration<T, TInput, TOutput>

This configuration for method chaining.

AddChainOfThoughtStage(Action<TrainingStage<T, TInput, TOutput>>?)

Adds a chain-of-thought training stage.

public TrainingPipelineConfiguration<T, TInput, TOutput> AddChainOfThoughtStage(Action<TrainingStage<T, TInput, TOutput>>? configure = null)

Parameters

configure Action<TrainingStage<T, TInput, TOutput>>

Optional configuration action for the stage.

Returns

TrainingPipelineConfiguration<T, TInput, TOutput>

This configuration for method chaining.

AddCheckpointStage(Action<TrainingStage<T, TInput, TOutput>>?)

Adds a checkpoint stage.

public TrainingPipelineConfiguration<T, TInput, TOutput> AddCheckpointStage(Action<TrainingStage<T, TInput, TOutput>>? configure = null)

Parameters

configure Action<TrainingStage<T, TInput, TOutput>>

Optional configuration action for the stage.

Returns

TrainingPipelineConfiguration<T, TInput, TOutput>

This configuration for method chaining.

AddCodeFineTuningStage(Action<TrainingStage<T, TInput, TOutput>>?)

Adds a code fine-tuning stage.

public TrainingPipelineConfiguration<T, TInput, TOutput> AddCodeFineTuningStage(Action<TrainingStage<T, TInput, TOutput>>? configure = null)

Parameters

configure Action<TrainingStage<T, TInput, TOutput>>

Optional configuration action for the stage.

Returns

TrainingPipelineConfiguration<T, TInput, TOutput>

This configuration for method chaining.

AddConstitutionalAIStage(string[]?, Action<TrainingStage<T, TInput, TOutput>>?)

Adds a Constitutional AI stage.

public TrainingPipelineConfiguration<T, TInput, TOutput> AddConstitutionalAIStage(string[]? principles = null, Action<TrainingStage<T, TInput, TOutput>>? configure = null)

Parameters

principles string[]

The constitutional principles to use.

configure Action<TrainingStage<T, TInput, TOutput>>

Optional configuration action for the stage.

Returns

TrainingPipelineConfiguration<T, TInput, TOutput>

This configuration for method chaining.

AddCustomStage(string, Func<IFullModel<T, TInput, TOutput>, FineTuningData<T, TInput, TOutput>, CancellationToken, Task<IFullModel<T, TInput, TOutput>>>, Action<TrainingStage<T, TInput, TOutput>>?)

Adds a custom training stage with user-defined logic.

public TrainingPipelineConfiguration<T, TInput, TOutput> AddCustomStage(string name, Func<IFullModel<T, TInput, TOutput>, FineTuningData<T, TInput, TOutput>, CancellationToken, Task<IFullModel<T, TInput, TOutput>>> trainFunc, Action<TrainingStage<T, TInput, TOutput>>? configure = null)

Parameters

name string

Name of the custom stage.

trainFunc Func<IFullModel<T, TInput, TOutput>, FineTuningData<T, TInput, TOutput>, CancellationToken, Task<IFullModel<T, TInput, TOutput>>>

The custom training function.

configure Action<TrainingStage<T, TInput, TOutput>>

Optional configuration action for the stage.

Returns

TrainingPipelineConfiguration<T, TInput, TOutput>

This configuration for method chaining.

AddDPOStage(Action<TrainingStage<T, TInput, TOutput>>?)

Adds a Direct Preference Optimization (DPO) stage.

public TrainingPipelineConfiguration<T, TInput, TOutput> AddDPOStage(Action<TrainingStage<T, TInput, TOutput>>? configure = null)

Parameters

configure Action<TrainingStage<T, TInput, TOutput>>

Optional configuration action for the stage.

Returns

TrainingPipelineConfiguration<T, TInput, TOutput>

This configuration for method chaining.

AddDistillationStage(IFullModel<T, TInput, TOutput>?, Action<TrainingStage<T, TInput, TOutput>>?)

Adds a knowledge distillation stage.

public TrainingPipelineConfiguration<T, TInput, TOutput> AddDistillationStage(IFullModel<T, TInput, TOutput>? teacherModel = null, Action<TrainingStage<T, TInput, TOutput>>? configure = null)

Parameters

teacherModel IFullModel<T, TInput, TOutput>

The teacher model to distill from.

configure Action<TrainingStage<T, TInput, TOutput>>

Optional configuration action for the stage.

Returns

TrainingPipelineConfiguration<T, TInput, TOutput>

This configuration for method chaining.

AddEvaluationStage(Action<TrainingStage<T, TInput, TOutput>>?)

Adds an evaluation-only stage (no training, just metrics).

public TrainingPipelineConfiguration<T, TInput, TOutput> AddEvaluationStage(Action<TrainingStage<T, TInput, TOutput>>? configure = null)

Parameters

configure Action<TrainingStage<T, TInput, TOutput>>

Optional configuration action for the stage.

Returns

TrainingPipelineConfiguration<T, TInput, TOutput>

This configuration for method chaining.

AddGRPOStage(Action<TrainingStage<T, TInput, TOutput>>?)

Adds a GRPO (Group Relative Policy Optimization) stage.

public TrainingPipelineConfiguration<T, TInput, TOutput> AddGRPOStage(Action<TrainingStage<T, TInput, TOutput>>? configure = null)

Parameters

configure Action<TrainingStage<T, TInput, TOutput>>

Optional configuration action for the stage.

Returns

TrainingPipelineConfiguration<T, TInput, TOutput>

This configuration for method chaining.

AddHarmlessnessStage(Action<TrainingStage<T, TInput, TOutput>>?)

Adds a harmlessness training stage.

public TrainingPipelineConfiguration<T, TInput, TOutput> AddHarmlessnessStage(Action<TrainingStage<T, TInput, TOutput>>? configure = null)

Parameters

configure Action<TrainingStage<T, TInput, TOutput>>

Optional configuration action for the stage.

Returns

TrainingPipelineConfiguration<T, TInput, TOutput>

This configuration for method chaining.

AddHelpfulnessStage(Action<TrainingStage<T, TInput, TOutput>>?)

Adds a helpfulness training stage.

public TrainingPipelineConfiguration<T, TInput, TOutput> AddHelpfulnessStage(Action<TrainingStage<T, TInput, TOutput>>? configure = null)

Parameters

configure Action<TrainingStage<T, TInput, TOutput>>

Optional configuration action for the stage.

Returns

TrainingPipelineConfiguration<T, TInput, TOutput>

This configuration for method chaining.

AddIPOStage(Action<TrainingStage<T, TInput, TOutput>>?)

Adds an Identity Preference Optimization (IPO) stage.

public TrainingPipelineConfiguration<T, TInput, TOutput> AddIPOStage(Action<TrainingStage<T, TInput, TOutput>>? configure = null)

Parameters

configure Action<TrainingStage<T, TInput, TOutput>>

Optional configuration action for the stage.

Returns

TrainingPipelineConfiguration<T, TInput, TOutput>

This configuration for method chaining.

AddInstructionTuningStage(Action<TrainingStage<T, TInput, TOutput>>?)

Adds an instruction tuning stage (specialized SFT).

public TrainingPipelineConfiguration<T, TInput, TOutput> AddInstructionTuningStage(Action<TrainingStage<T, TInput, TOutput>>? configure = null)

Parameters

configure Action<TrainingStage<T, TInput, TOutput>>

Optional configuration action for the stage.

Returns

TrainingPipelineConfiguration<T, TInput, TOutput>

This configuration for method chaining.

AddKTOStage(Action<TrainingStage<T, TInput, TOutput>>?)

Adds a Kahneman-Tversky Optimization (KTO) stage.

public TrainingPipelineConfiguration<T, TInput, TOutput> AddKTOStage(Action<TrainingStage<T, TInput, TOutput>>? configure = null)

Parameters

configure Action<TrainingStage<T, TInput, TOutput>>

Optional configuration action for the stage.

Returns

TrainingPipelineConfiguration<T, TInput, TOutput>

This configuration for method chaining.

AddLoRAStage(int, Action<TrainingStage<T, TInput, TOutput>>?)

Adds a LoRA adapter training stage.

public TrainingPipelineConfiguration<T, TInput, TOutput> AddLoRAStage(int rank = 16, Action<TrainingStage<T, TInput, TOutput>>? configure = null)

Parameters

rank int

LoRA rank (dimension of low-rank matrices).

configure Action<TrainingStage<T, TInput, TOutput>>

Optional configuration action for the stage.

Returns

TrainingPipelineConfiguration<T, TInput, TOutput>

This configuration for method chaining.

AddMathReasoningStage(Action<TrainingStage<T, TInput, TOutput>>?)

Adds a math reasoning fine-tuning stage.

public TrainingPipelineConfiguration<T, TInput, TOutput> AddMathReasoningStage(Action<TrainingStage<T, TInput, TOutput>>? configure = null)

Parameters

configure Action<TrainingStage<T, TInput, TOutput>>

Optional configuration action for the stage.

Returns

TrainingPipelineConfiguration<T, TInput, TOutput>

This configuration for method chaining.

AddMultiTurnConversationStage(Action<TrainingStage<T, TInput, TOutput>>?)

Adds a multi-turn conversation training stage.

public TrainingPipelineConfiguration<T, TInput, TOutput> AddMultiTurnConversationStage(Action<TrainingStage<T, TInput, TOutput>>? configure = null)

Parameters

configure Action<TrainingStage<T, TInput, TOutput>>

Optional configuration action for the stage.

Returns

TrainingPipelineConfiguration<T, TInput, TOutput>

This configuration for method chaining.

AddORPOStage(Action<TrainingStage<T, TInput, TOutput>>?)

Adds an Odds Ratio Preference Optimization (ORPO) stage.

public TrainingPipelineConfiguration<T, TInput, TOutput> AddORPOStage(Action<TrainingStage<T, TInput, TOutput>>? configure = null)

Parameters

configure Action<TrainingStage<T, TInput, TOutput>>

Optional configuration action for the stage.

Returns

TrainingPipelineConfiguration<T, TInput, TOutput>

This configuration for method chaining.

AddPPOStage(Action<TrainingStage<T, TInput, TOutput>>?)

Adds a PPO stage.

public TrainingPipelineConfiguration<T, TInput, TOutput> AddPPOStage(Action<TrainingStage<T, TInput, TOutput>>? configure = null)

Parameters

configure Action<TrainingStage<T, TInput, TOutput>>

Optional configuration action for the stage.

Returns

TrainingPipelineConfiguration<T, TInput, TOutput>

This configuration for method chaining.

AddPROStage(Action<TrainingStage<T, TInput, TOutput>>?)

Adds a Preference Ranking Optimization (PRO) stage.

public TrainingPipelineConfiguration<T, TInput, TOutput> AddPROStage(Action<TrainingStage<T, TInput, TOutput>>? configure = null)

Parameters

configure Action<TrainingStage<T, TInput, TOutput>>

Optional configuration action for the stage.

Returns

TrainingPipelineConfiguration<T, TInput, TOutput>

This configuration for method chaining.

AddPreferenceStage(FineTuningMethodType, Action<TrainingStage<T, TInput, TOutput>>?)

Adds a generic preference optimization stage with configurable method.

public TrainingPipelineConfiguration<T, TInput, TOutput> AddPreferenceStage(FineTuningMethodType method = FineTuningMethodType.DPO, Action<TrainingStage<T, TInput, TOutput>>? configure = null)

Parameters

method FineTuningMethodType

The preference optimization method to use.

configure Action<TrainingStage<T, TInput, TOutput>>

Optional configuration action for the stage.

Returns

TrainingPipelineConfiguration<T, TInput, TOutput>

This configuration for method chaining.

AddProcessRewardModelStage(Action<TrainingStage<T, TInput, TOutput>>?)

Adds a process reward model (PRM) training stage.

public TrainingPipelineConfiguration<T, TInput, TOutput> AddProcessRewardModelStage(Action<TrainingStage<T, TInput, TOutput>>? configure = null)

Parameters

configure Action<TrainingStage<T, TInput, TOutput>>

Optional configuration action for the stage.

Returns

TrainingPipelineConfiguration<T, TInput, TOutput>

This configuration for method chaining.

AddQLoRAStage(int, int, Action<TrainingStage<T, TInput, TOutput>>?)

Adds a QLoRA (quantized LoRA) stage.

public TrainingPipelineConfiguration<T, TInput, TOutput> AddQLoRAStage(int rank = 16, int bits = 4, Action<TrainingStage<T, TInput, TOutput>>? configure = null)

Parameters

rank int

LoRA rank.

bits int

Quantization bits (4 or 8).

configure Action<TrainingStage<T, TInput, TOutput>>

Optional configuration action for the stage.

Returns

TrainingPipelineConfiguration<T, TInput, TOutput>

This configuration for method chaining.

AddRLAIFStage(Action<TrainingStage<T, TInput, TOutput>>?)

Adds an RLAIF (RL from AI Feedback) stage.

public TrainingPipelineConfiguration<T, TInput, TOutput> AddRLAIFStage(Action<TrainingStage<T, TInput, TOutput>>? configure = null)

Parameters

configure Action<TrainingStage<T, TInput, TOutput>>

Optional configuration action for the stage.

Returns

TrainingPipelineConfiguration<T, TInput, TOutput>

This configuration for method chaining.

AddRLHFStage(Action<TrainingStage<T, TInput, TOutput>>?)

Adds an RLHF (PPO-based) stage.

public TrainingPipelineConfiguration<T, TInput, TOutput> AddRLHFStage(Action<TrainingStage<T, TInput, TOutput>>? configure = null)

Parameters

configure Action<TrainingStage<T, TInput, TOutput>>

Optional configuration action for the stage.

Returns

TrainingPipelineConfiguration<T, TInput, TOutput>

This configuration for method chaining.

AddRLStage(FineTuningMethodType, Action<TrainingStage<T, TInput, TOutput>>?)

Adds a generic reinforcement learning stage.

public TrainingPipelineConfiguration<T, TInput, TOutput> AddRLStage(FineTuningMethodType method = FineTuningMethodType.RLHF, Action<TrainingStage<T, TInput, TOutput>>? configure = null)

Parameters

method FineTuningMethodType

The RL method to use.

configure Action<TrainingStage<T, TInput, TOutput>>

Optional configuration action for the stage.

Returns

TrainingPipelineConfiguration<T, TInput, TOutput>

This configuration for method chaining.

AddRRHFStage(Action<TrainingStage<T, TInput, TOutput>>?)

Adds a Rank Responses to align Human Feedback (RRHF) stage.

public TrainingPipelineConfiguration<T, TInput, TOutput> AddRRHFStage(Action<TrainingStage<T, TInput, TOutput>>? configure = null)

Parameters

configure Action<TrainingStage<T, TInput, TOutput>>

Optional configuration action for the stage.

Returns

TrainingPipelineConfiguration<T, TInput, TOutput>

This configuration for method chaining.

AddRejectionSamplingStage(Action<TrainingStage<T, TInput, TOutput>>?)

Adds a Rejection Sampling Optimization (RSO) stage.

public TrainingPipelineConfiguration<T, TInput, TOutput> AddRejectionSamplingStage(Action<TrainingStage<T, TInput, TOutput>>? configure = null)

Parameters

configure Action<TrainingStage<T, TInput, TOutput>>

Optional configuration action for the stage.

Returns

TrainingPipelineConfiguration<T, TInput, TOutput>

This configuration for method chaining.

AddRewardModelStage(Action<TrainingStage<T, TInput, TOutput>>?)

Adds a reward model training stage.

public TrainingPipelineConfiguration<T, TInput, TOutput> AddRewardModelStage(Action<TrainingStage<T, TInput, TOutput>>? configure = null)

Parameters

configure Action<TrainingStage<T, TInput, TOutput>>

Optional configuration action for the stage.

Returns

TrainingPipelineConfiguration<T, TInput, TOutput>

This configuration for method chaining.

AddRobustDPOStage(Action<TrainingStage<T, TInput, TOutput>>?)

Adds a Robust DPO (R-DPO) stage.

public TrainingPipelineConfiguration<T, TInput, TOutput> AddRobustDPOStage(Action<TrainingStage<T, TInput, TOutput>>? configure = null)

Parameters

configure Action<TrainingStage<T, TInput, TOutput>>

Optional configuration action for the stage.

Returns

TrainingPipelineConfiguration<T, TInput, TOutput>

This configuration for method chaining.

AddSFTStage(Action<TrainingStage<T, TInput, TOutput>>?)

Adds a supervised fine-tuning (SFT) stage.

public TrainingPipelineConfiguration<T, TInput, TOutput> AddSFTStage(Action<TrainingStage<T, TInput, TOutput>>? configure = null)

Parameters

configure Action<TrainingStage<T, TInput, TOutput>>

Optional configuration action for the stage.

Returns

TrainingPipelineConfiguration<T, TInput, TOutput>

This configuration for method chaining.

AddSLiCStage(Action<TrainingStage<T, TInput, TOutput>>?)

Adds a Sequence Likelihood Calibration (SLiC-HF) stage.

public TrainingPipelineConfiguration<T, TInput, TOutput> AddSLiCStage(Action<TrainingStage<T, TInput, TOutput>>? configure = null)

Parameters

configure Action<TrainingStage<T, TInput, TOutput>>

Optional configuration action for the stage.

Returns

TrainingPipelineConfiguration<T, TInput, TOutput>

This configuration for method chaining.

AddSPINStage(Action<TrainingStage<T, TInput, TOutput>>?)

Adds a Self-Play Fine-Tuning (SPIN) stage.

public TrainingPipelineConfiguration<T, TInput, TOutput> AddSPINStage(Action<TrainingStage<T, TInput, TOutput>>? configure = null)

Parameters

configure Action<TrainingStage<T, TInput, TOutput>>

Optional configuration action for the stage.

Returns

TrainingPipelineConfiguration<T, TInput, TOutput>

This configuration for method chaining.

AddSafetyAlignmentStage(Action<TrainingStage<T, TInput, TOutput>>?)

Adds a safety alignment stage.

public TrainingPipelineConfiguration<T, TInput, TOutput> AddSafetyAlignmentStage(Action<TrainingStage<T, TInput, TOutput>>? configure = null)

Parameters

configure Action<TrainingStage<T, TInput, TOutput>>

Optional configuration action for the stage.

Returns

TrainingPipelineConfiguration<T, TInput, TOutput>

This configuration for method chaining.

AddSelfRewardingStage(Action<TrainingStage<T, TInput, TOutput>>?)

Adds a self-rewarding stage.

public TrainingPipelineConfiguration<T, TInput, TOutput> AddSelfRewardingStage(Action<TrainingStage<T, TInput, TOutput>>? configure = null)

Parameters

configure Action<TrainingStage<T, TInput, TOutput>>

Optional configuration action for the stage.

Returns

TrainingPipelineConfiguration<T, TInput, TOutput>

This configuration for method chaining.

AddSimPOStage(Action<TrainingStage<T, TInput, TOutput>>?)

Adds a Simple Preference Optimization (SimPO) stage.

public TrainingPipelineConfiguration<T, TInput, TOutput> AddSimPOStage(Action<TrainingStage<T, TInput, TOutput>>? configure = null)

Parameters

configure Action<TrainingStage<T, TInput, TOutput>>

Optional configuration action for the stage.

Returns

TrainingPipelineConfiguration<T, TInput, TOutput>

This configuration for method chaining.

AddStage(TrainingStage<T, TInput, TOutput>)

Adds a training stage to the pipeline.

public TrainingPipelineConfiguration<T, TInput, TOutput> AddStage(TrainingStage<T, TInput, TOutput> stage)

Parameters

stage TrainingStage<T, TInput, TOutput>

The stage to add.

Returns

TrainingPipelineConfiguration<T, TInput, TOutput>

This configuration for method chaining.

AddSyntheticDataStage(IFullModel<T, TInput, TOutput>?, Action<TrainingStage<T, TInput, TOutput>>?)

Adds a synthetic data training stage.

public TrainingPipelineConfiguration<T, TInput, TOutput> AddSyntheticDataStage(IFullModel<T, TInput, TOutput>? teacherModel = null, Action<TrainingStage<T, TInput, TOutput>>? configure = null)

Parameters

teacherModel IFullModel<T, TInput, TOutput>

The teacher model to generate synthetic data.

configure Action<TrainingStage<T, TInput, TOutput>>

Optional configuration action for the stage.

Returns

TrainingPipelineConfiguration<T, TInput, TOutput>

This configuration for method chaining.

AddToolUseStage(Action<TrainingStage<T, TInput, TOutput>>?)

Adds a tool use training stage.

public TrainingPipelineConfiguration<T, TInput, TOutput> AddToolUseStage(Action<TrainingStage<T, TInput, TOutput>>? configure = null)

Parameters

configure Action<TrainingStage<T, TInput, TOutput>>

Optional configuration action for the stage.

Returns

TrainingPipelineConfiguration<T, TInput, TOutput>

This configuration for method chaining.

AgentTraining(FineTuningData<T, TInput, TOutput>?, FineTuningData<T, TInput, TOutput>?, FineTuningData<T, TInput, TOutput>?)

Creates an agent/tool-use training pipeline.

public static TrainingPipelineConfiguration<T, TInput, TOutput> AgentTraining(FineTuningData<T, TInput, TOutput>? sftData = null, FineTuningData<T, TInput, TOutput>? toolData = null, FineTuningData<T, TInput, TOutput>? agenticData = null)

Parameters

sftData FineTuningData<T, TInput, TOutput>
toolData FineTuningData<T, TInput, TOutput>
agenticData FineTuningData<T, TInput, TOutput>

Returns

TrainingPipelineConfiguration<T, TInput, TOutput>

AnthropicClaude(FineTuningData<T, TInput, TOutput>?, string[]?)

Creates an Anthropic Claude-style pipeline (SFT → Constitutional AI → RLHF).

public static TrainingPipelineConfiguration<T, TInput, TOutput> AnthropicClaude(FineTuningData<T, TInput, TOutput>? sftData = null, string[]? principles = null)

Parameters

sftData FineTuningData<T, TInput, TOutput>
principles string[]

Returns

TrainingPipelineConfiguration<T, TInput, TOutput>

Remarks

Anthropic's Constitutional AI approach for Claude models.

Auto(FineTuningData<T, TInput, TOutput>)

Automatically selects an appropriate pipeline based on available data.

public static TrainingPipelineConfiguration<T, TInput, TOutput> Auto(FineTuningData<T, TInput, TOutput> availableData)

Parameters

availableData FineTuningData<T, TInput, TOutput>

Returns

TrainingPipelineConfiguration<T, TInput, TOutput>

CodeModel(FineTuningData<T, TInput, TOutput>?, FineTuningData<T, TInput, TOutput>?)

Creates a code model training pipeline.

public static TrainingPipelineConfiguration<T, TInput, TOutput> CodeModel(FineTuningData<T, TInput, TOutput>? sftData = null, FineTuningData<T, TInput, TOutput>? codeExecutionData = null)

Parameters

sftData FineTuningData<T, TInput, TOutput>
codeExecutionData FineTuningData<T, TInput, TOutput>

Returns

TrainingPipelineConfiguration<T, TInput, TOutput>

ConstitutionalAI(FineTuningData<T, TInput, TOutput>?, string[]?)

Creates a Constitutional AI pipeline (SFT → CAI critique/revision → preference).

public static TrainingPipelineConfiguration<T, TInput, TOutput> ConstitutionalAI(FineTuningData<T, TInput, TOutput>? sftData = null, string[]? principles = null)

Parameters

sftData FineTuningData<T, TInput, TOutput>
principles string[]

Returns

TrainingPipelineConfiguration<T, TInput, TOutput>

CurriculumLearning(params (string Name, FineTuningData<T, TInput, TOutput> Data)[])

Creates a curriculum learning pipeline with progressively harder stages.

public static TrainingPipelineConfiguration<T, TInput, TOutput> CurriculumLearning(params (string Name, FineTuningData<T, TInput, TOutput> Data)[] curriculumStages)

Parameters

curriculumStages (string Name, FineTuningData<T, TInput, TOutput> Data)[]

Returns

TrainingPipelineConfiguration<T, TInput, TOutput>

DeepSeek(FineTuningData<T, TInput, TOutput>?, FineTuningData<T, TInput, TOutput>?)

Creates a DeepSeek-style pipeline (SFT → GRPO).

public static TrainingPipelineConfiguration<T, TInput, TOutput> DeepSeek(FineTuningData<T, TInput, TOutput>? sftData = null, FineTuningData<T, TInput, TOutput>? grpoData = null)

Parameters

sftData FineTuningData<T, TInput, TOutput>
grpoData FineTuningData<T, TInput, TOutput>

Returns

TrainingPipelineConfiguration<T, TInput, TOutput>

Remarks

DeepSeek's efficient training approach using GRPO instead of PPO.

FullRLHF(FineTuningData<T, TInput, TOutput>?, FineTuningData<T, TInput, TOutput>?)

Creates a full RLHF pipeline (SFT → Reward Model → PPO).

public static TrainingPipelineConfiguration<T, TInput, TOutput> FullRLHF(FineTuningData<T, TInput, TOutput>? sftData = null, FineTuningData<T, TInput, TOutput>? rlData = null)

Parameters

sftData FineTuningData<T, TInput, TOutput>
rlData FineTuningData<T, TInput, TOutput>

Returns

TrainingPipelineConfiguration<T, TInput, TOutput>

InstructGPT(FineTuningData<T, TInput, TOutput>?, FineTuningData<T, TInput, TOutput>?, FineTuningData<T, TInput, TOutput>?)

Creates an OpenAI InstructGPT-style pipeline (SFT → Reward Model → PPO).

public static TrainingPipelineConfiguration<T, TInput, TOutput> InstructGPT(FineTuningData<T, TInput, TOutput>? sftData = null, FineTuningData<T, TInput, TOutput>? preferenceData = null, FineTuningData<T, TInput, TOutput>? rlData = null)

Parameters

sftData FineTuningData<T, TInput, TOutput>
preferenceData FineTuningData<T, TInput, TOutput>
rlData FineTuningData<T, TInput, TOutput>

Returns

TrainingPipelineConfiguration<T, TInput, TOutput>

Remarks

The original ChatGPT training pipeline from the InstructGPT paper.

IterativeRefinement(int, FineTuningData<T, TInput, TOutput>?, FineTuningData<T, TInput, TOutput>?)

Creates an iterative refinement pipeline that runs multiple DPO rounds.

public static TrainingPipelineConfiguration<T, TInput, TOutput> IterativeRefinement(int iterations, FineTuningData<T, TInput, TOutput>? sftData = null, FineTuningData<T, TInput, TOutput>? preferenceData = null)

Parameters

iterations int
sftData FineTuningData<T, TInput, TOutput>
preferenceData FineTuningData<T, TInput, TOutput>

Returns

TrainingPipelineConfiguration<T, TInput, TOutput>

IterativeSPIN(int, FineTuningData<T, TInput, TOutput>?)

Creates an iterative SPIN pipeline.

public static TrainingPipelineConfiguration<T, TInput, TOutput> IterativeSPIN(int iterations = 3, FineTuningData<T, TInput, TOutput>? sftData = null)

Parameters

iterations int
sftData FineTuningData<T, TInput, TOutput>

Returns

TrainingPipelineConfiguration<T, TInput, TOutput>

KnowledgeDistillation(IFullModel<T, TInput, TOutput>?, FineTuningData<T, TInput, TOutput>?)

Creates a distillation pipeline (teacher → student).

public static TrainingPipelineConfiguration<T, TInput, TOutput> KnowledgeDistillation(IFullModel<T, TInput, TOutput>? teacherModel = null, FineTuningData<T, TInput, TOutput>? data = null)

Parameters

teacherModel IFullModel<T, TInput, TOutput>
data FineTuningData<T, TInput, TOutput>

Returns

TrainingPipelineConfiguration<T, TInput, TOutput>

Llama2(FineTuningData<T, TInput, TOutput>?, FineTuningData<T, TInput, TOutput>?)

Creates a Meta Llama 2-style pipeline (SFT → Rejection Sampling → DPO).

public static TrainingPipelineConfiguration<T, TInput, TOutput> Llama2(FineTuningData<T, TInput, TOutput>? sftData = null, FineTuningData<T, TInput, TOutput>? preferenceData = null)

Parameters

sftData FineTuningData<T, TInput, TOutput>
preferenceData FineTuningData<T, TInput, TOutput>

Returns

TrainingPipelineConfiguration<T, TInput, TOutput>

Remarks

Meta's approach for Llama 2 Chat models.

LoRAFineTuning(int, FineTuningData<T, TInput, TOutput>?, FineTuningData<T, TInput, TOutput>?)

Creates a memory-efficient LoRA fine-tuning pipeline.

public static TrainingPipelineConfiguration<T, TInput, TOutput> LoRAFineTuning(int rank = 16, FineTuningData<T, TInput, TOutput>? sftData = null, FineTuningData<T, TInput, TOutput>? preferenceData = null)

Parameters

rank int
sftData FineTuningData<T, TInput, TOutput>
preferenceData FineTuningData<T, TInput, TOutput>

Returns

TrainingPipelineConfiguration<T, TInput, TOutput>

MathReasoning(FineTuningData<T, TInput, TOutput>?, FineTuningData<T, TInput, TOutput>?)

Creates a math reasoning model pipeline.

public static TrainingPipelineConfiguration<T, TInput, TOutput> MathReasoning(FineTuningData<T, TInput, TOutput>? sftData = null, FineTuningData<T, TInput, TOutput>? cotData = null)

Parameters

sftData FineTuningData<T, TInput, TOutput>
cotData FineTuningData<T, TInput, TOutput>

Returns

TrainingPipelineConfiguration<T, TInput, TOutput>

ORPOAlignment(FineTuningData<T, TInput, TOutput>?)

Creates a reference-free alignment pipeline using ORPO.

public static TrainingPipelineConfiguration<T, TInput, TOutput> ORPOAlignment(FineTuningData<T, TInput, TOutput>? data = null)

Parameters

data FineTuningData<T, TInput, TOutput>

Returns

TrainingPipelineConfiguration<T, TInput, TOutput>

Remarks

ORPO combines SFT and preference learning, requiring no reference model. More memory-efficient than DPO.

QLoRAFineTuning(int, int, FineTuningData<T, TInput, TOutput>?)

Creates a QLoRA fine-tuning pipeline for maximum memory efficiency.

public static TrainingPipelineConfiguration<T, TInput, TOutput> QLoRAFineTuning(int rank = 16, int bits = 4, FineTuningData<T, TInput, TOutput>? data = null)

Parameters

rank int
bits int
data FineTuningData<T, TInput, TOutput>

Returns

TrainingPipelineConfiguration<T, TInput, TOutput>

SafetyFocused(FineTuningData<T, TInput, TOutput>?, FineTuningData<T, TInput, TOutput>?, string[]?)

Creates a safety-focused training pipeline.

public static TrainingPipelineConfiguration<T, TInput, TOutput> SafetyFocused(FineTuningData<T, TInput, TOutput>? sftData = null, FineTuningData<T, TInput, TOutput>? safetyData = null, string[]? principles = null)

Parameters

sftData FineTuningData<T, TInput, TOutput>
safetyData FineTuningData<T, TInput, TOutput>
principles string[]

Returns

TrainingPipelineConfiguration<T, TInput, TOutput>

SimPOAlignment(FineTuningData<T, TInput, TOutput>?, FineTuningData<T, TInput, TOutput>?)

Creates a SimPO alignment pipeline (reference-free, simple).

public static TrainingPipelineConfiguration<T, TInput, TOutput> SimPOAlignment(FineTuningData<T, TInput, TOutput>? sftData = null, FineTuningData<T, TInput, TOutput>? preferenceData = null)

Parameters

sftData FineTuningData<T, TInput, TOutput>
preferenceData FineTuningData<T, TInput, TOutput>

Returns

TrainingPipelineConfiguration<T, TInput, TOutput>

StandardAlignment(FineTuningData<T, TInput, TOutput>?, FineTuningData<T, TInput, TOutput>?)

Creates a standard SFT → DPO pipeline (most common alignment workflow).

public static TrainingPipelineConfiguration<T, TInput, TOutput> StandardAlignment(FineTuningData<T, TInput, TOutput>? sftData = null, FineTuningData<T, TInput, TOutput>? preferenceData = null)

Parameters

sftData FineTuningData<T, TInput, TOutput>
preferenceData FineTuningData<T, TInput, TOutput>

Returns

TrainingPipelineConfiguration<T, TInput, TOutput>

Validate()

Validates the pipeline configuration.

public List<string> Validate()

Returns

List<string>

A list of validation errors, empty if valid.

ValidateOrThrow()

Throws an exception if the pipeline is invalid.

public void ValidateOrThrow()