Enum PreferenceLossType
Types of loss functions for preference optimization methods.
public enum PreferenceLossType
Fields
Conservative = 4Conservative DPO loss (cDPO).
Adds a conservative constraint to prevent over-optimization. Useful when concerned about reward hacking.
Hinge = 1Hinge loss for preference optimization.
max(0, margin - (log_p_chosen - log_p_rejected)). More aggressive margin-based learning, can be unstable.
IPO = 2Identity Preference Optimization (IPO) loss.
Squared loss on log probability ratios. Addresses overfitting issues in DPO with noisy preferences.
KTO = 7Kahneman-Tversky Optimization loss.
Based on prospect theory, handles unpaired preference data. Asymmetric treatment of gains (good responses) and losses (bad responses).
OddsRatio = 5Odds ratio preference loss (used in ORPO).
Uses odds ratios instead of log probability differences. Combines SFT and preference optimization in a single objective.
Robust = 3Robust DPO loss with outlier handling.
Modified sigmoid loss that's more robust to noisy preference labels. Useful when preference data quality is uncertain.
Sigmoid = 0Standard sigmoid loss used in DPO.
The original DPO loss: -log(sigmoid(beta * (log_p_chosen - log_p_rejected))). Works well in most cases and is the recommended default.
Simple = 6Simple preference optimization loss (used in SimPO).
Reference-free loss based on length-normalized log probabilities. More memory efficient as it doesn't require a reference model.
Remarks
Different preference optimization methods use different loss formulations. Each has tradeoffs in terms of stability, sample efficiency, and robustness.
For Beginners: This controls how the model learns from preference data. Start with Sigmoid (standard DPO) and only switch if you have specific needs.