Table of Contents

Enum PreferenceLossType

Namespace
AiDotNet.Enums
Assembly
AiDotNet.dll

Types of loss functions for preference optimization methods.

public enum PreferenceLossType

Fields

Conservative = 4

Conservative DPO loss (cDPO).

Adds a conservative constraint to prevent over-optimization. Useful when concerned about reward hacking.

Hinge = 1

Hinge loss for preference optimization.

max(0, margin - (log_p_chosen - log_p_rejected)). More aggressive margin-based learning, can be unstable.

IPO = 2

Identity Preference Optimization (IPO) loss.

Squared loss on log probability ratios. Addresses overfitting issues in DPO with noisy preferences.

KTO = 7

Kahneman-Tversky Optimization loss.

Based on prospect theory, handles unpaired preference data. Asymmetric treatment of gains (good responses) and losses (bad responses).

OddsRatio = 5

Odds ratio preference loss (used in ORPO).

Uses odds ratios instead of log probability differences. Combines SFT and preference optimization in a single objective.

Robust = 3

Robust DPO loss with outlier handling.

Modified sigmoid loss that's more robust to noisy preference labels. Useful when preference data quality is uncertain.

Sigmoid = 0

Standard sigmoid loss used in DPO.

The original DPO loss: -log(sigmoid(beta * (log_p_chosen - log_p_rejected))). Works well in most cases and is the recommended default.

Simple = 6

Simple preference optimization loss (used in SimPO).

Reference-free loss based on length-normalized log probabilities. More memory efficient as it doesn't require a reference model.

Remarks

Different preference optimization methods use different loss formulations. Each has tradeoffs in terms of stability, sample efficiency, and robustness.

For Beginners: This controls how the model learns from preference data. Start with Sigmoid (standard DPO) and only switch if you have specific needs.