Enum CheckpointMetricType
Standard metrics for checkpoint selection and early stopping.
public enum CheckpointMetricType
Fields
Accuracy = 1Accuracy metric (higher is better).
Percentage of correct predictions. Best for classification tasks with balanced classes.
BLEU = 4BLEU score (higher is better).
Bilingual Evaluation Understudy score. Standard metric for machine translation and text generation.
Custom = 8Custom metric defined by user.
Use with CustomMetricName property to specify the metric.
F1Score = 3F1 score (higher is better).
Harmonic mean of precision and recall. Good for imbalanced classification.
Loss = 0Training or validation loss (lower is better).
The most common metric for checkpoint selection. Works for all model types and training objectives.
Perplexity = 2Perplexity (lower is better).
Exponential of cross-entropy loss. Standard metric for language models.
ROUGE = 5ROUGE score (higher is better).
Recall-Oriented Understudy for Gisting Evaluation. Common for summarization tasks.
RewardScore = 6Reward model score (higher is better).
Score from a reward model during RLHF. Used for preference optimization evaluation.
WinRate = 7Win rate against reference (higher is better).
Percentage of times model output is preferred over reference. Used for evaluating alignment quality.
Remarks
These metrics determine which checkpoint is considered "best" during training.
For Beginners: Use Loss for most cases, or Accuracy for classification. The metric determines when to save checkpoints and when to stop early.