Enum CheckpointMetricType

Standard metrics for checkpoint selection and early stopping.

public enum CheckpointMetricType

Fields

Accuracy = 1

Accuracy metric (higher is better).

Percentage of correct predictions. Best for classification tasks with balanced classes.

BLEU = 4

BLEU score (higher is better).

Bilingual Evaluation Understudy score. Standard metric for machine translation and text generation.

Custom = 8

Custom metric defined by user.

Use with CustomMetricName property to specify the metric.

F1Score = 3

F1 score (higher is better).

Harmonic mean of precision and recall. Good for imbalanced classification.

Loss = 0

Training or validation loss (lower is better).

The most common metric for checkpoint selection. Works for all model types and training objectives.

Perplexity = 2

Perplexity (lower is better).

Exponential of cross-entropy loss. Standard metric for language models.

ROUGE = 5

ROUGE score (higher is better).

Recall-Oriented Understudy for Gisting Evaluation. Common for summarization tasks.

RewardScore = 6

Reward model score (higher is better).

Score from a reward model during RLHF. Used for preference optimization evaluation.

WinRate = 7

Win rate against reference (higher is better).

Percentage of times model output is preferred over reference. Used for evaluating alignment quality.

These metrics determine which checkpoint is considered "best" during training.

For Beginners: Use Loss for most cases, or Accuracy for classification. The metric determines when to save checkpoints and when to stop early.