Table of Contents

Interface ITrainingMonitor<T>

Namespace
AiDotNet.Interfaces
Assembly
AiDotNet.dll

Defines the contract for training monitoring systems that track and visualize model training progress.

public interface ITrainingMonitor<T>

Type Parameters

T

The numeric data type used for calculations (e.g., float, double).

Remarks

A training monitor provides real-time visibility into the training process, tracking metrics, system resources, and training state to help identify issues and optimize performance.

For Beginners: Think of a training monitor as a dashboard for your model training. Just like a car dashboard shows speed, fuel, and engine temperature, a training monitor shows:

  • Training metrics (loss, accuracy)
  • System resources (CPU, GPU, memory usage)
  • Training speed (iterations per second)
  • Progress and estimated time remaining

Why training monitoring matters:

  • Catch problems early (model not learning, overfitting, resource issues)
  • Understand training dynamics and patterns
  • Optimize resource usage
  • Track progress on long-running training jobs
  • Enable remote monitoring of training

Methods

CheckForIssues(string)

Checks for potential training issues and returns warnings.

List<string> CheckForIssues(string sessionId)

Parameters

sessionId string

The ID of the monitoring session.

Returns

List<string>

List of detected issues and warnings.

Remarks

For Beginners: This automatically detects common problems like:

  • Training loss not decreasing
  • Metrics showing NaN (not a number) values
  • Very high or low learning rates
  • Memory leaks

CreateVisualization(string, List<string>, string)

Creates a visualization of training metrics.

void CreateVisualization(string sessionId, List<string> metricNames, string outputPath)

Parameters

sessionId string

The ID of the monitoring session.

metricNames List<string>

Names of metrics to visualize.

outputPath string

Path to save the visualization.

EndSession(string)

Ends the current monitoring session.

void EndSession(string sessionId)

Parameters

sessionId string

The ID of the session to end.

ExportData(string, string, string)

Exports monitoring data to a file.

void ExportData(string sessionId, string filePath, string format = "json")

Parameters

sessionId string

The ID of the monitoring session.

filePath string

Path to save the export.

format string

Export format (CSV, JSON, etc.).

GetCurrentMetrics(string)

Gets the current metrics for a session.

Dictionary<string, T> GetCurrentMetrics(string sessionId)

Parameters

sessionId string

The ID of the monitoring session.

Returns

Dictionary<string, T>

Dictionary of current metric values.

GetMetricHistory(string, string)

Gets the history of a specific metric.

List<(int Step, T Value, DateTime Timestamp)> GetMetricHistory(string sessionId, string metricName)

Parameters

sessionId string

The ID of the monitoring session.

metricName string

Name of the metric.

Returns

List<(int Step, T Value, DateTime Timestamp)>

List of (step, value, timestamp) tuples for the metric.

GetResourceUsage(string)

Gets the current resource usage.

ResourceUsageStats GetResourceUsage(string sessionId)

Parameters

sessionId string

The ID of the monitoring session.

Returns

ResourceUsageStats

Current resource usage statistics.

GetSpeedStats(string)

Gets statistics about training speed.

TrainingSpeedStats GetSpeedStats(string sessionId)

Parameters

sessionId string

The ID of the monitoring session.

Returns

TrainingSpeedStats

Statistics including steps/second, estimated time remaining.

LogMessage(string, LogLevel, string)

Logs a text message or event during training.

void LogMessage(string sessionId, LogLevel level, string message)

Parameters

sessionId string

The ID of the monitoring session.

level LogLevel

Severity level (Info, Warning, Error).

message string

The message to log.

Remarks

For Beginners: This lets you add notes or warnings during training, like "Started learning rate decay" or "Warning: High memory usage".

LogMetric(string, string, T, int, DateTime?)

Records a metric value for the current training step.

void LogMetric(string sessionId, string metricName, T value, int step, DateTime? timestamp = null)

Parameters

sessionId string

The ID of the monitoring session.

metricName string

Name of the metric (e.g., "train_loss", "val_accuracy").

value T

The metric value.

step int

The training step/iteration.

timestamp DateTime?

Optional timestamp for the metric.

Remarks

For Beginners: This logs a measurement from your training, like loss or accuracy. These values are tracked over time so you can see how training progresses.

LogMetrics(string, Dictionary<string, T>, int)

Records multiple metrics at once.

void LogMetrics(string sessionId, Dictionary<string, T> metrics, int step)

Parameters

sessionId string

The ID of the monitoring session.

metrics Dictionary<string, T>

Dictionary of metric names and values.

step int

The training step/iteration.

LogResourceUsage(string, double, double, double?, double?)

Records system resource usage.

void LogResourceUsage(string sessionId, double cpuUsage, double memoryUsage, double? gpuUsage = null, double? gpuMemory = null)

Parameters

sessionId string

The ID of the monitoring session.

cpuUsage double

CPU usage percentage (0-100).

memoryUsage double

Memory usage in MB.

gpuUsage double?

GPU usage percentage (0-100), if applicable.

gpuMemory double?

GPU memory usage in MB, if applicable.

Remarks

For Beginners: This tracks how much of your computer's resources (CPU, memory, GPU) are being used during training.

OnEpochEnd(string, int, Dictionary<string, T>, TimeSpan)

Records the end of a training epoch with summary metrics.

void OnEpochEnd(string sessionId, int epochNumber, Dictionary<string, T> metrics, TimeSpan duration)

Parameters

sessionId string

The ID of the monitoring session.

epochNumber int

The epoch number ending.

metrics Dictionary<string, T>

Summary metrics for the epoch.

duration TimeSpan

Time taken for the epoch.

OnEpochStart(string, int)

Records the start of a new training epoch.

void OnEpochStart(string sessionId, int epochNumber)

Parameters

sessionId string

The ID of the monitoring session.

epochNumber int

The epoch number starting.

StartSession(string, Dictionary<string, object>?)

Starts monitoring a training session.

string StartSession(string sessionName, Dictionary<string, object>? metadata = null)

Parameters

sessionName string

Name for this training session.

metadata Dictionary<string, object>

Optional metadata about the training.

Returns

string

The unique identifier for the monitoring session.

UpdateProgress(string, int, int, int, int)

Updates the training progress information.

void UpdateProgress(string sessionId, int currentStep, int totalSteps, int currentEpoch, int totalEpochs)

Parameters

sessionId string

The ID of the monitoring session.

currentStep int

Current training step.

totalSteps int

Total number of steps planned.

currentEpoch int

Current epoch number.

totalEpochs int

Total number of epochs planned.