Interface ITrainingMonitor<T>
- Namespace
- AiDotNet.Interfaces
- Assembly
- AiDotNet.dll
Defines the contract for training monitoring systems that track and visualize model training progress.
public interface ITrainingMonitor<T>
Type Parameters
TThe numeric data type used for calculations (e.g., float, double).
Remarks
A training monitor provides real-time visibility into the training process, tracking metrics, system resources, and training state to help identify issues and optimize performance.
For Beginners: Think of a training monitor as a dashboard for your model training. Just like a car dashboard shows speed, fuel, and engine temperature, a training monitor shows:
- Training metrics (loss, accuracy)
- System resources (CPU, GPU, memory usage)
- Training speed (iterations per second)
- Progress and estimated time remaining
Why training monitoring matters:
- Catch problems early (model not learning, overfitting, resource issues)
- Understand training dynamics and patterns
- Optimize resource usage
- Track progress on long-running training jobs
- Enable remote monitoring of training
Methods
CheckForIssues(string)
Checks for potential training issues and returns warnings.
List<string> CheckForIssues(string sessionId)
Parameters
sessionIdstringThe ID of the monitoring session.
Returns
Remarks
For Beginners: This automatically detects common problems like:
- Training loss not decreasing
- Metrics showing NaN (not a number) values
- Very high or low learning rates
- Memory leaks
CreateVisualization(string, List<string>, string)
Creates a visualization of training metrics.
void CreateVisualization(string sessionId, List<string> metricNames, string outputPath)
Parameters
sessionIdstringThe ID of the monitoring session.
metricNamesList<string>Names of metrics to visualize.
outputPathstringPath to save the visualization.
EndSession(string)
Ends the current monitoring session.
void EndSession(string sessionId)
Parameters
sessionIdstringThe ID of the session to end.
ExportData(string, string, string)
Exports monitoring data to a file.
void ExportData(string sessionId, string filePath, string format = "json")
Parameters
sessionIdstringThe ID of the monitoring session.
filePathstringPath to save the export.
formatstringExport format (CSV, JSON, etc.).
GetCurrentMetrics(string)
Gets the current metrics for a session.
Dictionary<string, T> GetCurrentMetrics(string sessionId)
Parameters
sessionIdstringThe ID of the monitoring session.
Returns
- Dictionary<string, T>
Dictionary of current metric values.
GetMetricHistory(string, string)
Gets the history of a specific metric.
List<(int Step, T Value, DateTime Timestamp)> GetMetricHistory(string sessionId, string metricName)
Parameters
Returns
- List<(int Step, T Value, DateTime Timestamp)>
List of (step, value, timestamp) tuples for the metric.
GetResourceUsage(string)
Gets the current resource usage.
ResourceUsageStats GetResourceUsage(string sessionId)
Parameters
sessionIdstringThe ID of the monitoring session.
Returns
- ResourceUsageStats
Current resource usage statistics.
GetSpeedStats(string)
Gets statistics about training speed.
TrainingSpeedStats GetSpeedStats(string sessionId)
Parameters
sessionIdstringThe ID of the monitoring session.
Returns
- TrainingSpeedStats
Statistics including steps/second, estimated time remaining.
LogMessage(string, LogLevel, string)
Logs a text message or event during training.
void LogMessage(string sessionId, LogLevel level, string message)
Parameters
sessionIdstringThe ID of the monitoring session.
levelLogLevelSeverity level (Info, Warning, Error).
messagestringThe message to log.
Remarks
For Beginners: This lets you add notes or warnings during training, like "Started learning rate decay" or "Warning: High memory usage".
LogMetric(string, string, T, int, DateTime?)
Records a metric value for the current training step.
void LogMetric(string sessionId, string metricName, T value, int step, DateTime? timestamp = null)
Parameters
sessionIdstringThe ID of the monitoring session.
metricNamestringName of the metric (e.g., "train_loss", "val_accuracy").
valueTThe metric value.
stepintThe training step/iteration.
timestampDateTime?Optional timestamp for the metric.
Remarks
For Beginners: This logs a measurement from your training, like loss or accuracy. These values are tracked over time so you can see how training progresses.
LogMetrics(string, Dictionary<string, T>, int)
Records multiple metrics at once.
void LogMetrics(string sessionId, Dictionary<string, T> metrics, int step)
Parameters
sessionIdstringThe ID of the monitoring session.
metricsDictionary<string, T>Dictionary of metric names and values.
stepintThe training step/iteration.
LogResourceUsage(string, double, double, double?, double?)
Records system resource usage.
void LogResourceUsage(string sessionId, double cpuUsage, double memoryUsage, double? gpuUsage = null, double? gpuMemory = null)
Parameters
sessionIdstringThe ID of the monitoring session.
cpuUsagedoubleCPU usage percentage (0-100).
memoryUsagedoubleMemory usage in MB.
gpuUsagedouble?GPU usage percentage (0-100), if applicable.
gpuMemorydouble?GPU memory usage in MB, if applicable.
Remarks
For Beginners: This tracks how much of your computer's resources (CPU, memory, GPU) are being used during training.
OnEpochEnd(string, int, Dictionary<string, T>, TimeSpan)
Records the end of a training epoch with summary metrics.
void OnEpochEnd(string sessionId, int epochNumber, Dictionary<string, T> metrics, TimeSpan duration)
Parameters
sessionIdstringThe ID of the monitoring session.
epochNumberintThe epoch number ending.
metricsDictionary<string, T>Summary metrics for the epoch.
durationTimeSpanTime taken for the epoch.
OnEpochStart(string, int)
Records the start of a new training epoch.
void OnEpochStart(string sessionId, int epochNumber)
Parameters
StartSession(string, Dictionary<string, object>?)
Starts monitoring a training session.
string StartSession(string sessionName, Dictionary<string, object>? metadata = null)
Parameters
sessionNamestringName for this training session.
metadataDictionary<string, object>Optional metadata about the training.
Returns
- string
The unique identifier for the monitoring session.
UpdateProgress(string, int, int, int, int)
Updates the training progress information.
void UpdateProgress(string sessionId, int currentStep, int totalSteps, int currentEpoch, int totalEpochs)