Table of Contents

Interface IDiagnosticsProvider

Namespace
AiDotNet.Interfaces
Assembly
AiDotNet.dll

Interface for components that provide diagnostic information for monitoring and debugging.

public interface IDiagnosticsProvider

Remarks

This interface enables neural network components (layers, networks, loss functions, etc.) to provide detailed diagnostic information about their internal state and behavior. This is particularly useful for:

  • Monitoring training progress
  • Debugging model behavior
  • Performance analysis and optimization
  • Understanding model decisions (explainability)

For Beginners: Think of this as a "health report" interface for neural network components.

Just like you might want to check various health metrics for your body (heart rate, blood pressure, etc.), you want to monitor various metrics for your neural network components during training and inference.

Real-world analogy: Imagine you're driving a car. Your dashboard shows:

  • Speed (how fast you're going)
  • RPM (engine revolutions)
  • Fuel level (remaining energy)
  • Temperature (engine heat)

Similarly, a neural network layer might report:

  • Activation statistics (min, max, mean values)
  • Gradient flow (how well training signals propagate)
  • Resource utilization (memory usage, computation time)
  • Layer-specific metrics (attention weights, expert usage, etc.)

This information helps you understand:

  • Is my model training properly?
  • Are there any bottlenecks or issues?
  • Which parts of the model are most active?
  • Is the model behaving as expected?

Industry Best Practices:

  • Consistent Keys: Use standardized key names across similar components
  • Meaningful Values: Provide human-readable string representations
  • Hierarchical Organization: Use prefixes to group related metrics (e.g., "activation.mean", "activation.std")
  • Efficient Computation: Diagnostics should be cheap to compute or cached
  • Optional Depth: Consider providing basic and detailed diagnostic modes

Implementation Example:

public class DenseLayer<T> : LayerBase<T>, IDiagnosticsProvider<T>
{
    private Tensor<T>? _lastActivations;

    public Dictionary<string, string> GetDiagnostics()
    {
        var diagnostics = new Dictionary<string, string>();

        if (_lastActivations != null)
        {
            diagnostics["activation.mean"] = ComputeMean(_lastActivations).ToString();
            diagnostics["activation.std"] = ComputeStd(_lastActivations).ToString();
            diagnostics["activation.sparsity"] = ComputeSparsity(_lastActivations).ToString();
        }

        diagnostics["parameter.count"] = ParameterCount.ToString();
        diagnostics["layer.type"] = "Dense";

        return diagnostics;
    }
}

// In monitoring code:
foreach (var layer in network.Layers)
{
    if (layer is IDiagnosticsProvider<T> diagnosticLayer)
    {
        var metrics = diagnosticLayer.GetDiagnostics();
        LogMetrics(metrics);
    }
}

Methods

GetDiagnostics()

Gets diagnostic information about this component's state and behavior.

Dictionary<string, string> GetDiagnostics()

Returns

Dictionary<string, string>

A dictionary containing diagnostic metrics. Keys should be descriptive and use consistent naming conventions (e.g., "activation.mean", "gradient.norm"). Values should be human-readable string representations of the metrics.

Remarks

This method should return diagnostic information that is useful for understanding the component's current state. The specific metrics returned depend on the component type:

  • Layers: Activation statistics, gradient flow, sparsity, etc.
  • Networks: Aggregate metrics, layer-by-layer summaries
  • Loss Functions: Loss components, regularization terms
  • Optimizers: Learning rates, momentum values, update statistics

For Beginners: This method returns a report card with various metrics.

The returned dictionary is like a set of labeled measurements:

  • Keys: What you're measuring (e.g., "mean_activation", "sparsity")
  • Values: The measurement results as strings (e.g., "0.42", "85% sparse")

Example for a Dense layer:

{
    "activation.mean": "0.342",
    "activation.std": "0.156",
    "activation.min": "-0.82",
    "activation.max": "1.24",
    "activation.sparsity": "0.23",
    "gradient.norm": "0.042",
    "weights.norm": "15.6",
    "layer.output_size": "256"
}

You can use this information to:

  1. Detect training issues: If activations are all zero, something might be wrong
  2. Tune hyperparameters: If gradients are too large/small, adjust learning rate
  3. Monitor convergence: Track metrics over time to see if training is progressing
  4. Compare experiments: See how different configurations affect internal behavior

Common patterns:

// Log diagnostics periodically during training
if (epoch % 10 == 0)
{
    foreach (var layer in network.Layers)
    {
        if (layer is IDiagnosticsProvider<T> diag)
        {
            Console.WriteLine($"Layer {layer.Name}:");
            foreach (var (key, value) in diag.GetDiagnostics())
            {
                Console.WriteLine($"  {key}: {value}");
            }
        }
    }
}

Performance Note: Diagnostic computation should be efficient. If expensive calculations are needed, consider caching results or computing them only when diagnostics are requested. Diagnostics should not significantly impact training performance.