This guide demonstrates how to use the Mixture-of-Experts neural network model with AiDotNet’s PredictionModelBuilder.
Mixture-of-Experts (MoE) is a neural network architecture that employs multiple specialist networks (experts) with learned routing. It enables models with extremely high capacity while remaining computationally efficient by activating only a subset of parameters per input.
MoE follows the exact same pattern as other models in AiDotNet (like ARIMAModel, NBEATSModel, FeedForwardNeuralNetwork, etc.):
using AiDotNet;
using AiDotNet.LinearAlgebra;
using AiDotNet.Models;
using AiDotNet.Models.Options;
using AiDotNet.NeuralNetworks;
// 1. Create configuration options
var options = new MixtureOfExpertsOptions<float>
{
NumExperts = 8, // 8 specialist networks
TopK = 2, // Use top 2 experts per input
InputDim = 128, // Input dimension
OutputDim = 128, // Output dimension
HiddenExpansion = 4, // 4x hidden layer expansion
UseLoadBalancing = true, // Enable load balancing
LoadBalancingWeight = 0.01 // Load balancing loss weight
};
// 2. Create network architecture
var architecture = new NeuralNetworkArchitecture<float>(
inputType: InputType.OneDimensional,
taskType: NeuralNetworkTaskType.MultiClassClassification,
inputSize: 128,
outputSize: 10
);
// 3. Create the model (implements IFullModel automatically)
var model = new MixtureOfExpertsNeuralNetwork<float>(options, architecture);
// 4. Use with PredictionModelBuilder (same as always)
var builder = new PredictionModelBuilder<float, Tensor<float>, Tensor<float>>();
var result = builder
.ConfigureModel(model)
.Build(trainingData, trainingLabels);
// 5. Make predictions (same as always)
var predictions = builder.Predict(testData, result);
using AiDotNet;
using AiDotNet.LinearAlgebra;
using AiDotNet.Models;
using AiDotNet.Models.Options;
using AiDotNet.NeuralNetworks;
// Prepare your data
int numSamples = 1000;
int numFeatures = 784; // e.g., 28x28 images flattened
int numClasses = 10;
var trainingData = new Tensor<float>(new[] { numSamples, numFeatures });
var trainingLabels = new Tensor<float>(new[] { numSamples, numClasses });
// ... fill with actual data ...
// Configure MoE model
var options = new MixtureOfExpertsOptions<float>
{
NumExperts = 8,
TopK = 2,
InputDim = numFeatures,
OutputDim = numFeatures,
HiddenExpansion = 4,
UseLoadBalancing = true,
LoadBalancingWeight = 0.01
};
// Create architecture
var architecture = new NeuralNetworkArchitecture<float>(
inputType: InputType.OneDimensional,
taskType: NeuralNetworkTaskType.MultiClassClassification,
inputSize: numFeatures,
outputSize: numClasses
);
// Create model
var model = new MixtureOfExpertsNeuralNetwork<float>(options, architecture);
// Train with PredictionModelBuilder
var builder = new PredictionModelBuilder<float, Tensor<float>, Tensor<float>>();
var result = builder
.ConfigureModel(model)
.Build(trainingData, trainingLabels);
// Evaluate
Console.WriteLine($"Training Accuracy: {result.TrainingAccuracy:P2}");
Console.WriteLine($"Validation Accuracy: {result.ValidationAccuracy:P2}");
// Make predictions
var testData = new Tensor<float>(new[] { 100, numFeatures });
var predictions = builder.Predict(testData, result);
// Save model
builder.SaveModel(result, "moe_model.bin");
// Load and use later
var loadedModel = builder.LoadModel("moe_model.bin");
var newPredictions = builder.Predict(testData, loadedModel);
The MixtureOfExpertsOptions class provides all MoE-specific configuration:
Controls how many specialist networks the model contains.
options.NumExperts = 8; // Default: 4
Guidelines:
Determines how many experts process each input (sparse routing).
options.TopK = 2; // Default: 2
Guidelines:
TopK = 1: Only best expert - very fast, for 32+ expertsTopK = 2: Top 2 experts - good balance for 8-32 experts (recommended)TopK = 4: More experts per input - higher quality but slowerDimensions for expert networks.
options.InputDim = 128; // Default: 128
options.OutputDim = 128; // Default: 128
Guidelines:
Controls the hidden layer size within each expert (as a multiple of InputDim).
options.HiddenExpansion = 4; // Default: 4 (from Transformer research)
Guidelines:
Controls auxiliary loss to ensure balanced expert usage.
options.UseLoadBalancing = true; // Default: true
options.LoadBalancingWeight = 0.01; // Default: 0.01
Guidelines:
Controls reproducibility of initialization.
options.RandomSeed = 42; // Default: null (non-deterministic)
Set a specific value for reproducible results (useful for research and debugging).
MoE works for regression too - just change the task type:
// Configure MoE for regression
var options = new MixtureOfExpertsOptions<float>
{
NumExperts = 4,
TopK = 2,
InputDim = 10,
OutputDim = 10
};
var architecture = new NeuralNetworkArchitecture<float>(
inputType: InputType.OneDimensional,
taskType: NeuralNetworkTaskType.Regression, // Regression task
inputSize: 10,
outputSize: 1
);
var model = new MixtureOfExpertsNeuralNetwork<float>(options, architecture);
var result = builder.ConfigureModel(model).Build(trainingData, trainingTargets);
For more control, you can provide custom layers in the architecture:
using AiDotNet.NeuralNetworks.Layers;
using AiDotNet.ActivationFunctions;
// Create custom layers including MoE layer
var moeLayer = new MixtureOfExpertsBuilder<float>()
.WithExperts(8)
.WithDimensions(256, 256)
.WithTopK(2)
.WithLoadBalancing(true)
.Build();
var layers = new List<ILayer<float>>
{
new DenseLayer<float>(784, 256, new ReLUActivation<float>()),
moeLayer,
new DenseLayer<float>(256, 10, new SoftmaxActivation<float>())
};
// Pass custom layers to architecture
var architecture = new NeuralNetworkArchitecture<float>(
inputType: InputType.OneDimensional,
taskType: NeuralNetworkTaskType.MultiClassClassification,
inputSize: 784,
outputSize: 10,
layers: layers // Provide custom layers
);
// Create model with custom architecture
var model = new MixtureOfExpertsNeuralNetwork<float>(options, architecture);
Note: When providing custom layers, the options are only used for metadata. The actual MoE layer comes from your custom layers list.
MoE follows the exact same pattern as other models:
// ARIMA Model
var arimaOptions = new ARIMAOptions<float> { P = 2, D = 1, Q = 2 };
var arimaModel = new ARIMAModel<float>(arimaOptions);
var result = builder.ConfigureModel(arimaModel).Build(data, labels);
// MoE Model (same pattern!)
var moeOptions = new MixtureOfExpertsOptions<float> { NumExperts = 8, TopK = 2 };
var moeModel = new MixtureOfExpertsNeuralNetwork<float>(moeOptions, architecture);
var result = builder.ConfigureModel(moeModel).Build(data, labels);
// Feed-Forward Network
var ffnn = new FeedForwardNeuralNetwork<float>(architecture);
var result = builder.ConfigureModel(ffnn).Build(data, labels);
// MoE Network (same pattern!)
var moeModel = new MixtureOfExpertsNeuralNetwork<float>(options, architecture);
var result = builder.ConfigureModel(moeModel).Build(data, labels);
You can monitor how balanced expert usage is during training:
// After training, get diagnostics
var metadata = model.GetModelMetadata();
Console.WriteLine("\nModel Information:");
Console.WriteLine($"Number of Experts: {metadata.AdditionalInfo["NumExperts"]}");
Console.WriteLine($"TopK: {metadata.AdditionalInfo["TopK"]}");
Console.WriteLine($"Load Balancing Enabled: {metadata.AdditionalInfo["UseLoadBalancing"]}");
// For more detailed diagnostics, access the underlying MoE layer
if (model is MixtureOfExpertsNeuralNetwork<float> moeNet)
{
// Access layer-level diagnostics as needed
// (implementation depends on exposing layer diagnostics)
}
Start Simple: Begin with 4-8 experts and TopK=2, then scale up if needed
options.InputDim = yourInputSize;
options.UseLoadBalancing = true;
options.LoadBalancingWeight = 0.01; // Gentle but effective
Monitor Training: Check that training loss decreases and validation accuracy improves
MixtureOfExpertsOptions for all configurationvar options = new MixtureOfExpertsOptions<float>
{
NumExperts = 8,
TopK = 2,
InputDim = 784,
OutputDim = 784
};
var options = new MixtureOfExpertsOptions<float>
{
NumExperts = 16,
TopK = 2,
InputDim = 512,
OutputDim = 512
};
var options = new MixtureOfExpertsOptions<float>
{
NumExperts = 4,
TopK = 2,
InputDim = numFeatures,
OutputDim = numFeatures
};
Mixture-of-Experts in AiDotNet follows the standard model pattern:
MixtureOfExpertsOptions<T> configuration objectNeuralNetworkArchitecture<T> defining the taskMixtureOfExpertsNeuralNetwork<T> modelPredictionModelBuilder for training and inferenceThis is the same pattern as all other models in AiDotNet, making it easy to use and integrate into your workflows.