Class SymbolicRegression<T>
- Namespace
- AiDotNet.Regression
- Assembly
- AiDotNet.dll
Implements symbolic regression, which discovers mathematical expressions that best describe the relationship between input features and target values. Unlike traditional regression methods, symbolic regression can discover both the form of the equation and its parameters.
public class SymbolicRegression<T> : NonLinearRegressionBase<T>, INonLinearRegression<T>, IRegression<T>, IFullModel<T, Matrix<T>, Vector<T>>, IModel<Matrix<T>, Vector<T>, ModelMetadata<T>>, IModelSerializer, ICheckpointableModel, IParameterizable<T, Matrix<T>, Vector<T>>, IFeatureAware, IFeatureImportance<T>, ICloneable<IFullModel<T, Matrix<T>, Vector<T>>>, IGradientComputable<T, Matrix<T>, Vector<T>>, IJitCompilable<T>
Type Parameters
TThe numeric type used for calculations (typically float or double).
- Inheritance
-
SymbolicRegression<T>
- Implements
-
IRegression<T>
- Inherited Members
- Extension Methods
Remarks
Symbolic regression works by: - Creating a population of mathematical expressions (typically as expression trees) - Evolving these expressions using genetic programming techniques - Evaluating expressions based on how well they fit the data - Selecting the best expressions to create new generations - Eventually converging on an optimal or near-optimal mathematical model
This approach can discover complex, nonlinear relationships without requiring the user to specify the form of the equation in advance.
For Beginners: Symbolic regression is like having an AI mathematician that invents formulas.
Think of it like this:
- Instead of you telling the computer what equation to use (like y = mx + b)
- The computer tries thousands of different formulas (like y = x², y = sin(x), etc.)
- It tests each formula to see how well it predicts your data
- It combines good formulas to make even better ones
- Eventually, it finds a formula that best explains your data
For example, when modeling how a plant grows, instead of assuming it follows a linear or exponential pattern, symbolic regression might discover it follows a pattern like "growth = sunlight² × water / (1 + temperature)".
Constructors
SymbolicRegression(SymbolicRegressionOptions?, IRegularization<T, Matrix<T>, Vector<T>>?, IFitnessCalculator<T, Matrix<T>, Vector<T>>?, INormalizer<T, Matrix<T>, Vector<T>>?, IFeatureSelector<T, Matrix<T>>?, IFitDetector<T, Matrix<T>, Vector<T>>?, IOutlierRemoval<T, Matrix<T>, Vector<T>>?, IDataPreprocessor<T, Matrix<T>, Vector<T>>?)
Creates a new symbolic regression model.
public SymbolicRegression(SymbolicRegressionOptions? options = null, IRegularization<T, Matrix<T>, Vector<T>>? regularization = null, IFitnessCalculator<T, Matrix<T>, Vector<T>>? fitnessCalculator = null, INormalizer<T, Matrix<T>, Vector<T>>? normalizer = null, IFeatureSelector<T, Matrix<T>>? featureSelector = null, IFitDetector<T, Matrix<T>, Vector<T>>? fitDetector = null, IOutlierRemoval<T, Matrix<T>, Vector<T>>? outlierRemoval = null, IDataPreprocessor<T, Matrix<T>, Vector<T>>? dataPreprocessor = null)
Parameters
optionsSymbolicRegressionOptionsOptional configuration settings for the symbolic regression model. These settings control aspects like:
- The population size and number of generations for the genetic algorithm
- The mutation and crossover rates for evolving expressions
- The complexity penalty to prefer simpler expressions If not provided, default options will be used.
regularizationIRegularization<T, Matrix<T>, Vector<T>>Optional regularization method to prevent overfitting. If not provided, no additional regularization will be applied.
fitnessCalculatorIFitnessCalculator<T, Matrix<T>, Vector<T>>Optional calculator for evaluating model fitness. If not provided, R-squared will be used as the fitness metric.
normalizerINormalizer<T, Matrix<T>, Vector<T>>Optional component for normalizing input and output data. If not provided, no normalization will be applied.
featureSelectorIFeatureSelector<T, Matrix<T>>Optional component for selecting relevant features. If not provided, all features will be used.
fitDetectorIFitDetector<T, Matrix<T>, Vector<T>>Optional component for detecting when a satisfactory model has been found. If not provided, the default fit detector will be used.
outlierRemovalIOutlierRemoval<T, Matrix<T>, Vector<T>>Optional component for identifying and removing outliers. If not provided, no outlier removal will be performed.
dataPreprocessorIDataPreprocessor<T, Matrix<T>, Vector<T>>Optional component for preprocessing data before model training. If not provided, a default preprocessor will be used with the specified normalizer, feature selector, and outlier removal components.
Remarks
This constructor creates a new symbolic regression model with the specified components. If components are not provided, default implementations are used. It initializes the genetic algorithm optimizer with the configured population size, generations, and rates.
For Beginners: This method sets up your AI formula discoverer.
Think of it like assembling a team of specialists:
- The options define the overall strategy
- The fitness calculator evaluates each formula
- The normalizer, feature selector, and outlier remover prepare your data
- The fit detector knows when to stop searching
- The optimizer manages the evolution of formulas
You can use the default team members (by not specifying them) or bring in your own specialists with different approaches to each task.
Properties
BestFitness
Gets the fitness score of the best model discovered during optimization.
public T BestFitness { get; }
Property Value
- T
BestModel
Gets the best symbolic model discovered during optimization.
public IFullModel<T, Matrix<T>, Vector<T>>? BestModel { get; }
Property Value
- IFullModel<T, Matrix<T>, Vector<T>>
Methods
CreateInstance()
Creates a new instance of the Symbolic Regression model with the same configuration.
protected override IFullModel<T, Matrix<T>, Vector<T>> CreateInstance()
Returns
- IFullModel<T, Matrix<T>, Vector<T>>
A new instance of the Symbolic Regression model.
Remarks
This method creates a deep copy of the current Symbolic Regression model, including its discovered formula, configuration options, fitness calculator, data preprocessors, and optimization components. The new instance is completely independent of the original, allowing modifications without affecting the original model.
For Beginners: This method creates an exact copy of the current symbolic regression model.
Think of it like making a perfect duplicate of your AI mathematician:
- It copies the winning formula that was discovered
- It maintains the same configuration settings (population size, mutation rates, etc.)
- It preserves all the specialty components (fitness calculator, normalizer, etc.)
- It remembers how good the best formula was (the fitness score)
This is useful when you want to:
- Create a backup before making changes
- Create variations of the same model for different purposes
- Share the model while keeping your original intact
Exceptions
- InvalidOperationException
Thrown when the creation fails or required components are null.
GetModelType()
Returns the type identifier for this regression model.
protected override ModelType GetModelType()
Returns
- ModelType
The model type identifier for symbolic regression.
Remarks
This method returns the enum value that identifies this model as a symbolic regression model. This is used for model identification in serialization/deserialization and for logging purposes.
For Beginners: This method simply tells the system what kind of model this is.
It's like a name tag for the model that says "I am a symbolic regression model." This is useful when:
- Saving the model to a file
- Loading a model from a file
- Logging information about the model
You generally won't need to call this method directly in your code.
OptimizeModel(Matrix<T>, Vector<T>)
Optimizes the symbolic regression model using the provided input data and target values.
protected override void OptimizeModel(Matrix<T> x, Vector<T> y)
Parameters
xMatrix<T>The input feature matrix, where rows represent observations and columns represent features.
yVector<T>The target values vector containing the actual output values to predict.
Remarks
This method implements the core optimization for symbolic regression. It: 1. Preprocesses the data (normalization, feature selection, outlier removal) 2. Splits the data into training, validation, and test sets 3. Uses the genetic algorithm optimizer to evolve symbolic models 4. Stores the best model found during the optimization process
For Beginners: This method finds the best formula to describe your data.
The process works like this:
- First, it cleans and prepares your data
- Then it divides your data into portions for different purposes:
- Training data: Used to create and improve formulas
- Validation data: Used to check formulas during development
- Test data: Used for a final check of the best formula
- Next, it runs the genetic algorithm to evolve better and better formulas
- Finally, it saves the best formula it found
This is where the magic happens - the AI explores thousands of possible mathematical relationships to find the one that best describes your data.
Predict(Matrix<T>)
Predicts target values for a matrix of input features.
public override Vector<T> Predict(Matrix<T> X)
Parameters
XMatrix<T>The input feature matrix for which to make predictions.
Returns
- Vector<T>
A vector of predicted values, one for each row in the input matrix.
Remarks
This method makes predictions for multiple input samples by: 1. Applying the best symbolic model to each row of the input matrix 2. Returning a vector of predicted values
For Beginners: This method uses your discovered formula to make predictions.
Once the system has found the best formula:
- This method takes new data points
- It plugs each data point into the formula
- It calculates and returns the predicted results
For example, if your formula determined plant growth based on sunlight and water, you could use this method to predict how much a plant would grow with specific amounts of sunlight and water.
PredictSingle(Vector<T>)
Predicts a target value for a single input feature vector.
protected override T PredictSingle(Vector<T> input)
Parameters
inputVector<T>The input feature vector for which to make a prediction.
Returns
- T
The predicted value for the input vector.
Remarks
This method implements prediction for a single input sample. It: 1. Applies regularization to the input vector 2. Evaluates the best symbolic model with the regularized input
For Beginners: This method predicts a value for a single data point.
Think of it like this:
- It first applies regularization to your input (which helps ensure stable predictions)
- It then plugs the values into your discovered formula
- It calculates and returns the result
This is useful when you want to make a prediction for just one specific case, rather than a whole batch of data.