Class NaiveBayesOptions<T>
Configuration options for Naive Bayes classifiers.
public class NaiveBayesOptions<T> : ClassifierOptions<T>
Type Parameters
TThe data type used for calculations.
- Inheritance
-
NaiveBayesOptions<T>
- Inherited Members
Remarks
Naive Bayes classifiers are probabilistic classifiers based on Bayes' theorem with the "naive" assumption of conditional independence between features given the class label. Despite this simplifying assumption, Naive Bayes often performs well in practice.
For Beginners: Naive Bayes is one of the simplest and most effective classifiers.
How it works:
- During training, it learns the probability of each class and the probability of each feature value given each class
- During prediction, it uses Bayes' theorem to calculate the probability of each class given the observed features
- It returns the class with the highest probability
The "naive" assumption is that features are independent given the class. For example, in spam detection, the words "free" and "win" might both indicate spam, but the model assumes they contribute independently to that prediction.
Despite this unrealistic assumption, Naive Bayes often works surprisingly well!
Properties
Alpha
Gets or sets the smoothing parameter (alpha) for Laplace/additive smoothing.
public double Alpha { get; set; }
Property Value
- double
A positive smoothing value, defaulting to 1.0 (Laplace smoothing). Set to 0.0 for no smoothing (not recommended).
Remarks
Smoothing prevents zero probabilities when a feature value was not observed for a class in the training data. Laplace smoothing (alpha=1.0) is most common, but smaller values (like 0.1 or 0.01) may work better in some cases.
For Beginners: Smoothing solves the "zero probability" problem.
Imagine training a spam classifier on emails. If the word "congratulations" never appeared in your spam training examples, without smoothing the model would say ANY email containing "congratulations" has ZERO probability of being spam. That's too extreme!
Smoothing adds a small count to all feature values so nothing has zero probability:
- Alpha = 1.0: Laplace smoothing (default, works well in most cases)
- Alpha = 0.1-0.5: Lighter smoothing (can improve accuracy with enough data)
- Alpha = 0.0: No smoothing (risky - avoid unless you're sure all features appear)
For most applications, the default of 1.0 is a safe choice.
ClassPriors
Gets or sets custom class prior probabilities.
public double[]? ClassPriors { get; set; }
Property Value
- double[]
An array of prior probabilities for each class, or null to estimate from data. Values should sum to 1.0.
Remarks
Custom priors allow you to specify the prior probability for each class explicitly. This is useful when you have domain knowledge about the true class distribution that differs from the training data distribution.
For Beginners: This lets you tell the model how common each class is.
Example: You're building a fraud detector with training data that has equal fraud/legitimate. But in reality, only 1% of transactions are fraudulent.
You can set: ClassPriors = [0.99, 0.01]
This tells the model that even before looking at features:
- A transaction has 99% chance of being legitimate
- A transaction has 1% chance of being fraudulent
Leave this as null (default) to learn priors from your training data.
FitPriors
Gets or sets whether to fit class prior probabilities from the data.
public bool FitPriors { get; set; }
Property Value
- bool
True (default) to learn class priors from training data; false to use uniform priors (all classes equally likely a priori).
Remarks
Class priors represent the probability of each class before seeing any features. When FitPriors is true, these are estimated from the training data frequencies. When false, all classes are assumed equally likely, which can be useful when the training data is not representative of the true class distribution.
For Beginners: This controls how the model handles class imbalance.
Example: In a medical dataset, 95% of samples are healthy, 5% have a disease.
With FitPriors = true (default):
- The model learns that healthy is much more common
- It will be more likely to predict "healthy" by default
- This reflects the real-world probability
With FitPriors = false:
- The model treats healthy and diseased as equally likely a priori
- It focuses purely on the feature evidence
- May be better if you want to catch more disease cases
Use false when your training data doesn't reflect the true distribution, or when you want equal consideration for all classes regardless of frequency.
MinVariance
Gets or sets the minimum variance for Gaussian Naive Bayes.
public double MinVariance { get; set; }
Property Value
- double
A small positive value, defaulting to 1e-9.
Remarks
For Gaussian Naive Bayes, this sets a floor on the variance of each feature to prevent division by zero and numerical instability when a feature has very low variance (nearly constant values).
For Beginners: This prevents math problems when features barely change.
If a feature is almost constant (like height in cm for adult males might be 175 ± 0.001), dividing by such a tiny variance can cause calculation problems.
The minimum variance sets a floor to prevent these issues. The default of 1e-9 is usually fine; you shouldn't need to change this unless you're seeing numerical errors or warnings.