Class AdversarialRobustnessOptions<T>
Unified configuration options for adversarial robustness and AI safety.
public class AdversarialRobustnessOptions<T>
Type Parameters
TThe numeric data type used for calculations.
- Inheritance
-
AdversarialRobustnessOptions<T>
- Inherited Members
Remarks
This comprehensive options class consolidates all adversarial robustness settings including: - Safety filtering (input validation, output filtering, harmful content detection) - Adversarial attacks (for robustness testing and evaluation) - Adversarial defenses (training and preprocessing) - Certified robustness (provable guarantees) - Content moderation (for LLM applications)
For Beginners: This is your one-stop shop for all AI safety and robustness settings. You can configure how strictly inputs are validated, how the model is protected against attacks, and what guarantees you want about model predictions.
Properties
AdversarialTrainingRatio
Gets or sets the ratio of adversarial examples to include in training.
public double AdversarialTrainingRatio { get; set; }
Property Value
AttackEpsilon
Gets or sets the maximum perturbation budget (epsilon) for attacks.
public double AttackEpsilon { get; set; }
Property Value
AttackIterations
Gets or sets the number of iterations for iterative attacks.
public int AttackIterations { get; set; }
Property Value
AttackMethods
Gets or sets which attack methods to use for testing.
public string[] AttackMethods { get; set; }
Property Value
- string[]
AttackNormType
Gets or sets the norm type for perturbation constraints.
public string AttackNormType { get; set; }
Property Value
- string
"L-infinity", "L2", or "L1"
AttackStepSize
Gets or sets the step size for iterative attacks.
public double AttackStepSize { get; set; }
Property Value
BatchSize
Gets or sets the batch size for robustness operations.
public int BatchSize { get; set; }
Property Value
BlockPromptInjections
Gets or sets whether to detect and block prompt injection attacks.
public bool BlockPromptInjections { get; set; }
Property Value
CertificationConfidence
Gets or sets the confidence level for certification.
public double CertificationConfidence { get; set; }
Property Value
CertificationMethod
Gets or sets the certification method to use.
public string CertificationMethod { get; set; }
Property Value
- string
"RandomizedSmoothing", "IBP", or "CROWN"
CertificationNoiseSigma
Gets or sets the noise standard deviation for randomized smoothing.
public double CertificationNoiseSigma { get; set; }
Property Value
CertificationNormType
Gets or sets the norm type for certification.
public string CertificationNormType { get; set; }
Property Value
CertificationSamples
Gets or sets the number of samples for randomized smoothing.
public int CertificationSamples { get; set; }
Property Value
DefenseEpsilon
Gets or sets the perturbation budget for adversarial training.
public double DefenseEpsilon { get; set; }
Property Value
EnableAdversarialTraining
Gets or sets whether to enable adversarial training.
public bool EnableAdversarialTraining { get; set; }
Property Value
Remarks
For Beginners: Adversarial training makes the model more robust by training on both clean and adversarial examples.
EnableCertifiedRobustness
Gets or sets whether to enable certified robustness.
public bool EnableCertifiedRobustness { get; set; }
Property Value
Remarks
For Beginners: Certified robustness provides provable guarantees that the model's prediction won't change within a certain perturbation radius.
EnableContentModeration
Gets or sets whether to enable content moderation for LLM outputs.
public bool EnableContentModeration { get; set; }
Property Value
EnableFactualityChecking
Gets or sets whether to enable factuality checking.
public bool EnableFactualityChecking { get; set; }
Property Value
EnableInputValidation
Gets or sets whether to enable input validation.
public bool EnableInputValidation { get; set; }
Property Value
EnableOutputFiltering
Gets or sets whether to enable output filtering.
public bool EnableOutputFiltering { get; set; }
Property Value
EnableRedTeaming
Gets or sets whether to enable red teaming during evaluation.
public bool EnableRedTeaming { get; set; }
Property Value
EnableRobustnessTesting
Gets or sets whether to enable adversarial robustness testing.
public bool EnableRobustnessTesting { get; set; }
Property Value
Remarks
For Beginners: When enabled, the model is tested against adversarial attacks during evaluation to measure its robustness.
EnableSafetyFiltering
Gets or sets whether safety filtering is enabled.
public bool EnableSafetyFiltering { get; set; }
Property Value
Remarks
For Beginners: This is the master switch for safety filtering. When enabled, inputs and outputs are validated for safety.
EnsembleSize
Gets or sets the number of models in the ensemble.
public int EnsembleSize { get; set; }
Property Value
FilterPII
Gets or sets whether to filter PII (personally identifiable information).
public bool FilterPII { get; set; }
Property Value
HallucinationThreshold
Gets or sets the hallucination detection threshold.
public double HallucinationThreshold { get; set; }
Property Value
HarmfulContentCategories
Gets or sets the harmful content categories to check for.
public string[] HarmfulContentCategories { get; set; }
Property Value
- string[]
JailbreakSensitivity
Gets or sets the jailbreak detection sensitivity.
public double JailbreakSensitivity { get; set; }
Property Value
- double
The sensitivity (0-1), defaulting to 0.7.
LogFilteredContent
Gets or sets whether to log filtered content for review.
public bool LogFilteredContent { get; set; }
Property Value
MaxInputLength
Gets or sets the maximum input length to process.
public int MaxInputLength { get; set; }
Property Value
PIITypes
Gets or sets the types of PII to filter.
public string[] PIITypes { get; set; }
Property Value
- string[]
PreprocessingMethod
Gets or sets the preprocessing method to use.
public string PreprocessingMethod { get; set; }
Property Value
PromptInjectionSensitivity
Gets or sets the prompt injection detection sensitivity.
public double PromptInjectionSensitivity { get; set; }
Property Value
RandomSeed
Gets or sets the random seed for reproducibility.
public int? RandomSeed { get; set; }
Property Value
- int?
RedTeamingCategories
Gets or sets the red teaming categories to test.
public string[] RedTeamingCategories { get; set; }
Property Value
- string[]
RedTeamingPromptCount
Gets or sets the number of red teaming prompts to generate.
public int RedTeamingPromptCount { get; set; }
Property Value
SafetyLogFilePath
Gets or sets the file path for filtered content logging.
public string? SafetyLogFilePath { get; set; }
Property Value
SafetyThreshold
Gets or sets the safety threshold for content filtering.
public double SafetyThreshold { get; set; }
Property Value
- double
The threshold (0-1), defaulting to 0.8.
TargetClass
Gets or sets the target class for targeted attacks.
public int TargetClass { get; set; }
Property Value
TrainingAttackMethod
Gets or sets the attack method to use during adversarial training.
public string TrainingAttackMethod { get; set; }
Property Value
UseContentClassifier
Gets or sets whether to use a classifier for harmful content detection.
public bool UseContentClassifier { get; set; }
Property Value
UseEnsembleDefense
Gets or sets whether to use ensemble defenses.
public bool UseEnsembleDefense { get; set; }
Property Value
UseInputPreprocessing
Gets or sets whether to use input preprocessing for defense.
public bool UseInputPreprocessing { get; set; }
Property Value
UseRandomStartForAttacks
Gets or sets whether to use random initialization for attacks.
public bool UseRandomStartForAttacks { get; set; }
Property Value
UseTargetedAttacks
Gets or sets whether to use targeted attacks.
public bool UseTargetedAttacks { get; set; }
Property Value
UseTightCertificationBounds
Gets or sets whether to use tight bounds computation.
public bool UseTightCertificationBounds { get; set; }
Property Value
VerboseLogging
Gets or sets whether to enable verbose logging.
public bool VerboseLogging { get; set; }
Property Value
Methods
AdversarialTrainingFocus()
Creates options for adversarial training focus.
public static AdversarialRobustnessOptions<T> AdversarialTrainingFocus()
Returns
BasicSafety()
Creates options for basic safety filtering only.
public static AdversarialRobustnessOptions<T> BasicSafety()
Returns
ComprehensiveRobustness()
Creates options for comprehensive robustness with certified guarantees.
public static AdversarialRobustnessOptions<T> ComprehensiveRobustness()
Returns
LLMSafety()
Creates options for LLM safety with content moderation.
public static AdversarialRobustnessOptions<T> LLMSafety()