Table of Contents

Class AdversarialRobustnessOptions<T>

Namespace
AiDotNet.Models.Options
Assembly
AiDotNet.dll

Unified configuration options for adversarial robustness and AI safety.

public class AdversarialRobustnessOptions<T>

Type Parameters

T

The numeric data type used for calculations.

Inheritance
AdversarialRobustnessOptions<T>
Inherited Members

Remarks

This comprehensive options class consolidates all adversarial robustness settings including: - Safety filtering (input validation, output filtering, harmful content detection) - Adversarial attacks (for robustness testing and evaluation) - Adversarial defenses (training and preprocessing) - Certified robustness (provable guarantees) - Content moderation (for LLM applications)

For Beginners: This is your one-stop shop for all AI safety and robustness settings. You can configure how strictly inputs are validated, how the model is protected against attacks, and what guarantees you want about model predictions.

Properties

AdversarialTrainingRatio

Gets or sets the ratio of adversarial examples to include in training.

public double AdversarialTrainingRatio { get; set; }

Property Value

double

AttackEpsilon

Gets or sets the maximum perturbation budget (epsilon) for attacks.

public double AttackEpsilon { get; set; }

Property Value

double

AttackIterations

Gets or sets the number of iterations for iterative attacks.

public int AttackIterations { get; set; }

Property Value

int

AttackMethods

Gets or sets which attack methods to use for testing.

public string[] AttackMethods { get; set; }

Property Value

string[]

AttackNormType

Gets or sets the norm type for perturbation constraints.

public string AttackNormType { get; set; }

Property Value

string

"L-infinity", "L2", or "L1"

AttackStepSize

Gets or sets the step size for iterative attacks.

public double AttackStepSize { get; set; }

Property Value

double

BatchSize

Gets or sets the batch size for robustness operations.

public int BatchSize { get; set; }

Property Value

int

BlockPromptInjections

Gets or sets whether to detect and block prompt injection attacks.

public bool BlockPromptInjections { get; set; }

Property Value

bool

CertificationConfidence

Gets or sets the confidence level for certification.

public double CertificationConfidence { get; set; }

Property Value

double

CertificationMethod

Gets or sets the certification method to use.

public string CertificationMethod { get; set; }

Property Value

string

"RandomizedSmoothing", "IBP", or "CROWN"

CertificationNoiseSigma

Gets or sets the noise standard deviation for randomized smoothing.

public double CertificationNoiseSigma { get; set; }

Property Value

double

CertificationNormType

Gets or sets the norm type for certification.

public string CertificationNormType { get; set; }

Property Value

string

CertificationSamples

Gets or sets the number of samples for randomized smoothing.

public int CertificationSamples { get; set; }

Property Value

int

DefenseEpsilon

Gets or sets the perturbation budget for adversarial training.

public double DefenseEpsilon { get; set; }

Property Value

double

EnableAdversarialTraining

Gets or sets whether to enable adversarial training.

public bool EnableAdversarialTraining { get; set; }

Property Value

bool

Remarks

For Beginners: Adversarial training makes the model more robust by training on both clean and adversarial examples.

EnableCertifiedRobustness

Gets or sets whether to enable certified robustness.

public bool EnableCertifiedRobustness { get; set; }

Property Value

bool

Remarks

For Beginners: Certified robustness provides provable guarantees that the model's prediction won't change within a certain perturbation radius.

EnableContentModeration

Gets or sets whether to enable content moderation for LLM outputs.

public bool EnableContentModeration { get; set; }

Property Value

bool

EnableFactualityChecking

Gets or sets whether to enable factuality checking.

public bool EnableFactualityChecking { get; set; }

Property Value

bool

EnableInputValidation

Gets or sets whether to enable input validation.

public bool EnableInputValidation { get; set; }

Property Value

bool

EnableOutputFiltering

Gets or sets whether to enable output filtering.

public bool EnableOutputFiltering { get; set; }

Property Value

bool

EnableRedTeaming

Gets or sets whether to enable red teaming during evaluation.

public bool EnableRedTeaming { get; set; }

Property Value

bool

EnableRobustnessTesting

Gets or sets whether to enable adversarial robustness testing.

public bool EnableRobustnessTesting { get; set; }

Property Value

bool

Remarks

For Beginners: When enabled, the model is tested against adversarial attacks during evaluation to measure its robustness.

EnableSafetyFiltering

Gets or sets whether safety filtering is enabled.

public bool EnableSafetyFiltering { get; set; }

Property Value

bool

Remarks

For Beginners: This is the master switch for safety filtering. When enabled, inputs and outputs are validated for safety.

EnsembleSize

Gets or sets the number of models in the ensemble.

public int EnsembleSize { get; set; }

Property Value

int

FilterPII

Gets or sets whether to filter PII (personally identifiable information).

public bool FilterPII { get; set; }

Property Value

bool

HallucinationThreshold

Gets or sets the hallucination detection threshold.

public double HallucinationThreshold { get; set; }

Property Value

double

HarmfulContentCategories

Gets or sets the harmful content categories to check for.

public string[] HarmfulContentCategories { get; set; }

Property Value

string[]

JailbreakSensitivity

Gets or sets the jailbreak detection sensitivity.

public double JailbreakSensitivity { get; set; }

Property Value

double

The sensitivity (0-1), defaulting to 0.7.

LogFilteredContent

Gets or sets whether to log filtered content for review.

public bool LogFilteredContent { get; set; }

Property Value

bool

MaxInputLength

Gets or sets the maximum input length to process.

public int MaxInputLength { get; set; }

Property Value

int

PIITypes

Gets or sets the types of PII to filter.

public string[] PIITypes { get; set; }

Property Value

string[]

PreprocessingMethod

Gets or sets the preprocessing method to use.

public string PreprocessingMethod { get; set; }

Property Value

string

PromptInjectionSensitivity

Gets or sets the prompt injection detection sensitivity.

public double PromptInjectionSensitivity { get; set; }

Property Value

double

RandomSeed

Gets or sets the random seed for reproducibility.

public int? RandomSeed { get; set; }

Property Value

int?

RedTeamingCategories

Gets or sets the red teaming categories to test.

public string[] RedTeamingCategories { get; set; }

Property Value

string[]

RedTeamingPromptCount

Gets or sets the number of red teaming prompts to generate.

public int RedTeamingPromptCount { get; set; }

Property Value

int

SafetyLogFilePath

Gets or sets the file path for filtered content logging.

public string? SafetyLogFilePath { get; set; }

Property Value

string

SafetyThreshold

Gets or sets the safety threshold for content filtering.

public double SafetyThreshold { get; set; }

Property Value

double

The threshold (0-1), defaulting to 0.8.

TargetClass

Gets or sets the target class for targeted attacks.

public int TargetClass { get; set; }

Property Value

int

TrainingAttackMethod

Gets or sets the attack method to use during adversarial training.

public string TrainingAttackMethod { get; set; }

Property Value

string

UseContentClassifier

Gets or sets whether to use a classifier for harmful content detection.

public bool UseContentClassifier { get; set; }

Property Value

bool

UseEnsembleDefense

Gets or sets whether to use ensemble defenses.

public bool UseEnsembleDefense { get; set; }

Property Value

bool

UseInputPreprocessing

Gets or sets whether to use input preprocessing for defense.

public bool UseInputPreprocessing { get; set; }

Property Value

bool

UseRandomStartForAttacks

Gets or sets whether to use random initialization for attacks.

public bool UseRandomStartForAttacks { get; set; }

Property Value

bool

UseTargetedAttacks

Gets or sets whether to use targeted attacks.

public bool UseTargetedAttacks { get; set; }

Property Value

bool

UseTightCertificationBounds

Gets or sets whether to use tight bounds computation.

public bool UseTightCertificationBounds { get; set; }

Property Value

bool

VerboseLogging

Gets or sets whether to enable verbose logging.

public bool VerboseLogging { get; set; }

Property Value

bool

Methods

AdversarialTrainingFocus()

Creates options for adversarial training focus.

public static AdversarialRobustnessOptions<T> AdversarialTrainingFocus()

Returns

AdversarialRobustnessOptions<T>

BasicSafety()

Creates options for basic safety filtering only.

public static AdversarialRobustnessOptions<T> BasicSafety()

Returns

AdversarialRobustnessOptions<T>

ComprehensiveRobustness()

Creates options for comprehensive robustness with certified guarantees.

public static AdversarialRobustnessOptions<T> ComprehensiveRobustness()

Returns

AdversarialRobustnessOptions<T>

LLMSafety()

Creates options for LLM safety with content moderation.

public static AdversarialRobustnessOptions<T> LLMSafety()

Returns

AdversarialRobustnessOptions<T>