Interface ISafetyFilter<T>
- Namespace
- AiDotNet.Interfaces
- Assembly
- AiDotNet.dll
Defines the contract for safety filters that detect and prevent harmful or inappropriate model inputs and outputs.
public interface ISafetyFilter<T> : IModelSerializer
Type Parameters
TThe numeric data type used for calculations (e.g., float, double).
- Inherited Members
Remarks
Safety filters act as gatekeepers that monitor model inputs and outputs to prevent harmful, inappropriate, or malicious content from passing through the system.
For Beginners: Think of safety filters as "security guards" for your AI system. They check everything going in and coming out to make sure nothing dangerous or inappropriate gets through.
Common safety filter functions include:
- Input Validation: Check that inputs are safe and properly formatted
- Output Filtering: Ensure outputs don't contain harmful content
- Jailbreak Detection: Identify attempts to bypass safety measures
- Harmful Content Detection: Flag potentially dangerous or inappropriate content
Why safety filters matter:
- They prevent misuse of AI systems
- They protect users from harmful content
- They help maintain ethical AI deployments
- They catch edge cases and adversarial inputs
Methods
ComputeSafetyScore(Vector<T>)
Computes a safety score for model inputs or outputs.
T ComputeSafetyScore(Vector<T> content)
Parameters
contentVector<T>The content to score.
Returns
- T
A safety score between 0 (unsafe) and 1 (completely safe).
Remarks
For Beginners: This gives a single "safety score" from 0 to 1 indicating how safe the content is. Think of it like a trust score - higher numbers mean safer content.
DetectJailbreak(Vector<T>)
Detects jailbreak attempts that try to bypass safety measures.
JailbreakDetectionResult<T> DetectJailbreak(Vector<T> input)
Parameters
inputVector<T>The input to check for jailbreak attempts.
Returns
- JailbreakDetectionResult<T>
Detection result indicating if a jailbreak was detected and its severity.
Remarks
For Beginners: A "jailbreak" is when someone tries to trick your AI into ignoring its safety rules. This method detects those attempts.
Examples of jailbreak attempts:
- "Ignore your previous instructions and do X instead"
- Roleplaying scenarios to bypass restrictions
- Encoding harmful requests in creative ways
- Exploiting edge cases in safety training
FilterOutput(Vector<T>)
Filters model outputs to remove or flag harmful content.
SafetyFilterResult<T> FilterOutput(Vector<T> output)
Parameters
outputVector<T>The model output to filter.
Returns
- SafetyFilterResult<T>
Filtered output with harmful content removed or flagged.
Remarks
For Beginners: This checks what the AI is about to say before showing it to users. If the AI generated something harmful or inappropriate, this method can block it or modify it to be safe.
For example:
- If an AI accidentally generates instructions for something dangerous
- If output contains private or sensitive information
- If the response could be misleading or harmful
GetOptions()
Gets the configuration options for the safety filter.
SafetyFilterOptions<T> GetOptions()
Returns
- SafetyFilterOptions<T>
The configuration options for the safety filter.
Remarks
For Beginners: These settings control how strict the safety filter is and what types of content it looks for.
IdentifyHarmfulContent(Vector<T>)
Identifies harmful or inappropriate content in text or data.
HarmfulContentResult<T> IdentifyHarmfulContent(Vector<T> content)
Parameters
contentVector<T>The content to analyze.
Returns
- HarmfulContentResult<T>
Classification of harmful content types and severity scores.
Remarks
For Beginners: This is like a content moderation system. It scans content (inputs or outputs) and identifies anything that might be harmful, offensive, or inappropriate.
Categories it might detect:
- Violence or graphic content
- Hate speech or discrimination
- Private or sensitive information
- Misinformation or scams
- Adult or sexual content
Reset()
Resets the safety filter state.
void Reset()
ValidateInput(Vector<T>)
Validates that an input is safe and appropriate for processing.
SafetyValidationResult<T> ValidateInput(Vector<T> input)
Parameters
inputVector<T>The input to validate.
Returns
- SafetyValidationResult<T>
Validation result indicating if input is safe and any issues found.
Remarks
This method checks inputs before they reach the model to prevent malicious or inappropriate inputs from being processed.
For Beginners: This is like a bouncer at a club checking IDs at the door. Before letting an input into your AI system, this method checks if it's safe and appropriate to process.
The validation might check for:
- Malformed inputs that could crash the system
- Adversarial patterns designed to fool the model
- Attempts to inject malicious code or prompts
- Inappropriate or harmful content in the input