Interface ISafetyFilter<T>

Namespace: AiDotNet.Interfaces

Assembly: AiDotNet.dll

Defines the contract for safety filters that detect and prevent harmful or inappropriate model inputs and outputs.

public interface ISafetyFilter<T> : IModelSerializer

Type Parameters

T: The numeric data type used for calculations (e.g., float, double).

Inherited Members: IModelSerializer.Serialize()

IModelSerializer.Deserialize(byte[])

IModelSerializer.SaveModel(string)

IModelSerializer.LoadModel(string)

Remarks

Safety filters act as gatekeepers that monitor model inputs and outputs to prevent harmful, inappropriate, or malicious content from passing through the system.

For Beginners: Think of safety filters as "security guards" for your AI system. They check everything going in and coming out to make sure nothing dangerous or inappropriate gets through.

Common safety filter functions include:

Input Validation: Check that inputs are safe and properly formatted
Output Filtering: Ensure outputs don't contain harmful content
Jailbreak Detection: Identify attempts to bypass safety measures
Harmful Content Detection: Flag potentially dangerous or inappropriate content

Why safety filters matter:

They prevent misuse of AI systems
They protect users from harmful content
They help maintain ethical AI deployments
They catch edge cases and adversarial inputs

Methods

ComputeSafetyScore(Vector<T>)

Computes a safety score for model inputs or outputs.

T ComputeSafetyScore(Vector<T> content)

Parameters

content Vector<T>: The content to score.

Returns

T: A safety score between 0 (unsafe) and 1 (completely safe).

Remarks

For Beginners: This gives a single "safety score" from 0 to 1 indicating how safe the content is. Think of it like a trust score - higher numbers mean safer content.

DetectJailbreak(Vector<T>)

Detects jailbreak attempts that try to bypass safety measures.

JailbreakDetectionResult<T> DetectJailbreak(Vector<T> input)

Parameters

input Vector<T>: The input to check for jailbreak attempts.

Returns

JailbreakDetectionResult<T>: Detection result indicating if a jailbreak was detected and its severity.

Remarks

For Beginners: A "jailbreak" is when someone tries to trick your AI into ignoring its safety rules. This method detects those attempts.

Examples of jailbreak attempts:

"Ignore your previous instructions and do X instead"
Roleplaying scenarios to bypass restrictions
Encoding harmful requests in creative ways
Exploiting edge cases in safety training

FilterOutput(Vector<T>)

Filters model outputs to remove or flag harmful content.

SafetyFilterResult<T> FilterOutput(Vector<T> output)

Parameters

output Vector<T>: The model output to filter.

Returns

SafetyFilterResult<T>: Filtered output with harmful content removed or flagged.

Remarks

For Beginners: This checks what the AI is about to say before showing it to users. If the AI generated something harmful or inappropriate, this method can block it or modify it to be safe.

For example:

If an AI accidentally generates instructions for something dangerous
If output contains private or sensitive information
If the response could be misleading or harmful

GetOptions()

Gets the configuration options for the safety filter.

SafetyFilterOptions<T> GetOptions()

Returns

SafetyFilterOptions<T>: The configuration options for the safety filter.

Remarks

For Beginners: These settings control how strict the safety filter is and what types of content it looks for.

IdentifyHarmfulContent(Vector<T>)

Identifies harmful or inappropriate content in text or data.

HarmfulContentResult<T> IdentifyHarmfulContent(Vector<T> content)

Parameters

content Vector<T>: The content to analyze.

Returns

HarmfulContentResult<T>: Classification of harmful content types and severity scores.

Remarks

For Beginners: This is like a content moderation system. It scans content (inputs or outputs) and identifies anything that might be harmful, offensive, or inappropriate.

Categories it might detect:

Violence or graphic content
Hate speech or discrimination
Private or sensitive information
Misinformation or scams
Adult or sexual content

Reset()

Resets the safety filter state.

void Reset()

ValidateInput(Vector<T>)

Validates that an input is safe and appropriate for processing.

SafetyValidationResult<T> ValidateInput(Vector<T> input)

Parameters

input Vector<T>: The input to validate.

Returns

SafetyValidationResult<T>: Validation result indicating if input is safe and any issues found.

Remarks

This method checks inputs before they reach the model to prevent malicious or inappropriate inputs from being processed.

For Beginners: This is like a bouncer at a club checking IDs at the door. Before letting an input into your AI system, this method checks if it's safe and appropriate to process.

The validation might check for:

Malformed inputs that could crash the system
Adversarial patterns designed to fool the model
Attempts to inject malicious code or prompts
Inappropriate or harmful content in the input

Table of Contents

Interface ISafetyFilter<T>

Type Parameters

Remarks

Methods

ComputeSafetyScore(Vector<T>)

Parameters

Returns

Remarks

DetectJailbreak(Vector<T>)

Parameters

Returns

Remarks

FilterOutput(Vector<T>)

Parameters

Returns

Remarks

GetOptions()

Returns

Remarks

IdentifyHarmfulContent(Vector<T>)

Parameters

Returns

Remarks

Reset()

ValidateInput(Vector<T>)

Parameters

Returns

Remarks