Class SafetyFilter<T>

Namespace: AiDotNet.AdversarialRobustness.Safety

Assembly: AiDotNet.dll

Implements comprehensive safety filtering for AI model inputs and outputs.

public class SafetyFilter<T> : ISafetyFilter<T>, IModelSerializer

Type Parameters

T: The numeric data type used for calculations.

Inheritance: object

SafetyFilter<T>

Implements: ISafetyFilter<T>

IModelSerializer

Inherited Members: object.Equals(object)

object.Equals(object, object)

object.GetHashCode()

object.GetType()

object.MemberwiseClone()

object.ReferenceEquals(object, object)

object.ToString()

Remarks

SafetyFilter provides multiple layers of protection including input validation, output filtering, jailbreak detection, and harmful content identification.

For Beginners: Think of SafetyFilter as a comprehensive security system for your AI. It checks everything going in and coming out, looking for anything suspicious, harmful, or inappropriate. It's like having security guards, content moderators, and safety inspectors all working together.

Constructors

SafetyFilter(SafetyFilterOptions<T>)

Initializes a new instance of the safety filter.

public SafetyFilter(SafetyFilterOptions<T> options)

Parameters

options SafetyFilterOptions<T>: The safety filter configuration options.

Methods

ComputeSafetyScore(Vector<T>)

Computes a safety score for model inputs or outputs.

public T ComputeSafetyScore(Vector<T> content)

Parameters

content Vector<T>: The content to score.

Returns

T: A safety score between 0 (unsafe) and 1 (completely safe).

Remarks

For Beginners: This gives a single "safety score" from 0 to 1 indicating how safe the content is. Think of it like a trust score - higher numbers mean safer content.

Deserialize(byte[])

Loads a previously serialized model from binary data.

public void Deserialize(byte[] data)

Parameters

data byte[]: The byte array containing the serialized model data.

Remarks

This method takes binary data created by the Serialize method and uses it to restore a model to its previous state.

For Beginners: This is like opening a saved file to continue your work.

When you call this method:

You provide the binary data (bytes) that was previously created by Serialize
The model rebuilds itself using this data
After deserializing, the model is exactly as it was when serialized
It's ready to make predictions without needing to be trained again

For example:

You download a pre-trained model file for detecting spam emails
You deserialize this file into your application
Immediately, your application can detect spam without any training
The model has all the knowledge that was built into it by its original creator

This is particularly useful when:

You want to use a model that took days to train
You need to deploy the same model across multiple devices
You're creating an application that non-technical users will use

Think of it like installing the brain of a trained expert directly into your application.

DetectJailbreak(Vector<T>)

Detects jailbreak attempts that try to bypass safety measures.

public JailbreakDetectionResult<T> DetectJailbreak(Vector<T> input)

Parameters

input Vector<T>: The input to check for jailbreak attempts.

Returns

JailbreakDetectionResult<T>: Detection result indicating if a jailbreak was detected and its severity.

Remarks

For Beginners: A "jailbreak" is when someone tries to trick your AI into ignoring its safety rules. This method detects those attempts.

Examples of jailbreak attempts:

"Ignore your previous instructions and do X instead"
Roleplaying scenarios to bypass restrictions
Encoding harmful requests in creative ways
Exploiting edge cases in safety training

FilterOutput(Vector<T>)

Filters model outputs to remove or flag harmful content.

public SafetyFilterResult<T> FilterOutput(Vector<T> output)

Parameters

output Vector<T>: The model output to filter.

Returns

SafetyFilterResult<T>: Filtered output with harmful content removed or flagged.

Remarks

For Beginners: This checks what the AI is about to say before showing it to users. If the AI generated something harmful or inappropriate, this method can block it or modify it to be safe.

For example:

If an AI accidentally generates instructions for something dangerous
If output contains private or sensitive information
If the response could be misleading or harmful

GetOptions()

Gets the configuration options for the safety filter.

public SafetyFilterOptions<T> GetOptions()

Returns

SafetyFilterOptions<T>: The configuration options for the safety filter.

Remarks

For Beginners: These settings control how strict the safety filter is and what types of content it looks for.

IdentifyHarmfulContent(Vector<T>)

Identifies harmful or inappropriate content in text or data.

public HarmfulContentResult<T> IdentifyHarmfulContent(Vector<T> content)

Parameters

content Vector<T>: The content to analyze.

Returns

HarmfulContentResult<T>: Classification of harmful content types and severity scores.

Remarks

For Beginners: This is like a content moderation system. It scans content (inputs or outputs) and identifies anything that might be harmful, offensive, or inappropriate.

Categories it might detect:

Violence or graphic content
Hate speech or discrimination
Private or sensitive information
Misinformation or scams
Adult or sexual content

LoadModel(string)

Loads the model from a file.

public void LoadModel(string filePath)

Parameters

filePath string: The path to the file containing the saved model.

Remarks

This method provides a convenient way to load a model directly from disk. It combines file I/O operations with deserialization.

For Beginners: This is like clicking "Open" in a document editor. Instead of manually reading from a file and then calling Deserialize(), this method does both steps for you.

Exceptions

FileNotFoundException: Thrown when the specified file does not exist.
IOException: Thrown when an I/O error occurs while reading from the file or when the file contains corrupted or invalid model data.

Reset()

Resets the safety filter state.

public void Reset()

SaveModel(string)

Saves the model to a file.

public void SaveModel(string filePath)

Parameters

filePath string: The path where the model should be saved.

Remarks

This method provides a convenient way to save the model directly to disk. It combines serialization with file I/O operations.

For Beginners: This is like clicking "Save As" in a document editor. Instead of manually calling Serialize() and then writing to a file, this method does both steps for you.

Exceptions

IOException: Thrown when an I/O error occurs while writing to the file.
UnauthorizedAccessException: Thrown when the caller does not have the required permission to write to the specified file path.

Serialize()

Converts the current state of a machine learning model into a binary format.

public byte[] Serialize()

Returns

byte[]: A byte array containing the serialized model data.

Remarks

This method captures all the essential information about a trained model and converts it into a sequence of bytes that can be stored or transmitted.

For Beginners: This is like exporting your work to a file.

When you call this method:

The model's current state (all its learned patterns and parameters) is captured
This information is converted into a compact binary format (bytes)
You can then save these bytes to a file, database, or send them over a network

For example:

After training a model to recognize cats vs. dogs in images
You can serialize the model to save all its learned knowledge
Later, you can use this saved data to recreate the model exactly as it was
The recreated model will make the same predictions as the original

Think of it like taking a snapshot of your model's brain at a specific moment in time.

ValidateInput(Vector<T>)

Validates that an input is safe and appropriate for processing.

public SafetyValidationResult<T> ValidateInput(Vector<T> input)

Parameters

input Vector<T>: The input to validate.

Returns

SafetyValidationResult<T>: Validation result indicating if input is safe and any issues found.

Remarks

This method checks inputs before they reach the model to prevent malicious or inappropriate inputs from being processed.

For Beginners: This is like a bouncer at a club checking IDs at the door. Before letting an input into your AI system, this method checks if it's safe and appropriate to process.

The validation might check for:

Malformed inputs that could crash the system
Adversarial patterns designed to fool the model
Attempts to inject malicious code or prompts
Inappropriate or harmful content in the input

Table of Contents

Class SafetyFilter<T>

Type Parameters

Remarks

Constructors

SafetyFilter(SafetyFilterOptions<T>)

Parameters

Methods

ComputeSafetyScore(Vector<T>)

Parameters

Returns

Remarks

Deserialize(byte[])

Parameters

Remarks

DetectJailbreak(Vector<T>)

Parameters

Returns

Remarks

FilterOutput(Vector<T>)

Parameters

Returns

Remarks

GetOptions()

Returns

Remarks

IdentifyHarmfulContent(Vector<T>)

Parameters

Returns

Remarks

LoadModel(string)

Parameters

Remarks

Exceptions

Reset()

SaveModel(string)

Parameters

Remarks

Exceptions

Serialize()

Returns

Remarks

ValidateInput(Vector<T>)

Parameters

Returns

Remarks