Interface IMultimodalEmbedding<T>
- Namespace
- AiDotNet.Interfaces
- Assembly
- AiDotNet.dll
Interface for multimodal embedding models that can encode multiple modalities (text, images, audio) into a shared embedding space.
public interface IMultimodalEmbedding<T>
Type Parameters
TThe numeric type used for computations.
Remarks
Multimodal embedding models like CLIP (Contrastive Language-Image Pre-training) learn to project different types of data into the same vector space, enabling cross-modal similarity search and zero-shot classification.
For Beginners: Imagine you want to search for images using text queries. A multimodal model learns to convert both "a photo of a cat" and an actual cat image into similar vectors, allowing direct comparison between text and images.
Properties
EmbeddingDimension
Gets the dimensionality of the embedding space.
int EmbeddingDimension { get; }
Property Value
ImageSize
Gets the expected image size (square images: ImageSize x ImageSize pixels).
int ImageSize { get; }
Property Value
MaxSequenceLength
Gets the maximum sequence length for text input.
int MaxSequenceLength { get; }
Property Value
Methods
ComputeSimilarity(Vector<T>, Vector<T>)
Computes similarity between two embeddings.
T ComputeSimilarity(Vector<T> embedding1, Vector<T> embedding2)
Parameters
embedding1Vector<T>The first embedding.
embedding2Vector<T>The second embedding.
Returns
- T
Similarity score (cosine similarity for normalized embeddings).
EncodeImage(double[])
Encodes an image into an embedding vector.
Vector<T> EncodeImage(double[] imageData)
Parameters
imageDatadouble[]The preprocessed image data as a flattened array in CHW format.
Returns
- Vector<T>
A normalized embedding vector.
EncodeImageBatch(IEnumerable<double[]>)
Encodes multiple images into embedding vectors in a batch.
Matrix<T> EncodeImageBatch(IEnumerable<double[]> imageDataBatch)
Parameters
imageDataBatchIEnumerable<double[]>The preprocessed images as flattened arrays.
Returns
- Matrix<T>
A matrix where each row is an embedding for the corresponding image.
EncodeText(string)
Encodes text into an embedding vector.
Vector<T> EncodeText(string text)
Parameters
textstringThe text to encode.
Returns
- Vector<T>
A normalized embedding vector.
EncodeTextBatch(IEnumerable<string>)
Encodes multiple texts into embedding vectors in a batch.
Matrix<T> EncodeTextBatch(IEnumerable<string> texts)
Parameters
textsIEnumerable<string>The texts to encode.
Returns
- Matrix<T>
A matrix where each row is an embedding for the corresponding text.
ZeroShotClassify(double[], IEnumerable<string>)
Performs zero-shot classification of an image against text labels.
Dictionary<string, T> ZeroShotClassify(double[] imageData, IEnumerable<string> labels)
Parameters
imageDatadouble[]The preprocessed image data.
labelsIEnumerable<string>The candidate class labels.
Returns
- Dictionary<string, T>
A dictionary mapping each label to its probability score.