Table of Contents

Class MultimodalInput<T>

Namespace
AiDotNet.Interfaces
Assembly
AiDotNet.dll

Represents an input item for unified multimodal models.

public class MultimodalInput<T>

Type Parameters

T

The numeric type used for calculations.

Inheritance
MultimodalInput<T>
Inherited Members

Properties

Metadata

Optional metadata about the input.

public Dictionary<string, object>? Metadata { get; set; }

Property Value

Dictionary<string, object>

Modality

The modality type of this input.

public ModalityType Modality { get; set; }

Property Value

ModalityType

SequenceIndex

Temporal ordering for sequential inputs.

public int SequenceIndex { get; set; }

Property Value

int

TextContent

Optional text content (for text modality).

public string? TextContent { get; set; }

Property Value

string

Methods

FromAudio(Vector<T>, int, int)

Creates an audio input from waveform samples.

public static MultimodalInput<T> FromAudio(Vector<T> samples, int sampleRate, int sequenceIndex = 0)

Parameters

samples Vector<T>

Audio samples.

sampleRate int

Sample rate in Hz.

sequenceIndex int

Optional sequence ordering.

Returns

MultimodalInput<T>

FromImage(Vector<T>, int, int, int, int)

Creates an image input from pixel data.

public static MultimodalInput<T> FromImage(Vector<T> pixels, int channels, int height, int width, int sequenceIndex = 0)

Parameters

pixels Vector<T>

Pixel values.

channels int

Number of color channels.

height int

Image height in pixels.

width int

Image width in pixels.

sequenceIndex int

Optional sequence ordering.

Returns

MultimodalInput<T>

FromText(string, int)

Creates a text input for the multimodal model.

public static MultimodalInput<T> FromText(string text, int sequenceIndex = 0)

Parameters

text string

The text content.

sequenceIndex int

Optional sequence ordering.

Returns

MultimodalInput<T>

FromVideo(Vector<T>, int, int, int, int, double, int)

Creates a video input from frame data.

public static MultimodalInput<T> FromVideo(Vector<T> frames, int numFrames, int channels, int height, int width, double frameRate, int sequenceIndex = 0)

Parameters

frames Vector<T>

Frame pixel data.

numFrames int

Number of frames.

channels int

Number of color channels.

height int

Frame height in pixels.

width int

Frame width in pixels.

frameRate double

Frame rate in fps.

sequenceIndex int

Optional sequence ordering.

Returns

MultimodalInput<T>