Interface IVideoDiffusionModel<T>

Namespace: AiDotNet.Interfaces

Assembly: AiDotNet.dll

Interface for video diffusion models that generate temporal sequences.

public interface IVideoDiffusionModel<T> : IDiffusionModel<T>, IFullModel<T, Tensor<T>, Tensor<T>>, IModel<Tensor<T>, Tensor<T>, ModelMetadata<T>>, IModelSerializer, ICheckpointableModel, IParameterizable<T, Tensor<T>, Tensor<T>>, IFeatureAware, IFeatureImportance<T>, ICloneable<IFullModel<T, Tensor<T>, Tensor<T>>>, IGradientComputable<T, Tensor<T>, Tensor<T>>, IJitCompilable<T>

Type Parameters

T: The numeric type used for calculations.

Inherited Members: IDiffusionModel<T>.Scheduler

IDiffusionModel<T>.Generate(int[], int, int?)

IDiffusionModel<T>.PredictNoise(Tensor<T>, int)

IDiffusionModel<T>.ComputeLoss(Tensor<T>, Tensor<T>, int[])

IFullModel<T, Tensor<T>, Tensor<T>>.DefaultLossFunction

IModel<Tensor<T>, Tensor<T>, ModelMetadata<T>>.Train(Tensor<T>, Tensor<T>)

IModel<Tensor<T>, Tensor<T>, ModelMetadata<T>>.Predict(Tensor<T>)

IModel<Tensor<T>, Tensor<T>, ModelMetadata<T>>.GetModelMetadata()

IModelSerializer.Serialize()

IModelSerializer.Deserialize(byte[])

IModelSerializer.SaveModel(string)

IModelSerializer.LoadModel(string)

ICheckpointableModel.SaveState(Stream)

ICheckpointableModel.LoadState(Stream)

IParameterizable<T, Tensor<T>, Tensor<T>>.GetParameters()

IParameterizable<T, Tensor<T>, Tensor<T>>.SetParameters(Vector<T>)

IParameterizable<T, Tensor<T>, Tensor<T>>.ParameterCount

IParameterizable<T, Tensor<T>, Tensor<T>>.WithParameters(Vector<T>)

IFeatureAware.GetActiveFeatureIndices()

IFeatureAware.SetActiveFeatureIndices(IEnumerable<int>)

IFeatureAware.IsFeatureUsed(int)

IFeatureImportance<T>.GetFeatureImportance()

ICloneable<IFullModel<T, Tensor<T>, Tensor<T>>>.DeepCopy()

ICloneable<IFullModel<T, Tensor<T>, Tensor<T>>>.Clone()

IGradientComputable<T, Tensor<T>, Tensor<T>>.ComputeGradients(Tensor<T>, Tensor<T>, ILossFunction<T>)

IGradientComputable<T, Tensor<T>, Tensor<T>>.ApplyGradients(Vector<T>, T)

IJitCompilable<T>.ExportComputationGraph(List<ComputationNode<T>>)

IJitCompilable<T>.SupportsJitCompilation

Extension Methods: DistributedExtensions.AsDistributedForHighBandwidth<T, TInput, TOutput>(IFullModel<T, TInput, TOutput>, ICommunicationBackend<T>)

DistributedExtensions.AsDistributedForLowBandwidth<T, TInput, TOutput>(IFullModel<T, TInput, TOutput>, ICommunicationBackend<T>)

DistributedExtensions.AsDistributed<T, TInput, TOutput>(IFullModel<T, TInput, TOutput>, ICommunicationBackend<T>)

DistributedExtensions.AsDistributed<T, TInput, TOutput>(IFullModel<T, TInput, TOutput>, IShardingConfiguration<T>)

Remarks

Video diffusion models extend image diffusion to handle the temporal dimension, generating coherent video sequences. They model both spatial (within-frame) and temporal (across-frame) dependencies.

For Beginners: Video diffusion is like image diffusion, but it creates videos instead of single images. The main challenge is making the frames look consistent over time (no flickering or teleporting objects).

How video diffusion works:

The model generates multiple frames at once (typically 14-25 frames)
Special "temporal attention" ensures frames are consistent
The model can be conditioned on a starting image, text, or both

Common approaches:

Image-to-Video (SVD): Start from an image, generate motion
Text-to-Video (VideoCrafter): Generate video from text description
Video-to-Video: Transform existing video with new style/content

Key challenges solved by these models:

Temporal consistency (no flickering)
Motion coherence (objects move naturally)
Long-range dependencies (beginning and end are related)

This interface extends IDiffusionModel<T> with video-specific operations.

Properties

DefaultFPS

Gets the default frames per second for generated videos.

int DefaultFPS { get; }

Property Value

int

Remarks

Typical values: 7 FPS for SVD, 8 FPS for AnimateDiff. Lower FPS = smoother but slower apparent motion.

DefaultNumFrames

Gets the default number of frames generated.

int DefaultNumFrames { get; }

Property Value

int

Remarks

Typical values: 14, 16, 25 frames. Limited by GPU memory.

MotionBucketId

Gets the motion bucket ID for controlling motion intensity (SVD-specific).

int MotionBucketId { get; }

Property Value

int

Remarks

Controls amount of motion in generated video. Lower values = less motion, higher values = more motion. Range: 1-255, default: 127.

SupportsImageToVideo

Gets whether this model supports image-to-video generation.

bool SupportsImageToVideo { get; }

Property Value

bool

SupportsTextToVideo

Gets whether this model supports text-to-video generation.

bool SupportsTextToVideo { get; }

Property Value

bool

SupportsVideoToVideo

Gets whether this model supports video-to-video transformation.

bool SupportsVideoToVideo { get; }

Property Value

bool

Methods

ExtractFrame(Tensor<T>, int)

Extracts a frame from the video tensor.

Tensor<T> ExtractFrame(Tensor<T> video, int frameIndex)

Parameters

video Tensor<T>: The video tensor [batch, numFrames, channels, height, width].
frameIndex int: Index of the frame to extract.

Returns

Tensor<T>: The frame as an image tensor [batch, channels, height, width].

FramesToVideo(Tensor<T>[])

Concatenates frames into a video tensor.

Tensor<T> FramesToVideo(Tensor<T>[] frames)

Parameters

frames Tensor<T>[]: Array of frame tensors [batch, channels, height, width].

Returns

Tensor<T>: Video tensor [batch, numFrames, channels, height, width].

GenerateFromImage(Tensor<T>, int?, int?, int, int?, double, int?)

Generates a video from a conditioning image.

Tensor<T> GenerateFromImage(Tensor<T> inputImage, int? numFrames = null, int? fps = null, int numInferenceSteps = 25, int? motionBucketId = null, double noiseAugStrength = 0.02, int? seed = null)

Parameters

inputImage Tensor<T>: The conditioning image [batch, channels, height, width].
numFrames int?: Number of frames to generate.
fps int?: Target frames per second.
numInferenceSteps int: Number of denoising steps.
motionBucketId int?: Motion intensity (1-255).
noiseAugStrength double: Noise augmentation for input image.
seed int?: Optional random seed.

Returns

Tensor<T>: Generated video tensor [batch, numFrames, channels, height, width].

Remarks

For Beginners: This animates a still image: - Input: A single image (photo, artwork, etc.) - Output: A video where the scene comes to life

Tips:

motionBucketId controls how much movement happens
noiseAugStrength slightly varies the input to encourage motion
Higher inference steps = smoother motion but slower

GenerateFromText(string, string?, int, int, int?, int?, int, double, int?)

Generates a video from a text prompt.

Tensor<T> GenerateFromText(string prompt, string? negativePrompt = null, int width = 512, int height = 512, int? numFrames = null, int? fps = null, int numInferenceSteps = 50, double guidanceScale = 7.5, int? seed = null)

Parameters

prompt string: Text description of the video to generate.
negativePrompt string: What to avoid in the video.
width int: Video width in pixels.
height int: Video height in pixels.
numFrames int?: Number of frames to generate.
fps int?: Target frames per second.
numInferenceSteps int: Number of denoising steps.
guidanceScale double: Classifier-free guidance scale.
seed int?: Optional random seed.

Returns

Tensor<T>: Generated video tensor [batch, numFrames, channels, height, width].

Remarks

For Beginners: This creates a video from a description: - prompt: What you want ("a dog running on a beach") - The model generates both the visual content and the motion

InterpolateFrames(Tensor<T>, int, FrameInterpolationMethod)

Interpolates between frames to increase frame rate.

Tensor<T> InterpolateFrames(Tensor<T> video, int targetFPS, FrameInterpolationMethod interpolationMethod = FrameInterpolationMethod.Diffusion)

Parameters

video Tensor<T>: The input video [batch, numFrames, channels, height, width].
targetFPS int: Target frame rate.
interpolationMethod FrameInterpolationMethod: Method for frame interpolation.

Returns

Tensor<T>: Interpolated video with more frames.

Remarks

For Beginners: This makes videos smoother by adding in-between frames: - Input: 7 FPS video (a bit choppy) - Output: 30 FPS video (smooth playback) The AI figures out what the in-between frames should look like.

SetMotionBucketId(int)

Sets the motion intensity for generation.

void SetMotionBucketId(int bucketId)

Parameters

bucketId int: Motion bucket ID (1-255).

VideoToVideo(Tensor<T>, string, string?, double, int, double, int?)

Transforms an existing video.

Tensor<T> VideoToVideo(Tensor<T> inputVideo, string prompt, string? negativePrompt = null, double strength = 0.7, int numInferenceSteps = 50, double guidanceScale = 7.5, int? seed = null)

Parameters

inputVideo Tensor<T>: The input video [batch, numFrames, channels, height, width].
prompt string: Text prompt describing the transformation.
negativePrompt string: What to avoid.
strength double: Transformation strength (0.0-1.0).
numInferenceSteps int: Number of denoising steps.
guidanceScale double: Classifier-free guidance scale.
seed int?: Optional random seed.

Returns

Tensor<T>: Transformed video tensor.

Remarks

For Beginners: This changes an existing video's style or content: - strength=0.3: Minor style changes, motion preserved - strength=0.7: Major changes, but timing preserved - strength=1.0: Complete regeneration guided by original

Table of Contents

Interface IVideoDiffusionModel<T>

Type Parameters

Remarks

Properties

DefaultFPS

Property Value

Remarks

DefaultNumFrames

Property Value

Remarks

MotionBucketId

Property Value

Remarks

SupportsImageToVideo

Property Value

SupportsTextToVideo

Property Value

SupportsVideoToVideo

Property Value

Methods

ExtractFrame(Tensor<T>, int)

Parameters

Returns

FramesToVideo(Tensor<T>[])

Parameters

Returns

GenerateFromImage(Tensor<T>, int?, int?, int, int?, double, int?)

Parameters

Returns

Remarks

GenerateFromText(string, string?, int, int, int?, int?, int, double, int?)

Parameters

Returns

Remarks

InterpolateFrames(Tensor<T>, int, FrameInterpolationMethod)

Parameters

Returns

Remarks

SetMotionBucketId(int)

Parameters

VideoToVideo(Tensor<T>, string, string?, double, int, double, int?)

Parameters

Returns

Remarks