Class AnimateDiffModel<T>

Namespace: AiDotNet.Diffusion.Models

Assembly: AiDotNet.dll

AnimateDiff model for text-to-video and image-to-video generation.

public class AnimateDiffModel<T> : VideoDiffusionModelBase<T>, ILatentDiffusionModel<T>, IVideoDiffusionModel<T>, IDiffusionModel<T>, IFullModel<T, Tensor<T>, Tensor<T>>, IModel<Tensor<T>, Tensor<T>, ModelMetadata<T>>, IModelSerializer, ICheckpointableModel, IParameterizable<T, Tensor<T>, Tensor<T>>, IFeatureAware, IFeatureImportance<T>, ICloneable<IFullModel<T, Tensor<T>, Tensor<T>>>, IGradientComputable<T, Tensor<T>, Tensor<T>>, IJitCompilable<T>

Type Parameters

T: The numeric type used for calculations.

Inheritance: object

DiffusionModelBase<T>

LatentDiffusionModelBase<T>

VideoDiffusionModelBase<T>

AnimateDiffModel<T>

Implements: ILatentDiffusionModel<T>

IVideoDiffusionModel<T>

IDiffusionModel<T>

IFullModel<T, Tensor<T>, Tensor<T>>

IModel<Tensor<T>, Tensor<T>, ModelMetadata<T>>

IModelSerializer

ICheckpointableModel

IParameterizable<T, Tensor<T>, Tensor<T>>

IFeatureAware

IFeatureImportance<T>

ICloneable<IFullModel<T, Tensor<T>, Tensor<T>>>

IGradientComputable<T, Tensor<T>, Tensor<T>>

IJitCompilable<T>

Inherited Members: VideoDiffusionModelBase<T>.DefaultNumFrames

VideoDiffusionModelBase<T>.DefaultFPS

VideoDiffusionModelBase<T>.MotionBucketId

VideoDiffusionModelBase<T>.TemporalVAE

VideoDiffusionModelBase<T>.NoiseAugStrength

VideoDiffusionModelBase<T>.VideoToVideo(Tensor<T>, string, string, double, int, double, int?)

VideoDiffusionModelBase<T>.InterpolateFrames(Tensor<T>, int, FrameInterpolationMethod)

VideoDiffusionModelBase<T>.SetMotionBucketId(int)

VideoDiffusionModelBase<T>.ExtractFrame(Tensor<T>, int)

VideoDiffusionModelBase<T>.FramesToVideo(Tensor<T>[])

VideoDiffusionModelBase<T>.EncodeConditioningImage(Tensor<T>, double, int?)

VideoDiffusionModelBase<T>.CreateMotionEmbedding(int, int)

VideoDiffusionModelBase<T>.PredictVideoNoiseWithText(Tensor<T>, int, Tensor<T>)

VideoDiffusionModelBase<T>.EncodeVideoToLatent(Tensor<T>)

VideoDiffusionModelBase<T>.AddNoiseToVideoLatents(Tensor<T>, int, Random)

VideoDiffusionModelBase<T>.SchedulerStepVideo(Tensor<T>, Tensor<T>, int)

VideoDiffusionModelBase<T>.ApplyGuidanceVideo(Tensor<T>, Tensor<T>, double)

VideoDiffusionModelBase<T>.ExtractFrameLatent(Tensor<T>, int)

VideoDiffusionModelBase<T>.InsertFrameLatent(Tensor<T>, Tensor<T>, int)

VideoDiffusionModelBase<T>.InterpolateFramesDiffusion(Tensor<T>, int)

VideoDiffusionModelBase<T>.InterpolateFramesOpticalFlow(Tensor<T>, int)

VideoDiffusionModelBase<T>.InterpolateFramesLinear(Tensor<T>, int)

VideoDiffusionModelBase<T>.InterpolateFramesBlend(Tensor<T>, int)

VideoDiffusionModelBase<T>.LinearBlend(Tensor<T>, Tensor<T>, double)

LatentDiffusionModelBase<T>.GuidanceScale

LatentDiffusionModelBase<T>.SupportsNegativePrompt

LatentDiffusionModelBase<T>.SupportsInpainting

LatentDiffusionModelBase<T>.EncodeToLatent(Tensor<T>, bool)

LatentDiffusionModelBase<T>.DecodeFromLatent(Tensor<T>)

LatentDiffusionModelBase<T>.GenerateFromText(string, string, int, int, int, double?, int?)

LatentDiffusionModelBase<T>.ImageToImage(Tensor<T>, string, string, double, int, double?, int?)

LatentDiffusionModelBase<T>.Inpaint(Tensor<T>, Tensor<T>, string, string, int, double?, int?)

LatentDiffusionModelBase<T>.SetGuidanceScale(double)

LatentDiffusionModelBase<T>.PredictNoise(Tensor<T>, int)

LatentDiffusionModelBase<T>.Generate(int[], int, int?)

LatentDiffusionModelBase<T>.ApplyGuidance(Tensor<T>, Tensor<T>, double)

LatentDiffusionModelBase<T>.SampleNoiseTensor(int[], Random)

LatentDiffusionModelBase<T>.ResizeMaskToLatent(Tensor<T>, int[])

LatentDiffusionModelBase<T>.BlendLatentsWithMask(Tensor<T>, Tensor<T>, Tensor<T>, int)

DiffusionModelBase<T>.NumOps

DiffusionModelBase<T>.RandomGenerator

DiffusionModelBase<T>.LossFunction

DiffusionModelBase<T>.LearningRate

DiffusionModelBase<T>.Scheduler

DiffusionModelBase<T>.DefaultLossFunction

DiffusionModelBase<T>.SupportsJitCompilation

DiffusionModelBase<T>.ComputeLoss(Tensor<T>, Tensor<T>, int[])

DiffusionModelBase<T>.Train(Tensor<T>, Tensor<T>)

DiffusionModelBase<T>.Predict(Tensor<T>)

DiffusionModelBase<T>.GetModelMetadata()

DiffusionModelBase<T>.WithParameters(Vector<T>)

DiffusionModelBase<T>.Serialize()

DiffusionModelBase<T>.Deserialize(byte[])

DiffusionModelBase<T>.SaveModel(string)

DiffusionModelBase<T>.LoadModel(string)

DiffusionModelBase<T>.SaveState(Stream)

DiffusionModelBase<T>.LoadState(Stream)

DiffusionModelBase<T>.GetActiveFeatureIndices()

DiffusionModelBase<T>.SetActiveFeatureIndices(IEnumerable<int>)

DiffusionModelBase<T>.IsFeatureUsed(int)

DiffusionModelBase<T>.GetFeatureImportance()

DiffusionModelBase<T>.ComputeGradients(Tensor<T>, Tensor<T>, ILossFunction<T>)

DiffusionModelBase<T>.ApplyGradients(Vector<T>, T)

DiffusionModelBase<T>.ExportComputationGraph(List<ComputationNode<T>>)

DiffusionModelBase<T>.SampleNoise(int, Random)

object.Equals(object)

object.Equals(object, object)

object.GetHashCode()

object.GetType()

object.MemberwiseClone()

object.ReferenceEquals(object, object)

object.ToString()

Extension Methods: DistributedExtensions.AsDistributedForHighBandwidth<T, TInput, TOutput>(IFullModel<T, TInput, TOutput>, ICommunicationBackend<T>)

DistributedExtensions.AsDistributedForLowBandwidth<T, TInput, TOutput>(IFullModel<T, TInput, TOutput>, ICommunicationBackend<T>)

DistributedExtensions.AsDistributed<T, TInput, TOutput>(IFullModel<T, TInput, TOutput>, ICommunicationBackend<T>)

DistributedExtensions.AsDistributed<T, TInput, TOutput>(IFullModel<T, TInput, TOutput>, IShardingConfiguration<T>)

Examples

// Create AnimateDiff with default motion modules
var animateDiff = new AnimateDiffModel<float>();

// Text-to-video generation
var video = animateDiff.GenerateFromText(
    prompt: "A beautiful sunset over the ocean, waves gently rolling",
    width: 512,
    height: 512,
    numFrames: 16,
    numInferenceSteps: 25);

// Image-to-video with text guidance
var inputImage = LoadImage("beach.jpg");
var animatedVideo = animateDiff.AnimateImage(
    inputImage,
    prompt: "gentle waves, moving clouds",
    numFrames: 16);

Remarks

AnimateDiff extends Stable Diffusion with motion modules that enable temporal consistency in video generation. Unlike SVD which is trained end-to-end for video, AnimateDiff adds motion modules to existing text-to-image models, making it highly flexible.

For Beginners: Think of AnimateDiff as "teaching an image generator to make videos."

How it works:

Start with a text-to-image model (like Stable Diffusion)
Add special "motion modules" between the layers
These modules learn how things move in videos
The original image quality is preserved while adding motion

Key advantages:

Works with any Stable Diffusion model/checkpoint
Can use existing LoRAs, ControlNets, etc.
Flexible: text-to-video, image-to-video, or both
Lower training requirements than full video models

Example use cases:

Generate a short animation from a text prompt
Animate a still image with natural motion
Create consistent character animations
Style transfer for videos using SD checkpoints

Architecture overview: - Base: Standard Stable Diffusion U-Net - Motion Modules: Temporal attention layers inserted after spatial attention - VAE: Standard SD VAE (per-frame encoding/decoding) - Optional: LoRA adapters for style customization

Supported modes:

Text-to-Video: Generate video from text prompt
Image-to-Video: Animate an input image with text guidance
Video-to-Video: Style transfer or modify existing video

Constructors

AnimateDiffModel()

Initializes a new instance of AnimateDiffModel with default parameters.

public AnimateDiffModel()

AnimateDiffModel(DiffusionModelOptions<T>?, INoiseScheduler<T>?, UNetNoisePredictor<T>?, StandardVAE<T>?, IConditioningModule<T>?, MotionModuleConfig?, int, int)

Initializes a new instance of AnimateDiffModel with custom parameters.

public AnimateDiffModel(DiffusionModelOptions<T>? options = null, INoiseScheduler<T>? scheduler = null, UNetNoisePredictor<T>? unet = null, StandardVAE<T>? vae = null, IConditioningModule<T>? conditioner = null, MotionModuleConfig? motionConfig = null, int defaultNumFrames = 16, int defaultFPS = 8)

Parameters

options DiffusionModelOptions<T>: Configuration options for the diffusion model.
scheduler INoiseScheduler<T>: Optional custom scheduler.
unet UNetNoisePredictor<T>: Optional custom U-Net noise predictor.
vae StandardVAE<T>: Optional custom VAE.
conditioner IConditioningModule<T>: Optional conditioning module for text guidance.
motionConfig MotionModuleConfig: Optional motion module configuration.
defaultNumFrames int: Default number of frames to generate.
defaultFPS int: Default frames per second.

Fields

DefaultHeight

Default AnimateDiff height (SD compatible).

public const int DefaultHeight = 512

Field Value

int

DefaultWidth

Default AnimateDiff width (SD compatible).

public const int DefaultWidth = 512

Field Value

int

Properties

Conditioner

Gets the conditioning module.

public override IConditioningModule<T>? Conditioner { get; }

Property Value

IConditioningModule<T>

ContextLength

Gets or sets the context length for temporal attention.

public int ContextLength { get; set; }

Property Value

int

Remarks

Controls how many frames are processed together in the motion modules. Larger values provide better temporal consistency but require more memory.

ContextOverlap

Gets or sets the context overlap for sliding window generation.

public int ContextOverlap { get; set; }

Property Value

int

Remarks

When generating more frames than ContextLength, this controls the overlap between windows to maintain smooth transitions.

LatentChannels

Gets the number of latent channels.

public override int LatentChannels { get; }

Property Value

int

MotionConfig

Gets the motion module configuration.

public MotionModuleConfig MotionConfig { get; }

Property Value

MotionModuleConfig

NoisePredictor

Gets the noise predictor.

public override INoisePredictor<T> NoisePredictor { get; }

Property Value

INoisePredictor<T>

ParameterCount

Gets the total parameter count.

public override int ParameterCount { get; }

Property Value

int

SupportsImageToVideo

Gets whether image-to-video is supported.

public override bool SupportsImageToVideo { get; }

Property Value

bool

Remarks

AnimateDiff supports animating still images when a conditioner is available.

SupportsTextToVideo

Gets whether text-to-video is supported.

public override bool SupportsTextToVideo { get; }

Property Value

bool

Remarks

AnimateDiff's primary mode is text-to-video.

SupportsVideoToVideo

Gets whether video-to-video is supported.

public override bool SupportsVideoToVideo { get; }

Property Value

bool

VAE

Gets the VAE for frame encoding/decoding.

public override IVAEModel<T> VAE { get; }

Property Value

IVAEModel<T>

Methods

Clone()

Clones this AnimateDiff model.

public override IDiffusionModel<T> Clone()

Returns

IDiffusionModel<T>

DecodeVideoLatents(Tensor<T>)

Decodes video latents to frames.

protected override Tensor<T> DecodeVideoLatents(Tensor<T> latents)

Parameters

latents Tensor<T>

Returns

Tensor<T>

DeepCopy()

Creates a deep copy.

public override IFullModel<T, Tensor<T>, Tensor<T>> DeepCopy()

Returns

IFullModel<T, Tensor<T>, Tensor<T>>

GenerateFromImage(Tensor<T>, int?, int?, int, int?, double, int?)

Generates video from an input image.

public override Tensor<T> GenerateFromImage(Tensor<T> inputImage, int? numFrames = null, int? fps = null, int numInferenceSteps = 25, int? motionBucketId = null, double noiseAugStrength = 0.02, int? seed = null)

Parameters

inputImage Tensor<T>
numFrames int?
fps int?
numInferenceSteps int
motionBucketId int?
noiseAugStrength double
seed int?

Returns

Tensor<T>

GenerateFromText(string, string?, int, int, int?, int?, int, double, int?)

Generates video from text using AnimateDiff.

public override Tensor<T> GenerateFromText(string prompt, string? negativePrompt = null, int width = 512, int height = 512, int? numFrames = null, int? fps = null, int numInferenceSteps = 50, double guidanceScale = 7.5, int? seed = null)

Parameters

prompt string: The text prompt describing the video.
negativePrompt string: Optional negative prompt.
width int: Video width.
height int: Video height.
numFrames int?: Number of frames to generate.
fps int?: Frames per second (for motion module).
numInferenceSteps int: Number of denoising steps.
guidanceScale double: Classifier-free guidance scale.
seed int?: Optional random seed.

Returns

Tensor<T>: Generated video tensor [batch, numFrames, channels, height, width].

GetParameters()

Gets all parameters.

public override Vector<T> GetParameters()

Returns

Vector<T>

PredictVideoNoise(Tensor<T>, int, Tensor<T>, Tensor<T>)

Predicts video noise for image-to-video generation.

protected override Tensor<T> PredictVideoNoise(Tensor<T> latents, int timestep, Tensor<T> imageEmbedding, Tensor<T> motionEmbedding)

Parameters

latents Tensor<T>
timestep int
imageEmbedding Tensor<T>
motionEmbedding Tensor<T>

Returns

Tensor<T>

SetParameters(Vector<T>)

Sets all parameters.

public override void SetParameters(Vector<T> parameters)

Parameters

parameters Vector<T>

Table of Contents

Class AnimateDiffModel<T>

Type Parameters

Examples

Remarks

Constructors

AnimateDiffModel()

AnimateDiffModel(DiffusionModelOptions<T>?, INoiseScheduler<T>?, UNetNoisePredictor<T>?, StandardVAE<T>?, IConditioningModule<T>?, MotionModuleConfig?, int, int)

Parameters

Fields

DefaultHeight

Field Value

DefaultWidth

Field Value

Properties

Conditioner

Property Value

ContextLength

Property Value

Remarks

ContextOverlap

Property Value

Remarks

LatentChannels

Property Value

MotionConfig

Property Value

NoisePredictor

Property Value

ParameterCount

Property Value

SupportsImageToVideo

Property Value

Remarks

SupportsTextToVideo

Property Value

Remarks

SupportsVideoToVideo

Property Value

VAE

Property Value

Methods

Clone()

Returns

DecodeVideoLatents(Tensor<T>)

Parameters

Returns

DeepCopy()

Returns

GenerateFromImage(Tensor<T>, int?, int?, int, int?, double, int?)

Parameters

Returns

GenerateFromText(string, string?, int, int, int?, int?, int, double, int?)

Parameters

Returns

GetParameters()

Returns

PredictVideoNoise(Tensor<T>, int, Tensor<T>, Tensor<T>)

Parameters

Returns

SetParameters(Vector<T>)

Parameters