Table of Contents

Class OpenSora<T>

Namespace
AiDotNet.Video.Generation
Assembly
AiDotNet.dll

OpenSora - Open-source Sora-like video generation model.

public class OpenSora<T> : NeuralNetworkBase<T>, INeuralNetworkModel<T>, INeuralNetwork<T>, IFullModel<T, Tensor<T>, Tensor<T>>, IModel<Tensor<T>, Tensor<T>, ModelMetadata<T>>, IModelSerializer, ICheckpointableModel, IParameterizable<T, Tensor<T>, Tensor<T>>, IFeatureAware, IFeatureImportance<T>, ICloneable<IFullModel<T, Tensor<T>, Tensor<T>>>, IGradientComputable<T, Tensor<T>, Tensor<T>>, IJitCompilable<T>, IInterpretableModel<T>, IInputGradientComputable<T>, IDisposable

Type Parameters

T

The numeric type used for calculations.

Inheritance
OpenSora<T>
Implements
IFullModel<T, Tensor<T>, Tensor<T>>
IModel<Tensor<T>, Tensor<T>, ModelMetadata<T>>
IParameterizable<T, Tensor<T>, Tensor<T>>
ICloneable<IFullModel<T, Tensor<T>, Tensor<T>>>
IGradientComputable<T, Tensor<T>, Tensor<T>>
Inherited Members
Extension Methods

Remarks

For Beginners: OpenSora generates videos from text descriptions, similar to how image generation models like DALL-E or Stable Diffusion work but for videos.

Key capabilities:

  • Text-to-Video: Generate videos from text descriptions
  • Image-to-Video: Animate still images
  • Video continuation: Extend existing videos
  • Variable length: Generate videos of different durations
  • Multiple aspect ratios: Support various video dimensions

Example prompts:

  • "A cat playing with a ball in a sunny garden"
  • "Time-lapse of a flower blooming"
  • "A spaceship flying through an asteroid field"

Technical Details: - Spatiotemporal DiT (Diffusion Transformer) architecture - Variable resolution and duration support - Efficient 3D attention mechanisms - Progressive training strategy

Constructors

OpenSora(NeuralNetworkArchitecture<T>, int, int, int, int, double)

public OpenSora(NeuralNetworkArchitecture<T> architecture, int numFrames = 16, int hiddenDim = 1152, int numLayers = 28, int numInferenceSteps = 50, double guidanceScale = 7.5)

Parameters

architecture NeuralNetworkArchitecture<T>
numFrames int
hiddenDim int
numLayers int
numInferenceSteps int
guidanceScale double

Properties

SupportsTraining

Gets whether training is supported.

public override bool SupportsTraining { get; }

Property Value

bool

Methods

CreateNewInstance()

Creates a new instance of the same type as this neural network.

protected override IFullModel<T, Tensor<T>, Tensor<T>> CreateNewInstance()

Returns

IFullModel<T, Tensor<T>, Tensor<T>>

A new instance of the same neural network type.

Remarks

For Beginners: This creates a blank version of the same type of neural network.

It's used internally by methods like DeepCopy and Clone to create the right type of network before copying the data into it.

DeserializeNetworkSpecificData(BinaryReader)

protected override void DeserializeNetworkSpecificData(BinaryReader reader)

Parameters

reader BinaryReader

Remarks

Restores all model configuration fields and reinitializes layers to match the deserialized state. This ensures the model structure is properly reconstructed after loading from a serialized format.

ExtendVideo(List<Tensor<T>>, Tensor<T>?, int?)

Extends an existing video.

public List<Tensor<T>> ExtendVideo(List<Tensor<T>> existingFrames, Tensor<T>? textEmbedding = null, int? seed = null)

Parameters

existingFrames List<Tensor<T>>
textEmbedding Tensor<T>
seed int?

Returns

List<Tensor<T>>

GenerateCustom(Tensor<T>, int, int, int, int?)

Generates video with custom duration and aspect ratio.

public List<Tensor<T>> GenerateCustom(Tensor<T> textEmbedding, int numFrames, int height, int width, int? seed = null)

Parameters

textEmbedding Tensor<T>
numFrames int
height int
width int
seed int?

Returns

List<Tensor<T>>

GenerateFromImage(Tensor<T>, Tensor<T>?, int?)

Generates a video from an image (image-to-video).

public List<Tensor<T>> GenerateFromImage(Tensor<T> image, Tensor<T>? textEmbedding = null, int? seed = null)

Parameters

image Tensor<T>
textEmbedding Tensor<T>
seed int?

Returns

List<Tensor<T>>

GenerateFromText(Tensor<T>, int?)

Generates a video from a text prompt.

public List<Tensor<T>> GenerateFromText(Tensor<T> textEmbedding, int? seed = null)

Parameters

textEmbedding Tensor<T>

Text embedding from encoder [B, 768] or similar.

seed int?

Random seed for reproducibility.

Returns

List<Tensor<T>>

Generated video frames.

GetModelMetadata()

Gets the metadata for this neural network model.

public override ModelMetadata<T> GetModelMetadata()

Returns

ModelMetadata<T>

A ModelMetaData object containing information about the model.

InitializeLayers()

Initializes the layers of the neural network based on the architecture.

protected override void InitializeLayers()

Remarks

For Beginners: This method sets up all the layers in your neural network according to the architecture you've defined. It's like assembling the parts of your network before you can use it.

Predict(Tensor<T>)

Performs a single denoising prediction step on the input latents.

public override Tensor<T> Predict(Tensor<T> input)

Parameters

input Tensor<T>

Input latent tensor [B, C, H, W].

Returns

Tensor<T>

Predicted denoised output.

SerializeNetworkSpecificData(BinaryWriter)

Serializes network-specific data that is not covered by the general serialization process.

protected override void SerializeNetworkSpecificData(BinaryWriter writer)

Parameters

writer BinaryWriter

The BinaryWriter to write the data to.

Remarks

This method is called at the end of the general serialization process to allow derived classes to write any additional data specific to their implementation.

For Beginners: Think of this as packing a special compartment in your suitcase. While the main serialization method packs the common items (layers, parameters), this method allows each specific type of neural network to pack its own unique items that other networks might not have.

Train(Tensor<T>, Tensor<T>)

Trains the model using the diffusion training objective.

public override void Train(Tensor<T> input, Tensor<T> expectedOutput)

Parameters

input Tensor<T>

Clean input video latents.

expectedOutput Tensor<T>

Target (typically the same as input for diffusion training).

UpdateParameters(Vector<T>)

Updates the network's parameters with new values.

public override void UpdateParameters(Vector<T> parameters)

Parameters

parameters Vector<T>

The new parameter values to set.

Remarks

For Beginners: During training, a neural network's internal values (parameters) get adjusted to improve its performance. This method allows you to update all those values at once by providing a complete set of new parameters.

This is typically used by optimization algorithms that calculate better parameter values based on training data.