Class OpenSora<T>
- Namespace
- AiDotNet.Video.Generation
- Assembly
- AiDotNet.dll
OpenSora - Open-source Sora-like video generation model.
public class OpenSora<T> : NeuralNetworkBase<T>, INeuralNetworkModel<T>, INeuralNetwork<T>, IFullModel<T, Tensor<T>, Tensor<T>>, IModel<Tensor<T>, Tensor<T>, ModelMetadata<T>>, IModelSerializer, ICheckpointableModel, IParameterizable<T, Tensor<T>, Tensor<T>>, IFeatureAware, IFeatureImportance<T>, ICloneable<IFullModel<T, Tensor<T>, Tensor<T>>>, IGradientComputable<T, Tensor<T>, Tensor<T>>, IJitCompilable<T>, IInterpretableModel<T>, IInputGradientComputable<T>, IDisposable
Type Parameters
TThe numeric type used for calculations.
- Inheritance
-
OpenSora<T>
- Implements
- Inherited Members
- Extension Methods
Remarks
For Beginners: OpenSora generates videos from text descriptions, similar to how image generation models like DALL-E or Stable Diffusion work but for videos.
Key capabilities:
- Text-to-Video: Generate videos from text descriptions
- Image-to-Video: Animate still images
- Video continuation: Extend existing videos
- Variable length: Generate videos of different durations
- Multiple aspect ratios: Support various video dimensions
Example prompts:
- "A cat playing with a ball in a sunny garden"
- "Time-lapse of a flower blooming"
- "A spaceship flying through an asteroid field"
Technical Details: - Spatiotemporal DiT (Diffusion Transformer) architecture - Variable resolution and duration support - Efficient 3D attention mechanisms - Progressive training strategy
Constructors
OpenSora(NeuralNetworkArchitecture<T>, int, int, int, int, double)
public OpenSora(NeuralNetworkArchitecture<T> architecture, int numFrames = 16, int hiddenDim = 1152, int numLayers = 28, int numInferenceSteps = 50, double guidanceScale = 7.5)
Parameters
architectureNeuralNetworkArchitecture<T>numFramesinthiddenDimintnumLayersintnumInferenceStepsintguidanceScaledouble
Properties
SupportsTraining
Gets whether training is supported.
public override bool SupportsTraining { get; }
Property Value
Methods
CreateNewInstance()
Creates a new instance of the same type as this neural network.
protected override IFullModel<T, Tensor<T>, Tensor<T>> CreateNewInstance()
Returns
- IFullModel<T, Tensor<T>, Tensor<T>>
A new instance of the same neural network type.
Remarks
For Beginners: This creates a blank version of the same type of neural network.
It's used internally by methods like DeepCopy and Clone to create the right type of network before copying the data into it.
DeserializeNetworkSpecificData(BinaryReader)
protected override void DeserializeNetworkSpecificData(BinaryReader reader)
Parameters
readerBinaryReader
Remarks
Restores all model configuration fields and reinitializes layers to match the deserialized state. This ensures the model structure is properly reconstructed after loading from a serialized format.
ExtendVideo(List<Tensor<T>>, Tensor<T>?, int?)
Extends an existing video.
public List<Tensor<T>> ExtendVideo(List<Tensor<T>> existingFrames, Tensor<T>? textEmbedding = null, int? seed = null)
Parameters
Returns
- List<Tensor<T>>
GenerateCustom(Tensor<T>, int, int, int, int?)
Generates video with custom duration and aspect ratio.
public List<Tensor<T>> GenerateCustom(Tensor<T> textEmbedding, int numFrames, int height, int width, int? seed = null)
Parameters
Returns
- List<Tensor<T>>
GenerateFromImage(Tensor<T>, Tensor<T>?, int?)
Generates a video from an image (image-to-video).
public List<Tensor<T>> GenerateFromImage(Tensor<T> image, Tensor<T>? textEmbedding = null, int? seed = null)
Parameters
imageTensor<T>textEmbeddingTensor<T>seedint?
Returns
- List<Tensor<T>>
GenerateFromText(Tensor<T>, int?)
Generates a video from a text prompt.
public List<Tensor<T>> GenerateFromText(Tensor<T> textEmbedding, int? seed = null)
Parameters
textEmbeddingTensor<T>Text embedding from encoder [B, 768] or similar.
seedint?Random seed for reproducibility.
Returns
- List<Tensor<T>>
Generated video frames.
GetModelMetadata()
Gets the metadata for this neural network model.
public override ModelMetadata<T> GetModelMetadata()
Returns
- ModelMetadata<T>
A ModelMetaData object containing information about the model.
InitializeLayers()
Initializes the layers of the neural network based on the architecture.
protected override void InitializeLayers()
Remarks
For Beginners: This method sets up all the layers in your neural network according to the architecture you've defined. It's like assembling the parts of your network before you can use it.
Predict(Tensor<T>)
Performs a single denoising prediction step on the input latents.
public override Tensor<T> Predict(Tensor<T> input)
Parameters
inputTensor<T>Input latent tensor [B, C, H, W].
Returns
- Tensor<T>
Predicted denoised output.
SerializeNetworkSpecificData(BinaryWriter)
Serializes network-specific data that is not covered by the general serialization process.
protected override void SerializeNetworkSpecificData(BinaryWriter writer)
Parameters
writerBinaryWriterThe BinaryWriter to write the data to.
Remarks
This method is called at the end of the general serialization process to allow derived classes to write any additional data specific to their implementation.
For Beginners: Think of this as packing a special compartment in your suitcase. While the main serialization method packs the common items (layers, parameters), this method allows each specific type of neural network to pack its own unique items that other networks might not have.
Train(Tensor<T>, Tensor<T>)
Trains the model using the diffusion training objective.
public override void Train(Tensor<T> input, Tensor<T> expectedOutput)
Parameters
inputTensor<T>Clean input video latents.
expectedOutputTensor<T>Target (typically the same as input for diffusion training).
UpdateParameters(Vector<T>)
Updates the network's parameters with new values.
public override void UpdateParameters(Vector<T> parameters)
Parameters
parametersVector<T>The new parameter values to set.
Remarks
For Beginners: During training, a neural network's internal values (parameters) get adjusted to improve its performance. This method allows you to update all those values at once by providing a complete set of new parameters.
This is typically used by optimization algorithms that calculate better parameter values based on training data.