Table of Contents

Class MVDreamModel<T>

Namespace
AiDotNet.Diffusion.Models
Assembly
AiDotNet.dll

MVDream - Multi-View Diffusion Model for 3D-consistent image generation.

public class MVDreamModel<T> : ThreeDDiffusionModelBase<T>, ILatentDiffusionModel<T>, I3DDiffusionModel<T>, IDiffusionModel<T>, IFullModel<T, Tensor<T>, Tensor<T>>, IModel<Tensor<T>, Tensor<T>, ModelMetadata<T>>, IModelSerializer, ICheckpointableModel, IParameterizable<T, Tensor<T>, Tensor<T>>, IFeatureAware, IFeatureImportance<T>, ICloneable<IFullModel<T, Tensor<T>, Tensor<T>>>, IGradientComputable<T, Tensor<T>, Tensor<T>>, IJitCompilable<T>

Type Parameters

T

The numeric type used for calculations.

Inheritance
MVDreamModel<T>
Implements
IFullModel<T, Tensor<T>, Tensor<T>>
IModel<Tensor<T>, Tensor<T>, ModelMetadata<T>>
IParameterizable<T, Tensor<T>, Tensor<T>>
ICloneable<IFullModel<T, Tensor<T>, Tensor<T>>>
IGradientComputable<T, Tensor<T>, Tensor<T>>
Inherited Members
Extension Methods

Examples

// Create MVDream model
var mvdream = new MVDreamModel<float>();

// Generate 4 views of an object
var views = mvdream.GenerateMultiView(
    prompt: "A cute robot toy",
    numViews: 4,
    numInferenceSteps: 50,
    guidanceScale: 7.5);

// Generate for score distillation (SDS)
var sdsGradient = mvdream.ComputeScoreDistillationGradients(
    renderedViews: nerFRenderOutput,
    prompt: "A cute robot toy",
    timestep: 500,
    guidanceScale: 100.0);

Remarks

MVDream is a multi-view diffusion model that generates 3D-consistent images from multiple viewpoints simultaneously. It enables high-quality 3D generation by leveraging multi-view supervision during training.

Key capabilities:

  1. Multi-View Generation: Generate multiple consistent views of an object
  2. Text-to-3D: Create 3D content from text descriptions
  3. Image-to-3D: Convert single images to 3D representations
  4. Score Distillation Sampling (SDS): Guide NeRF/3DGS optimization
  5. Novel View Synthesis: Generate unseen viewpoints of objects

For Beginners: MVDream creates multiple images of the same object from different angles, all consistent with each other:

Example: "A red sports car"

  • Front view: shows the car's front grille and headlights
  • Side view: shows the profile with wheels and doors
  • Back view: shows tail lights and rear design
  • All views are consistent (same car, same color, same features)

This consistency is what enables 3D reconstruction:

  • Multiple views + triangulation = 3D model
  • Can be used with Score Distillation for high-quality 3D

Technical specifications: - Image resolution: 256x256 per view (default) - Number of views: 4 (orthogonal) or 8 (comprehensive) - Latent channels: 4 (Stable Diffusion compatible) - Context dimension: 1024 (CLIP/T5 embeddings) - Camera model: Spherical coordinates (azimuth, elevation, radius)

Constructors

MVDreamModel()

Initializes a new MVDream model with default parameters.

public MVDreamModel()

MVDreamModel(DiffusionModelOptions<T>?, INoiseScheduler<T>?, MultiViewUNet<T>?, StandardVAE<T>?, IConditioningModule<T>?, IConditioningModule<T>?, MVDreamConfig?, int?)

Initializes a new MVDream model with custom parameters.

public MVDreamModel(DiffusionModelOptions<T>? options = null, INoiseScheduler<T>? scheduler = null, MultiViewUNet<T>? multiViewUNet = null, StandardVAE<T>? imageVAE = null, IConditioningModule<T>? textConditioner = null, IConditioningModule<T>? imageConditioner = null, MVDreamConfig? config = null, int? seed = null)

Parameters

options DiffusionModelOptions<T>

Configuration options for the diffusion model.

scheduler INoiseScheduler<T>

Optional custom scheduler.

multiViewUNet MultiViewUNet<T>

Optional custom multi-view U-Net.

imageVAE StandardVAE<T>

Optional custom image VAE.

textConditioner IConditioningModule<T>

Optional text conditioning module.

imageConditioner IConditioningModule<T>

Optional image conditioning module.

config MVDreamConfig

Model configuration.

seed int?

Optional random seed.

Fields

DEFAULT_CAMERA_DISTANCE

Default camera distance from object.

public const double DEFAULT_CAMERA_DISTANCE = 1.5

Field Value

double

MVDREAM_BASE_CHANNELS

MVDream base channels for U-Net.

public const int MVDREAM_BASE_CHANNELS = 320

Field Value

int

MVDREAM_CONTEXT_DIM

Context dimension for text/image conditioning.

public const int MVDREAM_CONTEXT_DIM = 1024

Field Value

int

MVDREAM_IMAGE_SIZE

Default image resolution for each view.

public const int MVDREAM_IMAGE_SIZE = 256

Field Value

int

MVDREAM_LATENT_CHANNELS

MVDream latent channels (Stable Diffusion compatible).

public const int MVDREAM_LATENT_CHANNELS = 4

Field Value

int

Properties

CameraEmbedding

Gets the camera embedding module.

public CameraEmbedding<T> CameraEmbedding { get; }

Property Value

CameraEmbedding<T>

Conditioner

Gets the conditioning module (optional, for conditioned generation).

public override IConditioningModule<T>? Conditioner { get; }

Property Value

IConditioningModule<T>

Config

Gets the model configuration.

public MVDreamConfig Config { get; }

Property Value

MVDreamConfig

ImageConditioner

Gets the image conditioner for image-to-3D tasks.

public IConditioningModule<T>? ImageConditioner { get; }

Property Value

IConditioningModule<T>

LatentChannels

Gets the number of latent channels.

public override int LatentChannels { get; }

Property Value

int

Remarks

Typically 4 for Stable Diffusion models.

NoisePredictor

Gets the noise predictor model (U-Net, DiT, etc.).

public override INoisePredictor<T> NoisePredictor { get; }

Property Value

INoisePredictor<T>

ParameterCount

Gets the number of parameters in the model.

public override int ParameterCount { get; }

Property Value

int

Remarks

This property returns the total count of trainable parameters in the model. It's useful for understanding model complexity and memory requirements.

SupportsMesh

Gets whether this model supports mesh generation.

public override bool SupportsMesh { get; }

Property Value

bool

SupportsNovelView

Gets whether this model supports novel view synthesis.

public override bool SupportsNovelView { get; }

Property Value

bool

SupportsPointCloud

Gets whether this model supports point cloud generation.

public override bool SupportsPointCloud { get; }

Property Value

bool

SupportsScoreDistillation

Gets whether this model supports score distillation sampling (SDS).

public override bool SupportsScoreDistillation { get; }

Property Value

bool

Remarks

SDS uses gradients from a 2D diffusion model to optimize a 3D representation. This is the technique behind DreamFusion and similar text-to-3D methods.

SupportsTexture

Gets whether this model supports texture generation.

public override bool SupportsTexture { get; }

Property Value

bool

VAE

Gets the VAE model used for encoding and decoding.

public override IVAEModel<T> VAE { get; }

Property Value

IVAEModel<T>

Methods

Clone()

Creates a deep copy of the model.

public override IDiffusionModel<T> Clone()

Returns

IDiffusionModel<T>

A new instance with the same parameters.

ComputeScoreDistillationGradients(Tensor<T>, string, int, double)

Computes Score Distillation Sampling gradients for 3D optimization.

public override Tensor<T> ComputeScoreDistillationGradients(Tensor<T> renderedViews, string prompt, int timestep, double guidanceScale = 100)

Parameters

renderedViews Tensor<T>

Rendered views from 3D representation.

prompt string

Text description guiding the 3D.

timestep int

Diffusion timestep for noise level.

guidanceScale double

CFG scale (typically 100 for SDS).

Returns

Tensor<T>

Gradient tensor for backpropagation.

Remarks

For Beginners: Score Distillation uses the diffusion model's knowledge to guide 3D optimization:

  1. Render your 3D model from multiple angles
  2. Add noise to the renders
  3. Ask the diffusion model what noise it "sees"
  4. The difference tells you how to improve the 3D model

High guidance scale (100) strongly pushes toward the text description.

DeepCopy()

Creates a deep copy of this object.

public override IFullModel<T, Tensor<T>, Tensor<T>> DeepCopy()

Returns

IFullModel<T, Tensor<T>, Tensor<T>>

GenerateMultiView(string, string?, int, int, double, double, int?)

Generates multiple consistent views from a text prompt.

public virtual Tensor<T>[] GenerateMultiView(string prompt, string? negativePrompt = null, int numViews = 4, int numInferenceSteps = 50, double guidanceScale = 7.5, double elevation = 30, int? seed = null)

Parameters

prompt string

Text description of the object.

negativePrompt string

Optional negative prompt.

numViews int

Number of views to generate (4 or 8).

numInferenceSteps int

Number of denoising steps.

guidanceScale double

Classifier-free guidance scale.

elevation double

Camera elevation angle in degrees.

seed int?

Optional random seed.

Returns

Tensor<T>[]

Array of generated view images [numViews, channels, height, width].

Remarks

For Beginners: This generates multiple pictures of an object from different angles, all looking consistent:

  • numViews=4: Front, right, back, left (90-degree spacing)
  • numViews=8: Adds diagonal views (45-degree spacing)

Higher elevation = looking more from above

  • 0 degrees: eye level
  • 30 degrees: looking down at 30-degree angle

GenerateNovelViewsFromImage(Tensor<T>, int, int, double, int?)

Generates novel views from a single input image.

public virtual Tensor<T>[] GenerateNovelViewsFromImage(Tensor<T> inputImage, int numViews = 4, int numInferenceSteps = 50, double guidanceScale = 3, int? seed = null)

Parameters

inputImage Tensor<T>

Input image tensor.

numViews int

Number of views to generate.

numInferenceSteps int

Number of denoising steps.

guidanceScale double

Guidance scale.

seed int?

Optional random seed.

Returns

Tensor<T>[]

Array of generated views.

GetParameters()

Gets the parameters that can be optimized.

public override Vector<T> GetParameters()

Returns

Vector<T>

ImageTo3D(Tensor<T>, int, int, double, int?)

Generates 3D from a single input image.

public override Mesh3D<T> ImageTo3D(Tensor<T> inputImage, int numViews = 4, int numInferenceSteps = 50, double guidanceScale = 3, int? seed = null)

Parameters

inputImage Tensor<T>

Input image tensor [1, channels, height, width].

numViews int

Number of novel views to generate.

numInferenceSteps int

Number of denoising steps.

guidanceScale double

Classifier-free guidance scale.

seed int?

Optional random seed.

Returns

Mesh3D<T>

Generated 3D mesh.

SetParameters(Vector<T>)

Sets the model parameters.

public override void SetParameters(Vector<T> parameters)

Parameters

parameters Vector<T>

The parameter vector to set.

Remarks

This method allows direct modification of the model's internal parameters. This is useful for optimization algorithms that need to update parameters iteratively. If the length of parameters does not match ParameterCount, an ArgumentException should be thrown.

Exceptions

ArgumentException

Thrown when the length of parameters does not match ParameterCount.