Class MVDreamModel<T>

Namespace: AiDotNet.Diffusion.Models

Assembly: AiDotNet.dll

MVDream - Multi-View Diffusion Model for 3D-consistent image generation.

public class MVDreamModel<T> : ThreeDDiffusionModelBase<T>, ILatentDiffusionModel<T>, I3DDiffusionModel<T>, IDiffusionModel<T>, IFullModel<T, Tensor<T>, Tensor<T>>, IModel<Tensor<T>, Tensor<T>, ModelMetadata<T>>, IModelSerializer, ICheckpointableModel, IParameterizable<T, Tensor<T>, Tensor<T>>, IFeatureAware, IFeatureImportance<T>, ICloneable<IFullModel<T, Tensor<T>, Tensor<T>>>, IGradientComputable<T, Tensor<T>, Tensor<T>>, IJitCompilable<T>

Type Parameters

T: The numeric type used for calculations.

Inheritance: object

DiffusionModelBase<T>

LatentDiffusionModelBase<T>

ThreeDDiffusionModelBase<T>

MVDreamModel<T>

Implements: ILatentDiffusionModel<T>

I3DDiffusionModel<T>

IDiffusionModel<T>

IFullModel<T, Tensor<T>, Tensor<T>>

IModel<Tensor<T>, Tensor<T>, ModelMetadata<T>>

IModelSerializer

ICheckpointableModel

IParameterizable<T, Tensor<T>, Tensor<T>>

IFeatureAware

IFeatureImportance<T>

ICloneable<IFullModel<T, Tensor<T>, Tensor<T>>>

IGradientComputable<T, Tensor<T>, Tensor<T>>

IJitCompilable<T>

Inherited Members: ThreeDDiffusionModelBase<T>.DefaultPointCount

ThreeDDiffusionModelBase<T>.CoordinateScale

ThreeDDiffusionModelBase<T>.GeneratePointCloud(string, string, int?, int, double, int?)

ThreeDDiffusionModelBase<T>.GenerateMesh(string, string, int, int, double, int?)

ThreeDDiffusionModelBase<T>.SynthesizeNovelViews(Tensor<T>, (double azimuth, double elevation)[], int, double, int?)

ThreeDDiffusionModelBase<T>.PointCloudToMesh(Tensor<T>, SurfaceReconstructionMethod)

ThreeDDiffusionModelBase<T>.ColorizePointCloud(Tensor<T>, string, int, int?)

ThreeDDiffusionModelBase<T>.PredictPointCloudNoise(Tensor<T>, int, Tensor<T>)

ThreeDDiffusionModelBase<T>.PredictNovelViewNoise(Tensor<T>, int, Tensor<T>, Tensor<T>, double)

ThreeDDiffusionModelBase<T>.PredictColorNoise(Tensor<T>, int, Tensor<T>, Tensor<T>)

ThreeDDiffusionModelBase<T>.CreateViewEmbedding(double, double)

ThreeDDiffusionModelBase<T>.CombineImageAndViewConditioning(Tensor<T>, Tensor<T>)

ThreeDDiffusionModelBase<T>.GenerateViewAngles(int)

ThreeDDiffusionModelBase<T>.ReconstructFromViews(Tensor<T>[], (double azimuth, double elevation)[])

ThreeDDiffusionModelBase<T>.NormalizePointCloud(Tensor<T>)

ThreeDDiffusionModelBase<T>.NormalizeColors(Tensor<T>)

ThreeDDiffusionModelBase<T>.ConcatenatePointsAndColors(Tensor<T>, Tensor<T>)

ThreeDDiffusionModelBase<T>.PointCloudToMeshPoisson(Tensor<T>)

ThreeDDiffusionModelBase<T>.PointCloudToMeshBallPivoting(Tensor<T>)

ThreeDDiffusionModelBase<T>.PointCloudToMeshMarchingCubes(Tensor<T>)

ThreeDDiffusionModelBase<T>.PointCloudToMeshAlphaShape(Tensor<T>)

ThreeDDiffusionModelBase<T>.CreateSimpleMeshFromPoints(Tensor<T>)

LatentDiffusionModelBase<T>.GuidanceScale

LatentDiffusionModelBase<T>.SupportsNegativePrompt

LatentDiffusionModelBase<T>.SupportsInpainting

LatentDiffusionModelBase<T>.EncodeToLatent(Tensor<T>, bool)

LatentDiffusionModelBase<T>.DecodeFromLatent(Tensor<T>)

LatentDiffusionModelBase<T>.GenerateFromText(string, string, int, int, int, double?, int?)

LatentDiffusionModelBase<T>.ImageToImage(Tensor<T>, string, string, double, int, double?, int?)

LatentDiffusionModelBase<T>.Inpaint(Tensor<T>, Tensor<T>, string, string, int, double?, int?)

LatentDiffusionModelBase<T>.SetGuidanceScale(double)

LatentDiffusionModelBase<T>.PredictNoise(Tensor<T>, int)

LatentDiffusionModelBase<T>.Generate(int[], int, int?)

LatentDiffusionModelBase<T>.ApplyGuidance(Tensor<T>, Tensor<T>, double)

LatentDiffusionModelBase<T>.SampleNoiseTensor(int[], Random)

LatentDiffusionModelBase<T>.ResizeMaskToLatent(Tensor<T>, int[])

LatentDiffusionModelBase<T>.BlendLatentsWithMask(Tensor<T>, Tensor<T>, Tensor<T>, int)

DiffusionModelBase<T>.NumOps

DiffusionModelBase<T>.RandomGenerator

DiffusionModelBase<T>.LossFunction

DiffusionModelBase<T>.LearningRate

DiffusionModelBase<T>.Scheduler

DiffusionModelBase<T>.DefaultLossFunction

DiffusionModelBase<T>.SupportsJitCompilation

DiffusionModelBase<T>.ComputeLoss(Tensor<T>, Tensor<T>, int[])

DiffusionModelBase<T>.Train(Tensor<T>, Tensor<T>)

DiffusionModelBase<T>.Predict(Tensor<T>)

DiffusionModelBase<T>.GetModelMetadata()

DiffusionModelBase<T>.WithParameters(Vector<T>)

DiffusionModelBase<T>.Serialize()

DiffusionModelBase<T>.Deserialize(byte[])

DiffusionModelBase<T>.SaveModel(string)

DiffusionModelBase<T>.LoadModel(string)

DiffusionModelBase<T>.SaveState(Stream)

DiffusionModelBase<T>.LoadState(Stream)

DiffusionModelBase<T>.GetActiveFeatureIndices()

DiffusionModelBase<T>.SetActiveFeatureIndices(IEnumerable<int>)

DiffusionModelBase<T>.IsFeatureUsed(int)

DiffusionModelBase<T>.GetFeatureImportance()

DiffusionModelBase<T>.ComputeGradients(Tensor<T>, Tensor<T>, ILossFunction<T>)

DiffusionModelBase<T>.ApplyGradients(Vector<T>, T)

DiffusionModelBase<T>.ExportComputationGraph(List<ComputationNode<T>>)

DiffusionModelBase<T>.SampleNoise(int, Random)

object.Equals(object)

object.Equals(object, object)

object.GetHashCode()

object.GetType()

object.MemberwiseClone()

object.ReferenceEquals(object, object)

object.ToString()

Extension Methods: DistributedExtensions.AsDistributedForHighBandwidth<T, TInput, TOutput>(IFullModel<T, TInput, TOutput>, ICommunicationBackend<T>)

DistributedExtensions.AsDistributedForLowBandwidth<T, TInput, TOutput>(IFullModel<T, TInput, TOutput>, ICommunicationBackend<T>)

DistributedExtensions.AsDistributed<T, TInput, TOutput>(IFullModel<T, TInput, TOutput>, ICommunicationBackend<T>)

DistributedExtensions.AsDistributed<T, TInput, TOutput>(IFullModel<T, TInput, TOutput>, IShardingConfiguration<T>)

Examples

// Create MVDream model
var mvdream = new MVDreamModel<float>();

// Generate 4 views of an object
var views = mvdream.GenerateMultiView(
    prompt: "A cute robot toy",
    numViews: 4,
    numInferenceSteps: 50,
    guidanceScale: 7.5);

// Generate for score distillation (SDS)
var sdsGradient = mvdream.ComputeScoreDistillationGradients(
    renderedViews: nerFRenderOutput,
    prompt: "A cute robot toy",
    timestep: 500,
    guidanceScale: 100.0);

Remarks

MVDream is a multi-view diffusion model that generates 3D-consistent images from multiple viewpoints simultaneously. It enables high-quality 3D generation by leveraging multi-view supervision during training.

Key capabilities:

Multi-View Generation: Generate multiple consistent views of an object
Text-to-3D: Create 3D content from text descriptions
Image-to-3D: Convert single images to 3D representations
Score Distillation Sampling (SDS): Guide NeRF/3DGS optimization
Novel View Synthesis: Generate unseen viewpoints of objects

For Beginners: MVDream creates multiple images of the same object from different angles, all consistent with each other:

Example: "A red sports car"

Front view: shows the car's front grille and headlights
Side view: shows the profile with wheels and doors
Back view: shows tail lights and rear design
All views are consistent (same car, same color, same features)

This consistency is what enables 3D reconstruction:

Multiple views + triangulation = 3D model
Can be used with Score Distillation for high-quality 3D

Technical specifications: - Image resolution: 256x256 per view (default) - Number of views: 4 (orthogonal) or 8 (comprehensive) - Latent channels: 4 (Stable Diffusion compatible) - Context dimension: 1024 (CLIP/T5 embeddings) - Camera model: Spherical coordinates (azimuth, elevation, radius)

Constructors

MVDreamModel()

Initializes a new MVDream model with default parameters.

public MVDreamModel()

MVDreamModel(DiffusionModelOptions<T>?, INoiseScheduler<T>?, MultiViewUNet<T>?, StandardVAE<T>?, IConditioningModule<T>?, IConditioningModule<T>?, MVDreamConfig?, int?)

Initializes a new MVDream model with custom parameters.

public MVDreamModel(DiffusionModelOptions<T>? options = null, INoiseScheduler<T>? scheduler = null, MultiViewUNet<T>? multiViewUNet = null, StandardVAE<T>? imageVAE = null, IConditioningModule<T>? textConditioner = null, IConditioningModule<T>? imageConditioner = null, MVDreamConfig? config = null, int? seed = null)

Parameters

options DiffusionModelOptions<T>: Configuration options for the diffusion model.
scheduler INoiseScheduler<T>: Optional custom scheduler.
multiViewUNet MultiViewUNet<T>: Optional custom multi-view U-Net.
imageVAE StandardVAE<T>: Optional custom image VAE.
textConditioner IConditioningModule<T>: Optional text conditioning module.
imageConditioner IConditioningModule<T>: Optional image conditioning module.
config MVDreamConfig: Model configuration.
seed int?: Optional random seed.

Fields

DEFAULT_CAMERA_DISTANCE

Default camera distance from object.

public const double DEFAULT_CAMERA_DISTANCE = 1.5

Field Value

double

MVDREAM_BASE_CHANNELS

MVDream base channels for U-Net.

public const int MVDREAM_BASE_CHANNELS = 320

Field Value

int

MVDREAM_CONTEXT_DIM

Context dimension for text/image conditioning.

public const int MVDREAM_CONTEXT_DIM = 1024

Field Value

int

MVDREAM_IMAGE_SIZE

Default image resolution for each view.

public const int MVDREAM_IMAGE_SIZE = 256

Field Value

int

MVDREAM_LATENT_CHANNELS

MVDream latent channels (Stable Diffusion compatible).

public const int MVDREAM_LATENT_CHANNELS = 4

Field Value

int

Properties

CameraEmbedding

Gets the camera embedding module.

public CameraEmbedding<T> CameraEmbedding { get; }

Property Value

CameraEmbedding<T>

Conditioner

Gets the conditioning module (optional, for conditioned generation).

public override IConditioningModule<T>? Conditioner { get; }

Property Value

IConditioningModule<T>

Config

Gets the model configuration.

public MVDreamConfig Config { get; }

Property Value

MVDreamConfig

ImageConditioner

Gets the image conditioner for image-to-3D tasks.

public IConditioningModule<T>? ImageConditioner { get; }

Property Value

IConditioningModule<T>

LatentChannels

Gets the number of latent channels.

public override int LatentChannels { get; }

Property Value

int

Remarks

Typically 4 for Stable Diffusion models.

NoisePredictor

Gets the noise predictor model (U-Net, DiT, etc.).

public override INoisePredictor<T> NoisePredictor { get; }

Property Value

INoisePredictor<T>

ParameterCount

Gets the number of parameters in the model.

public override int ParameterCount { get; }

Property Value

int

Remarks

This property returns the total count of trainable parameters in the model. It's useful for understanding model complexity and memory requirements.

SupportsMesh

Gets whether this model supports mesh generation.

public override bool SupportsMesh { get; }

Property Value

bool

SupportsNovelView

Gets whether this model supports novel view synthesis.

public override bool SupportsNovelView { get; }

Property Value

bool

SupportsPointCloud

Gets whether this model supports point cloud generation.

public override bool SupportsPointCloud { get; }

Property Value

bool

SupportsScoreDistillation

Gets whether this model supports score distillation sampling (SDS).

public override bool SupportsScoreDistillation { get; }

Property Value

bool

Remarks

SDS uses gradients from a 2D diffusion model to optimize a 3D representation. This is the technique behind DreamFusion and similar text-to-3D methods.

SupportsTexture

Gets whether this model supports texture generation.

public override bool SupportsTexture { get; }

Property Value

bool

VAE

Gets the VAE model used for encoding and decoding.

public override IVAEModel<T> VAE { get; }

Property Value

IVAEModel<T>

Methods

Clone()

Creates a deep copy of the model.

public override IDiffusionModel<T> Clone()

Returns

IDiffusionModel<T>: A new instance with the same parameters.

ComputeScoreDistillationGradients(Tensor<T>, string, int, double)

Computes Score Distillation Sampling gradients for 3D optimization.

public override Tensor<T> ComputeScoreDistillationGradients(Tensor<T> renderedViews, string prompt, int timestep, double guidanceScale = 100)

Parameters

renderedViews Tensor<T>: Rendered views from 3D representation.
prompt string: Text description guiding the 3D.
timestep int: Diffusion timestep for noise level.
guidanceScale double: CFG scale (typically 100 for SDS).

Returns

Tensor<T>: Gradient tensor for backpropagation.

Remarks

For Beginners: Score Distillation uses the diffusion model's knowledge to guide 3D optimization:

Render your 3D model from multiple angles
Add noise to the renders
Ask the diffusion model what noise it "sees"
The difference tells you how to improve the 3D model

High guidance scale (100) strongly pushes toward the text description.

DeepCopy()

Creates a deep copy of this object.

public override IFullModel<T, Tensor<T>, Tensor<T>> DeepCopy()

Returns

IFullModel<T, Tensor<T>, Tensor<T>>

GenerateMultiView(string, string?, int, int, double, double, int?)

Generates multiple consistent views from a text prompt.

public virtual Tensor<T>[] GenerateMultiView(string prompt, string? negativePrompt = null, int numViews = 4, int numInferenceSteps = 50, double guidanceScale = 7.5, double elevation = 30, int? seed = null)

Parameters

prompt string: Text description of the object.
negativePrompt string: Optional negative prompt.
numViews int: Number of views to generate (4 or 8).
numInferenceSteps int: Number of denoising steps.
guidanceScale double: Classifier-free guidance scale.
elevation double: Camera elevation angle in degrees.
seed int?: Optional random seed.

Returns

Tensor<T>[]: Array of generated view images [numViews, channels, height, width].

Remarks

For Beginners: This generates multiple pictures of an object from different angles, all looking consistent:

numViews=4: Front, right, back, left (90-degree spacing)
numViews=8: Adds diagonal views (45-degree spacing)

Higher elevation = looking more from above

0 degrees: eye level
30 degrees: looking down at 30-degree angle

GenerateNovelViewsFromImage(Tensor<T>, int, int, double, int?)

Generates novel views from a single input image.

public virtual Tensor<T>[] GenerateNovelViewsFromImage(Tensor<T> inputImage, int numViews = 4, int numInferenceSteps = 50, double guidanceScale = 3, int? seed = null)

Parameters

inputImage Tensor<T>: Input image tensor.
numViews int: Number of views to generate.
numInferenceSteps int: Number of denoising steps.
guidanceScale double: Guidance scale.
seed int?: Optional random seed.

Returns

Tensor<T>[]: Array of generated views.

GetParameters()

Gets the parameters that can be optimized.

public override Vector<T> GetParameters()

Returns

Vector<T>

ImageTo3D(Tensor<T>, int, int, double, int?)

Generates 3D from a single input image.

public override Mesh3D<T> ImageTo3D(Tensor<T> inputImage, int numViews = 4, int numInferenceSteps = 50, double guidanceScale = 3, int? seed = null)

Parameters

inputImage Tensor<T>: Input image tensor [1, channels, height, width].
numViews int: Number of novel views to generate.
numInferenceSteps int: Number of denoising steps.
guidanceScale double: Classifier-free guidance scale.
seed int?: Optional random seed.

Returns

Mesh3D<T>: Generated 3D mesh.

SetParameters(Vector<T>)

Sets the model parameters.

public override void SetParameters(Vector<T> parameters)

Parameters

parameters Vector<T>: The parameter vector to set.

Remarks

This method allows direct modification of the model's internal parameters. This is useful for optimization algorithms that need to update parameters iteratively. If the length of parameters does not match ParameterCount, an ArgumentException should be thrown.

Exceptions

ArgumentException: Thrown when the length of parameters does not match ParameterCount.

Table of Contents

Class MVDreamModel<T>

Type Parameters

Examples

Remarks

Constructors

MVDreamModel()

MVDreamModel(DiffusionModelOptions<T>?, INoiseScheduler<T>?, MultiViewUNet<T>?, StandardVAE<T>?, IConditioningModule<T>?, IConditioningModule<T>?, MVDreamConfig?, int?)

Parameters

Fields

DEFAULT_CAMERA_DISTANCE

Field Value

MVDREAM_BASE_CHANNELS

Field Value

MVDREAM_CONTEXT_DIM

Field Value

MVDREAM_IMAGE_SIZE

Field Value

MVDREAM_LATENT_CHANNELS

Field Value

Properties

CameraEmbedding

Property Value

Conditioner

Property Value

Config

Property Value

ImageConditioner

Property Value

LatentChannels

Property Value

Remarks

NoisePredictor

Property Value

ParameterCount

Property Value

Remarks

SupportsMesh

Property Value

SupportsNovelView

Property Value

SupportsPointCloud

Property Value

SupportsScoreDistillation

Property Value

Remarks

SupportsTexture

Property Value

VAE

Property Value

Methods

Clone()

Returns

ComputeScoreDistillationGradients(Tensor<T>, string, int, double)

Parameters

Returns

Remarks

DeepCopy()

Returns

GenerateMultiView(string, string?, int, int, double, double, int?)

Parameters

Returns

Remarks

GenerateNovelViewsFromImage(Tensor<T>, int, int, double, int?)

Parameters

Returns

GetParameters()

Returns

ImageTo3D(Tensor<T>, int, int, double, int?)

Parameters

Returns

SetParameters(Vector<T>)

Parameters

Remarks

Exceptions