Class MVDreamModel<T>
MVDream - Multi-View Diffusion Model for 3D-consistent image generation.
public class MVDreamModel<T> : ThreeDDiffusionModelBase<T>, ILatentDiffusionModel<T>, I3DDiffusionModel<T>, IDiffusionModel<T>, IFullModel<T, Tensor<T>, Tensor<T>>, IModel<Tensor<T>, Tensor<T>, ModelMetadata<T>>, IModelSerializer, ICheckpointableModel, IParameterizable<T, Tensor<T>, Tensor<T>>, IFeatureAware, IFeatureImportance<T>, ICloneable<IFullModel<T, Tensor<T>, Tensor<T>>>, IGradientComputable<T, Tensor<T>, Tensor<T>>, IJitCompilable<T>
Type Parameters
TThe numeric type used for calculations.
- Inheritance
-
MVDreamModel<T>
- Implements
- Inherited Members
- Extension Methods
Examples
// Create MVDream model
var mvdream = new MVDreamModel<float>();
// Generate 4 views of an object
var views = mvdream.GenerateMultiView(
prompt: "A cute robot toy",
numViews: 4,
numInferenceSteps: 50,
guidanceScale: 7.5);
// Generate for score distillation (SDS)
var sdsGradient = mvdream.ComputeScoreDistillationGradients(
renderedViews: nerFRenderOutput,
prompt: "A cute robot toy",
timestep: 500,
guidanceScale: 100.0);
Remarks
MVDream is a multi-view diffusion model that generates 3D-consistent images from multiple viewpoints simultaneously. It enables high-quality 3D generation by leveraging multi-view supervision during training.
Key capabilities:
- Multi-View Generation: Generate multiple consistent views of an object
- Text-to-3D: Create 3D content from text descriptions
- Image-to-3D: Convert single images to 3D representations
- Score Distillation Sampling (SDS): Guide NeRF/3DGS optimization
- Novel View Synthesis: Generate unseen viewpoints of objects
For Beginners: MVDream creates multiple images of the same object from different angles, all consistent with each other:
Example: "A red sports car"
- Front view: shows the car's front grille and headlights
- Side view: shows the profile with wheels and doors
- Back view: shows tail lights and rear design
- All views are consistent (same car, same color, same features)
This consistency is what enables 3D reconstruction:
- Multiple views + triangulation = 3D model
- Can be used with Score Distillation for high-quality 3D
Technical specifications: - Image resolution: 256x256 per view (default) - Number of views: 4 (orthogonal) or 8 (comprehensive) - Latent channels: 4 (Stable Diffusion compatible) - Context dimension: 1024 (CLIP/T5 embeddings) - Camera model: Spherical coordinates (azimuth, elevation, radius)
Constructors
MVDreamModel()
Initializes a new MVDream model with default parameters.
public MVDreamModel()
MVDreamModel(DiffusionModelOptions<T>?, INoiseScheduler<T>?, MultiViewUNet<T>?, StandardVAE<T>?, IConditioningModule<T>?, IConditioningModule<T>?, MVDreamConfig?, int?)
Initializes a new MVDream model with custom parameters.
public MVDreamModel(DiffusionModelOptions<T>? options = null, INoiseScheduler<T>? scheduler = null, MultiViewUNet<T>? multiViewUNet = null, StandardVAE<T>? imageVAE = null, IConditioningModule<T>? textConditioner = null, IConditioningModule<T>? imageConditioner = null, MVDreamConfig? config = null, int? seed = null)
Parameters
optionsDiffusionModelOptions<T>Configuration options for the diffusion model.
schedulerINoiseScheduler<T>Optional custom scheduler.
multiViewUNetMultiViewUNet<T>Optional custom multi-view U-Net.
imageVAEStandardVAE<T>Optional custom image VAE.
textConditionerIConditioningModule<T>Optional text conditioning module.
imageConditionerIConditioningModule<T>Optional image conditioning module.
configMVDreamConfigModel configuration.
seedint?Optional random seed.
Fields
DEFAULT_CAMERA_DISTANCE
Default camera distance from object.
public const double DEFAULT_CAMERA_DISTANCE = 1.5
Field Value
MVDREAM_BASE_CHANNELS
MVDream base channels for U-Net.
public const int MVDREAM_BASE_CHANNELS = 320
Field Value
MVDREAM_CONTEXT_DIM
Context dimension for text/image conditioning.
public const int MVDREAM_CONTEXT_DIM = 1024
Field Value
MVDREAM_IMAGE_SIZE
Default image resolution for each view.
public const int MVDREAM_IMAGE_SIZE = 256
Field Value
MVDREAM_LATENT_CHANNELS
MVDream latent channels (Stable Diffusion compatible).
public const int MVDREAM_LATENT_CHANNELS = 4
Field Value
Properties
CameraEmbedding
Gets the camera embedding module.
public CameraEmbedding<T> CameraEmbedding { get; }
Property Value
Conditioner
Gets the conditioning module (optional, for conditioned generation).
public override IConditioningModule<T>? Conditioner { get; }
Property Value
Config
Gets the model configuration.
public MVDreamConfig Config { get; }
Property Value
ImageConditioner
Gets the image conditioner for image-to-3D tasks.
public IConditioningModule<T>? ImageConditioner { get; }
Property Value
LatentChannels
Gets the number of latent channels.
public override int LatentChannels { get; }
Property Value
Remarks
Typically 4 for Stable Diffusion models.
NoisePredictor
Gets the noise predictor model (U-Net, DiT, etc.).
public override INoisePredictor<T> NoisePredictor { get; }
Property Value
ParameterCount
Gets the number of parameters in the model.
public override int ParameterCount { get; }
Property Value
Remarks
This property returns the total count of trainable parameters in the model. It's useful for understanding model complexity and memory requirements.
SupportsMesh
Gets whether this model supports mesh generation.
public override bool SupportsMesh { get; }
Property Value
SupportsNovelView
Gets whether this model supports novel view synthesis.
public override bool SupportsNovelView { get; }
Property Value
SupportsPointCloud
Gets whether this model supports point cloud generation.
public override bool SupportsPointCloud { get; }
Property Value
SupportsScoreDistillation
Gets whether this model supports score distillation sampling (SDS).
public override bool SupportsScoreDistillation { get; }
Property Value
Remarks
SDS uses gradients from a 2D diffusion model to optimize a 3D representation. This is the technique behind DreamFusion and similar text-to-3D methods.
SupportsTexture
Gets whether this model supports texture generation.
public override bool SupportsTexture { get; }
Property Value
VAE
Gets the VAE model used for encoding and decoding.
public override IVAEModel<T> VAE { get; }
Property Value
- IVAEModel<T>
Methods
Clone()
Creates a deep copy of the model.
public override IDiffusionModel<T> Clone()
Returns
- IDiffusionModel<T>
A new instance with the same parameters.
ComputeScoreDistillationGradients(Tensor<T>, string, int, double)
Computes Score Distillation Sampling gradients for 3D optimization.
public override Tensor<T> ComputeScoreDistillationGradients(Tensor<T> renderedViews, string prompt, int timestep, double guidanceScale = 100)
Parameters
renderedViewsTensor<T>Rendered views from 3D representation.
promptstringText description guiding the 3D.
timestepintDiffusion timestep for noise level.
guidanceScaledoubleCFG scale (typically 100 for SDS).
Returns
- Tensor<T>
Gradient tensor for backpropagation.
Remarks
For Beginners: Score Distillation uses the diffusion model's knowledge to guide 3D optimization:
- Render your 3D model from multiple angles
- Add noise to the renders
- Ask the diffusion model what noise it "sees"
- The difference tells you how to improve the 3D model
High guidance scale (100) strongly pushes toward the text description.
DeepCopy()
Creates a deep copy of this object.
public override IFullModel<T, Tensor<T>, Tensor<T>> DeepCopy()
Returns
- IFullModel<T, Tensor<T>, Tensor<T>>
GenerateMultiView(string, string?, int, int, double, double, int?)
Generates multiple consistent views from a text prompt.
public virtual Tensor<T>[] GenerateMultiView(string prompt, string? negativePrompt = null, int numViews = 4, int numInferenceSteps = 50, double guidanceScale = 7.5, double elevation = 30, int? seed = null)
Parameters
promptstringText description of the object.
negativePromptstringOptional negative prompt.
numViewsintNumber of views to generate (4 or 8).
numInferenceStepsintNumber of denoising steps.
guidanceScaledoubleClassifier-free guidance scale.
elevationdoubleCamera elevation angle in degrees.
seedint?Optional random seed.
Returns
- Tensor<T>[]
Array of generated view images [numViews, channels, height, width].
Remarks
For Beginners: This generates multiple pictures of an object from different angles, all looking consistent:
- numViews=4: Front, right, back, left (90-degree spacing)
- numViews=8: Adds diagonal views (45-degree spacing)
Higher elevation = looking more from above
- 0 degrees: eye level
- 30 degrees: looking down at 30-degree angle
GenerateNovelViewsFromImage(Tensor<T>, int, int, double, int?)
Generates novel views from a single input image.
public virtual Tensor<T>[] GenerateNovelViewsFromImage(Tensor<T> inputImage, int numViews = 4, int numInferenceSteps = 50, double guidanceScale = 3, int? seed = null)
Parameters
inputImageTensor<T>Input image tensor.
numViewsintNumber of views to generate.
numInferenceStepsintNumber of denoising steps.
guidanceScaledoubleGuidance scale.
seedint?Optional random seed.
Returns
- Tensor<T>[]
Array of generated views.
GetParameters()
Gets the parameters that can be optimized.
public override Vector<T> GetParameters()
Returns
- Vector<T>
ImageTo3D(Tensor<T>, int, int, double, int?)
Generates 3D from a single input image.
public override Mesh3D<T> ImageTo3D(Tensor<T> inputImage, int numViews = 4, int numInferenceSteps = 50, double guidanceScale = 3, int? seed = null)
Parameters
inputImageTensor<T>Input image tensor [1, channels, height, width].
numViewsintNumber of novel views to generate.
numInferenceStepsintNumber of denoising steps.
guidanceScaledoubleClassifier-free guidance scale.
seedint?Optional random seed.
Returns
- Mesh3D<T>
Generated 3D mesh.
SetParameters(Vector<T>)
Sets the model parameters.
public override void SetParameters(Vector<T> parameters)
Parameters
parametersVector<T>The parameter vector to set.
Remarks
This method allows direct modification of the model's internal parameters.
This is useful for optimization algorithms that need to update parameters iteratively.
If the length of parameters does not match ParameterCount,
an ArgumentException should be thrown.
Exceptions
- ArgumentException
Thrown when the length of
parametersdoes not match ParameterCount.