Class ShapEModel<T>

Namespace: AiDotNet.Diffusion.Models

Assembly: AiDotNet.dll

Shap-E model for text-to-3D and image-to-3D generation with implicit neural representations.

public class ShapEModel<T> : ThreeDDiffusionModelBase<T>, ILatentDiffusionModel<T>, I3DDiffusionModel<T>, IDiffusionModel<T>, IFullModel<T, Tensor<T>, Tensor<T>>, IModel<Tensor<T>, Tensor<T>, ModelMetadata<T>>, IModelSerializer, ICheckpointableModel, IParameterizable<T, Tensor<T>, Tensor<T>>, IFeatureAware, IFeatureImportance<T>, ICloneable<IFullModel<T, Tensor<T>, Tensor<T>>>, IGradientComputable<T, Tensor<T>, Tensor<T>>, IJitCompilable<T>

Type Parameters

T: The numeric type used for calculations.

Inheritance: object

DiffusionModelBase<T>

LatentDiffusionModelBase<T>

ThreeDDiffusionModelBase<T>

ShapEModel<T>

Implements: ILatentDiffusionModel<T>

I3DDiffusionModel<T>

IDiffusionModel<T>

IFullModel<T, Tensor<T>, Tensor<T>>

IModel<Tensor<T>, Tensor<T>, ModelMetadata<T>>

IModelSerializer

ICheckpointableModel

IParameterizable<T, Tensor<T>, Tensor<T>>

IFeatureAware

IFeatureImportance<T>

ICloneable<IFullModel<T, Tensor<T>, Tensor<T>>>

IGradientComputable<T, Tensor<T>, Tensor<T>>

IJitCompilable<T>

Inherited Members: ThreeDDiffusionModelBase<T>.DefaultPointCount

ThreeDDiffusionModelBase<T>.CoordinateScale

ThreeDDiffusionModelBase<T>.ImageTo3D(Tensor<T>, int, int, double, int?)

ThreeDDiffusionModelBase<T>.SynthesizeNovelViews(Tensor<T>, (double azimuth, double elevation)[], int, double, int?)

ThreeDDiffusionModelBase<T>.ComputeScoreDistillationGradients(Tensor<T>, string, int, double)

ThreeDDiffusionModelBase<T>.PointCloudToMesh(Tensor<T>, SurfaceReconstructionMethod)

ThreeDDiffusionModelBase<T>.ColorizePointCloud(Tensor<T>, string, int, int?)

ThreeDDiffusionModelBase<T>.PredictPointCloudNoise(Tensor<T>, int, Tensor<T>)

ThreeDDiffusionModelBase<T>.PredictNovelViewNoise(Tensor<T>, int, Tensor<T>, Tensor<T>, double)

ThreeDDiffusionModelBase<T>.PredictColorNoise(Tensor<T>, int, Tensor<T>, Tensor<T>)

ThreeDDiffusionModelBase<T>.CreateViewEmbedding(double, double)

ThreeDDiffusionModelBase<T>.CombineImageAndViewConditioning(Tensor<T>, Tensor<T>)

ThreeDDiffusionModelBase<T>.GenerateViewAngles(int)

ThreeDDiffusionModelBase<T>.ReconstructFromViews(Tensor<T>[], (double azimuth, double elevation)[])

ThreeDDiffusionModelBase<T>.NormalizePointCloud(Tensor<T>)

ThreeDDiffusionModelBase<T>.NormalizeColors(Tensor<T>)

ThreeDDiffusionModelBase<T>.ConcatenatePointsAndColors(Tensor<T>, Tensor<T>)

ThreeDDiffusionModelBase<T>.PointCloudToMeshPoisson(Tensor<T>)

ThreeDDiffusionModelBase<T>.PointCloudToMeshBallPivoting(Tensor<T>)

ThreeDDiffusionModelBase<T>.PointCloudToMeshMarchingCubes(Tensor<T>)

ThreeDDiffusionModelBase<T>.PointCloudToMeshAlphaShape(Tensor<T>)

ThreeDDiffusionModelBase<T>.CreateSimpleMeshFromPoints(Tensor<T>)

LatentDiffusionModelBase<T>.GuidanceScale

LatentDiffusionModelBase<T>.SupportsNegativePrompt

LatentDiffusionModelBase<T>.SupportsInpainting

LatentDiffusionModelBase<T>.EncodeToLatent(Tensor<T>, bool)

LatentDiffusionModelBase<T>.DecodeFromLatent(Tensor<T>)

LatentDiffusionModelBase<T>.GenerateFromText(string, string, int, int, int, double?, int?)

LatentDiffusionModelBase<T>.ImageToImage(Tensor<T>, string, string, double, int, double?, int?)

LatentDiffusionModelBase<T>.Inpaint(Tensor<T>, Tensor<T>, string, string, int, double?, int?)

LatentDiffusionModelBase<T>.SetGuidanceScale(double)

LatentDiffusionModelBase<T>.PredictNoise(Tensor<T>, int)

LatentDiffusionModelBase<T>.Generate(int[], int, int?)

LatentDiffusionModelBase<T>.ApplyGuidance(Tensor<T>, Tensor<T>, double)

LatentDiffusionModelBase<T>.SampleNoiseTensor(int[], Random)

LatentDiffusionModelBase<T>.ResizeMaskToLatent(Tensor<T>, int[])

LatentDiffusionModelBase<T>.BlendLatentsWithMask(Tensor<T>, Tensor<T>, Tensor<T>, int)

DiffusionModelBase<T>.NumOps

DiffusionModelBase<T>.RandomGenerator

DiffusionModelBase<T>.LossFunction

DiffusionModelBase<T>.LearningRate

DiffusionModelBase<T>.Scheduler

DiffusionModelBase<T>.DefaultLossFunction

DiffusionModelBase<T>.SupportsJitCompilation

DiffusionModelBase<T>.ComputeLoss(Tensor<T>, Tensor<T>, int[])

DiffusionModelBase<T>.Train(Tensor<T>, Tensor<T>)

DiffusionModelBase<T>.Predict(Tensor<T>)

DiffusionModelBase<T>.GetModelMetadata()

DiffusionModelBase<T>.WithParameters(Vector<T>)

DiffusionModelBase<T>.Serialize()

DiffusionModelBase<T>.Deserialize(byte[])

DiffusionModelBase<T>.SaveModel(string)

DiffusionModelBase<T>.LoadModel(string)

DiffusionModelBase<T>.SaveState(Stream)

DiffusionModelBase<T>.LoadState(Stream)

DiffusionModelBase<T>.GetActiveFeatureIndices()

DiffusionModelBase<T>.SetActiveFeatureIndices(IEnumerable<int>)

DiffusionModelBase<T>.IsFeatureUsed(int)

DiffusionModelBase<T>.GetFeatureImportance()

DiffusionModelBase<T>.ComputeGradients(Tensor<T>, Tensor<T>, ILossFunction<T>)

DiffusionModelBase<T>.ApplyGradients(Vector<T>, T)

DiffusionModelBase<T>.ExportComputationGraph(List<ComputationNode<T>>)

DiffusionModelBase<T>.SampleNoise(int, Random)

object.Equals(object)

object.Equals(object, object)

object.GetHashCode()

object.GetType()

object.MemberwiseClone()

object.ReferenceEquals(object, object)

object.ToString()

Extension Methods: DistributedExtensions.AsDistributedForHighBandwidth<T, TInput, TOutput>(IFullModel<T, TInput, TOutput>, ICommunicationBackend<T>)

DistributedExtensions.AsDistributedForLowBandwidth<T, TInput, TOutput>(IFullModel<T, TInput, TOutput>, ICommunicationBackend<T>)

DistributedExtensions.AsDistributed<T, TInput, TOutput>(IFullModel<T, TInput, TOutput>, ICommunicationBackend<T>)

DistributedExtensions.AsDistributed<T, TInput, TOutput>(IFullModel<T, TInput, TOutput>, IShardingConfiguration<T>)

Examples

// Create a Shap-E model
var shapE = new ShapEModel<float>();

// Generate a 3D shape from text
var latent = shapE.GenerateLatent(
    prompt: "A wooden chair",
    numInferenceSteps: 64,
    guidanceScale: 15.0);

// Render from a specific view
var image = shapE.RenderView(latent, cameraPosition: (0, 0, 2), lookAt: (0, 0, 0));

// Export to mesh
var (vertices, faces) = shapE.ExtractMesh(latent, resolution: 64);

// Or use the high-level API
var mesh = shapE.GenerateMesh(
    prompt: "A red sports car",
    resolution: 128);

ExportToOBJ(mesh, "car.obj");

Remarks

Shap-E is OpenAI's model for generating 3D objects as implicit neural representations (NeRFs). Unlike Point-E which generates point clouds, Shap-E generates parameters for a neural network that represents the 3D shape, which can then be rendered from any angle or converted to meshes.

For Beginners: Shap-E creates 3D objects that you can view from any angle:

What is an Implicit Neural Representation (NeRF)?

A neural network that knows the 3D shape
Input: 3D coordinates (x, y, z)
Output: Color and density at that point
Can render views from ANY angle without artifacts

Feature	Point-E	Shap-E
Output	Point cloud	Neural field
Quality	Good	Better
Rendering	Fast	Slower
Mesh export	Reconstruction	Direct SDF
Memory	Lower	Higher

Example: "A red chair"

Shap-E generates network weights (latent representation)
These weights define a neural network
Query (x,y,z) -> neural network -> color, density
Render from any view or extract mesh via marching cubes

Use cases:

High-quality 3D assets
Novel view synthesis
Direct mesh export with SDF
View-consistent 3D models

Technical specifications: - Latent dimension: 1024 parameters per shape - Output: NeRF weights or SDF (Signed Distance Function) - Rendering: Differentiable volumetric rendering - Mesh export: Marching cubes on SDF - Inference: ~64 steps

Constructors

ShapEModel()

Initializes a new Shap-E model with default parameters.

public ShapEModel()

ShapEModel(DiffusionModelOptions<T>?, INoiseScheduler<T>?, DiTNoisePredictor<T>?, IConditioningModule<T>?, bool, int, int?)

Initializes a new Shap-E model with custom parameters.

public ShapEModel(DiffusionModelOptions<T>? options = null, INoiseScheduler<T>? scheduler = null, DiTNoisePredictor<T>? latentPredictor = null, IConditioningModule<T>? conditioner = null, bool useSDFMode = true, int defaultPointCount = 4096, int? seed = null)

Parameters

options DiffusionModelOptions<T>: Configuration options.
scheduler INoiseScheduler<T>: Optional custom scheduler.
latentPredictor DiTNoisePredictor<T>: Optional custom latent predictor.
conditioner IConditioningModule<T>: Optional conditioning module.
useSDFMode bool: Whether to use SDF mode.
defaultPointCount int: Default point count for point cloud extraction.
seed int?: Optional random seed.

Properties

Conditioner

Gets the conditioning module (optional, for conditioned generation).

public override IConditioningModule<T>? Conditioner { get; }

Property Value

IConditioningModule<T>

LatentChannels

Gets the number of latent channels.

public override int LatentChannels { get; }

Property Value

int

Remarks

Typically 4 for Stable Diffusion models.

LatentDimension

Gets the latent dimension.

public int LatentDimension { get; }

Property Value

int

NoisePredictor

Gets the noise predictor model (U-Net, DiT, etc.).

public override INoisePredictor<T> NoisePredictor { get; }

Property Value

INoisePredictor<T>

ParameterCount

Gets the number of parameters in the model.

public override int ParameterCount { get; }

Property Value

int

Remarks

This property returns the total count of trainable parameters in the model. It's useful for understanding model complexity and memory requirements.

SupportsMesh

Gets whether this model supports mesh generation.

public override bool SupportsMesh { get; }

Property Value

bool

SupportsNovelView

Gets whether this model supports novel view synthesis.

public override bool SupportsNovelView { get; }

Property Value

bool

SupportsPointCloud

Gets whether this model supports point cloud generation.

public override bool SupportsPointCloud { get; }

Property Value

bool

SupportsScoreDistillation

Gets whether this model supports score distillation sampling (SDS).

public override bool SupportsScoreDistillation { get; }

Property Value

bool

Remarks

SDS uses gradients from a 2D diffusion model to optimize a 3D representation. This is the technique behind DreamFusion and similar text-to-3D methods.

SupportsTexture

Gets whether this model supports texture generation.

public override bool SupportsTexture { get; }

Property Value

bool

UseSDFMode

Gets whether this model uses SDF mode.

public bool UseSDFMode { get; }

Property Value

bool

VAE

Gets the VAE model used for encoding and decoding.

public override IVAEModel<T> VAE { get; }

Property Value

IVAEModel<T>

Methods

Clone()

Creates a deep copy of the model.

public override IDiffusionModel<T> Clone()

Returns

IDiffusionModel<T>: A new instance with the same parameters.

DeepCopy()

Creates a deep copy of this object.

public override IFullModel<T, Tensor<T>, Tensor<T>> DeepCopy()

Returns

IFullModel<T, Tensor<T>, Tensor<T>>

ExtractMesh(Tensor<T>, int)

Extracts a mesh from the latent using marching cubes.

public virtual (Tensor<T> Vertices, Tensor<T> Faces) ExtractMesh(Tensor<T> latent, int resolution = 64)

Parameters

latent Tensor<T>: Shape latent representation.
resolution int: Grid resolution for marching cubes.

Returns

(Tensor<T> grad1, Tensor<T> grad2): Tuple of vertices [numVerts, 3] and faces [numFaces, 3].

Remarks

For Beginners: This converts the neural representation to a triangle mesh:

Marching cubes algorithm:

Create a 3D grid of points
Evaluate SDF (signed distance) at each point
Find where surface crosses grid cells (SDF = 0)
Generate triangles for those crossings

Higher resolution = more triangles = more detail but slower

GenerateLatent(string, string?, int, double, int?)

Generates a latent representation of a 3D shape from text.

public virtual Tensor<T> GenerateLatent(string prompt, string? negativePrompt = null, int numInferenceSteps = 64, double guidanceScale = 15, int? seed = null)

Parameters

prompt string: Text description of the 3D object.
negativePrompt string: Optional negative prompt.
numInferenceSteps int: Number of denoising steps.
guidanceScale double: Classifier-free guidance scale.
seed int?: Optional random seed.

Returns

Tensor<T>: Latent tensor representing the 3D shape [1, 1, latentDim].

Remarks

For Beginners: The latent is a compressed representation of the 3D shape. It contains the "recipe" for rendering the object from any angle.

After generating a latent, you can:

Render views with RenderView()
Extract a mesh with ExtractMesh()
Get a point cloud with SamplePointCloud()

GenerateLatentFromImage(Tensor<T>, int, double, int?)

Generates a latent from an image.

public virtual Tensor<T> GenerateLatentFromImage(Tensor<T> image, int numInferenceSteps = 64, double guidanceScale = 3, int? seed = null)

Parameters

image Tensor<T>: Input image [batch, channels, height, width].
numInferenceSteps int: Number of denoising steps.
guidanceScale double: Guidance scale.
seed int?: Optional random seed.

Returns

Tensor<T>: Latent tensor representing the 3D shape.

GenerateMesh(string, string?, int, int, double, int?)

Generates a mesh directly from a text prompt.

public override Mesh3D<T> GenerateMesh(string prompt, string? negativePrompt = null, int resolution = 64, int numInferenceSteps = 64, double guidanceScale = 15, int? seed = null)

Parameters

prompt string: Text description of the 3D object.
negativePrompt string: Optional negative prompt.
resolution int: Mesh resolution.
numInferenceSteps int: Number of denoising steps.
guidanceScale double: Guidance scale.
seed int?: Optional random seed.

Returns

Mesh3D<T>: Mesh3D containing vertices and faces.

GeneratePointCloud(string, string?, int?, int, double, int?)

Generates a point cloud from a text description.

public override Tensor<T> GeneratePointCloud(string prompt, string? negativePrompt = null, int? numPoints = null, int numInferenceSteps = 64, double guidanceScale = 15, int? seed = null)

Parameters

prompt string: Text description of the desired 3D object.
negativePrompt string: What to avoid.
numPoints int?: Number of points in the cloud.
numInferenceSteps int: Number of denoising steps.
guidanceScale double: Classifier-free guidance scale.
seed int?: Optional random seed.

Returns

Tensor<T>: Point cloud tensor [batch, numPoints, 3] for XYZ coordinates.

Remarks

For Beginners: This creates a cloud of 3D points that form a shape: - prompt: "A chair" → 4096 points arranged in a chair shape - The points define the surface of the object - Can be converted to a mesh for rendering

GetParameters()

Gets the parameters that can be optimized.

public override Vector<T> GetParameters()

Returns

Vector<T>

RenderView(Tensor<T>, (double x, double y, double z), (double x, double y, double z), int, int)

Renders a view of the shape from a camera position.

public virtual Tensor<T> RenderView(Tensor<T> latent, (double x, double y, double z) cameraPosition, (double x, double y, double z) lookAt, int imageSize = 256, int numSamples = 64)

Parameters

latent Tensor<T>: Shape latent representation.
cameraPosition (double X, double Y, double Z): Camera position (x, y, z).
lookAt (double X, double Y, double Z): Look-at target (x, y, z).
imageSize int: Output image size.
numSamples int: Number of ray samples for rendering.

Returns

Tensor<T>: Rendered image tensor [1, 3, imageSize, imageSize].

Remarks

For Beginners: This renders what the 3D object looks like from a specific viewpoint:

cameraPosition: Where the "camera" is located in 3D space lookAt: What point the camera is looking at

Example:

cameraPosition: (0, 0, 2) - camera is 2 units in front
lookAt: (0, 0, 0) - looking at the center
Result: Front view of the object

Change cameraPosition to render from different angles!

SamplePointCloud(Tensor<T>, int?, int?)

Samples a point cloud from the shape.

public virtual Tensor<T> SamplePointCloud(Tensor<T> latent, int? numPoints = null, int? seed = null)

Parameters

latent Tensor<T>: Shape latent representation.
numPoints int?: Number of points to sample.
seed int?: Optional random seed.

Returns

Tensor<T>: Point cloud tensor [1, numPoints, 6] with XYZ + RGB.

SetParameters(Vector<T>)

Sets the model parameters.

public override void SetParameters(Vector<T> parameters)

Parameters

parameters Vector<T>: The parameter vector to set.

Remarks

This method allows direct modification of the model's internal parameters. This is useful for optimization algorithms that need to update parameters iteratively. If the length of parameters does not match ParameterCount, an ArgumentException should be thrown.

Exceptions

ArgumentException: Thrown when the length of parameters does not match ParameterCount.

Table of Contents

Class ShapEModel<T>

Type Parameters

Examples

Remarks

Constructors

ShapEModel()

ShapEModel(DiffusionModelOptions<T>?, INoiseScheduler<T>?, DiTNoisePredictor<T>?, IConditioningModule<T>?, bool, int, int?)

Parameters

Properties

Conditioner

Property Value

LatentChannels

Property Value

Remarks

LatentDimension

Property Value

NoisePredictor

Property Value

ParameterCount

Property Value

Remarks

SupportsMesh

Property Value

SupportsNovelView

Property Value

SupportsPointCloud

Property Value

SupportsScoreDistillation

Property Value

Remarks

SupportsTexture

Property Value

UseSDFMode

Property Value

VAE

Property Value

Methods

Clone()

Returns

DeepCopy()

Returns

ExtractMesh(Tensor<T>, int)

Parameters

Returns

Remarks

GenerateLatent(string, string?, int, double, int?)

Parameters

Returns

Remarks

GenerateLatentFromImage(Tensor<T>, int, double, int?)

Parameters

Returns

GenerateMesh(string, string?, int, int, double, int?)

Parameters

Returns

GeneratePointCloud(string, string?, int?, int, double, int?)

Parameters

Returns

Remarks

GetParameters()

Returns

RenderView(Tensor<T>, (double x, double y, double z), (double x, double y, double z), int, int)

Parameters

Returns

Remarks

SamplePointCloud(Tensor<T>, int?, int?)

Parameters

Returns

SetParameters(Vector<T>)

Parameters

Remarks

Exceptions