Table of Contents

Class ShapEModel<T>

Namespace
AiDotNet.Diffusion.Models
Assembly
AiDotNet.dll

Shap-E model for text-to-3D and image-to-3D generation with implicit neural representations.

public class ShapEModel<T> : ThreeDDiffusionModelBase<T>, ILatentDiffusionModel<T>, I3DDiffusionModel<T>, IDiffusionModel<T>, IFullModel<T, Tensor<T>, Tensor<T>>, IModel<Tensor<T>, Tensor<T>, ModelMetadata<T>>, IModelSerializer, ICheckpointableModel, IParameterizable<T, Tensor<T>, Tensor<T>>, IFeatureAware, IFeatureImportance<T>, ICloneable<IFullModel<T, Tensor<T>, Tensor<T>>>, IGradientComputable<T, Tensor<T>, Tensor<T>>, IJitCompilable<T>

Type Parameters

T

The numeric type used for calculations.

Inheritance
ShapEModel<T>
Implements
IFullModel<T, Tensor<T>, Tensor<T>>
IModel<Tensor<T>, Tensor<T>, ModelMetadata<T>>
IParameterizable<T, Tensor<T>, Tensor<T>>
ICloneable<IFullModel<T, Tensor<T>, Tensor<T>>>
IGradientComputable<T, Tensor<T>, Tensor<T>>
Inherited Members
Extension Methods

Examples

// Create a Shap-E model
var shapE = new ShapEModel<float>();

// Generate a 3D shape from text
var latent = shapE.GenerateLatent(
    prompt: "A wooden chair",
    numInferenceSteps: 64,
    guidanceScale: 15.0);

// Render from a specific view
var image = shapE.RenderView(latent, cameraPosition: (0, 0, 2), lookAt: (0, 0, 0));

// Export to mesh
var (vertices, faces) = shapE.ExtractMesh(latent, resolution: 64);

// Or use the high-level API
var mesh = shapE.GenerateMesh(
    prompt: "A red sports car",
    resolution: 128);

ExportToOBJ(mesh, "car.obj");

Remarks

Shap-E is OpenAI's model for generating 3D objects as implicit neural representations (NeRFs). Unlike Point-E which generates point clouds, Shap-E generates parameters for a neural network that represents the 3D shape, which can then be rendered from any angle or converted to meshes.

For Beginners: Shap-E creates 3D objects that you can view from any angle:

What is an Implicit Neural Representation (NeRF)?

  • A neural network that knows the 3D shape
  • Input: 3D coordinates (x, y, z)
  • Output: Color and density at that point
  • Can render views from ANY angle without artifacts
Feature Point-E Shap-E
Output Point cloud Neural field
Quality Good Better
Rendering Fast Slower
Mesh export Reconstruction Direct SDF
Memory Lower Higher

Example: "A red chair"

  1. Shap-E generates network weights (latent representation)
  2. These weights define a neural network
  3. Query (x,y,z) -> neural network -> color, density
  4. Render from any view or extract mesh via marching cubes

Use cases:

  • High-quality 3D assets
  • Novel view synthesis
  • Direct mesh export with SDF
  • View-consistent 3D models

Technical specifications: - Latent dimension: 1024 parameters per shape - Output: NeRF weights or SDF (Signed Distance Function) - Rendering: Differentiable volumetric rendering - Mesh export: Marching cubes on SDF - Inference: ~64 steps

Constructors

ShapEModel()

Initializes a new Shap-E model with default parameters.

public ShapEModel()

ShapEModel(DiffusionModelOptions<T>?, INoiseScheduler<T>?, DiTNoisePredictor<T>?, IConditioningModule<T>?, bool, int, int?)

Initializes a new Shap-E model with custom parameters.

public ShapEModel(DiffusionModelOptions<T>? options = null, INoiseScheduler<T>? scheduler = null, DiTNoisePredictor<T>? latentPredictor = null, IConditioningModule<T>? conditioner = null, bool useSDFMode = true, int defaultPointCount = 4096, int? seed = null)

Parameters

options DiffusionModelOptions<T>

Configuration options.

scheduler INoiseScheduler<T>

Optional custom scheduler.

latentPredictor DiTNoisePredictor<T>

Optional custom latent predictor.

conditioner IConditioningModule<T>

Optional conditioning module.

useSDFMode bool

Whether to use SDF mode.

defaultPointCount int

Default point count for point cloud extraction.

seed int?

Optional random seed.

Properties

Conditioner

Gets the conditioning module (optional, for conditioned generation).

public override IConditioningModule<T>? Conditioner { get; }

Property Value

IConditioningModule<T>

LatentChannels

Gets the number of latent channels.

public override int LatentChannels { get; }

Property Value

int

Remarks

Typically 4 for Stable Diffusion models.

LatentDimension

Gets the latent dimension.

public int LatentDimension { get; }

Property Value

int

NoisePredictor

Gets the noise predictor model (U-Net, DiT, etc.).

public override INoisePredictor<T> NoisePredictor { get; }

Property Value

INoisePredictor<T>

ParameterCount

Gets the number of parameters in the model.

public override int ParameterCount { get; }

Property Value

int

Remarks

This property returns the total count of trainable parameters in the model. It's useful for understanding model complexity and memory requirements.

SupportsMesh

Gets whether this model supports mesh generation.

public override bool SupportsMesh { get; }

Property Value

bool

SupportsNovelView

Gets whether this model supports novel view synthesis.

public override bool SupportsNovelView { get; }

Property Value

bool

SupportsPointCloud

Gets whether this model supports point cloud generation.

public override bool SupportsPointCloud { get; }

Property Value

bool

SupportsScoreDistillation

Gets whether this model supports score distillation sampling (SDS).

public override bool SupportsScoreDistillation { get; }

Property Value

bool

Remarks

SDS uses gradients from a 2D diffusion model to optimize a 3D representation. This is the technique behind DreamFusion and similar text-to-3D methods.

SupportsTexture

Gets whether this model supports texture generation.

public override bool SupportsTexture { get; }

Property Value

bool

UseSDFMode

Gets whether this model uses SDF mode.

public bool UseSDFMode { get; }

Property Value

bool

VAE

Gets the VAE model used for encoding and decoding.

public override IVAEModel<T> VAE { get; }

Property Value

IVAEModel<T>

Methods

Clone()

Creates a deep copy of the model.

public override IDiffusionModel<T> Clone()

Returns

IDiffusionModel<T>

A new instance with the same parameters.

DeepCopy()

Creates a deep copy of this object.

public override IFullModel<T, Tensor<T>, Tensor<T>> DeepCopy()

Returns

IFullModel<T, Tensor<T>, Tensor<T>>

ExtractMesh(Tensor<T>, int)

Extracts a mesh from the latent using marching cubes.

public virtual (Tensor<T> Vertices, Tensor<T> Faces) ExtractMesh(Tensor<T> latent, int resolution = 64)

Parameters

latent Tensor<T>

Shape latent representation.

resolution int

Grid resolution for marching cubes.

Returns

(Tensor<T> grad1, Tensor<T> grad2)

Tuple of vertices [numVerts, 3] and faces [numFaces, 3].

Remarks

For Beginners: This converts the neural representation to a triangle mesh:

Marching cubes algorithm:

  1. Create a 3D grid of points
  2. Evaluate SDF (signed distance) at each point
  3. Find where surface crosses grid cells (SDF = 0)
  4. Generate triangles for those crossings

Higher resolution = more triangles = more detail but slower

GenerateLatent(string, string?, int, double, int?)

Generates a latent representation of a 3D shape from text.

public virtual Tensor<T> GenerateLatent(string prompt, string? negativePrompt = null, int numInferenceSteps = 64, double guidanceScale = 15, int? seed = null)

Parameters

prompt string

Text description of the 3D object.

negativePrompt string

Optional negative prompt.

numInferenceSteps int

Number of denoising steps.

guidanceScale double

Classifier-free guidance scale.

seed int?

Optional random seed.

Returns

Tensor<T>

Latent tensor representing the 3D shape [1, 1, latentDim].

Remarks

For Beginners: The latent is a compressed representation of the 3D shape. It contains the "recipe" for rendering the object from any angle.

After generating a latent, you can:

  • Render views with RenderView()
  • Extract a mesh with ExtractMesh()
  • Get a point cloud with SamplePointCloud()

GenerateLatentFromImage(Tensor<T>, int, double, int?)

Generates a latent from an image.

public virtual Tensor<T> GenerateLatentFromImage(Tensor<T> image, int numInferenceSteps = 64, double guidanceScale = 3, int? seed = null)

Parameters

image Tensor<T>

Input image [batch, channels, height, width].

numInferenceSteps int

Number of denoising steps.

guidanceScale double

Guidance scale.

seed int?

Optional random seed.

Returns

Tensor<T>

Latent tensor representing the 3D shape.

GenerateMesh(string, string?, int, int, double, int?)

Generates a mesh directly from a text prompt.

public override Mesh3D<T> GenerateMesh(string prompt, string? negativePrompt = null, int resolution = 64, int numInferenceSteps = 64, double guidanceScale = 15, int? seed = null)

Parameters

prompt string

Text description of the 3D object.

negativePrompt string

Optional negative prompt.

resolution int

Mesh resolution.

numInferenceSteps int

Number of denoising steps.

guidanceScale double

Guidance scale.

seed int?

Optional random seed.

Returns

Mesh3D<T>

Mesh3D containing vertices and faces.

GeneratePointCloud(string, string?, int?, int, double, int?)

Generates a point cloud from a text description.

public override Tensor<T> GeneratePointCloud(string prompt, string? negativePrompt = null, int? numPoints = null, int numInferenceSteps = 64, double guidanceScale = 15, int? seed = null)

Parameters

prompt string

Text description of the desired 3D object.

negativePrompt string

What to avoid.

numPoints int?

Number of points in the cloud.

numInferenceSteps int

Number of denoising steps.

guidanceScale double

Classifier-free guidance scale.

seed int?

Optional random seed.

Returns

Tensor<T>

Point cloud tensor [batch, numPoints, 3] for XYZ coordinates.

Remarks

For Beginners: This creates a cloud of 3D points that form a shape: - prompt: "A chair" → 4096 points arranged in a chair shape - The points define the surface of the object - Can be converted to a mesh for rendering

GetParameters()

Gets the parameters that can be optimized.

public override Vector<T> GetParameters()

Returns

Vector<T>

RenderView(Tensor<T>, (double x, double y, double z), (double x, double y, double z), int, int)

Renders a view of the shape from a camera position.

public virtual Tensor<T> RenderView(Tensor<T> latent, (double x, double y, double z) cameraPosition, (double x, double y, double z) lookAt, int imageSize = 256, int numSamples = 64)

Parameters

latent Tensor<T>

Shape latent representation.

cameraPosition (double X, double Y, double Z)

Camera position (x, y, z).

lookAt (double X, double Y, double Z)

Look-at target (x, y, z).

imageSize int

Output image size.

numSamples int

Number of ray samples for rendering.

Returns

Tensor<T>

Rendered image tensor [1, 3, imageSize, imageSize].

Remarks

For Beginners: This renders what the 3D object looks like from a specific viewpoint:

cameraPosition: Where the "camera" is located in 3D space lookAt: What point the camera is looking at

Example:

  • cameraPosition: (0, 0, 2) - camera is 2 units in front
  • lookAt: (0, 0, 0) - looking at the center
  • Result: Front view of the object

Change cameraPosition to render from different angles!

SamplePointCloud(Tensor<T>, int?, int?)

Samples a point cloud from the shape.

public virtual Tensor<T> SamplePointCloud(Tensor<T> latent, int? numPoints = null, int? seed = null)

Parameters

latent Tensor<T>

Shape latent representation.

numPoints int?

Number of points to sample.

seed int?

Optional random seed.

Returns

Tensor<T>

Point cloud tensor [1, numPoints, 6] with XYZ + RGB.

SetParameters(Vector<T>)

Sets the model parameters.

public override void SetParameters(Vector<T> parameters)

Parameters

parameters Vector<T>

The parameter vector to set.

Remarks

This method allows direct modification of the model's internal parameters. This is useful for optimization algorithms that need to update parameters iteratively. If the length of parameters does not match ParameterCount, an ArgumentException should be thrown.

Exceptions

ArgumentException

Thrown when the length of parameters does not match ParameterCount.