Class ShapEModel<T>
Shap-E model for text-to-3D and image-to-3D generation with implicit neural representations.
public class ShapEModel<T> : ThreeDDiffusionModelBase<T>, ILatentDiffusionModel<T>, I3DDiffusionModel<T>, IDiffusionModel<T>, IFullModel<T, Tensor<T>, Tensor<T>>, IModel<Tensor<T>, Tensor<T>, ModelMetadata<T>>, IModelSerializer, ICheckpointableModel, IParameterizable<T, Tensor<T>, Tensor<T>>, IFeatureAware, IFeatureImportance<T>, ICloneable<IFullModel<T, Tensor<T>, Tensor<T>>>, IGradientComputable<T, Tensor<T>, Tensor<T>>, IJitCompilable<T>
Type Parameters
TThe numeric type used for calculations.
- Inheritance
-
ShapEModel<T>
- Implements
- Inherited Members
- Extension Methods
Examples
// Create a Shap-E model
var shapE = new ShapEModel<float>();
// Generate a 3D shape from text
var latent = shapE.GenerateLatent(
prompt: "A wooden chair",
numInferenceSteps: 64,
guidanceScale: 15.0);
// Render from a specific view
var image = shapE.RenderView(latent, cameraPosition: (0, 0, 2), lookAt: (0, 0, 0));
// Export to mesh
var (vertices, faces) = shapE.ExtractMesh(latent, resolution: 64);
// Or use the high-level API
var mesh = shapE.GenerateMesh(
prompt: "A red sports car",
resolution: 128);
ExportToOBJ(mesh, "car.obj");
Remarks
Shap-E is OpenAI's model for generating 3D objects as implicit neural representations (NeRFs). Unlike Point-E which generates point clouds, Shap-E generates parameters for a neural network that represents the 3D shape, which can then be rendered from any angle or converted to meshes.
For Beginners: Shap-E creates 3D objects that you can view from any angle:
What is an Implicit Neural Representation (NeRF)?
- A neural network that knows the 3D shape
- Input: 3D coordinates (x, y, z)
- Output: Color and density at that point
- Can render views from ANY angle without artifacts
| Feature | Point-E | Shap-E |
|---|---|---|
| Output | Point cloud | Neural field |
| Quality | Good | Better |
| Rendering | Fast | Slower |
| Mesh export | Reconstruction | Direct SDF |
| Memory | Lower | Higher |
Example: "A red chair"
- Shap-E generates network weights (latent representation)
- These weights define a neural network
- Query (x,y,z) -> neural network -> color, density
- Render from any view or extract mesh via marching cubes
Use cases:
- High-quality 3D assets
- Novel view synthesis
- Direct mesh export with SDF
- View-consistent 3D models
Technical specifications: - Latent dimension: 1024 parameters per shape - Output: NeRF weights or SDF (Signed Distance Function) - Rendering: Differentiable volumetric rendering - Mesh export: Marching cubes on SDF - Inference: ~64 steps
Constructors
ShapEModel()
Initializes a new Shap-E model with default parameters.
public ShapEModel()
ShapEModel(DiffusionModelOptions<T>?, INoiseScheduler<T>?, DiTNoisePredictor<T>?, IConditioningModule<T>?, bool, int, int?)
Initializes a new Shap-E model with custom parameters.
public ShapEModel(DiffusionModelOptions<T>? options = null, INoiseScheduler<T>? scheduler = null, DiTNoisePredictor<T>? latentPredictor = null, IConditioningModule<T>? conditioner = null, bool useSDFMode = true, int defaultPointCount = 4096, int? seed = null)
Parameters
optionsDiffusionModelOptions<T>Configuration options.
schedulerINoiseScheduler<T>Optional custom scheduler.
latentPredictorDiTNoisePredictor<T>Optional custom latent predictor.
conditionerIConditioningModule<T>Optional conditioning module.
useSDFModeboolWhether to use SDF mode.
defaultPointCountintDefault point count for point cloud extraction.
seedint?Optional random seed.
Properties
Conditioner
Gets the conditioning module (optional, for conditioned generation).
public override IConditioningModule<T>? Conditioner { get; }
Property Value
LatentChannels
Gets the number of latent channels.
public override int LatentChannels { get; }
Property Value
Remarks
Typically 4 for Stable Diffusion models.
LatentDimension
Gets the latent dimension.
public int LatentDimension { get; }
Property Value
NoisePredictor
Gets the noise predictor model (U-Net, DiT, etc.).
public override INoisePredictor<T> NoisePredictor { get; }
Property Value
ParameterCount
Gets the number of parameters in the model.
public override int ParameterCount { get; }
Property Value
Remarks
This property returns the total count of trainable parameters in the model. It's useful for understanding model complexity and memory requirements.
SupportsMesh
Gets whether this model supports mesh generation.
public override bool SupportsMesh { get; }
Property Value
SupportsNovelView
Gets whether this model supports novel view synthesis.
public override bool SupportsNovelView { get; }
Property Value
SupportsPointCloud
Gets whether this model supports point cloud generation.
public override bool SupportsPointCloud { get; }
Property Value
SupportsScoreDistillation
Gets whether this model supports score distillation sampling (SDS).
public override bool SupportsScoreDistillation { get; }
Property Value
Remarks
SDS uses gradients from a 2D diffusion model to optimize a 3D representation. This is the technique behind DreamFusion and similar text-to-3D methods.
SupportsTexture
Gets whether this model supports texture generation.
public override bool SupportsTexture { get; }
Property Value
UseSDFMode
Gets whether this model uses SDF mode.
public bool UseSDFMode { get; }
Property Value
VAE
Gets the VAE model used for encoding and decoding.
public override IVAEModel<T> VAE { get; }
Property Value
- IVAEModel<T>
Methods
Clone()
Creates a deep copy of the model.
public override IDiffusionModel<T> Clone()
Returns
- IDiffusionModel<T>
A new instance with the same parameters.
DeepCopy()
Creates a deep copy of this object.
public override IFullModel<T, Tensor<T>, Tensor<T>> DeepCopy()
Returns
- IFullModel<T, Tensor<T>, Tensor<T>>
ExtractMesh(Tensor<T>, int)
Extracts a mesh from the latent using marching cubes.
public virtual (Tensor<T> Vertices, Tensor<T> Faces) ExtractMesh(Tensor<T> latent, int resolution = 64)
Parameters
latentTensor<T>Shape latent representation.
resolutionintGrid resolution for marching cubes.
Returns
Remarks
For Beginners: This converts the neural representation to a triangle mesh:
Marching cubes algorithm:
- Create a 3D grid of points
- Evaluate SDF (signed distance) at each point
- Find where surface crosses grid cells (SDF = 0)
- Generate triangles for those crossings
Higher resolution = more triangles = more detail but slower
GenerateLatent(string, string?, int, double, int?)
Generates a latent representation of a 3D shape from text.
public virtual Tensor<T> GenerateLatent(string prompt, string? negativePrompt = null, int numInferenceSteps = 64, double guidanceScale = 15, int? seed = null)
Parameters
promptstringText description of the 3D object.
negativePromptstringOptional negative prompt.
numInferenceStepsintNumber of denoising steps.
guidanceScaledoubleClassifier-free guidance scale.
seedint?Optional random seed.
Returns
- Tensor<T>
Latent tensor representing the 3D shape [1, 1, latentDim].
Remarks
For Beginners: The latent is a compressed representation of the 3D shape. It contains the "recipe" for rendering the object from any angle.
After generating a latent, you can:
- Render views with RenderView()
- Extract a mesh with ExtractMesh()
- Get a point cloud with SamplePointCloud()
GenerateLatentFromImage(Tensor<T>, int, double, int?)
Generates a latent from an image.
public virtual Tensor<T> GenerateLatentFromImage(Tensor<T> image, int numInferenceSteps = 64, double guidanceScale = 3, int? seed = null)
Parameters
imageTensor<T>Input image [batch, channels, height, width].
numInferenceStepsintNumber of denoising steps.
guidanceScaledoubleGuidance scale.
seedint?Optional random seed.
Returns
- Tensor<T>
Latent tensor representing the 3D shape.
GenerateMesh(string, string?, int, int, double, int?)
Generates a mesh directly from a text prompt.
public override Mesh3D<T> GenerateMesh(string prompt, string? negativePrompt = null, int resolution = 64, int numInferenceSteps = 64, double guidanceScale = 15, int? seed = null)
Parameters
promptstringText description of the 3D object.
negativePromptstringOptional negative prompt.
resolutionintMesh resolution.
numInferenceStepsintNumber of denoising steps.
guidanceScaledoubleGuidance scale.
seedint?Optional random seed.
Returns
- Mesh3D<T>
Mesh3D containing vertices and faces.
GeneratePointCloud(string, string?, int?, int, double, int?)
Generates a point cloud from a text description.
public override Tensor<T> GeneratePointCloud(string prompt, string? negativePrompt = null, int? numPoints = null, int numInferenceSteps = 64, double guidanceScale = 15, int? seed = null)
Parameters
promptstringText description of the desired 3D object.
negativePromptstringWhat to avoid.
numPointsint?Number of points in the cloud.
numInferenceStepsintNumber of denoising steps.
guidanceScaledoubleClassifier-free guidance scale.
seedint?Optional random seed.
Returns
- Tensor<T>
Point cloud tensor [batch, numPoints, 3] for XYZ coordinates.
Remarks
For Beginners: This creates a cloud of 3D points that form a shape: - prompt: "A chair" → 4096 points arranged in a chair shape - The points define the surface of the object - Can be converted to a mesh for rendering
GetParameters()
Gets the parameters that can be optimized.
public override Vector<T> GetParameters()
Returns
- Vector<T>
RenderView(Tensor<T>, (double x, double y, double z), (double x, double y, double z), int, int)
Renders a view of the shape from a camera position.
public virtual Tensor<T> RenderView(Tensor<T> latent, (double x, double y, double z) cameraPosition, (double x, double y, double z) lookAt, int imageSize = 256, int numSamples = 64)
Parameters
latentTensor<T>Shape latent representation.
cameraPosition(double X, double Y, double Z)Camera position (x, y, z).
lookAt(double X, double Y, double Z)Look-at target (x, y, z).
imageSizeintOutput image size.
numSamplesintNumber of ray samples for rendering.
Returns
- Tensor<T>
Rendered image tensor [1, 3, imageSize, imageSize].
Remarks
For Beginners: This renders what the 3D object looks like from a specific viewpoint:
cameraPosition: Where the "camera" is located in 3D space lookAt: What point the camera is looking at
Example:
- cameraPosition: (0, 0, 2) - camera is 2 units in front
- lookAt: (0, 0, 0) - looking at the center
- Result: Front view of the object
Change cameraPosition to render from different angles!
SamplePointCloud(Tensor<T>, int?, int?)
Samples a point cloud from the shape.
public virtual Tensor<T> SamplePointCloud(Tensor<T> latent, int? numPoints = null, int? seed = null)
Parameters
latentTensor<T>Shape latent representation.
numPointsint?Number of points to sample.
seedint?Optional random seed.
Returns
- Tensor<T>
Point cloud tensor [1, numPoints, 6] with XYZ + RGB.
SetParameters(Vector<T>)
Sets the model parameters.
public override void SetParameters(Vector<T> parameters)
Parameters
parametersVector<T>The parameter vector to set.
Remarks
This method allows direct modification of the model's internal parameters.
This is useful for optimization algorithms that need to update parameters iteratively.
If the length of parameters does not match ParameterCount,
an ArgumentException should be thrown.
Exceptions
- ArgumentException
Thrown when the length of
parametersdoes not match ParameterCount.