Table of Contents

Namespace AiDotNet.Diffusion.Models

Classes

AnimateDiffModel<T>

AnimateDiff model for text-to-video and image-to-video generation.

AudioLDM2Model<T>

AudioLDM 2 - Enhanced Audio Latent Diffusion Model with dual text encoders.

AudioLDMModel<T>

Audio Latent Diffusion Model (AudioLDM) for text-to-audio generation.

CameraEmbedding<T>

Camera position embedding for view conditioning.

CameraPose

Camera pose for rendering.

CameraPoseEncoder<T>

Encodes camera pose (polar, azimuth, radius) into embeddings.

ConsistencyModel<T>

Consistency Model for single-step or few-step image generation.

ControlNetEncoder<T>

ControlNet encoder that processes control signals.

ControlNetModel<T>

ControlNet model for adding spatial conditioning to diffusion models.

DallE3Model<T>

DALL-E 3 style text-to-image generation model with advanced prompt understanding and high-fidelity image generation capabilities.

DiffWaveModel<T>

DiffWave model for high-quality audio waveform synthesis using diffusion.

DiffWaveNetwork<T>

DiffWave neural network with dilated convolutions.

DiffWaveResidualBlock<T>

Residual block for DiffWave with dilated convolution.

DreamFusionConfig

Configuration for DreamFusion model.

DreamFusionModel<T>

DreamFusion model for text-to-3D generation via Score Distillation Sampling (SDS). Uses a 2D diffusion prior to optimize a 3D neural radiance field representation. Based on "DreamFusion: Text-to-3D using 2D Diffusion" (Poole et al., 2022).

DreamMesh<T>

Simple mesh representation for DreamFusion.

IPAdapterModel<T>

IP-Adapter model for image-based prompt conditioning in diffusion models.

ImageEncoder<T>

Image encoder for extracting features from reference images.

ImageProjector<T>

Projects image features to text embedding space.

MVDreamConfig

Configuration for MVDream model.

MVDreamModel<T>

MVDream - Multi-View Diffusion Model for 3D-consistent image generation.

MelodyEncoder<T>

Melody encoder for extracting melodic features from audio.

MotionModuleConfig

Configuration for AnimateDiff motion modules.

MultiViewAttention<T>

Multi-view attention module for cross-view consistency.

MultiViewUNet<T>

Multi-view aware U-Net for MVDream.

MusicGenModel<T>

MusicGen - Diffusion-based music generation model with advanced musical controls.

NeRFNetwork<T>

Neural Radiance Field network for 3D representation.

NeRFResult<T>

Result from DreamFusion generation.

PixArtModel<T>

PixArt-α model for efficient high-quality text-to-image generation using DiT architecture.

PixArtOptions<T>

Options for PixArt-α model configuration.

PointEModel<T>

Point-E model for text-to-3D point cloud generation.

PointEModel<T>.PointCounts

Standard Point-E point counts.

RhythmEncoder<T>

Rhythm encoder for extracting beat/rhythm features from audio.

RiffusionModel<T>

Riffusion model for music generation via spectrogram diffusion.

SDXLModel<T>

Stable Diffusion XL (SDXL) model for high-resolution image generation.

SDXLRefiner<T>

SDXL Refiner model for enhancing generated images.

ShapEModel<T>

Shap-E model for text-to-3D and image-to-3D generation with implicit neural representations.

SpectrogramConfig

Configuration for spectrogram generation.

StableVideoDiffusion<T>

Stable Video Diffusion (SVD) model for image-to-video generation.

VideoCrafterModel<T>

VideoCrafter model for high-quality text-to-video and image-to-video generation.

Zero123Model<T>

Zero-1-to-3 model for novel view synthesis from a single image.

Structs

DreamVector3<T>

3D vector type for DreamFusion.

Enums

AudioLDM2Variant

AudioLDM 2 model variant.

ControlType

Types of control signals supported by ControlNet.

MusicGenSize

MusicGen model size variants.