Namespace AiDotNet.Diffusion.Models

Classes

AnimateDiffModel<T>: AnimateDiff model for text-to-video and image-to-video generation.

AudioLDM2Model<T>: AudioLDM 2 - Enhanced Audio Latent Diffusion Model with dual text encoders.

AudioLDMModel<T>: Audio Latent Diffusion Model (AudioLDM) for text-to-audio generation.

CameraEmbedding<T>: Camera position embedding for view conditioning.

CameraPose: Camera pose for rendering.

CameraPoseEncoder<T>: Encodes camera pose (polar, azimuth, radius) into embeddings.

ConsistencyModel<T>: Consistency Model for single-step or few-step image generation.

ControlNetEncoder<T>: ControlNet encoder that processes control signals.

ControlNetModel<T>: ControlNet model for adding spatial conditioning to diffusion models.

DallE3Model<T>: DALL-E 3 style text-to-image generation model with advanced prompt understanding and high-fidelity image generation capabilities.

DiffWaveModel<T>: DiffWave model for high-quality audio waveform synthesis using diffusion.

DiffWaveNetwork<T>: DiffWave neural network with dilated convolutions.

DiffWaveResidualBlock<T>: Residual block for DiffWave with dilated convolution.

DreamFusionConfig: Configuration for DreamFusion model.

DreamFusionModel<T>: DreamFusion model for text-to-3D generation via Score Distillation Sampling (SDS). Uses a 2D diffusion prior to optimize a 3D neural radiance field representation. Based on "DreamFusion: Text-to-3D using 2D Diffusion" (Poole et al., 2022).

DreamMesh<T>: Simple mesh representation for DreamFusion.

IPAdapterModel<T>: IP-Adapter model for image-based prompt conditioning in diffusion models.

ImageEncoder<T>: Image encoder for extracting features from reference images.

ImageProjector<T>: Projects image features to text embedding space.

MVDreamConfig: Configuration for MVDream model.

MVDreamModel<T>: MVDream - Multi-View Diffusion Model for 3D-consistent image generation.

MelodyEncoder<T>: Melody encoder for extracting melodic features from audio.

MotionModuleConfig: Configuration for AnimateDiff motion modules.

MultiViewAttention<T>: Multi-view attention module for cross-view consistency.

MultiViewUNet<T>: Multi-view aware U-Net for MVDream.

MusicGenModel<T>: MusicGen - Diffusion-based music generation model with advanced musical controls.

NeRFNetwork<T>: Neural Radiance Field network for 3D representation.

NeRFResult<T>: Result from DreamFusion generation.

PixArtModel<T>: PixArt-α model for efficient high-quality text-to-image generation using DiT architecture.

PixArtOptions<T>: Options for PixArt-α model configuration.

PointEModel<T>: Point-E model for text-to-3D point cloud generation.

PointEModel<T>.PointCounts: Standard Point-E point counts.

RhythmEncoder<T>: Rhythm encoder for extracting beat/rhythm features from audio.

RiffusionModel<T>: Riffusion model for music generation via spectrogram diffusion.

SDXLModel<T>: Stable Diffusion XL (SDXL) model for high-resolution image generation.

SDXLRefiner<T>: SDXL Refiner model for enhancing generated images.

ShapEModel<T>: Shap-E model for text-to-3D and image-to-3D generation with implicit neural representations.

SpectrogramConfig: Configuration for spectrogram generation.

StableVideoDiffusion<T>: Stable Video Diffusion (SVD) model for image-to-video generation.

VideoCrafterModel<T>: VideoCrafter model for high-quality text-to-video and image-to-video generation.

Zero123Model<T>: Zero-1-to-3 model for novel view synthesis from a single image.

Structs

DreamVector3<T>: 3D vector type for DreamFusion.

Enums

AudioLDM2Variant: AudioLDM 2 model variant.

ControlType: Types of control signals supported by ControlNet.

MusicGenSize: MusicGen model size variants.