Namespace AiDotNet.Diffusion.Models
Classes
- AnimateDiffModel<T>
AnimateDiff model for text-to-video and image-to-video generation.
- AudioLDM2Model<T>
AudioLDM 2 - Enhanced Audio Latent Diffusion Model with dual text encoders.
- AudioLDMModel<T>
Audio Latent Diffusion Model (AudioLDM) for text-to-audio generation.
- CameraEmbedding<T>
Camera position embedding for view conditioning.
- CameraPose
Camera pose for rendering.
- CameraPoseEncoder<T>
Encodes camera pose (polar, azimuth, radius) into embeddings.
- ConsistencyModel<T>
Consistency Model for single-step or few-step image generation.
- ControlNetEncoder<T>
ControlNet encoder that processes control signals.
- ControlNetModel<T>
ControlNet model for adding spatial conditioning to diffusion models.
- DallE3Model<T>
DALL-E 3 style text-to-image generation model with advanced prompt understanding and high-fidelity image generation capabilities.
- DiffWaveModel<T>
DiffWave model for high-quality audio waveform synthesis using diffusion.
- DiffWaveNetwork<T>
DiffWave neural network with dilated convolutions.
- DiffWaveResidualBlock<T>
Residual block for DiffWave with dilated convolution.
- DreamFusionConfig
Configuration for DreamFusion model.
- DreamFusionModel<T>
DreamFusion model for text-to-3D generation via Score Distillation Sampling (SDS). Uses a 2D diffusion prior to optimize a 3D neural radiance field representation. Based on "DreamFusion: Text-to-3D using 2D Diffusion" (Poole et al., 2022).
- DreamMesh<T>
Simple mesh representation for DreamFusion.
- IPAdapterModel<T>
IP-Adapter model for image-based prompt conditioning in diffusion models.
- ImageEncoder<T>
Image encoder for extracting features from reference images.
- ImageProjector<T>
Projects image features to text embedding space.
- MVDreamConfig
Configuration for MVDream model.
- MVDreamModel<T>
MVDream - Multi-View Diffusion Model for 3D-consistent image generation.
- MelodyEncoder<T>
Melody encoder for extracting melodic features from audio.
- MotionModuleConfig
Configuration for AnimateDiff motion modules.
- MultiViewAttention<T>
Multi-view attention module for cross-view consistency.
- MultiViewUNet<T>
Multi-view aware U-Net for MVDream.
- MusicGenModel<T>
MusicGen - Diffusion-based music generation model with advanced musical controls.
- NeRFNetwork<T>
Neural Radiance Field network for 3D representation.
- NeRFResult<T>
Result from DreamFusion generation.
- PixArtModel<T>
PixArt-α model for efficient high-quality text-to-image generation using DiT architecture.
- PixArtOptions<T>
Options for PixArt-α model configuration.
- PointEModel<T>
Point-E model for text-to-3D point cloud generation.
- PointEModel<T>.PointCounts
Standard Point-E point counts.
- RhythmEncoder<T>
Rhythm encoder for extracting beat/rhythm features from audio.
- RiffusionModel<T>
Riffusion model for music generation via spectrogram diffusion.
- SDXLModel<T>
Stable Diffusion XL (SDXL) model for high-resolution image generation.
- SDXLRefiner<T>
SDXL Refiner model for enhancing generated images.
- ShapEModel<T>
Shap-E model for text-to-3D and image-to-3D generation with implicit neural representations.
- SpectrogramConfig
Configuration for spectrogram generation.
- StableVideoDiffusion<T>
Stable Video Diffusion (SVD) model for image-to-video generation.
- VideoCrafterModel<T>
VideoCrafter model for high-quality text-to-video and image-to-video generation.
- Zero123Model<T>
Zero-1-to-3 model for novel view synthesis from a single image.
Structs
- DreamVector3<T>
3D vector type for DreamFusion.
Enums
- AudioLDM2Variant
AudioLDM 2 model variant.
- ControlType
Types of control signals supported by ControlNet.
- MusicGenSize
MusicGen model size variants.