Table of Contents

Class IPAdapterModel<T>

Namespace
AiDotNet.Diffusion.Models
Assembly
AiDotNet.dll

IP-Adapter model for image-based prompt conditioning in diffusion models.

public class IPAdapterModel<T> : LatentDiffusionModelBase<T>, ILatentDiffusionModel<T>, IDiffusionModel<T>, IFullModel<T, Tensor<T>, Tensor<T>>, IModel<Tensor<T>, Tensor<T>, ModelMetadata<T>>, IModelSerializer, ICheckpointableModel, IParameterizable<T, Tensor<T>, Tensor<T>>, IFeatureAware, IFeatureImportance<T>, ICloneable<IFullModel<T, Tensor<T>, Tensor<T>>>, IGradientComputable<T, Tensor<T>, Tensor<T>>, IJitCompilable<T>

Type Parameters

T

The numeric type used for calculations.

Inheritance
IPAdapterModel<T>
Implements
IFullModel<T, Tensor<T>, Tensor<T>>
IModel<Tensor<T>, Tensor<T>, ModelMetadata<T>>
IParameterizable<T, Tensor<T>, Tensor<T>>
ICloneable<IFullModel<T, Tensor<T>, Tensor<T>>>
IGradientComputable<T, Tensor<T>, Tensor<T>>
Inherited Members
Extension Methods

Examples

// Create an IP-Adapter model
var ipAdapter = new IPAdapterModel<float>();

// Generate with image reference
var referenceImage = LoadImage("style_reference.png");
var image = ipAdapter.GenerateWithImagePrompt(
    textPrompt: "A beautiful landscape",
    imagePrompt: referenceImage,
    imagePromptWeight: 0.7);

// Multi-image reference
var faceImage = LoadImage("face.png");
var styleImage = LoadImage("art_style.png");
var composed = ipAdapter.GenerateWithMultiImagePrompt(
    textPrompt: "Portrait painting",
    imagePrompts: new[] { faceImage, styleImage },
    imageWeights: new[] { 0.8, 0.5 });

Remarks

IP-Adapter (Image Prompt Adapter) enables using reference images as prompts to guide image generation. It decouples cross-attention for text and image features, allowing fine-grained control over image style, composition, and content transfer.

For Beginners: IP-Adapter lets you use pictures as instructions for the AI instead of just text.

Think of it like:

  • Showing someone a photo and saying "make something like this"
  • The AI extracts the style, composition, and content from your image
  • It then applies those elements to create new images

Use cases:

  • Style transfer: "Generate in the style of this artwork"
  • Face preservation: Keep a person's likeness in different scenes
  • Object consistency: Maintain the same object across images
  • Scene composition: Use reference for layout/arrangement

Key advantage: Combines with text prompts for precise control

Technical details: - Uses a pretrained image encoder (like CLIP ViT) - Projects image features to text embedding space - Injects via decoupled cross-attention mechanism - Supports multiple reference images (multi-IP) - Adjustable image prompt weight (0-1)

Reference: Ye et al., "IP-Adapter: Text Compatible Image Prompt Adapter", 2023

Constructors

IPAdapterModel()

Initializes a new instance of IPAdapterModel with default parameters.

public IPAdapterModel()

IPAdapterModel(DiffusionModelOptions<T>?, INoiseScheduler<T>?, UNetNoisePredictor<T>?, StandardVAE<T>?, IConditioningModule<T>?, int, int?)

Initializes a new instance of IPAdapterModel with custom parameters.

public IPAdapterModel(DiffusionModelOptions<T>? options = null, INoiseScheduler<T>? scheduler = null, UNetNoisePredictor<T>? baseUNet = null, StandardVAE<T>? vae = null, IConditioningModule<T>? conditioner = null, int embedDim = 768, int? seed = null)

Parameters

options DiffusionModelOptions<T>

Configuration options for the diffusion model.

scheduler INoiseScheduler<T>

Optional custom scheduler.

baseUNet UNetNoisePredictor<T>

Optional base U-Net noise predictor.

vae StandardVAE<T>

Optional custom VAE.

conditioner IConditioningModule<T>

Optional conditioning module for text encoding.

embedDim int

Image embedding dimension.

seed int?

Optional random seed for reproducibility.

Properties

Conditioner

Gets the conditioning module (optional, for conditioned generation).

public override IConditioningModule<T>? Conditioner { get; }

Property Value

IConditioningModule<T>

ImagePromptWeight

Gets or sets the default image prompt weight (0-1).

public double ImagePromptWeight { get; set; }

Property Value

double

LatentChannels

Gets the number of latent channels.

public override int LatentChannels { get; }

Property Value

int

Remarks

Typically 4 for Stable Diffusion models.

NoisePredictor

Gets the noise predictor model (U-Net, DiT, etc.).

public override INoisePredictor<T> NoisePredictor { get; }

Property Value

INoisePredictor<T>

ParameterCount

Gets the number of parameters in the model.

public override int ParameterCount { get; }

Property Value

int

Remarks

This property returns the total count of trainable parameters in the model. It's useful for understanding model complexity and memory requirements.

VAE

Gets the VAE model used for encoding and decoding.

public override IVAEModel<T> VAE { get; }

Property Value

IVAEModel<T>

Methods

Clone()

Creates a deep copy of the model.

public override IDiffusionModel<T> Clone()

Returns

IDiffusionModel<T>

A new instance with the same parameters.

DeepCopy()

Creates a deep copy of this object.

public override IFullModel<T, Tensor<T>, Tensor<T>> DeepCopy()

Returns

IFullModel<T, Tensor<T>, Tensor<T>>

GenerateWithImagePrompt(string, Tensor<T>, string?, int, int, int, double?, double?, int?)

Generates an image with image prompt conditioning.

public virtual Tensor<T> GenerateWithImagePrompt(string textPrompt, Tensor<T> imagePrompt, string? negativePrompt = null, int width = 512, int height = 512, int numInferenceSteps = 50, double? guidanceScale = null, double? imagePromptWeight = null, int? seed = null)

Parameters

textPrompt string

The text prompt describing the desired image.

imagePrompt Tensor<T>

The reference image for conditioning.

negativePrompt string

Optional negative prompt.

width int

Output image width.

height int

Output image height.

numInferenceSteps int

Number of denoising steps.

guidanceScale double?

Classifier-free guidance scale.

imagePromptWeight double?

Weight for image prompt (0-1).

seed int?

Optional random seed.

Returns

Tensor<T>

The generated image tensor.

GenerateWithMultiImagePrompt(string, Tensor<T>[], double[]?, string?, int, int, int, double?, int?)

Generates an image with multiple image prompts.

public virtual Tensor<T> GenerateWithMultiImagePrompt(string textPrompt, Tensor<T>[] imagePrompts, double[]? imageWeights = null, string? negativePrompt = null, int width = 512, int height = 512, int numInferenceSteps = 50, double? guidanceScale = null, int? seed = null)

Parameters

textPrompt string

The text prompt.

imagePrompts Tensor<T>[]

Array of reference images.

imageWeights double[]

Optional weights for each image.

negativePrompt string

Optional negative prompt.

width int

Output width.

height int

Output height.

numInferenceSteps int

Number of steps.

guidanceScale double?

Guidance scale.

seed int?

Random seed.

Returns

Tensor<T>

The generated image.

GetParameters()

Gets the parameters that can be optimized.

public override Vector<T> GetParameters()

Returns

Vector<T>

SetParameters(Vector<T>)

Sets the model parameters.

public override void SetParameters(Vector<T> parameters)

Parameters

parameters Vector<T>

The parameter vector to set.

Remarks

This method allows direct modification of the model's internal parameters. This is useful for optimization algorithms that need to update parameters iteratively. If the length of parameters does not match ParameterCount, an ArgumentException should be thrown.

Exceptions

ArgumentException

Thrown when the length of parameters does not match ParameterCount.