Enum FewShotSelectionStrategy

Namespace: AiDotNet.Enums

Assembly: AiDotNet.dll

Represents strategies for selecting few-shot examples in prompt templates.

public enum FewShotSelectionStrategy

Fields

ClusterBased = 5

Select examples using a clustering approach to ensure broad coverage.

For Beginners: Cluster-based selection groups similar examples and picks representatives.

Think of organizing a photo collection:

Group 1: Beach photos
Group 2: Mountain photos
Group 3: City photos
Group 4: Portrait photos

Instead of showing all beach photos, you show one representative from each group.

How it works:

Cluster all examples into groups of similar items
Select examples from different clusters
Ensure each cluster is represented
Optionally, weight selection by cluster size or importance

Example - Customer support queries: Clusters discovered:

Billing issues (100 examples)
Technical problems (150 examples)
Account questions (50 examples)
Shipping inquiries (75 examples)

Selection for 4 examples:

Pick 1 from Billing issues
Pick 1 from Technical problems
Pick 1 from Account questions
Pick 1 from Shipping inquiries

Result: Examples that cover all major types of customer queries.

Advantages:

Systematic coverage of the example space
Prevents over-representation of any one type
Good for varied input distributions
Generalizes well

Disadvantages:

Requires pre-computed clusters
May not adapt to individual queries
Setup overhead for clustering

Use this when:

Examples naturally fall into categories
You want guaranteed coverage of all categories
Example pool has uneven distribution
Building a general-purpose system

Diversity = 2

Select diverse examples to maximize coverage of different patterns.

For Beginners: Diversity selection picks examples that are different from each other.

Think of it like choosing a diverse team:

You don't want everyone with the same skills
You want a variety of perspectives and strengths
Together, they cover more ground than similar individuals would

The goal is to show the model different types of inputs and outputs so it understands the full range of possibilities.

Example - Sentiment classification: Instead of all similar examples:

"Great product!" → Positive
"Loved it!" → Positive
"Excellent quality!" → Positive

Diverse selection chooses:

"Great product!" → Positive (enthusiastic positive)
"It's okay, nothing special." → Neutral (mild, mixed)
"Complete waste of money!" → Negative (strong negative)
"Not bad, but could be better." → Mixed (constructive criticism)

The diverse set teaches the model to handle different tones, lengths, and edge cases.

Advantages:

Model learns broader patterns
Better generalization
Handles edge cases better
Less likely to overfit to specific patterns

Disadvantages:

May not include the most relevant examples for specific queries
Requires clustering or diversity metrics
More complex to implement

Use this when:

Input space is large and varied
You want robust, generalizable performance
Examples naturally cluster into groups
You're building a general-purpose system

Fixed = 4

Select examples in a fixed, predetermined order.

For Beginners: Fixed ordering uses the same examples in the same order every time.

Like a textbook that always shows the same examples:

Example 1 is always first
Example 2 is always second
And so on...

Use cases:

You've carefully curated the best examples
Order matters (simple to complex, for instance)
Consistency across all queries is important
You want predictable, reproducible behavior

Example - Teaching code style: Always show these examples in this order:

Simple variable naming
Function documentation
Error handling
Complex class structure

This builds from basics to advanced, regardless of the query.

Advantages:

Completely predictable
No computation needed
Fastest option
Can carefully control the learning progression

Disadvantages:

Not adaptive to different queries
May include irrelevant examples
Less flexible

Use this when:

You have a canonical set of examples
All queries are similar enough that the same examples work
Speed and consistency are critical
You want to maintain exact reproducibility

MaximumMarginalRelevance = 3

Select examples based on maximum marginal relevance (balance between relevance and diversity).

For Beginners: MMR (Maximum Marginal Relevance) balances being relevant AND diverse.

Think of it like building a playlist:

Pure relevance: All songs sound almost identical (boring!)
Pure diversity: Random songs that don't fit together (confusing!)
MMR: Similar style, but each song brings something different (perfect!)

How it works:

Start with the most relevant example
For the next example, consider both:
- How relevant it is to the query (good)
- How different it is from already-selected examples (also good)
Pick the example with the best balance
Repeat until you have enough examples

Example - Question answering about Python: Query: "How do I sort a list in Python?"

Step 1: Most relevant example Selected: "How to reverse a list → list.reverse()" (very similar to sorting)

Step 2: Balance relevance and diversity Candidates:

"How to sort a dictionary" (Relevant: 0.9, Diversity: 0.4) → Score: 0.65
"How to use list comprehensions" (Relevant: 0.5, Diversity: 0.9) → Score: 0.70
"How to remove duplicates" (Relevant: 0.7, Diversity: 0.7) → Score: 0.70

Selected: "How to use list comprehensions" or "How to remove duplicates"

The result: Examples that are all related to the query but show different aspects of working with lists.

Advantages:

Best of both worlds (relevance + diversity)
Often outperforms pure relevance or pure diversity
Prevents redundant examples
Provides comprehensive coverage

Disadvantages:

More complex to implement
Slower than simple strategies
Requires tuning the relevance/diversity balance parameter

Use this when:

You want the best possible performance
You have the computational resources
Both relevance and diversity matter
Quality is the top priority

Random = 0

Select examples randomly from the available pool.

For Beginners: Random selection picks examples without any particular logic.

Like drawing names from a hat:

Each example has an equal chance of being selected
No consideration of relevance or similarity
Simple and fast
Good baseline approach

Example: Available examples: 100 sentiment classification examples Need: 3 examples Strategy: Randomly pick 3 from the 100

Advantages:

Very fast
No special setup required
Unbiased
Good for diverse example sets

Disadvantages:

May pick irrelevant examples
Performance varies between runs
Doesn't adapt to the specific query

Use this when:

You have diverse, high-quality examples
All examples are roughly equally useful
Simplicity is important
You don't have semantic similarity infrastructure

SemanticSimilarity = 1

Select examples most semantically similar to the current input.

For Beginners: Semantic similarity picks examples that are most similar to the current query.

Think of it like a tutor who:

Looks at the problem the student is struggling with
Finds previous problems that are similar
Shows those similar examples to help the student understand

How it works:

Convert query and all examples into embeddings (mathematical representations)
Calculate similarity between query and each example
Select the most similar examples

Example: Query: "This restaurant was absolutely terrible, worst experience ever."

Example pool:

"The movie was amazing, I loved it!" (Low similarity)
"Bad service, would not recommend this place." (High similarity - both negative reviews)
"The weather is nice today." (Low similarity)
"Horrible food, never going back." (High similarity - both negative, about food)

Selected examples:

"Bad service, would not recommend this place."
"Horrible food, never going back."

Advantages:

Examples are relevant to the specific query
Often better performance than random
Adapts to different types of queries

Disadvantages:

Requires embedding model
Slower than random (need to compute similarities)
May lack diversity if all examples are too similar

Use this when:

Query types vary significantly
Relevant examples significantly improve performance
You have an embedding model available
Quality is more important than speed

Remarks

For Beginners: Few-shot selection strategies determine which examples to show the language model.

Think of it like choosing which practice problems to show a student:

You could show random problems
You could show problems similar to the current one
You could show problems that cover diverse scenarios
You could show the most helpful examples

The right strategy depends on what you're trying to teach and what works best. Different strategies can significantly impact the model's performance.

Table of Contents

Enum FewShotSelectionStrategy

Fields

Remarks