Table of Contents

Class GloVe<T>

Namespace
AiDotNet.NeuralNetworks
Assembly
AiDotNet.dll

GloVe (Global Vectors for Word Representation) neural network implementation.

public class GloVe<T> : NeuralNetworkBase<T>, INeuralNetworkModel<T>, INeuralNetwork<T>, IFullModel<T, Tensor<T>, Tensor<T>>, IModel<Tensor<T>, Tensor<T>, ModelMetadata<T>>, IModelSerializer, ICheckpointableModel, IParameterizable<T, Tensor<T>, Tensor<T>>, IFeatureAware, IFeatureImportance<T>, ICloneable<IFullModel<T, Tensor<T>, Tensor<T>>>, IGradientComputable<T, Tensor<T>, Tensor<T>>, IJitCompilable<T>, IInterpretableModel<T>, IInputGradientComputable<T>, IDisposable, IEmbeddingModel<T>

Type Parameters

T

The numeric type used for calculations (typically float or double).

Inheritance
GloVe<T>
Implements
IFullModel<T, Tensor<T>, Tensor<T>>
IModel<Tensor<T>, Tensor<T>, ModelMetadata<T>>
IParameterizable<T, Tensor<T>, Tensor<T>>
ICloneable<IFullModel<T, Tensor<T>, Tensor<T>>>
IGradientComputable<T, Tensor<T>, Tensor<T>>
Inherited Members
Extension Methods

Remarks

GloVe is an unsupervised learning algorithm for obtaining vector representations for words. Training is performed on aggregated global word-word co-occurrence statistics from a corpus, and the resulting representations showcase interesting linear substructures of the word vector space.

For Beginners: If Word2Vec is like a student learning from reading newspapers one page at a time, GloVe is like a researcher who looks at the entire library all at once. It builds a giant table showing how often every word in the dictionary appears near every other word. It then uses clever math to find the best "address" for each word so that the distance between addresses matches those counts perfectly.

The GloVe model is famous for its ability to solve word analogies, like: "King - Man + Woman = Queen."

Constructors

GloVe(NeuralNetworkArchitecture<T>, ITokenizer?, IGradientBasedOptimizer<T, Tensor<T>, Tensor<T>>?, int, int, int, ILossFunction<T>?, double)

Initializes a new instance of the GloVe embedding model.

public GloVe(NeuralNetworkArchitecture<T> architecture, ITokenizer? tokenizer = null, IGradientBasedOptimizer<T, Tensor<T>, Tensor<T>>? optimizer = null, int vocabSize = 10000, int embeddingDimension = 100, int maxTokens = 512, ILossFunction<T>? lossFunction = null, double maxGradNorm = 1)

Parameters

architecture NeuralNetworkArchitecture<T>

The configuration defining the model's metadata.

tokenizer ITokenizer

Optional tokenizer for text processing.

optimizer IGradientBasedOptimizer<T, Tensor<T>, Tensor<T>>

Optional optimizer for training.

vocabSize int

The size of the vocabulary (default: 10000).

embeddingDimension int

The dimension of the word vectors (default: 100).

maxTokens int

The maximum tokens per input (default: 512).

lossFunction ILossFunction<T>

Optional loss function. Defaults to Mean Squared Error.

maxGradNorm double

Maximum gradient norm for stability (default: 1.0).

Remarks

For Beginners: This constructor builds the framework for the model. You can decide how many words it should know and how detailed its "dictionary" should be.

Properties

EmbeddingDimension

Gets the dimensionality of the embedding vectors produced by this model.

public int EmbeddingDimension { get; }

Property Value

int

Remarks

The embedding dimension determines the size of the vector representation. Common dimensions range from 128 to 1536, with larger dimensions typically capturing more nuanced semantic relationships at the cost of memory and computation.

For Beginners: This is how many numbers represent each piece of text.

Think of it like describing a person:

  • Low dimension (128): Basic traits like height, weight, age
  • High dimension (768): Detailed description including personality, preferences, habits
  • Very high dimension (1536): Extremely detailed profile

More dimensions = more detailed understanding, but also more storage space needed.

MaxTokens

Gets the maximum length of text (in tokens) that this model can process.

public int MaxTokens { get; }

Property Value

int

Remarks

Most embedding models have a maximum context length beyond which text must be truncated. Common limits range from 512 to 8192 tokens. Implementations should handle text exceeding this limit gracefully, either by truncation or raising an exception.

For Beginners: This is the maximum amount of text the model can understand at once.

Think of it like a reader's attention span:

  • Short span (512 tokens): Can read about a paragraph
  • Medium span (2048 tokens): Can read a few pages
  • Long span (8192 tokens): Can read a short chapter

If your text is longer, it needs to be split into chunks. (A token is roughly a word, so 512 tokens ≈ 1-2 paragraphs)

Methods

Backward(Tensor<T>)

Propagates error gradients backward through the model layers.

public Tensor<T> Backward(Tensor<T> outputGradient)

Parameters

outputGradient Tensor<T>

The error signal from the loss function.

Returns

Tensor<T>

The calculated gradient for the input.

Remarks

For Beginners: This method traces mistakes back to their source. It figures out which word coordinates need to change to better match the real-world data.

CreateNewInstance()

Creates a new instance of the same type as this neural network.

protected override IFullModel<T, Tensor<T>, Tensor<T>> CreateNewInstance()

Returns

IFullModel<T, Tensor<T>, Tensor<T>>

A new instance of the same neural network type.

Remarks

For Beginners: This creates a blank version of the same type of neural network.

It's used internally by methods like DeepCopy and Clone to create the right type of network before copying the data into it.

DeserializeNetworkSpecificData(BinaryReader)

Deserializes network-specific data that was not covered by the general deserialization process.

protected override void DeserializeNetworkSpecificData(BinaryReader reader)

Parameters

reader BinaryReader

The BinaryReader to read the data from.

Remarks

This method is called at the end of the general deserialization process to allow derived classes to read any additional data specific to their implementation.

For Beginners: Continuing the suitcase analogy, this is like unpacking that special compartment. After the main deserialization method has unpacked the common items (layers, parameters), this method allows each specific type of neural network to unpack its own unique items that were stored during serialization.

Embed(string)

Turns a sentence into a single, summary coordinate (embedding).

public Vector<T> Embed(string text)

Parameters

text string

The sentence or text to encode.

Returns

Vector<T>

A normalized summary vector.

Remarks

For Beginners: If you give this a sentence like "I love technology," it finds the address for every word, averages them all together, and finds the "geographic center" of that sentence's meaning.

EmbedAsync(string)

Asynchronously embeds a single text string into a vector representation.

public Task<Vector<T>> EmbedAsync(string text)

Parameters

text string

The text to embed.

Returns

Task<Vector<T>>

A task representing the async operation, with the resulting vector.

EmbedBatch(IEnumerable<string>)

Encodes a whole batch of sentences at once for speed.

public Matrix<T> EmbedBatch(IEnumerable<string> texts)

Parameters

texts IEnumerable<string>

The collection of texts to encode.

Returns

Matrix<T>

A matrix where each row is an embedding vector.

Remarks

For Beginners: This is a high-speed way to process many sentences at the same time.

EmbedBatchAsync(IEnumerable<string>)

Asynchronously embeds multiple text strings into vector representations in a single batch operation.

public Task<Matrix<T>> EmbedBatchAsync(IEnumerable<string> texts)

Parameters

texts IEnumerable<string>

The collection of texts to embed.

Returns

Task<Matrix<T>>

A task representing the async operation, with the resulting matrix.

Forward(Tensor<T>)

Performs a forward pass to retrieve embeddings for given token IDs.

public Tensor<T> Forward(Tensor<T> input)

Parameters

input Tensor<T>

A tensor containing token indices.

Returns

Tensor<T>

A tensor containing the resulting embeddings.

Remarks

For Beginners: This is the lookup process. You give the model word ID numbers, and it returns the coordinates for those words from its internal memory.

GetModelMetadata()

Returns technical details and configuration info about the GloVe model.

public override ModelMetadata<T> GetModelMetadata()

Returns

ModelMetadata<T>

A metadata object containing vocabulary and dimension details.

InitializeLayers()

Sets up the neural network layers required for the GloVe architecture.

protected override void InitializeLayers()

Remarks

GloVe utilizes four primary learnable components: two embedding matrices (W and W_tilde) and two bias vectors (b and b_tilde). This method initializes them as standard layers to leverage the library's built-in GPU and AutoDiff support.

For Beginners: This method creates the internal "books" the model uses to store its knowledge. It creates two main lists of coordinates and two lists of "popularity scores" for words, then combines them to find the most balanced representation.

Predict(Tensor<T>)

Makes a prediction using the neural network.

public override Tensor<T> Predict(Tensor<T> input)

Parameters

input Tensor<T>

The input data to process.

Returns

Tensor<T>

The network's prediction.

Remarks

For Beginners: This is the main method you'll use to get results from your trained neural network. You provide some input data (like an image or text), and the network processes it through all its layers to produce an output (like a classification or prediction).

SerializeNetworkSpecificData(BinaryWriter)

Serializes network-specific data that is not covered by the general serialization process.

protected override void SerializeNetworkSpecificData(BinaryWriter writer)

Parameters

writer BinaryWriter

The BinaryWriter to write the data to.

Remarks

This method is called at the end of the general serialization process to allow derived classes to write any additional data specific to their implementation.

For Beginners: Think of this as packing a special compartment in your suitcase. While the main serialization method packs the common items (layers, parameters), this method allows each specific type of neural network to pack its own unique items that other networks might not have.

Train(Tensor<T>, Tensor<T>)

Trains the model on a batch of word pairs and their co-occurrence counts.

public override void Train(Tensor<T> input, Tensor<T> expectedOutput)

Parameters

input Tensor<T>

The word pair indices.

expectedOutput Tensor<T>

The actual co-occurrence counts from the dataset.

Remarks

For Beginners: This is how the model gets smarter. You show it two words and how often they appeared together in your data. The model then adjusts its "addresses" for those words so the distances between them reflect that frequency.

UpdateParameters(Vector<T>)

Updates the internal weights and biases of the model.

public override void UpdateParameters(Vector<T> parameters)

Parameters

parameters Vector<T>

The new coordinates and scores for the model.

Remarks

For Beginners: This method actually moves the words around on the map. It updates the "addresses" of the words based on what it learned in the backward pass.