Class GloVe<T>
- Namespace
- AiDotNet.NeuralNetworks
- Assembly
- AiDotNet.dll
GloVe (Global Vectors for Word Representation) neural network implementation.
public class GloVe<T> : NeuralNetworkBase<T>, INeuralNetworkModel<T>, INeuralNetwork<T>, IFullModel<T, Tensor<T>, Tensor<T>>, IModel<Tensor<T>, Tensor<T>, ModelMetadata<T>>, IModelSerializer, ICheckpointableModel, IParameterizable<T, Tensor<T>, Tensor<T>>, IFeatureAware, IFeatureImportance<T>, ICloneable<IFullModel<T, Tensor<T>, Tensor<T>>>, IGradientComputable<T, Tensor<T>, Tensor<T>>, IJitCompilable<T>, IInterpretableModel<T>, IInputGradientComputable<T>, IDisposable, IEmbeddingModel<T>
Type Parameters
TThe numeric type used for calculations (typically float or double).
- Inheritance
-
GloVe<T>
- Implements
- Inherited Members
- Extension Methods
Remarks
GloVe is an unsupervised learning algorithm for obtaining vector representations for words. Training is performed on aggregated global word-word co-occurrence statistics from a corpus, and the resulting representations showcase interesting linear substructures of the word vector space.
For Beginners: If Word2Vec is like a student learning from reading newspapers one page at a time, GloVe is like a researcher who looks at the entire library all at once. It builds a giant table showing how often every word in the dictionary appears near every other word. It then uses clever math to find the best "address" for each word so that the distance between addresses matches those counts perfectly.
The GloVe model is famous for its ability to solve word analogies, like: "King - Man + Woman = Queen."
Constructors
GloVe(NeuralNetworkArchitecture<T>, ITokenizer?, IGradientBasedOptimizer<T, Tensor<T>, Tensor<T>>?, int, int, int, ILossFunction<T>?, double)
Initializes a new instance of the GloVe embedding model.
public GloVe(NeuralNetworkArchitecture<T> architecture, ITokenizer? tokenizer = null, IGradientBasedOptimizer<T, Tensor<T>, Tensor<T>>? optimizer = null, int vocabSize = 10000, int embeddingDimension = 100, int maxTokens = 512, ILossFunction<T>? lossFunction = null, double maxGradNorm = 1)
Parameters
architectureNeuralNetworkArchitecture<T>The configuration defining the model's metadata.
tokenizerITokenizerOptional tokenizer for text processing.
optimizerIGradientBasedOptimizer<T, Tensor<T>, Tensor<T>>Optional optimizer for training.
vocabSizeintThe size of the vocabulary (default: 10000).
embeddingDimensionintThe dimension of the word vectors (default: 100).
maxTokensintThe maximum tokens per input (default: 512).
lossFunctionILossFunction<T>Optional loss function. Defaults to Mean Squared Error.
maxGradNormdoubleMaximum gradient norm for stability (default: 1.0).
Remarks
For Beginners: This constructor builds the framework for the model. You can decide how many words it should know and how detailed its "dictionary" should be.
Properties
EmbeddingDimension
Gets the dimensionality of the embedding vectors produced by this model.
public int EmbeddingDimension { get; }
Property Value
Remarks
The embedding dimension determines the size of the vector representation. Common dimensions range from 128 to 1536, with larger dimensions typically capturing more nuanced semantic relationships at the cost of memory and computation.
For Beginners: This is how many numbers represent each piece of text.
Think of it like describing a person:
- Low dimension (128): Basic traits like height, weight, age
- High dimension (768): Detailed description including personality, preferences, habits
- Very high dimension (1536): Extremely detailed profile
More dimensions = more detailed understanding, but also more storage space needed.
MaxTokens
Gets the maximum length of text (in tokens) that this model can process.
public int MaxTokens { get; }
Property Value
Remarks
Most embedding models have a maximum context length beyond which text must be truncated. Common limits range from 512 to 8192 tokens. Implementations should handle text exceeding this limit gracefully, either by truncation or raising an exception.
For Beginners: This is the maximum amount of text the model can understand at once.
Think of it like a reader's attention span:
- Short span (512 tokens): Can read about a paragraph
- Medium span (2048 tokens): Can read a few pages
- Long span (8192 tokens): Can read a short chapter
If your text is longer, it needs to be split into chunks. (A token is roughly a word, so 512 tokens ≈ 1-2 paragraphs)
Methods
Backward(Tensor<T>)
Propagates error gradients backward through the model layers.
public Tensor<T> Backward(Tensor<T> outputGradient)
Parameters
outputGradientTensor<T>The error signal from the loss function.
Returns
- Tensor<T>
The calculated gradient for the input.
Remarks
For Beginners: This method traces mistakes back to their source. It figures out which word coordinates need to change to better match the real-world data.
CreateNewInstance()
Creates a new instance of the same type as this neural network.
protected override IFullModel<T, Tensor<T>, Tensor<T>> CreateNewInstance()
Returns
- IFullModel<T, Tensor<T>, Tensor<T>>
A new instance of the same neural network type.
Remarks
For Beginners: This creates a blank version of the same type of neural network.
It's used internally by methods like DeepCopy and Clone to create the right type of network before copying the data into it.
DeserializeNetworkSpecificData(BinaryReader)
Deserializes network-specific data that was not covered by the general deserialization process.
protected override void DeserializeNetworkSpecificData(BinaryReader reader)
Parameters
readerBinaryReaderThe BinaryReader to read the data from.
Remarks
This method is called at the end of the general deserialization process to allow derived classes to read any additional data specific to their implementation.
For Beginners: Continuing the suitcase analogy, this is like unpacking that special compartment. After the main deserialization method has unpacked the common items (layers, parameters), this method allows each specific type of neural network to unpack its own unique items that were stored during serialization.
Embed(string)
Turns a sentence into a single, summary coordinate (embedding).
public Vector<T> Embed(string text)
Parameters
textstringThe sentence or text to encode.
Returns
- Vector<T>
A normalized summary vector.
Remarks
For Beginners: If you give this a sentence like "I love technology," it finds the address for every word, averages them all together, and finds the "geographic center" of that sentence's meaning.
EmbedAsync(string)
Asynchronously embeds a single text string into a vector representation.
public Task<Vector<T>> EmbedAsync(string text)
Parameters
textstringThe text to embed.
Returns
- Task<Vector<T>>
A task representing the async operation, with the resulting vector.
EmbedBatch(IEnumerable<string>)
Encodes a whole batch of sentences at once for speed.
public Matrix<T> EmbedBatch(IEnumerable<string> texts)
Parameters
textsIEnumerable<string>The collection of texts to encode.
Returns
- Matrix<T>
A matrix where each row is an embedding vector.
Remarks
For Beginners: This is a high-speed way to process many sentences at the same time.
EmbedBatchAsync(IEnumerable<string>)
Asynchronously embeds multiple text strings into vector representations in a single batch operation.
public Task<Matrix<T>> EmbedBatchAsync(IEnumerable<string> texts)
Parameters
textsIEnumerable<string>The collection of texts to embed.
Returns
- Task<Matrix<T>>
A task representing the async operation, with the resulting matrix.
Forward(Tensor<T>)
Performs a forward pass to retrieve embeddings for given token IDs.
public Tensor<T> Forward(Tensor<T> input)
Parameters
inputTensor<T>A tensor containing token indices.
Returns
- Tensor<T>
A tensor containing the resulting embeddings.
Remarks
For Beginners: This is the lookup process. You give the model word ID numbers, and it returns the coordinates for those words from its internal memory.
GetModelMetadata()
Returns technical details and configuration info about the GloVe model.
public override ModelMetadata<T> GetModelMetadata()
Returns
- ModelMetadata<T>
A metadata object containing vocabulary and dimension details.
InitializeLayers()
Sets up the neural network layers required for the GloVe architecture.
protected override void InitializeLayers()
Remarks
GloVe utilizes four primary learnable components: two embedding matrices (W and W_tilde) and two bias vectors (b and b_tilde). This method initializes them as standard layers to leverage the library's built-in GPU and AutoDiff support.
For Beginners: This method creates the internal "books" the model uses to store its knowledge. It creates two main lists of coordinates and two lists of "popularity scores" for words, then combines them to find the most balanced representation.
Predict(Tensor<T>)
Makes a prediction using the neural network.
public override Tensor<T> Predict(Tensor<T> input)
Parameters
inputTensor<T>The input data to process.
Returns
- Tensor<T>
The network's prediction.
Remarks
For Beginners: This is the main method you'll use to get results from your trained neural network. You provide some input data (like an image or text), and the network processes it through all its layers to produce an output (like a classification or prediction).
SerializeNetworkSpecificData(BinaryWriter)
Serializes network-specific data that is not covered by the general serialization process.
protected override void SerializeNetworkSpecificData(BinaryWriter writer)
Parameters
writerBinaryWriterThe BinaryWriter to write the data to.
Remarks
This method is called at the end of the general serialization process to allow derived classes to write any additional data specific to their implementation.
For Beginners: Think of this as packing a special compartment in your suitcase. While the main serialization method packs the common items (layers, parameters), this method allows each specific type of neural network to pack its own unique items that other networks might not have.
Train(Tensor<T>, Tensor<T>)
Trains the model on a batch of word pairs and their co-occurrence counts.
public override void Train(Tensor<T> input, Tensor<T> expectedOutput)
Parameters
inputTensor<T>The word pair indices.
expectedOutputTensor<T>The actual co-occurrence counts from the dataset.
Remarks
For Beginners: This is how the model gets smarter. You show it two words and how often they appeared together in your data. The model then adjusts its "addresses" for those words so the distances between them reflect that frequency.
UpdateParameters(Vector<T>)
Updates the internal weights and biases of the model.
public override void UpdateParameters(Vector<T> parameters)
Parameters
parametersVector<T>The new coordinates and scores for the model.
Remarks
For Beginners: This method actually moves the words around on the map. It updates the "addresses" of the words based on what it learned in the backward pass.