Table of Contents

Class FastText<T>

Namespace
AiDotNet.NeuralNetworks
Assembly
AiDotNet.dll

FastText neural network implementation, an extension of Word2Vec that considers subword information.

public class FastText<T> : NeuralNetworkBase<T>, INeuralNetworkModel<T>, INeuralNetwork<T>, IFullModel<T, Tensor<T>, Tensor<T>>, IModel<Tensor<T>, Tensor<T>, ModelMetadata<T>>, IModelSerializer, ICheckpointableModel, IParameterizable<T, Tensor<T>, Tensor<T>>, IFeatureAware, IFeatureImportance<T>, ICloneable<IFullModel<T, Tensor<T>, Tensor<T>>>, IGradientComputable<T, Tensor<T>, Tensor<T>>, IJitCompilable<T>, IInterpretableModel<T>, IInputGradientComputable<T>, IDisposable, IEmbeddingModel<T>

Type Parameters

T

The numeric type used for calculations (typically float or double).

Inheritance
FastText<T>
Implements
IFullModel<T, Tensor<T>, Tensor<T>>
IModel<Tensor<T>, Tensor<T>, ModelMetadata<T>>
IParameterizable<T, Tensor<T>, Tensor<T>>
ICloneable<IFullModel<T, Tensor<T>, Tensor<T>>>
IGradientComputable<T, Tensor<T>, Tensor<T>>
Inherited Members
Extension Methods

Remarks

FastText is a library for learning of word representations and sentence classification. It improves upon the original Word2Vec by representing each word as a bag of character n-grams. This approach allows the model to compute word representations for words that did not appear in the training data (out-of-vocabulary words).

For Beginners: Most models see words like "playing" and "played" as completely different things. FastText is smarter: it breaks words into pieces (like "play", "ing", and "ed"). Because it knows what "play" means, it can guess the meaning of a new word like "player" even if it has never seen it before. It's like a person who can understand a complex new word by looking at its root and its suffix.

Constructors

FastText(NeuralNetworkArchitecture<T>, ITokenizer?, IGradientBasedOptimizer<T, Tensor<T>, Tensor<T>>?, int, int, int, int, ILossFunction<T>?, double)

Initializes a new instance of the FastText model.

public FastText(NeuralNetworkArchitecture<T> architecture, ITokenizer? tokenizer = null, IGradientBasedOptimizer<T, Tensor<T>, Tensor<T>>? optimizer = null, int vocabSize = 10000, int bucketSize = 2000000, int embeddingDimension = 100, int maxTokens = 512, ILossFunction<T>? lossFunction = null, double maxGradNorm = 1)

Parameters

architecture NeuralNetworkArchitecture<T>

The configuration defining the model metadata.

tokenizer ITokenizer

Optional tokenizer for text processing.

optimizer IGradientBasedOptimizer<T, Tensor<T>, Tensor<T>>

Optional optimizer for training.

vocabSize int

The size of the vocabulary (default: 10000).

bucketSize int

The number of subword buckets (default: 2,000,000).

embeddingDimension int

The dimension of the word vectors (default: 100).

maxTokens int

The maximum tokens per sentence (default: 512).

lossFunction ILossFunction<T>

Optional loss function. Defaults to Binary Cross Entropy.

maxGradNorm double

Maximum gradient norm for stability (default: 1.0).

Properties

EmbeddingDimension

Gets the dimensionality of the embedding vectors produced by this model.

public int EmbeddingDimension { get; }

Property Value

int

Remarks

The embedding dimension determines the size of the vector representation. Common dimensions range from 128 to 1536, with larger dimensions typically capturing more nuanced semantic relationships at the cost of memory and computation.

For Beginners: This is how many numbers represent each piece of text.

Think of it like describing a person:

  • Low dimension (128): Basic traits like height, weight, age
  • High dimension (768): Detailed description including personality, preferences, habits
  • Very high dimension (1536): Extremely detailed profile

More dimensions = more detailed understanding, but also more storage space needed.

MaxTokens

Gets the maximum length of text (in tokens) that this model can process.

public int MaxTokens { get; }

Property Value

int

Remarks

Most embedding models have a maximum context length beyond which text must be truncated. Common limits range from 512 to 8192 tokens. Implementations should handle text exceeding this limit gracefully, either by truncation or raising an exception.

For Beginners: This is the maximum amount of text the model can understand at once.

Think of it like a reader's attention span:

  • Short span (512 tokens): Can read about a paragraph
  • Medium span (2048 tokens): Can read a few pages
  • Long span (8192 tokens): Can read a short chapter

If your text is longer, it needs to be split into chunks. (A token is roughly a word, so 512 tokens ≈ 1-2 paragraphs)

Methods

Backward(Tensor<T>)

Propagates error gradients backward for learning.

public Tensor<T> Backward(Tensor<T> outputGradient)

Parameters

outputGradient Tensor<T>

Returns

Tensor<T>

CreateNewInstance()

Creates a new instance of the same type as this neural network.

protected override IFullModel<T, Tensor<T>, Tensor<T>> CreateNewInstance()

Returns

IFullModel<T, Tensor<T>, Tensor<T>>

A new instance of the same neural network type.

Remarks

For Beginners: This creates a blank version of the same type of neural network.

It's used internally by methods like DeepCopy and Clone to create the right type of network before copying the data into it.

DeserializeNetworkSpecificData(BinaryReader)

Deserializes network-specific data that was not covered by the general deserialization process.

protected override void DeserializeNetworkSpecificData(BinaryReader reader)

Parameters

reader BinaryReader

The BinaryReader to read the data from.

Remarks

This method is called at the end of the general deserialization process to allow derived classes to read any additional data specific to their implementation.

For Beginners: Continuing the suitcase analogy, this is like unpacking that special compartment. After the main deserialization method has unpacked the common items (layers, parameters), this method allows each specific type of neural network to unpack its own unique items that were stored during serialization.

Embed(string)

Turns text into a robust embedding vector using both word and subword information.

public Vector<T> Embed(string text)

Parameters

text string

Returns

Vector<T>

Remarks

For Beginners: This is the final step where the model summarizes your text. It looks at the meaning of every word and the meaning of every word fragment (like roots and suffixes), averages them all together, and gives you one final "meaning coordinate" for the entire sentence.

EmbedAsync(string)

Asynchronously embeds a single text string into a vector representation.

public Task<Vector<T>> EmbedAsync(string text)

Parameters

text string

The text to embed.

Returns

Task<Vector<T>>

A task representing the async operation, with the resulting vector.

EmbedBatch(IEnumerable<string>)

Encodes a batch of texts for high-throughput processing.

public Matrix<T> EmbedBatch(IEnumerable<string> texts)

Parameters

texts IEnumerable<string>

The texts to encode.

Returns

Matrix<T>

A matrix where each row is the embedding for one input text.

EmbedBatchAsync(IEnumerable<string>)

Asynchronously embeds multiple text strings into vector representations in a single batch operation.

public Task<Matrix<T>> EmbedBatchAsync(IEnumerable<string> texts)

Parameters

texts IEnumerable<string>

The collection of texts to embed.

Returns

Task<Matrix<T>>

A task representing the async operation, with the resulting matrix.

Forward(Tensor<T>)

Performs a forward pass to retrieve representations.

public Tensor<T> Forward(Tensor<T> input)

Parameters

input Tensor<T>

Returns

Tensor<T>

GetModelMetadata()

Retrieves detailed metadata about the FastText model configuration.

public override ModelMetadata<T> GetModelMetadata()

Returns

ModelMetadata<T>

InitializeLayers()

Configures the layers needed for FastText, including word and subword embedding tables.

protected override void InitializeLayers()

Remarks

This method initializes: 1. A standard Word Embedding table (Layer 0). 2. A large N-gram Embedding table (Layer 1) using the hashing trick. 3. A projection head used for training.

For Beginners: This method builds the internal storage for the model. It creates two main "dictionaries"—one for whole words and a much larger one for word fragments. These are used together to understand sentences.

Predict(Tensor<T>)

Makes a prediction using the neural network.

public override Tensor<T> Predict(Tensor<T> input)

Parameters

input Tensor<T>

The input data to process.

Returns

Tensor<T>

The network's prediction.

Remarks

For Beginners: This is the main method you'll use to get results from your trained neural network. You provide some input data (like an image or text), and the network processes it through all its layers to produce an output (like a classification or prediction).

SerializeNetworkSpecificData(BinaryWriter)

Serializes network-specific data that is not covered by the general serialization process.

protected override void SerializeNetworkSpecificData(BinaryWriter writer)

Parameters

writer BinaryWriter

The BinaryWriter to write the data to.

Remarks

This method is called at the end of the general serialization process to allow derived classes to write any additional data specific to their implementation.

For Beginners: Think of this as packing a special compartment in your suitcase. While the main serialization method packs the common items (layers, parameters), this method allows each specific type of neural network to pack its own unique items that other networks might not have.

Train(Tensor<T>, Tensor<T>)

Trains the model on a single step of data using standard backpropagation.

public override void Train(Tensor<T> input, Tensor<T> expectedOutput)

Parameters

input Tensor<T>
expectedOutput Tensor<T>

UpdateParameters(Vector<T>)

Updates all trainable weights in the FastText model.

public override void UpdateParameters(Vector<T> parameters)

Parameters

parameters Vector<T>