Class FastText<T>
- Namespace
- AiDotNet.NeuralNetworks
- Assembly
- AiDotNet.dll
FastText neural network implementation, an extension of Word2Vec that considers subword information.
public class FastText<T> : NeuralNetworkBase<T>, INeuralNetworkModel<T>, INeuralNetwork<T>, IFullModel<T, Tensor<T>, Tensor<T>>, IModel<Tensor<T>, Tensor<T>, ModelMetadata<T>>, IModelSerializer, ICheckpointableModel, IParameterizable<T, Tensor<T>, Tensor<T>>, IFeatureAware, IFeatureImportance<T>, ICloneable<IFullModel<T, Tensor<T>, Tensor<T>>>, IGradientComputable<T, Tensor<T>, Tensor<T>>, IJitCompilable<T>, IInterpretableModel<T>, IInputGradientComputable<T>, IDisposable, IEmbeddingModel<T>
Type Parameters
TThe numeric type used for calculations (typically float or double).
- Inheritance
-
FastText<T>
- Implements
- Inherited Members
- Extension Methods
Remarks
FastText is a library for learning of word representations and sentence classification. It improves upon the original Word2Vec by representing each word as a bag of character n-grams. This approach allows the model to compute word representations for words that did not appear in the training data (out-of-vocabulary words).
For Beginners: Most models see words like "playing" and "played" as completely different things. FastText is smarter: it breaks words into pieces (like "play", "ing", and "ed"). Because it knows what "play" means, it can guess the meaning of a new word like "player" even if it has never seen it before. It's like a person who can understand a complex new word by looking at its root and its suffix.
Constructors
FastText(NeuralNetworkArchitecture<T>, ITokenizer?, IGradientBasedOptimizer<T, Tensor<T>, Tensor<T>>?, int, int, int, int, ILossFunction<T>?, double)
Initializes a new instance of the FastText model.
public FastText(NeuralNetworkArchitecture<T> architecture, ITokenizer? tokenizer = null, IGradientBasedOptimizer<T, Tensor<T>, Tensor<T>>? optimizer = null, int vocabSize = 10000, int bucketSize = 2000000, int embeddingDimension = 100, int maxTokens = 512, ILossFunction<T>? lossFunction = null, double maxGradNorm = 1)
Parameters
architectureNeuralNetworkArchitecture<T>The configuration defining the model metadata.
tokenizerITokenizerOptional tokenizer for text processing.
optimizerIGradientBasedOptimizer<T, Tensor<T>, Tensor<T>>Optional optimizer for training.
vocabSizeintThe size of the vocabulary (default: 10000).
bucketSizeintThe number of subword buckets (default: 2,000,000).
embeddingDimensionintThe dimension of the word vectors (default: 100).
maxTokensintThe maximum tokens per sentence (default: 512).
lossFunctionILossFunction<T>Optional loss function. Defaults to Binary Cross Entropy.
maxGradNormdoubleMaximum gradient norm for stability (default: 1.0).
Properties
EmbeddingDimension
Gets the dimensionality of the embedding vectors produced by this model.
public int EmbeddingDimension { get; }
Property Value
Remarks
The embedding dimension determines the size of the vector representation. Common dimensions range from 128 to 1536, with larger dimensions typically capturing more nuanced semantic relationships at the cost of memory and computation.
For Beginners: This is how many numbers represent each piece of text.
Think of it like describing a person:
- Low dimension (128): Basic traits like height, weight, age
- High dimension (768): Detailed description including personality, preferences, habits
- Very high dimension (1536): Extremely detailed profile
More dimensions = more detailed understanding, but also more storage space needed.
MaxTokens
Gets the maximum length of text (in tokens) that this model can process.
public int MaxTokens { get; }
Property Value
Remarks
Most embedding models have a maximum context length beyond which text must be truncated. Common limits range from 512 to 8192 tokens. Implementations should handle text exceeding this limit gracefully, either by truncation or raising an exception.
For Beginners: This is the maximum amount of text the model can understand at once.
Think of it like a reader's attention span:
- Short span (512 tokens): Can read about a paragraph
- Medium span (2048 tokens): Can read a few pages
- Long span (8192 tokens): Can read a short chapter
If your text is longer, it needs to be split into chunks. (A token is roughly a word, so 512 tokens ≈ 1-2 paragraphs)
Methods
Backward(Tensor<T>)
Propagates error gradients backward for learning.
public Tensor<T> Backward(Tensor<T> outputGradient)
Parameters
outputGradientTensor<T>
Returns
- Tensor<T>
CreateNewInstance()
Creates a new instance of the same type as this neural network.
protected override IFullModel<T, Tensor<T>, Tensor<T>> CreateNewInstance()
Returns
- IFullModel<T, Tensor<T>, Tensor<T>>
A new instance of the same neural network type.
Remarks
For Beginners: This creates a blank version of the same type of neural network.
It's used internally by methods like DeepCopy and Clone to create the right type of network before copying the data into it.
DeserializeNetworkSpecificData(BinaryReader)
Deserializes network-specific data that was not covered by the general deserialization process.
protected override void DeserializeNetworkSpecificData(BinaryReader reader)
Parameters
readerBinaryReaderThe BinaryReader to read the data from.
Remarks
This method is called at the end of the general deserialization process to allow derived classes to read any additional data specific to their implementation.
For Beginners: Continuing the suitcase analogy, this is like unpacking that special compartment. After the main deserialization method has unpacked the common items (layers, parameters), this method allows each specific type of neural network to unpack its own unique items that were stored during serialization.
Embed(string)
Turns text into a robust embedding vector using both word and subword information.
public Vector<T> Embed(string text)
Parameters
textstring
Returns
- Vector<T>
Remarks
For Beginners: This is the final step where the model summarizes your text. It looks at the meaning of every word and the meaning of every word fragment (like roots and suffixes), averages them all together, and gives you one final "meaning coordinate" for the entire sentence.
EmbedAsync(string)
Asynchronously embeds a single text string into a vector representation.
public Task<Vector<T>> EmbedAsync(string text)
Parameters
textstringThe text to embed.
Returns
- Task<Vector<T>>
A task representing the async operation, with the resulting vector.
EmbedBatch(IEnumerable<string>)
Encodes a batch of texts for high-throughput processing.
public Matrix<T> EmbedBatch(IEnumerable<string> texts)
Parameters
textsIEnumerable<string>The texts to encode.
Returns
- Matrix<T>
A matrix where each row is the embedding for one input text.
EmbedBatchAsync(IEnumerable<string>)
Asynchronously embeds multiple text strings into vector representations in a single batch operation.
public Task<Matrix<T>> EmbedBatchAsync(IEnumerable<string> texts)
Parameters
textsIEnumerable<string>The collection of texts to embed.
Returns
- Task<Matrix<T>>
A task representing the async operation, with the resulting matrix.
Forward(Tensor<T>)
Performs a forward pass to retrieve representations.
public Tensor<T> Forward(Tensor<T> input)
Parameters
inputTensor<T>
Returns
- Tensor<T>
GetModelMetadata()
Retrieves detailed metadata about the FastText model configuration.
public override ModelMetadata<T> GetModelMetadata()
Returns
InitializeLayers()
Configures the layers needed for FastText, including word and subword embedding tables.
protected override void InitializeLayers()
Remarks
This method initializes: 1. A standard Word Embedding table (Layer 0). 2. A large N-gram Embedding table (Layer 1) using the hashing trick. 3. A projection head used for training.
For Beginners: This method builds the internal storage for the model. It creates two main "dictionaries"—one for whole words and a much larger one for word fragments. These are used together to understand sentences.
Predict(Tensor<T>)
Makes a prediction using the neural network.
public override Tensor<T> Predict(Tensor<T> input)
Parameters
inputTensor<T>The input data to process.
Returns
- Tensor<T>
The network's prediction.
Remarks
For Beginners: This is the main method you'll use to get results from your trained neural network. You provide some input data (like an image or text), and the network processes it through all its layers to produce an output (like a classification or prediction).
SerializeNetworkSpecificData(BinaryWriter)
Serializes network-specific data that is not covered by the general serialization process.
protected override void SerializeNetworkSpecificData(BinaryWriter writer)
Parameters
writerBinaryWriterThe BinaryWriter to write the data to.
Remarks
This method is called at the end of the general serialization process to allow derived classes to write any additional data specific to their implementation.
For Beginners: Think of this as packing a special compartment in your suitcase. While the main serialization method packs the common items (layers, parameters), this method allows each specific type of neural network to pack its own unique items that other networks might not have.
Train(Tensor<T>, Tensor<T>)
Trains the model on a single step of data using standard backpropagation.
public override void Train(Tensor<T> input, Tensor<T> expectedOutput)
Parameters
inputTensor<T>expectedOutputTensor<T>
UpdateParameters(Vector<T>)
Updates all trainable weights in the FastText model.
public override void UpdateParameters(Vector<T> parameters)
Parameters
parametersVector<T>