Table of Contents

Class SGPT<T>

Namespace
AiDotNet.NeuralNetworks
Assembly
AiDotNet.dll

SGPT (Sentence GPT) neural network implementation using decoder-only transformer architectures.

public class SGPT<T> : TransformerEmbeddingNetwork<T>, INeuralNetworkModel<T>, INeuralNetwork<T>, IFullModel<T, Tensor<T>, Tensor<T>>, IModel<Tensor<T>, Tensor<T>, ModelMetadata<T>>, IModelSerializer, ICheckpointableModel, IParameterizable<T, Tensor<T>, Tensor<T>>, IFeatureAware, IFeatureImportance<T>, ICloneable<IFullModel<T, Tensor<T>, Tensor<T>>>, IGradientComputable<T, Tensor<T>, Tensor<T>>, IJitCompilable<T>, IInterpretableModel<T>, IInputGradientComputable<T>, IDisposable, IEmbeddingModel<T>

Type Parameters

T

The numeric type used for calculations (typically float or double).

Inheritance
SGPT<T>
Implements
IFullModel<T, Tensor<T>, Tensor<T>>
IModel<Tensor<T>, Tensor<T>, ModelMetadata<T>>
IParameterizable<T, Tensor<T>, Tensor<T>>
ICloneable<IFullModel<T, Tensor<T>, Tensor<T>>>
IGradientComputable<T, Tensor<T>, Tensor<T>>
Inherited Members
Extension Methods

Remarks

SGPT leverages large-scale decoder-only models (like GPT-2 or GPT-Neo) to generate high-quality sentence embeddings. By focusing on the last token of a sequence, the model utilizes the unidirectional context to summarize the entire sentence's meaning.

For Beginners: Most AI models are like "readers" who read a whole sentence and then think about it. SGPT is like a "writer." Because it's trained to write sentences one word at a time, it has a very deep understanding of how sentences are built. When it finishes a sentence, the very last word it would have written contains a "mental summary" of everything that came before it. SGPT uses that summary as the coordinate (embedding) for the whole sentence.

Constructors

SGPT(NeuralNetworkArchitecture<T>, ITokenizer?, IGradientBasedOptimizer<T, Tensor<T>, Tensor<T>>?, int, int, int, int, int, int, PoolingStrategy, ILossFunction<T>?, double)

Initializes a new instance of the SGPT model.

public SGPT(NeuralNetworkArchitecture<T> architecture, ITokenizer? tokenizer = null, IGradientBasedOptimizer<T, Tensor<T>, Tensor<T>>? optimizer = null, int vocabSize = 50257, int embeddingDimension = 768, int maxSequenceLength = 1024, int numLayers = 12, int numHeads = 12, int feedForwardDim = 3072, TransformerEmbeddingNetwork<T>.PoolingStrategy poolingStrategy = PoolingStrategy.Mean, ILossFunction<T>? lossFunction = null, double maxGradNorm = 1)

Parameters

architecture NeuralNetworkArchitecture<T>

The configuration defining the model's structure.

tokenizer ITokenizer

Optional tokenizer for text processing.

optimizer IGradientBasedOptimizer<T, Tensor<T>, Tensor<T>>

Optional optimizer for training.

vocabSize int

The size of the vocabulary (default: 50257 for GPT-2).

embeddingDimension int

The dimension of the embeddings (default: 768).

maxSequenceLength int

The maximum length of input sequences (default: 1024).

numLayers int

The number of transformer layers (default: 12).

numHeads int

The number of attention heads (default: 12).

feedForwardDim int

The hidden dimension of feed-forward networks (default: 3072).

poolingStrategy TransformerEmbeddingNetwork<T>.PoolingStrategy

The strategy used to aggregate outputs (default: Mean, though research often uses last token).

lossFunction ILossFunction<T>

Optional loss function.

maxGradNorm double

Maximum gradient norm for stability (default: 1.0).

Methods

CreateNewInstance()

Creates a new instance of the same type as this neural network.

protected override IFullModel<T, Tensor<T>, Tensor<T>> CreateNewInstance()

Returns

IFullModel<T, Tensor<T>, Tensor<T>>

A new instance of the same neural network type.

Remarks

For Beginners: This creates a blank version of the same type of neural network.

It's used internally by methods like DeepCopy and Clone to create the right type of network before copying the data into it.

DeserializeNetworkSpecificData(BinaryReader)

Deserializes network-specific data that was not covered by the general deserialization process.

protected override void DeserializeNetworkSpecificData(BinaryReader reader)

Parameters

reader BinaryReader

The BinaryReader to read the data from.

Remarks

This method is called at the end of the general deserialization process to allow derived classes to read any additional data specific to their implementation.

For Beginners: Continuing the suitcase analogy, this is like unpacking that special compartment. After the main deserialization method has unpacked the common items (layers, parameters), this method allows each specific type of neural network to unpack its own unique items that were stored during serialization.

Embed(string)

Encodes a single string into a normalized summary vector.

public override Vector<T> Embed(string text)

Parameters

text string

The text to encode.

Returns

Vector<T>

A normalized embedding vector.

Remarks

For Beginners: This is the main use case. You give the model a sentence, it reads it with all its layers, summarizes the meaning based on your chosen pooling strategy (like taking the average meaning), and returns one final list of numbers.

EmbedAsync(string)

Asynchronously embeds a single text string into a vector representation.

public override Task<Vector<T>> EmbedAsync(string text)

Parameters

text string

The text to embed.

Returns

Task<Vector<T>>

A task representing the async operation, with the resulting vector.

EmbedBatchAsync(IEnumerable<string>)

Asynchronously embeds multiple text strings into vector representations in a single batch operation.

public override Task<Matrix<T>> EmbedBatchAsync(IEnumerable<string> texts)

Parameters

texts IEnumerable<string>

The collection of texts to embed.

Returns

Task<Matrix<T>>

A task representing the async operation, with the resulting matrix.

GetModelMetadata()

Retrieves metadata about the SGPT model.

public override ModelMetadata<T> GetModelMetadata()

Returns

ModelMetadata<T>

Metadata containing model type and naming information.

InitializeLayers()

Configures the transformer layers for the SGPT model using decoder-only defaults from LayerHelper.

protected override void InitializeLayers()

Remarks

For Beginners: This method builds the "writer's brain." It sets up a large transformer brain that is specifically tuned to understand the flow of information from the start of a sentence to the very end.

SerializeNetworkSpecificData(BinaryWriter)

Serializes network-specific data that is not covered by the general serialization process.

protected override void SerializeNetworkSpecificData(BinaryWriter writer)

Parameters

writer BinaryWriter

The BinaryWriter to write the data to.

Remarks

This method is called at the end of the general serialization process to allow derived classes to write any additional data specific to their implementation.

For Beginners: Think of this as packing a special compartment in your suitcase. While the main serialization method packs the common items (layers, parameters), this method allows each specific type of neural network to pack its own unique items that other networks might not have.