Class CodeModelBase<T>
- Namespace
- AiDotNet.ProgramSynthesis.Engines
- Assembly
- AiDotNet.dll
Base class for code models that provides shared tokenization, task dispatch, and structured outputs.
public abstract class CodeModelBase<T> : NeuralNetworkBase<T>, INeuralNetworkModel<T>, INeuralNetwork<T>, IInterpretableModel<T>, IInputGradientComputable<T>, IDisposable, ICodeModel<T>, IFullModel<T, Tensor<T>, Tensor<T>>, IModel<Tensor<T>, Tensor<T>, ModelMetadata<T>>, IModelSerializer, ICheckpointableModel, IParameterizable<T, Tensor<T>, Tensor<T>>, IFeatureAware, IFeatureImportance<T>, ICloneable<IFullModel<T, Tensor<T>, Tensor<T>>>, IGradientComputable<T, Tensor<T>, Tensor<T>>, IJitCompilable<T>
Type Parameters
TThe numeric type used for calculations (e.g., double, float).
- Inheritance
-
CodeModelBase<T>
- Implements
-
ICodeModel<T>
- Derived
- Inherited Members
- Extension Methods
Constructors
CodeModelBase(CodeSynthesisArchitecture<T>, ILossFunction<T>, ITokenizer?)
protected CodeModelBase(CodeSynthesisArchitecture<T> architecture, ILossFunction<T> lossFunction, ITokenizer? tokenizer = null)
Parameters
architectureCodeSynthesisArchitecture<T>lossFunctionILossFunction<T>tokenizerITokenizer
Properties
CodeArchitecture
protected CodeSynthesisArchitecture<T> CodeArchitecture { get; }
Property Value
MaxSequenceLength
Gets the maximum sequence length (in tokens) that the model can process.
public int MaxSequenceLength { get; }
Property Value
Remarks
Code models process code as sequences of tokens. This property specifies the maximum number of tokens the model can handle at once.
For Beginners: This is like the maximum length of code the model can read at once.
Code is broken into pieces called "tokens" (like words in a sentence). This number tells you the maximum number of tokens the model can process, which roughly corresponds to how long a code file can be.
TargetLanguage
Gets the target programming language for this model.
public ProgramLanguage TargetLanguage { get; }
Property Value
Remarks
Specifies which programming language this model is designed to work with. Some models are language-specific, while others can work with multiple languages.
For Beginners: This tells you which programming language the model knows.
Like a translator who specializes in French or Spanish, code models often specialize in specific programming languages like Python or Java.
Tokenizer
protected ITokenizer Tokenizer { get; }
Property Value
VocabularySize
Gets the vocabulary size of the model.
public int VocabularySize { get; }
Property Value
Remarks
The vocabulary consists of all the tokens (keywords, operators, identifiers, etc.) that the model knows and can work with.
For Beginners: This is like the model's dictionary size.
It tells you how many different code tokens the model knows. A larger vocabulary means the model can handle more diverse code patterns and identifiers.
Methods
CreateTransformerModelMetadata(string, IReadOnlyDictionary<string, object>?, string)
protected ModelMetadata<T> CreateTransformerModelMetadata(string modelName, IReadOnlyDictionary<string, object>? extraInfo, string optimizerName)
Parameters
modelNamestringextraInfoIReadOnlyDictionary<string, object>optimizerNamestring
Returns
DecodeCode(Tensor<T>)
Decodes a vector representation back into source code.
public string DecodeCode(Tensor<T> encoding)
Parameters
encodingTensor<T>The encoded representation to decode.
Returns
- string
The decoded source code as a string.
Remarks
Decoding transforms the model's internal numerical representation back into human-readable source code.
For Beginners: Decoding converts the AI's numerical format back to readable code.
After the AI processes code in numerical form, we need to convert it back to text that humans can read and computers can execute. This is the reverse of encoding.
EncodeCode(string)
Encodes source code into a vector representation.
public Tensor<T> EncodeCode(string code)
Parameters
codestringThe source code to encode.
Returns
- Tensor<T>
A tensor representing the encoded code.
Remarks
Encoding transforms source code (text) into a numerical representation that the model can process. This representation captures semantic information about the code.
For Beginners: Encoding converts code text into numbers the AI can understand.
Computers can't directly work with text, so we convert code into numerical form. This encoding captures the meaning of the code, not just the characters. Like translating emotions into emoji - different form, same meaning.
GetEmbeddings(string)
Gets embeddings for code tokens.
public virtual Tensor<T> GetEmbeddings(string code)
Parameters
codestringThe source code to get embeddings for.
Returns
- Tensor<T>
A tensor containing token embeddings.
Remarks
Embeddings are dense vector representations of code tokens that capture semantic similarities. Similar code constructs have similar embeddings.
For Beginners: Embeddings represent each piece of code as a point in space.
Code with similar meaning is placed close together in this space. For example, "for loop" and "while loop" would be near each other because they're both loops, but far from "function definition" because that's a different concept.
PerformBugDetection(CodeBugDetectionRequest)
protected virtual CodeBugDetectionResult PerformBugDetection(CodeBugDetectionRequest request)
Parameters
requestCodeBugDetectionRequest
Returns
PerformBugFixing(CodeBugFixingRequest)
protected virtual CodeBugFixingResult PerformBugFixing(CodeBugFixingRequest request)
Parameters
requestCodeBugFixingRequest
Returns
PerformCloneDetection(CodeCloneDetectionRequest)
protected virtual CodeCloneDetectionResult PerformCloneDetection(CodeCloneDetectionRequest request)
Parameters
requestCodeCloneDetectionRequest
Returns
PerformCodeReview(CodeReviewRequest)
protected virtual CodeReviewResult PerformCodeReview(CodeReviewRequest request)
Parameters
requestCodeReviewRequest
Returns
PerformCompletion(CodeCompletionRequest)
protected virtual CodeCompletionResult PerformCompletion(CodeCompletionRequest request)
Parameters
requestCodeCompletionRequest
Returns
PerformDocumentation(CodeDocumentationRequest)
protected virtual CodeDocumentationResult PerformDocumentation(CodeDocumentationRequest request)
Parameters
requestCodeDocumentationRequest
Returns
PerformGeneration(CodeGenerationRequest)
protected virtual CodeGenerationResult PerformGeneration(CodeGenerationRequest request)
Parameters
requestCodeGenerationRequest
Returns
PerformRefactoring(CodeRefactoringRequest)
protected virtual CodeRefactoringResult PerformRefactoring(CodeRefactoringRequest request)
Parameters
requestCodeRefactoringRequest
Returns
PerformSearch(CodeSearchRequest)
protected virtual CodeSearchResult PerformSearch(CodeSearchRequest request)
Parameters
requestCodeSearchRequest
Returns
PerformSummarization(CodeSummarizationRequest)
protected virtual CodeSummarizationResult PerformSummarization(CodeSummarizationRequest request)
Parameters
requestCodeSummarizationRequest
Returns
PerformTask(CodeTaskRequestBase)
Performs a code-related task and returns a structured result type.
public CodeTaskResultBase PerformTask(CodeTaskRequestBase request)
Parameters
requestCodeTaskRequestBaseThe task request.
Returns
- CodeTaskResultBase
A structured task result.
PerformTask(string, CodeTask)
Performs a code-related task on the input code.
[Obsolete("Use PerformTask(CodeTaskRequestBase) for structured outputs.")]
public string PerformTask(string code, CodeTask task)
Parameters
Returns
- string
The result of the task as a string.
Remarks
This method allows the model to perform various code-related tasks such as completion, summarization, bug detection, etc. based on the specified task type.
For Beginners: This method lets you tell the model what to do with the code.
You provide code and specify what you want done with it:
- Complete it
- Summarize it
- Find bugs
- Generate documentation
The model then performs that specific task and returns the result.
PerformTestGeneration(CodeTestGenerationRequest)
protected virtual CodeTestGenerationResult PerformTestGeneration(CodeTestGenerationRequest request)
Parameters
requestCodeTestGenerationRequest
Returns
PerformTranslation(CodeTranslationRequest)
protected virtual CodeTranslationResult PerformTranslation(CodeTranslationRequest request)
Parameters
requestCodeTranslationRequest
Returns
PerformUnderstanding(CodeUnderstandingRequest)
protected virtual CodeUnderstandingResult PerformUnderstanding(CodeUnderstandingRequest request)
Parameters
requestCodeUnderstandingRequest
Returns
Predict(Tensor<T>)
Makes a prediction using the neural network.
public override Tensor<T> Predict(Tensor<T> input)
Parameters
inputTensor<T>The input data to process.
Returns
- Tensor<T>
The network's prediction.
Remarks
For Beginners: This is the main method you'll use to get results from your trained neural network. You provide some input data (like an image or text), and the network processes it through all its layers to produce an output (like a classification or prediction).
TrainWithOptimizer(Tensor<T>, Tensor<T>, IGradientBasedOptimizer<T, Tensor<T>, Tensor<T>>)
protected void TrainWithOptimizer(Tensor<T> input, Tensor<T> expectedOutput, IGradientBasedOptimizer<T, Tensor<T>, Tensor<T>> optimizer)
Parameters
inputTensor<T>expectedOutputTensor<T>optimizerIGradientBasedOptimizer<T, Tensor<T>, Tensor<T>>
UpdateParameters(Vector<T>)
Updates the network's parameters with new values.
public override void UpdateParameters(Vector<T> parameters)
Parameters
parametersVector<T>The new parameter values to set.
Remarks
For Beginners: During training, a neural network's internal values (parameters) get adjusted to improve its performance. This method allows you to update all those values at once by providing a complete set of new parameters.
This is typically used by optimization algorithms that calculate better parameter values based on training data.