Interface ICodeModel<T>
- Namespace
- AiDotNet.ProgramSynthesis.Interfaces
- Assembly
- AiDotNet.dll
Represents a code understanding model capable of processing and analyzing source code.
public interface ICodeModel<T> : IFullModel<T, Tensor<T>, Tensor<T>>, IModel<Tensor<T>, Tensor<T>, ModelMetadata<T>>, IModelSerializer, ICheckpointableModel, IParameterizable<T, Tensor<T>, Tensor<T>>, IFeatureAware, IFeatureImportance<T>, ICloneable<IFullModel<T, Tensor<T>, Tensor<T>>>, IGradientComputable<T, Tensor<T>, Tensor<T>>, IJitCompilable<T>
Type Parameters
TThe numeric type used for calculations (e.g., double, float).
- Inherited Members
- Extension Methods
Remarks
ICodeModel defines the interface for models that can understand, encode, and analyze source code. These models are typically pre-trained on large corpora of code and can perform tasks like code completion, bug detection, and code summarization.
For Beginners: A code model is like an AI that understands programming.
Just as language models understand human languages, code models understand programming languages. They can:
- Read and comprehend code
- Suggest completions while you're writing
- Find bugs and issues
- Explain what code does
- Translate between programming languages
This interface defines what capabilities a code model should have.
Properties
MaxSequenceLength
Gets the maximum sequence length (in tokens) that the model can process.
int MaxSequenceLength { get; }
Property Value
Remarks
Code models process code as sequences of tokens. This property specifies the maximum number of tokens the model can handle at once.
For Beginners: This is like the maximum length of code the model can read at once.
Code is broken into pieces called "tokens" (like words in a sentence). This number tells you the maximum number of tokens the model can process, which roughly corresponds to how long a code file can be.
TargetLanguage
Gets the target programming language for this model.
ProgramLanguage TargetLanguage { get; }
Property Value
Remarks
Specifies which programming language this model is designed to work with. Some models are language-specific, while others can work with multiple languages.
For Beginners: This tells you which programming language the model knows.
Like a translator who specializes in French or Spanish, code models often specialize in specific programming languages like Python or Java.
VocabularySize
Gets the vocabulary size of the model.
int VocabularySize { get; }
Property Value
Remarks
The vocabulary consists of all the tokens (keywords, operators, identifiers, etc.) that the model knows and can work with.
For Beginners: This is like the model's dictionary size.
It tells you how many different code tokens the model knows. A larger vocabulary means the model can handle more diverse code patterns and identifiers.
Methods
DecodeCode(Tensor<T>)
Decodes a vector representation back into source code.
string DecodeCode(Tensor<T> encoding)
Parameters
encodingTensor<T>The encoded representation to decode.
Returns
- string
The decoded source code as a string.
Remarks
Decoding transforms the model's internal numerical representation back into human-readable source code.
For Beginners: Decoding converts the AI's numerical format back to readable code.
After the AI processes code in numerical form, we need to convert it back to text that humans can read and computers can execute. This is the reverse of encoding.
EncodeCode(string)
Encodes source code into a vector representation.
Tensor<T> EncodeCode(string code)
Parameters
codestringThe source code to encode.
Returns
- Tensor<T>
A tensor representing the encoded code.
Remarks
Encoding transforms source code (text) into a numerical representation that the model can process. This representation captures semantic information about the code.
For Beginners: Encoding converts code text into numbers the AI can understand.
Computers can't directly work with text, so we convert code into numerical form. This encoding captures the meaning of the code, not just the characters. Like translating emotions into emoji - different form, same meaning.
GetEmbeddings(string)
Gets embeddings for code tokens.
Tensor<T> GetEmbeddings(string code)
Parameters
codestringThe source code to get embeddings for.
Returns
- Tensor<T>
A tensor containing token embeddings.
Remarks
Embeddings are dense vector representations of code tokens that capture semantic similarities. Similar code constructs have similar embeddings.
For Beginners: Embeddings represent each piece of code as a point in space.
Code with similar meaning is placed close together in this space. For example, "for loop" and "while loop" would be near each other because they're both loops, but far from "function definition" because that's a different concept.
PerformTask(CodeTaskRequestBase)
Performs a code-related task and returns a structured result type.
CodeTaskResultBase PerformTask(CodeTaskRequestBase request)
Parameters
requestCodeTaskRequestBaseThe task request.
Returns
- CodeTaskResultBase
A structured task result.
PerformTask(string, CodeTask)
Performs a code-related task on the input code.
[Obsolete("Use PerformTask(CodeTaskRequestBase) for structured outputs.")]
string PerformTask(string code, CodeTask task)
Parameters
Returns
- string
The result of the task as a string.
Remarks
This method allows the model to perform various code-related tasks such as completion, summarization, bug detection, etc. based on the specified task type.
For Beginners: This method lets you tell the model what to do with the code.
You provide code and specify what you want done with it:
- Complete it
- Summarize it
- Find bugs
- Generate documentation
The model then performs that specific task and returns the result.