Table of Contents

Interface ICodeModel<T>

Namespace
AiDotNet.ProgramSynthesis.Interfaces
Assembly
AiDotNet.dll

Represents a code understanding model capable of processing and analyzing source code.

public interface ICodeModel<T> : IFullModel<T, Tensor<T>, Tensor<T>>, IModel<Tensor<T>, Tensor<T>, ModelMetadata<T>>, IModelSerializer, ICheckpointableModel, IParameterizable<T, Tensor<T>, Tensor<T>>, IFeatureAware, IFeatureImportance<T>, ICloneable<IFullModel<T, Tensor<T>, Tensor<T>>>, IGradientComputable<T, Tensor<T>, Tensor<T>>, IJitCompilable<T>

Type Parameters

T

The numeric type used for calculations (e.g., double, float).

Inherited Members
Extension Methods

Remarks

ICodeModel defines the interface for models that can understand, encode, and analyze source code. These models are typically pre-trained on large corpora of code and can perform tasks like code completion, bug detection, and code summarization.

For Beginners: A code model is like an AI that understands programming.

Just as language models understand human languages, code models understand programming languages. They can:

  • Read and comprehend code
  • Suggest completions while you're writing
  • Find bugs and issues
  • Explain what code does
  • Translate between programming languages

This interface defines what capabilities a code model should have.

Properties

MaxSequenceLength

Gets the maximum sequence length (in tokens) that the model can process.

int MaxSequenceLength { get; }

Property Value

int

Remarks

Code models process code as sequences of tokens. This property specifies the maximum number of tokens the model can handle at once.

For Beginners: This is like the maximum length of code the model can read at once.

Code is broken into pieces called "tokens" (like words in a sentence). This number tells you the maximum number of tokens the model can process, which roughly corresponds to how long a code file can be.

TargetLanguage

Gets the target programming language for this model.

ProgramLanguage TargetLanguage { get; }

Property Value

ProgramLanguage

Remarks

Specifies which programming language this model is designed to work with. Some models are language-specific, while others can work with multiple languages.

For Beginners: This tells you which programming language the model knows.

Like a translator who specializes in French or Spanish, code models often specialize in specific programming languages like Python or Java.

VocabularySize

Gets the vocabulary size of the model.

int VocabularySize { get; }

Property Value

int

Remarks

The vocabulary consists of all the tokens (keywords, operators, identifiers, etc.) that the model knows and can work with.

For Beginners: This is like the model's dictionary size.

It tells you how many different code tokens the model knows. A larger vocabulary means the model can handle more diverse code patterns and identifiers.

Methods

DecodeCode(Tensor<T>)

Decodes a vector representation back into source code.

string DecodeCode(Tensor<T> encoding)

Parameters

encoding Tensor<T>

The encoded representation to decode.

Returns

string

The decoded source code as a string.

Remarks

Decoding transforms the model's internal numerical representation back into human-readable source code.

For Beginners: Decoding converts the AI's numerical format back to readable code.

After the AI processes code in numerical form, we need to convert it back to text that humans can read and computers can execute. This is the reverse of encoding.

EncodeCode(string)

Encodes source code into a vector representation.

Tensor<T> EncodeCode(string code)

Parameters

code string

The source code to encode.

Returns

Tensor<T>

A tensor representing the encoded code.

Remarks

Encoding transforms source code (text) into a numerical representation that the model can process. This representation captures semantic information about the code.

For Beginners: Encoding converts code text into numbers the AI can understand.

Computers can't directly work with text, so we convert code into numerical form. This encoding captures the meaning of the code, not just the characters. Like translating emotions into emoji - different form, same meaning.

GetEmbeddings(string)

Gets embeddings for code tokens.

Tensor<T> GetEmbeddings(string code)

Parameters

code string

The source code to get embeddings for.

Returns

Tensor<T>

A tensor containing token embeddings.

Remarks

Embeddings are dense vector representations of code tokens that capture semantic similarities. Similar code constructs have similar embeddings.

For Beginners: Embeddings represent each piece of code as a point in space.

Code with similar meaning is placed close together in this space. For example, "for loop" and "while loop" would be near each other because they're both loops, but far from "function definition" because that's a different concept.

PerformTask(CodeTaskRequestBase)

Performs a code-related task and returns a structured result type.

CodeTaskResultBase PerformTask(CodeTaskRequestBase request)

Parameters

request CodeTaskRequestBase

The task request.

Returns

CodeTaskResultBase

A structured task result.

PerformTask(string, CodeTask)

Performs a code-related task on the input code.

[Obsolete("Use PerformTask(CodeTaskRequestBase) for structured outputs.")]
string PerformTask(string code, CodeTask task)

Parameters

code string

The source code to process.

task CodeTask

The type of task to perform.

Returns

string

The result of the task as a string.

Remarks

This method allows the model to perform various code-related tasks such as completion, summarization, bug detection, etc. based on the specified task type.

For Beginners: This method lets you tell the model what to do with the code.

You provide code and specify what you want done with it:

  • Complete it
  • Summarize it
  • Find bugs
  • Generate documentation

The model then performs that specific task and returns the result.