Namespace AiDotNet.Tokenization.CodeTokenization
Classes
- CodeBertTokenizer
CodeBERT-compatible tokenizer for program synthesis and code understanding tasks. Combines WordPiece tokenization with code-aware preprocessing.
- CodeTokenizer
Code-aware tokenizer that handles programming language constructs. Supports identifier splitting, keyword recognition, and language-specific patterns.
- TreeSitterTokenizer
AST-aware tokenizer using Tree-sitter for parsing source code into syntax trees. Provides structure-aware tokenization that understands programming language grammar.
Enums
- ProgrammingLanguage
Programming languages supported by the code tokenizer.
- TreeSitterLanguage
Supported programming languages for Tree-sitter parsing.