Package-level declarations

Types

Link copied to clipboard
interface Encoding

Represents an encoding interface that provides essential encoding functionality and access to predefined encoding types.

Link copied to clipboard
class EncodingConfig(val pattern: Regex, val mergeableRanks: Map<ByteString, Int>, val specialTokens: Map<ByteString, Int>, val explicitNVocab: Int? = null)

Manages configurations for token encoding, providing the settings and mappings needed to perform byte pair encoding (BPE) and handle special tokens.

Link copied to clipboard
interface Tokenizer

A public interface for tokenization and de-tokenization tasks, especially tailored for handling text encoding and decoding. The primary operations include encode to convert text to a sequence of integers (tokens), and decode to convert a sequence of integers back to text.