EncodingConfig

class EncodingConfig(val pattern: Regex, val mergeableRanks: Map<ByteString, Int>, val specialTokens: Map<ByteString, Int>, val explicitNVocab: Int? = null)

Manages configurations for token encoding, providing the settings and mappings needed to perform byte pair encoding (BPE) and handle special tokens.

Constructors

Link copied to clipboard
constructor(pattern: Regex, mergeableRanks: Map<ByteString, Int>, specialTokens: Map<ByteString, Int>, explicitNVocab: Int? = null)

Types

Link copied to clipboard
object Companion

Properties

Link copied to clipboard
val explicitNVocab: Int? = null

The number of tokens in the vocabulary. If provided, it is checked that the number of mergeable tokens and special tokens is equal to this number.

Link copied to clipboard
val mergeableRanks: Map<ByteString, Int>

A dictionary mapping mergeable token bytes to their ranks. The ranks must correspond to merge priority.

Link copied to clipboard

A regex pattern string that is used to split the input text.

Link copied to clipboard
val specialTokens: Map<ByteString, Int>

A dictionary mapping special token strings to their token values.