public class Encoder extends Object
This class represents a table mapping tuples of strings to integer values. It
is used by LayeredTokenPattern for matching patterns against
LayeredSequence objects.
The core of this class is a mapping from string tuples of length n to
integers 0 <= i < MAX_SIZE. The mapping is defined by
a list of n sets of String symbols S_1, ..., S_n, and a
special symbol UNK. The mapping assigns an integer value to
each tuple (x_1, ..., x_n), where x_i is either in
S_i or is the symbol UNK. For example, if n = 2 and
S_1 = S_2 = {0,1}, then a possible mapping would be
(0,0) => 0, (0,1) => 1,
(0, UNK) => 2, (1,0) => 3, (1,1) => 4, (1,UNK) => 5, (UNK,0) => 6,
(UNK,1) => 7, (UNK,UNK) => 8.
Given a String tuple (x_1, ..., x_n), it is mapped to an integer
value as follows. First, it is mapped to an intermediate tuple
(y_1, ..., y_n), where y_i = x_i if x_i is in
S_i, otherwise y_i = UNK. Then the value of
(y_1, ..., y_n) according to the mapping is returned. This procedure
is implemented in the method encode(String[]), which
represents tuples as String arrays.
There is no guarantee on the actual integer values assigned to each tuple.
The mapping cannot be larger than 2^16. This means that the product
(|S_1|+1) * (|S_2|+1) * ... * (|S_n| + 1) must be less than or equal
to 2^16.
| Modifier and Type | Field and Description |
|---|---|
static int |
MAX_SIZE
The maximum encoding size.
|
static String |
UNK
The "unknown" symbol.
|
| Constructor and Description |
|---|
Encoder(List<Set<String>> symbols)
Constructs a new encoding table using the given symbol sets.
|
| Modifier and Type | Method and Description |
|---|---|
char |
encode(String[] tuple)
Encodes the given tuple (represented as a String array) to its integer
value, represented as a char.
|
char[] |
encodeClass(int index,
String value)
Encodes a "class" of tuples that all have the symbol value in the given
layer index.
|
int |
size() |
int |
tableSize() |
public static final int MAX_SIZE
public static final String UNK
public Encoder(List<Set<String>> symbols) throws SequenceException
UNK.symbols - SequenceException - if the symbol sets result in an encoding table larger than
MAX_SIZE.public int size()
public int tableSize()
public char encode(String[] tuple) throws SequenceException
tuple - SequenceException - if unable to encode the tuplepublic char[] encodeClass(int index,
String value)
throws SequenceException
(0,0) => 0, (0,1) => 1,
(0,UNK) => 2, ..., then calling this method with layerIndex = 0
and value = 1 will return the encodings of (1,0), (1,1),
and (1,UNK) as an array.index - the position in the tuple (defined by the order of sets passed
to the constructor)value - SequenceException - if the index is out of bounds, or if any of the resulting
tuples cannot be encodedCopyright © 2010-2013 University of Washington CSE. All Rights Reserved.