|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Objectedu.washington.cs.knowitall.sequence.Encoder
public class Encoder
This class represents a table mapping tuples of strings to integer values. It
is used by LayeredTokenPattern for matching patterns against
LayeredSequence objects.
The core of this class is a mapping from string tuples of length n to
integers 0 <= i < MAX_SIZE. The mapping is defined by
a list of n sets of String symbols S_1, ..., S_n, and a
special symbol UNK. The mapping assigns an integer value to
each tuple (x_1, ..., x_n), where x_i is either in
S_i or is the symbol UNK. For example, if n = 2 and
S_1 = S_2 = {0,1}, then a possible mapping would be
(0,0) => 0, (0,1) => 1,
(0, UNK) => 2, (1,0) => 3, (1,1) => 4, (1,UNK) => 5, (UNK,0) => 6,
(UNK,1) => 7, (UNK,UNK) => 8.
Given a String tuple (x_1, ..., x_n), it is mapped to an integer
value as follows. First, it is mapped to an intermediate tuple
(y_1, ..., y_n), where y_i = x_i if x_i is in
S_i, otherwise y_i = UNK. Then the value of
(y_1, ..., y_n) according to the mapping is returned. This procedure
is implemented in the method encode(String[]), which
represents tuples as String arrays.
There is no guarantee on the actual integer values assigned to each tuple.
The mapping cannot be larger than 2^16. This means that the product
(|S_1|+1) * (|S_2|+1) * ... * (|S_n| + 1) must be less than or equal
to 2^16.
| Field Summary | |
|---|---|
static int |
MAX_SIZE
The maximum encoding size. |
static String |
UNK
The "unknown" symbol. |
| Constructor Summary | |
|---|---|
Encoder(List<Set<String>> symbols)
Constructs a new encoding table using the given symbol sets. |
|
| Method Summary | |
|---|---|
char |
encode(String[] tuple)
Encodes the given tuple (represented as a String array) to its integer value, represented as a char. |
char[] |
encodeClass(int index,
String value)
Encodes a "class" of tuples that all have the symbol value in the given layer index. |
int |
size()
|
int |
tableSize()
|
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Field Detail |
|---|
public static final int MAX_SIZE
public static final String UNK
| Constructor Detail |
|---|
public Encoder(List<Set<String>> symbols)
throws SequenceException
UNK.
symbols -
SequenceException - if the symbol sets result in an encoding table larger than
MAX_SIZE.| Method Detail |
|---|
public int size()
public int tableSize()
public char encode(String[] tuple)
throws SequenceException
tuple -
SequenceException - if unable to encode the tuple
public char[] encodeClass(int index,
String value)
throws SequenceException
(0,0) => 0, (0,1) => 1,
(0,UNK) => 2, ..., then calling this method with layerIndex = 0
and value = 1 will return the encodings of (1,0), (1,1),
and (1,UNK) as an array.
index - the position in the tuple (defined by the order of sets passed
to the constructor)value -
SequenceException - if the index is out of bounds, or if any of the resulting
tuples cannot be encoded
|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||