public class HMM extends TokenClassifier
Note that this HMM assumes that tokens are emitted by states, not by arcs. However, the start and end states do not emit tokens, so a sequence of N tokens is matched by a sequence of N+2 states, including the start and end state.
The HMM incorporates an auxiliary memory in the form of a document dictionary ('cache'), which is intended for use in name tagging. If a word has once been tagged as a specific type of name ("Mr. John Park") within a document, this can be recorded so that subsequent uses of the name will be consistently tagged even if the context is ambiguous ("Park").
In addition to generating the best path with the Viterbi decoder,
the decoder computes the margin (the difference in score between the best and
second best path) and alternative (N-best) paths. To obtain the margin,
one must set recordMargin before calling Viterbi
and getMargin after calling Viterbi. To obtain
N-best paths, one must set setNbest before calling Viterbi.
Then, after calling Viterbi, each call on nextBest
returns the next best path.
| Modifier and Type | Field and Description |
|---|---|
double |
pathProbability
after either the viterbi decode or the nextBest method has been
invoked, the probability along the most recently returned path.
|
protected static double |
UNLIKELY |
double |
viterbiProbability
after the viterbi decoder method has been invoked, the probability
along the best path found by the decoder.
|
| Constructor and Description |
|---|
HMM()
create a new HMM using instances of
BasicHMMemitter to control
emission of tokens from states. |
HMM(Class emitterClass)
create a new HMM using instances of
emitterClass to control
emission of tokens from states. |
| Modifier and Type | Method and Description |
|---|---|
void |
addState(HMMstate state)
add state
state to the HMM. |
void |
computeProbabilities()
compute the probabilities for token emission and state transition from the
counts acquired in training.
|
void |
createModel() |
double |
getLocalMargin(Document doc,
Annotation[] tokens,
String excludedTag,
int excludedTagStart,
int excludedTagEnd)
returns the margin for assigning a particular tag to a sequence of
tokens.
|
double |
getMargin()
if invoked after a call on 'viterbi', returns the margin (the difference
in score between the best and second best analyses).
|
double |
getPathProbability()
after either the viterbi decode or the nextBest method has been
invoked, returns the probability along the most recently returned path.
|
HMMstate |
getState(String stateName)
returns state with given name, or null if no such state
|
void |
load(Reader HMMReader)
read a description of an HMM from
HMMReader. |
void |
load(String fileName) |
static void |
main(String[] args)
tests the Viterbi decoder and the N-best path generator on a simple
4-word noun phrase using a 4-state HMM.
|
void |
newDocument()
clears the name cache for the document.
|
String[] |
nextBest()
an N-best-paths generator for HMMs.
|
int[] |
nextBestPath()
an N-best-paths generator for HMMs.
|
void |
print()
print a complete description of the HMM (all states and arcs) to System.out.
|
void |
recordMargin()
enable the recording of the margin (the difference in score between the
best and second best analysis) by the Viterbi decoder.
|
void |
resetForTraining() |
void |
setNbest()
enables N-best search.
|
void |
setTagsToCache(String[] tags) |
void |
store(PrintWriter stream)
save the HMM to
stream in a form which can be reloaded
using load(java.io.Reader). |
void |
store(String fileName) |
void |
train(Document doc,
Annotation[] tokens,
String[] tags)
a slower algorithm for training the HMM.
|
void |
train0(Document doc,
Annotation[] tokens,
String[] tags)
a fast, simple algorithm for training the HMM.
|
String[] |
viterbi(Document doc,
Annotation[] tokens)
a Viterbi decoder for HMMs.
|
int[] |
viterbiPath(Document doc,
Annotation[] tokens)
a Viterbi decoder for HMMs.
|
protected static final double UNLIKELY
public double viterbiProbability
public double pathProbability
public HMM()
BasicHMMemitter to control
emission of tokens from states.public HMM(Class emitterClass)
emitterClass to control
emission of tokens from states.public void setTagsToCache(String[] tags)
public void load(Reader HMMReader) throws IOException
HMMReader. The
description consists of lines IOExceptionpublic void load(String fileName)
load in class TokenClassifierpublic void addState(HMMstate state)
state to the HMM.public HMMstate getState(String stateName)
public void resetForTraining()
public void newDocument()
newDocument in class TokenClassifierpublic void train0(Document doc, Annotation[] tokens, String[] tags)
public void train(Document doc, Annotation[] tokens, String[] tags)
train in class TokenClassifierpublic void computeProbabilities()
public void createModel()
createModel in class TokenClassifierpublic void print()
public void store(PrintWriter stream)
stream in a form which can be reloaded
using load(java.io.Reader).public void store(String fileName)
store in class TokenClassifierpublic int[] viterbiPath(Document doc, Annotation[] tokens)
tokens, on document
doc, returns the most likely path which can generate
those tokens. The value returned is an array of the states
(indexes into states) along the most likely path.public String[] viterbi(Document doc, Annotation[] tokens)
tokens, on document
doc, returns the most likely path which can generate
those tokens. The value returned is an array of the tags
associated with the states along the most likely path.viterbi in class TokenClassifierpublic double getPathProbability()
getPathProbability in class TokenClassifierpublic void recordMargin()
recordMargin in class TokenClassifierpublic double getMargin()
getMargin in class TokenClassifierpublic double getLocalMargin(Document doc, Annotation[] tokens, String excludedTag, int excludedTagStart, int excludedTagEnd)
excludedTag.getLocalMargin in class TokenClassifierdoc - the Document containing the sentence being taggedtokens - the token annotations for the sentenceexcludedTag - the tag assigned to the sequenceexcludedTagStart - the index of the first token being assigned this tagexcludedTagEnd - the index of the last token being assigned this tagpublic void setNbest()
viterbi if you intend to also call nextBest.public int[] nextBestPath()
viterbiPath
has already been called with an array of token annotations and has
returned the most likely path. Each subsequent call on nextBest
returns the next best (next most likely) path, or null if no
further paths can be found. The value returned is an array of the states
(indexes into states) along the most likely path.public String[] nextBest()
viterbi
has already been called with an array of token annotations and has
returned the most likely path. Each subsequent call on nextBest
returns the next best (next most likely) path, or null if no
further paths can be found. The value returned is an array of the tags
associated with the states along the most likely path.nextBest in class TokenClassifierpublic static void main(String[] args)
Copyright © 2016 New York University. All rights reserved.