public class GISTrainer extends Object
ftp://ftp.cis.upenn.edu/pub/ircs/tr/97-08.ps.Z.
The slack parameter used in the above implementation has been removed by default
from the computation and a method for updating with Gaussian smoothing has been
added per Investigating GIS and Smoothing for Maximum Entropy Taggers, Clark and Curran (2002).
http://acl.ldc.upenn.edu/E/E03/E03-1071.pdf
The slack parameter can be used by setting useSlackParameter to true.
Gaussian smoothing can be used by setting useGaussianSmoothing to true.
A prior can be used to train models which converge to the distribution which minimizes the
relative entropy between the distribution specified by the empirical constraints of the training
data and the specified prior. By default, the uniform distribution is used as the prior.| Modifier and Type | Method and Description |
|---|---|
void |
setGaussianSigma(double sigmaValue)
Sets whether this trainer will use smoothing while training the model.
|
void |
setSmoothing(boolean smooth)
Sets whether this trainer will use smoothing while training the model.
|
void |
setSmoothingObservation(double timesSeen)
Sets whether this trainer will use smoothing while training the model.
|
opennlp.maxent.GISModel |
trainModel(opennlp.model.EventStream eventStream,
int iterations,
int cutoff)
Trains a GIS model on the event in the specified event stream, using the specified number
of iterations and the specified count cutoff.
|
opennlp.maxent.GISModel |
trainModel(int iterations,
opennlp.model.DataIndexer di,
int cutoff)
Train a model using the GIS algorithm.
|
opennlp.maxent.GISModel |
trainModel(int iterations,
opennlp.model.DataIndexer di,
opennlp.model.Prior modelPrior,
int cutoff)
Train a model using the GIS algorithm.
|
public void setSmoothing(boolean smooth)
smooth - true if smoothing is desired, false if notpublic void setSmoothingObservation(double timesSeen)
timesSeen - the "number" of times we want the trainer to imagine
it saw a feature that it actually didn't seepublic void setGaussianSigma(double sigmaValue)
sigmaValue - set the sigmaValue for smoothing. useGaussianSmoothing is
set to true automaticallypublic opennlp.maxent.GISModel trainModel(opennlp.model.EventStream eventStream,
int iterations,
int cutoff)
throws IOException
eventStream - A stream of all events.iterations - The number of iterations to use for GIS.cutoff - The number of times a feature must occur to be included.IOExceptionpublic opennlp.maxent.GISModel trainModel(int iterations,
opennlp.model.DataIndexer di,
int cutoff)
iterations - The number of GIS iterations to perform.di - The data indexer used to compress events in memory.public opennlp.maxent.GISModel trainModel(int iterations,
opennlp.model.DataIndexer di,
opennlp.model.Prior modelPrior,
int cutoff)
iterations - The number of GIS iterations to perform.di - The data indexer used to compress events in memory.modelPrior - The prior distribution used to train this model.Copyright © 2016 New York University. All rights reserved.