Package de.julielab.jcore.ae.jpos.tagger
Class POSTagger
- java.lang.Object
-
- de.julielab.jcore.ae.jpos.tagger.POSTagger
-
- All Implemented Interfaces:
Serializable
public class POSTagger extends Object implements Serializable
general class which does all the ML stuff- Author:
- hellrich, based on tomanek
- See Also:
- Serialized Form
-
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description PropertiesgetFeatureConfig()ObjectgetModel()return the modelintgetNumber_Iterations()booleanisTrained()returns true when model has been successfully trained.SentencePPDtoUnits(String sentence)takes a sentence in piped format and returns the corresponding unit sentence as a Sentence objectArrayList<String>predictForCLI(ArrayList<Sentence> sentences)predict the entity labels by means of a previously learned model.voidpredictForUIMA(Sentence sentence)predicts the entity labels by means of a model.static POSTaggerreadModel(File modelFile)load a previously trained FeatureSubsetModel (CRF4+Properties) which was stored as serialized object to disk.static POSTaggerreadModel(InputStream is)load a previously trained FeatureSubsetModel (CRF4+Properties) which was stored as serialized object to disk.voidset_Number_Iterations(int number_iter)voidsetFeatureConfig(Properties featureConfig)SentencetextToUnits(String sentence)voidtrain(ArrayList<Sentence> sentences)this is to train a NE model (based on CRF); when trained, the model is stored internally.voidwriteModel(String filename)Save the model learned to disk.
-
-
-
Constructor Detail
-
POSTagger
public POSTagger()
default constructor
-
POSTagger
public POSTagger(File featureConfigFile)
constructor for feature config file- Parameters:
featureConfigFile-
-
-
Method Detail
-
isTrained
public boolean isTrained()
returns true when model has been successfully trained.- Returns:
- true if trained
-
train
public void train(ArrayList<Sentence> sentences)
this is to train a NE model (based on CRF); when trained, the model is stored internally. The model can be saved to disk using the writeModel command.- Parameters:
sentences- training data, an ArrayList of Sentence objects, File which contains the feature subset to be used in a text format
-
predictForUIMA
public void predictForUIMA(Sentence sentence)
predicts the entity labels by means of a model. this method is needed by UIMA-JNET!- Parameters:
sentence- a Sentence object containing all units (= tokens) of that sentence
-
predictForCLI
public ArrayList<String> predictForCLI(ArrayList<Sentence> sentences)
predict the entity labels by means of a previously learned model. this method is used by JNET stand alone version (for UIMA-JNET see other predict method) Output is an arraylist of IOB- Parameters:
sentences- an ArrayList of Sentence objects- Returns:
- IOB output for the sentences to be predicted. Each element of the ArrayList is a string which refers to one word and its label ("token\tlabel")
-
writeModel
public void writeModel(String filename)
Save the model learned to disk. THis is done via Java's object serialization.- Parameters:
filename- where to write it (full path!)
-
readModel
public static POSTagger readModel(InputStream is) throws IOException, FileNotFoundException, ClassNotFoundException
load a previously trained FeatureSubsetModel (CRF4+Properties) which was stored as serialized object to disk.- Parameters:
is- InputStream for a serialized featureSubsetModel- Throws:
IOExceptionFileNotFoundExceptionClassNotFoundException
-
readModel
public static POSTagger readModel(File modelFile) throws IOException, FileNotFoundException, ClassNotFoundException
load a previously trained FeatureSubsetModel (CRF4+Properties) which was stored as serialized object to disk.- Parameters:
modelFile- where to find the serialized featureSubsetModel (full path!)- Throws:
IOExceptionFileNotFoundExceptionClassNotFoundException
-
getModel
public Object getModel()
return the model
-
setFeatureConfig
public void setFeatureConfig(Properties featureConfig)
-
getFeatureConfig
public Properties getFeatureConfig()
-
PPDtoUnits
public Sentence PPDtoUnits(String sentence)
takes a sentence in piped format and returns the corresponding unit sentence as a Sentence object- Parameters:
sentence- in piped format to be converted
-
getNumber_Iterations
public int getNumber_Iterations()
-
set_Number_Iterations
public void set_Number_Iterations(int number_iter)
-
-