Package de.julielab.jcore.ae.jnet.tagger
Class NETagger
- java.lang.Object
-
- de.julielab.jcore.ae.jnet.tagger.NETagger
-
public class NETagger extends java.lang.Objectgeneral class which does all the ML stuff TODO confidence estimation also for IOB (not only IO)- Author:
- tomanek
-
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description java.util.PropertiesgetFeatureConfig()java.lang.ObjectgetModel()return the modelintgetNumber_Iterations()booleanis_Max_Ent()booleanisTrained()returns true when model has been successfully trained.SentencePPDtoUnits(java.lang.String sentence)takes a sentence in piped format and returns the corresponding unit sentence as a Sentence objectvoidpredict(Sentence sentence, boolean showSegmentConfidence)predicts the entity labels by means of a model.java.util.ArrayList<java.lang.String>predictIOB(java.util.ArrayList<Sentence> sentences, boolean showSegmentConfidence)predict the entity labels by means of a previously learned model.voidreadModel(java.io.File f)load a previously trained FeatureSubsetModel (CRF4+Properties) which was stored as serialized object to disk.voidreadModel(java.io.InputStream is)load a previously trained FeatureSubsetModel (CRF4+Properties) which was stored as serialized object to disk.voidset_Max_Ent(boolean me_train)voidset_Number_Iterations(int number_iter)voidsetFeatureConfig(java.util.Properties featureConfig)voidtrain(java.util.ArrayList<Sentence> sentences)this is to train a NE model (based on CRF); when trained, the model is stored internally.voidwriteModel(java.lang.String filename)Save the model learned to disk.
-
-
-
Method Detail
-
isTrained
public boolean isTrained()
returns true when model has been successfully trained.- Returns:
- true if trained
-
train
public void train(java.util.ArrayList<Sentence> sentences)
this is to train a NE model (based on CRF); when trained, the model is stored internally. The model can be saved to disk using the writeModel command.- Parameters:
sentences- training data, an ArrayList of Sentence objects, File which contains the feature subset to be used in a text format
-
predict
public void predict(Sentence sentence, boolean showSegmentConfidence)
predicts the entity labels by means of a model. this method is needed by UIMA-JNET!- Parameters:
sentence- a Sentence object containing all units (= tokens) of that sentenceshowSegmentConfidence- when this flag is set to true for all found entities a confidence is estimated. The confidence is stored in Unit object.
-
predictIOB
public java.util.ArrayList<java.lang.String> predictIOB(java.util.ArrayList<Sentence> sentences, boolean showSegmentConfidence)
predict the entity labels by means of a previously learned model. this method is used by JNET stand alone version (for UIMA-JNET see other predict method) Output is an arraylist of IOB- Parameters:
sentences- an ArrayList of Sentence objectsshowSegmentConfidence- when this flag is set to true for all found entities a confidence is estimated. Confidence is written to IOB outputfile.- Returns:
- IOB output for the sentences to be predicted. Each element of the ArrayList is a string which refers to one word and its label ("token\tlabel")
-
writeModel
public void writeModel(java.lang.String filename)
Save the model learned to disk. THis is done via Java's object serialization.- Parameters:
filename- where to write it (full path!)
-
readModel
public void readModel(java.io.File f) throws java.io.IOException, java.io.FileNotFoundException, java.lang.ClassNotFoundExceptionload a previously trained FeatureSubsetModel (CRF4+Properties) which was stored as serialized object to disk.- Parameters:
is- input stream of the serialized featureSubsetModel- Throws:
java.io.IOExceptionjava.io.FileNotFoundExceptionjava.lang.ClassNotFoundException
-
readModel
public void readModel(java.io.InputStream is) throws java.io.IOException, java.io.FileNotFoundException, java.lang.ClassNotFoundExceptionload a previously trained FeatureSubsetModel (CRF4+Properties) which was stored as serialized object to disk.- Parameters:
is- input stream of the serialized featureSubsetModel- Throws:
java.io.IOExceptionjava.io.FileNotFoundExceptionjava.lang.ClassNotFoundException
-
getModel
public java.lang.Object getModel()
return the model
-
setFeatureConfig
public void setFeatureConfig(java.util.Properties featureConfig)
-
getFeatureConfig
public java.util.Properties getFeatureConfig()
-
PPDtoUnits
public Sentence PPDtoUnits(java.lang.String sentence)
takes a sentence in piped format and returns the corresponding unit sentence as a Sentence object- Parameters:
sentence- in piped format to be converted
-
getNumber_Iterations
public int getNumber_Iterations()
-
set_Number_Iterations
public void set_Number_Iterations(int number_iter)
-
is_Max_Ent
public boolean is_Max_Ent()
-
set_Max_Ent
public void set_Max_Ent(boolean me_train)
-
-