Class NETagger


  • public class NETagger
    extends java.lang.Object
    general class which does all the ML stuff TODO confidence estimation also for IOB (not only IO)
    Author:
    tomanek
    • Constructor Summary

      Constructors 
      Constructor Description
      NETagger()
      default constructor
      NETagger​(java.io.File featureConfigFile)
      constructor for feature config file
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      java.util.Properties getFeatureConfig()  
      java.lang.Object getModel()
      return the model
      int getNumber_Iterations()  
      boolean is_Max_Ent()  
      boolean isTrained()
      returns true when model has been successfully trained.
      Sentence PPDtoUnits​(java.lang.String sentence)
      takes a sentence in piped format and returns the corresponding unit sentence as a Sentence object
      void predict​(Sentence sentence, boolean showSegmentConfidence)
      predicts the entity labels by means of a model.
      java.util.ArrayList<java.lang.String> predictIOB​(java.util.ArrayList<Sentence> sentences, boolean showSegmentConfidence)
      predict the entity labels by means of a previously learned model.
      void readModel​(java.io.File f)
      load a previously trained FeatureSubsetModel (CRF4+Properties) which was stored as serialized object to disk.
      void readModel​(java.io.InputStream is)
      load a previously trained FeatureSubsetModel (CRF4+Properties) which was stored as serialized object to disk.
      void set_Max_Ent​(boolean me_train)  
      void set_Number_Iterations​(int number_iter)  
      void setFeatureConfig​(java.util.Properties featureConfig)  
      void train​(java.util.ArrayList<Sentence> sentences)
      this is to train a NE model (based on CRF); when trained, the model is stored internally.
      void writeModel​(java.lang.String filename)
      Save the model learned to disk.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Constructor Detail

      • NETagger

        public NETagger()
        default constructor
      • NETagger

        public NETagger​(java.io.File featureConfigFile)
        constructor for feature config file
        Parameters:
        featureConfigFile -
    • Method Detail

      • isTrained

        public boolean isTrained()
        returns true when model has been successfully trained.
        Returns:
        true if trained
      • train

        public void train​(java.util.ArrayList<Sentence> sentences)
        this is to train a NE model (based on CRF); when trained, the model is stored internally. The model can be saved to disk using the writeModel command.
        Parameters:
        sentences - training data, an ArrayList of Sentence objects, File which contains the feature subset to be used in a text format
      • predict

        public void predict​(Sentence sentence,
                            boolean showSegmentConfidence)
        predicts the entity labels by means of a model. this method is needed by UIMA-JNET!
        Parameters:
        sentence - a Sentence object containing all units (= tokens) of that sentence
        showSegmentConfidence - when this flag is set to true for all found entities a confidence is estimated. The confidence is stored in Unit object.
      • predictIOB

        public java.util.ArrayList<java.lang.String> predictIOB​(java.util.ArrayList<Sentence> sentences,
                                                                boolean showSegmentConfidence)
        predict the entity labels by means of a previously learned model. this method is used by JNET stand alone version (for UIMA-JNET see other predict method) Output is an arraylist of IOB
        Parameters:
        sentences - an ArrayList of Sentence objects
        showSegmentConfidence - when this flag is set to true for all found entities a confidence is estimated. Confidence is written to IOB outputfile.
        Returns:
        IOB output for the sentences to be predicted. Each element of the ArrayList is a string which refers to one word and its label ("token\tlabel")
      • writeModel

        public void writeModel​(java.lang.String filename)
        Save the model learned to disk. THis is done via Java's object serialization.
        Parameters:
        filename - where to write it (full path!)
      • readModel

        public void readModel​(java.io.File f)
                       throws java.io.IOException,
                              java.io.FileNotFoundException,
                              java.lang.ClassNotFoundException
        load a previously trained FeatureSubsetModel (CRF4+Properties) which was stored as serialized object to disk.
        Parameters:
        is - input stream of the serialized featureSubsetModel
        Throws:
        java.io.IOException
        java.io.FileNotFoundException
        java.lang.ClassNotFoundException
      • readModel

        public void readModel​(java.io.InputStream is)
                       throws java.io.IOException,
                              java.io.FileNotFoundException,
                              java.lang.ClassNotFoundException
        load a previously trained FeatureSubsetModel (CRF4+Properties) which was stored as serialized object to disk.
        Parameters:
        is - input stream of the serialized featureSubsetModel
        Throws:
        java.io.IOException
        java.io.FileNotFoundException
        java.lang.ClassNotFoundException
      • getModel

        public java.lang.Object getModel()
        return the model
      • setFeatureConfig

        public void setFeatureConfig​(java.util.Properties featureConfig)
      • getFeatureConfig

        public java.util.Properties getFeatureConfig()
      • PPDtoUnits

        public Sentence PPDtoUnits​(java.lang.String sentence)
        takes a sentence in piped format and returns the corresponding unit sentence as a Sentence object
        Parameters:
        sentence - in piped format to be converted
      • getNumber_Iterations

        public int getNumber_Iterations()
      • set_Number_Iterations

        public void set_Number_Iterations​(int number_iter)
      • is_Max_Ent

        public boolean is_Max_Ent()
      • set_Max_Ent

        public void set_Max_Ent​(boolean me_train)