Class EntityAnnotator

  • All Implemented Interfaces:
    org.apache.uima.analysis_component.AnalysisComponent

    public class EntityAnnotator
    extends org.apache.uima.analysis_component.JCasAnnotator_ImplBase
    • Constructor Summary

      Constructors 
      Constructor Description
      EntityAnnotator()  
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      protected Sentence createUnitSentence​(java.util.List<de.julielab.jcore.types.Token> tokenList, org.apache.uima.jcas.JCas JCas, java.util.ArrayList<java.util.HashMap<java.lang.String,​java.lang.String>> metaList, de.julielab.jcore.utility.index.JCoReCoverIndex<de.julielab.jcore.types.Abbreviation> abbreviationIndex, de.julielab.jcore.utility.index.JCoReCoverIndex<de.julielab.jcore.types.Token> tokenIndex)
      Takes all info about meta data and generates the corresponding unit sequence represented by a Sentence object.
      protected boolean ignoreLabel​(org.apache.uima.jcas.JCas aJCas, int start, int end, de.julielab.jcore.utility.index.JCoReCoverIndex<de.julielab.jcore.types.Abbreviation> abbreviationIndex)
      tests whether annotation should be ignored as this label is on a not introduced abbreviation
      void initialize​(org.apache.uima.UimaContext aContext)
      Initialisiation of UIMA-JNET.
      void process​(org.apache.uima.jcas.JCas aJCas)
      process current CAS.
      protected Sentence removeDuplicatedTokens​(Sentence unitSentence)
      removes duplicate tokens in a unit sentence (i.e., tokens having the same offset position).
      void writeToCAS​(Sentence unitSentence, org.apache.uima.jcas.JCas aJCas, de.julielab.jcore.utility.index.JCoReCoverIndex<de.julielab.jcore.types.Abbreviation> abbreviationIndex)
      creates the respective uima annotations from JNET's predictions.
      • Methods inherited from class org.apache.uima.analysis_component.JCasAnnotator_ImplBase

        getRequiredCasInterface, process
      • Methods inherited from class org.apache.uima.analysis_component.Annotator_ImplBase

        getCasInstancesRequired, hasNext, next
      • Methods inherited from class org.apache.uima.analysis_component.AnalysisComponent_ImplBase

        batchProcessComplete, collectionProcessComplete, destroy, getContext, getResultSpecification, reconfigure, setResultSpecification
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Field Detail

      • abbrevPattern

        public java.util.regex.Pattern abbrevPattern
      • expandAbbr

        protected boolean expandAbbr
      • confidenceThresholdForConsistencyPreservation

        protected float confidenceThresholdForConsistencyPreservation
      • showSegmentConf

        protected boolean showSegmentConf
      • entityMentionTypes

        protected java.util.TreeSet<java.lang.String> entityMentionTypes
    • Constructor Detail

      • EntityAnnotator

        public EntityAnnotator()
    • Method Detail

      • initialize

        public void initialize​(org.apache.uima.UimaContext aContext)
                        throws org.apache.uima.resource.ResourceInitializationException
        Initialisiation of UIMA-JNET. Reads in and checks descriptor's parameters.
        Specified by:
        initialize in interface org.apache.uima.analysis_component.AnalysisComponent
        Overrides:
        initialize in class org.apache.uima.analysis_component.AnalysisComponent_ImplBase
        Throws:
        org.apache.uima.resource.ResourceInitializationException
      • process

        public void process​(org.apache.uima.jcas.JCas aJCas)
                     throws org.apache.uima.analysis_engine.AnalysisEngineProcessException
        process current CAS. In case, abbreviation expansion is turned on, the abbreviation is replaced by its full form which is used during prediction. The labels of this full form are then applied to the original, short form.
        Specified by:
        process in class org.apache.uima.analysis_component.JCasAnnotator_ImplBase
        Throws:
        org.apache.uima.analysis_engine.AnalysisEngineProcessException
      • removeDuplicatedTokens

        protected Sentence removeDuplicatedTokens​(Sentence unitSentence)
        removes duplicate tokens in a unit sentence (i.e., tokens having the same offset position). This is necessary if abbreviations in sentence were expanded for prediction. Then, afterwards, this method needs to be called before writing the prediction into the CAS. When tokens within abbreviation long form differ in their prediction, the outside label is assumed for the abbreviation!
      • createUnitSentence

        protected Sentence createUnitSentence​(java.util.List<de.julielab.jcore.types.Token> tokenList,
                                              org.apache.uima.jcas.JCas JCas,
                                              java.util.ArrayList<java.util.HashMap<java.lang.String,​java.lang.String>> metaList,
                                              de.julielab.jcore.utility.index.JCoReCoverIndex<de.julielab.jcore.types.Abbreviation> abbreviationIndex,
                                              de.julielab.jcore.utility.index.JCoReCoverIndex<de.julielab.jcore.types.Token> tokenIndex)
        Takes all info about meta data and generates the corresponding unit sequence represented by a Sentence object. Abbreviation is expanded when specified in descriptor. Only abbreviations which span over single tokens can be interpreted here. Other case (which is very rare and thus probably not relevant) is ignored!
        Parameters:
        tokenList - a list of Token objects of the current sentence
        JCas - the CAS we are working on
        metaList - a Arraylist of meta-info HashMaps which specify the meta information of the respective token
        abbreviationIndex -
        tokenIndex -
        Returns:
        an array of two sequences of units containing all available meta data for the corresponding tokens. In the first sequence, abbreviations are expanded to their fullform. In the second sequence, the tokens are of their original form.
      • writeToCAS

        public void writeToCAS​(Sentence unitSentence,
                               org.apache.uima.jcas.JCas aJCas,
                               de.julielab.jcore.utility.index.JCoReCoverIndex<de.julielab.jcore.types.Abbreviation> abbreviationIndex)
        creates the respective uima annotations from JNET's predictions. Therefore, we loop over JNET's Sentence objects which contain predictions/labels for each Unit (i.e., for each token).
        Parameters:
        unitSentence - the current Sentence object
        aJCas - the cas to write the annotation to
        abbreviationIndex -
      • ignoreLabel

        protected boolean ignoreLabel​(org.apache.uima.jcas.JCas aJCas,
                                      int start,
                                      int end,
                                      de.julielab.jcore.utility.index.JCoReCoverIndex<de.julielab.jcore.types.Abbreviation> abbreviationIndex)
        tests whether annotation should be ignored as this label is on a not introduced abbreviation
        Parameters:
        aJCas -
        start -
        end -
        abbreviationIndex -
        coveredText -
        Returns: