Package de.julielab.jcore.ae.jnet.uima
Class EntityAnnotator
- java.lang.Object
-
- org.apache.uima.analysis_component.AnalysisComponent_ImplBase
-
- org.apache.uima.analysis_component.Annotator_ImplBase
-
- org.apache.uima.analysis_component.JCasAnnotator_ImplBase
-
- de.julielab.jcore.ae.jnet.uima.EntityAnnotator
-
- All Implemented Interfaces:
org.apache.uima.analysis_component.AnalysisComponent
public class EntityAnnotator extends org.apache.uima.analysis_component.JCasAnnotator_ImplBase
-
-
Field Summary
Fields Modifier and Type Field Description protected static StringABBREV_PATTERNPatternabbrevPatternprotected floatconfidenceThresholdForConsistencyPreservationprotected ConsistencyPreservationconsistencyPreservationprotected TreeSet<String>entityMentionTypesprotected booleanexpandAbbrprotected NegativeListnegativeListprotected booleanshowSegmentConf
-
Constructor Summary
Constructors Constructor Description EntityAnnotator()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description protected SentencecreateUnitSentence(List<Token> tokenList, org.apache.uima.jcas.JCas JCas, ArrayList<HashMap<String,String>> metaList, JCoReCoverIndex<Abbreviation> abbreviationIndex, JCoReCoverIndex<Token> tokenIndex)Takes all info about meta data and generates the corresponding unit sequence represented by a Sentence object.protected booleanignoreLabel(org.apache.uima.jcas.JCas aJCas, int start, int end, JCoReCoverIndex<Abbreviation> abbreviationIndex)tests whether annotation should be ignored as this label is on a not introduced abbreviationvoidinitialize(org.apache.uima.UimaContext aContext)Initialisiation of UIMA-JNET.voidprocess(org.apache.uima.jcas.JCas aJCas)process current CAS.protected SentenceremoveDuplicatedTokens(Sentence unitSentence)removes duplicate tokens in a unit sentence (i.e., tokens having the same offset position).voidwriteToCAS(Sentence unitSentence, org.apache.uima.jcas.JCas aJCas, JCoReCoverIndex<Abbreviation> abbreviationIndex)creates the respective uima annotations from JNET's predictions.-
Methods inherited from class org.apache.uima.analysis_component.JCasAnnotator_ImplBase
getRequiredCasInterface, process
-
Methods inherited from class org.apache.uima.analysis_component.Annotator_ImplBase
getCasInstancesRequired, hasNext, next
-
-
-
-
Field Detail
-
ABBREV_PATTERN
protected static final String ABBREV_PATTERN
- See Also:
- Constant Field Values
-
abbrevPattern
public Pattern abbrevPattern
-
expandAbbr
protected boolean expandAbbr
-
consistencyPreservation
protected ConsistencyPreservation consistencyPreservation
-
confidenceThresholdForConsistencyPreservation
protected float confidenceThresholdForConsistencyPreservation
-
showSegmentConf
protected boolean showSegmentConf
-
negativeList
protected NegativeList negativeList
-
-
Method Detail
-
initialize
public void initialize(org.apache.uima.UimaContext aContext) throws org.apache.uima.resource.ResourceInitializationExceptionInitialisiation of UIMA-JNET. Reads in and checks descriptor's parameters.- Specified by:
initializein interfaceorg.apache.uima.analysis_component.AnalysisComponent- Overrides:
initializein classorg.apache.uima.analysis_component.AnalysisComponent_ImplBase- Throws:
org.apache.uima.resource.ResourceInitializationException
-
process
public void process(org.apache.uima.jcas.JCas aJCas) throws org.apache.uima.analysis_engine.AnalysisEngineProcessExceptionprocess current CAS. In case, abbreviation expansion is turned on, the abbreviation is replaced by its full form which is used during prediction. The labels of this full form are then applied to the original, short form.- Specified by:
processin classorg.apache.uima.analysis_component.JCasAnnotator_ImplBase- Throws:
org.apache.uima.analysis_engine.AnalysisEngineProcessException
-
removeDuplicatedTokens
protected Sentence removeDuplicatedTokens(Sentence unitSentence)
removes duplicate tokens in a unit sentence (i.e., tokens having the same offset position). This is necessary if abbreviations in sentence were expanded for prediction. Then, afterwards, this method needs to be called before writing the prediction into the CAS. When tokens within abbreviation long form differ in their prediction, the outside label is assumed for the abbreviation!
-
createUnitSentence
protected Sentence createUnitSentence(List<Token> tokenList, org.apache.uima.jcas.JCas JCas, ArrayList<HashMap<String,String>> metaList, JCoReCoverIndex<Abbreviation> abbreviationIndex, JCoReCoverIndex<Token> tokenIndex)
Takes all info about meta data and generates the corresponding unit sequence represented by a Sentence object. Abbreviation is expanded when specified in descriptor. Only abbreviations which span over single tokens can be interpreted here. Other case (which is very rare and thus probably not relevant) is ignored!- Parameters:
tokenList- a list of Token objects of the current sentenceJCas- the CAS we are working onmetaList- a Arraylist of meta-info HashMaps which specify the meta information of the respective tokenabbreviationIndex-tokenIndex-- Returns:
- an array of two sequences of units containing all available meta data for the corresponding tokens. In the first sequence, abbreviations are expanded to their fullform. In the second sequence, the tokens are of their original form.
-
writeToCAS
public void writeToCAS(Sentence unitSentence, org.apache.uima.jcas.JCas aJCas, JCoReCoverIndex<Abbreviation> abbreviationIndex)
creates the respective uima annotations from JNET's predictions. Therefore, we loop over JNET's Sentence objects which contain predictions/labels for each Unit (i.e., for each token).- Parameters:
unitSentence- the current Sentence objectaJCas- the cas to write the annotation toabbreviationIndex-
-
ignoreLabel
protected boolean ignoreLabel(org.apache.uima.jcas.JCas aJCas, int start, int end, JCoReCoverIndex<Abbreviation> abbreviationIndex)tests whether annotation should be ignored as this label is on a not introduced abbreviation- Parameters:
aJCas-start-end-abbreviationIndex-coveredText-- Returns:
-
-