| Package | Description |
|---|---|
| edu.nyu.jet |
The root Jet package provides the methods for top-level system
control and for the Console.
|
| edu.nyu.jet.aceJet |
The AceJet package provides the classes and methods for the
Automatic
Content Extraction (ACE) evaluation.
|
| edu.nyu.jet.chunk | |
| edu.nyu.jet.format | |
| edu.nyu.jet.hmm |
The HMM package includes the classes for Hidden Markov Models,
and part-of-speech and name taggers implemented using HMMs. A separate
description is available of the overall structure and external representation
of these models, provided for those who wish to modify the models.
|
| edu.nyu.jet.lex |
The Lex Package incorporate the code for reading dictionaries,
looking words up in dictionaries, and tokenizing text.
|
| edu.nyu.jet.ne |
The NE package contains code for annotating extended named entities using a
large dictionary and a set of transformation rules.
|
| edu.nyu.jet.parser |
The Parser package includes several types of parsers (top-down, bottom-up,
and chart).
|
| edu.nyu.jet.pat |
The Pat package encapsulates the basic pattern application
mechanism of Jet, sets of pattern/action rules which can be applied
to a document to add or modify annotations on the document. The external
form of the pattern language is described below; the classes used
to encode these patterns are summarized separately.
|
| edu.nyu.jet.refres |
The Refres package provides the methods for identifying coreference
relations within a document.
|
| edu.nyu.jet.scorer |
The Scorer package provides the classes for scoring an annotated
document against an answer key..
|
| edu.nyu.jet.time |
The Time package contains code for annotating time expressions in text
following the TIMEX2 standard.
|
| edu.nyu.jet.tipster |
The Tipster package provides the basic methods for recording
information about documents. It is loosely based on the 'Tipster
Architecture' developed by R.Grishman as part of the Government-sponsored
Tipster program. The basic objects are Documents and Annotations;
a Document is a container for the text of the document, and a set of Annotations
on the Document.
|
| edu.nyu.jet.util | |
| edu.nyu.jet.zoner |
The Zoner package contains methods for identifying text segments
(sentences, etc.) within the document.
|
| Modifier and Type | Method and Description |
|---|---|
static Document |
JetTest.readDocument(BufferedReader rdr)
read a document from
rdr. |
| Modifier and Type | Method and Description |
|---|---|
static void |
Control.applyScript(Document doc,
Span span,
String script)
apply script
script to span span of
document doc. |
static void |
Control.processDocument(Document doc,
BufferedWriter writer,
boolean viewable,
int docNo)
apply the processDocument script to (all of) document
doc. |
static void |
Control.processSentence(Document doc,
Span sentenceSpan)
apply the processSentence script to span
sentenceSpan
of document doc. |
| Modifier and Type | Method and Description |
|---|---|
Document |
AceDocument.JetDocument() |
| Modifier and Type | Method and Description |
|---|---|
void |
EventTagger.acquirePatterns(Document doc,
AceDocument aceDoc,
String docId)
trains the tagger from document 'doc' and corresponding AceDocument
(APF file) aceDoc.
|
static void |
APFtoXML.addAnnotations(Document doc,
AceDocument aceDoc) |
static boolean |
Ace.allLowerCase(Document doc) |
static boolean |
Ace.allLowerCase(Document doc,
Span span)
return true if either all the letters in span are
lower case, or the fraction of letters which are upper case
exceeds MAX_UPPER.
|
static void |
Ace.buildAceEntities(Document doc,
String docId,
AceDocument aceDoc)
create ACE entities from entity annotations produced by refres.
|
static void |
FindAceValues.buildAceValues(Document doc,
String docId,
AceDocument aceDoc)
adds to AceDocument
aceDoc the Ace Values contained in
Document doc. |
void |
NewEventTagger.collectAnchorsFromDocument(Document doc,
AceDocument aceDoc,
String docPath)
collects the anchors (triggers) from an annotated document.
|
static void |
EventEval.evalEvents(Document doc,
AceDocument aceDoc,
String docId)
evaluate the event tagger on Document
doc. |
void |
EventTagger.evaluatePatterns(Document doc,
AceDocument aceDoc,
String docId)
applies the learned patterns to Document 'doc' and records the
number of times it produced a correct or incorrect event.
|
void |
EventTagger.eventCoref(AceDocument aceDoc,
Document doc,
SyntacticRelationSet relations)
performs coreference on the events in an Ace document.
|
void |
NewEventTagger.eventCoref(AceDocument aceDoc,
Document doc,
SyntacticRelationSet relations)
performs coreference on the events in an Ace document.
|
static void |
LearnRelations.findRelations(String currentDoc,
Document d,
AceDocument aceDoc)
relation 'decoder': using previously learned patterns, identifies
the relations in document 'd' (from file name 'currentDoc') and adds them
as AceRelations to AceDocument 'aceDoc'.
|
static void |
RelationTagger.findRelations(String currentDoc,
Document d,
AceDocument ad)
relation 'decoder': using previously trained models, identifies
the relations in document 'doc' (from file name 'currentDoc') and adds them
as AceRelations to AceDocument 'aceDoc'.
|
static void |
DepPathRelationTagger.findRelations(String currentDoc,
Document d,
AceDocument ad)
relation 'decoder': identifies the relations in document 'doc'
(from file name 'currentDoc') and adds them
as AceRelations to AceDocument 'aceDoc'.
|
static String |
Ace.getDocId(Document doc)
returns the document ID of Document
doc, if found,
else returns null. |
static String |
FindAceValues.getTypeSubtype(Document doc,
Annotation mention)
returns the AceValue type and subtype of a mention: Numeric, Crime,
Sentence, Contact-Info, ...
|
static String |
EDTtype.getTypeSubtype(Document doc,
Annotation entity,
Annotation mention)
returns the EDT type of a mention: PERSON, GPE, ORGANIZATION,
LOCATION, FACILITY, or OTHER (where OTHER indicates that it is not
and EDT mention).
|
static boolean |
EDTtype.hasGenericHead(Document doc,
Annotation mention) |
String |
Gazetteer.locationType(Document doc,
String locationName)
returns "country", "stateorprovince", or "city" as the type
of 'locationName' based on three types of evidence:
- entries in the gazetteer itself
- a coreferential nominal mention, typically from a
construct 'city of X' or 'X province'
- the last token of the location name
|
AceEvent |
EventPattern.match(Span anchorExtent,
String anchor,
Document doc,
SyntacticRelationSet relations,
AceDocument aceDoc)
match an anchor and its context against the event patterns; if the
match is successful, build and return an AceEvent.
|
int |
ChunkPath.matchFromLeft(int posn,
Document doc) |
AceMention |
AcePatternNode.matchFromLeft(int posn,
Document doc,
AceDocument aceDoc)
looks for an entity mention matching the AcePatternNode starting at position
posn in Document doc. |
int |
ChunkPath.matchFromRight(int posn,
Document doc) |
AceMention |
AcePatternNode.matchFromRight(int posn,
Document doc,
AceDocument aceDoc)
looks for an entity mention matching the AcePatternNode ending at position
posn in Document doc. |
int |
AcePatternNode.matchOnHead(int posn,
Document doc,
AceDocument aceDoc)
looks for an entity mention matching the AcePatternNode whose head
begins at position
posn in Document doc. |
static String |
APFtoXML.processDocument(Document doc,
AceDocument aceDoc) |
static AceDocument |
Ace.processDocument(Document doc,
String sourceId,
String sourceFile,
String docPathBase)
process a (Jet) document and create a corresponding AceDocument.
|
void |
EventTagger.tag(Document doc,
AceDocument aceDoc,
String currentDocPath,
String docId)
identify ACE events in Document 'doc' and add them to 'aceDoc'.
|
void |
NewEventTagger.tag(Document doc,
AceDocument aceDoc,
String currentDocPath,
String docId)
identify ACE events in Document 'doc' and add them to 'aceDoc'.
|
void |
PerfectNameTagger.tag(Document doc,
Span span)
tag Span 'span' of Document 'doc' with ENAMEX annotations.
|
void |
PerfectNameTagger.tagDocument(Document doc)
tag the entire Document 'doc' with ENAMEX annotations.
|
static void |
Ace.tagReciprocalRelations(Document doc)
assigns reciprocal relations subject-1 and object-1
|
static boolean |
Ace.titleCase(Document doc,
Span span)
returns true if Span
span of Document doc
appears to be capitalized as a title: if there are no words
beginning with a lower-case letter except for a small list of
function words (articles, possessive pronouns, prepositions, ...). |
void |
NewEventTagger.trainOnDocument(Document doc,
AceDocument aceDoc,
String docPath)
trains the four statistical models on an annotated document.
|
static boolean |
PerfectAce.validMention(Document doc,
Annotation head,
String cat) |
void |
AceDocument.write(PrintWriter w,
Document doc)
writes the AceDocument to 'w' in APF format.
|
| Constructor and Description |
|---|
AceEventAnchor(Span head,
Span jetHead,
String text,
Document doc) |
AceRelationMention(String id,
AceEntityMention arg1,
AceEntityMention arg2,
Document doc) |
ChunkPath(Document doc,
AceMention m1,
AceMention m2)
builds the ChunkPath between two AceMentions.
|
ChunkPath(Document doc,
int from,
int to)
builds the ChunkPath from position
from to position
to in Document doc. |
| Modifier and Type | Method and Description |
|---|---|
void |
ChunkDependencyAnalyzer.analyzeChunkDependency(Document doc,
ParseTreeNode tree) |
void |
TreeBasedChunker.chunk(Document doc,
ParseTreeNode tree) |
static void |
Chunker.chunk(Document doc,
Span span)
adds chunks (annotations of type ng) to Span 'span' of
Document 'doc'.
|
double |
TokenClassifier.getLocalMargin(Document doc,
Annotation[] tokens,
String excludedTag,
int excludedTagStart,
int excludedTagEnd) |
String[] |
MaxEntNE.simpleDecoder(Document doc,
Annotation[] tokens)
assign the best tag for each token using a simple deterministic
left-to-right tagger (which may not find the most probable path).
|
void |
MENameTagger.tag(Document doc,
Span span)
tag span 'span' of Document 'doc' with Named Entity annotations.
|
void |
MENameTagger.tagDocument(Document doc)
tag document
doc for named entities. |
static void |
Onoma.tagDrugs(Document doc,
Span span)
This is a stub which remains from code that was added at SRI's
request for Dovetail in order to tag drug names..
|
static void |
Onoma.tagNames(Document doc,
Span span)
tag names which appear in the onomasticon, adding an ENAMEX annotation
with features TYPE and SUBTYPE.
|
void |
MaxEntNE.train(Document doc,
Annotation[] tokens,
String[] tags)
train the model on a sequence of words from Document doc.
|
abstract void |
TokenClassifier.train(Document doc,
Annotation[] tokens,
String[] tags) |
String[] |
MaxEntNE.viterbi(Document doc,
Annotation[] tokens)
assign the best tag for each token using a Viterbi decoder.
|
abstract String[] |
TokenClassifier.viterbi(Document doc,
Annotation[] tokens) |
| Modifier and Type | Method and Description |
|---|---|
Document |
Treebank.getDocument() |
| Modifier and Type | Method and Description |
|---|---|
void |
PTBReader.addAnnotations(List<ParseTreeNode> trees,
Document doc,
String targetAnnotation,
Span span,
boolean jetCategories)
Adds constit annotations to an existing Document
doc to
represent the parse tree structure of a set of trees trees. |
void |
PTBReader.addAnnotations(List<ParseTreeNode> trees,
List<Integer> offsets,
Document doc,
String targetAnnotation,
Span span,
boolean jetCategories)
Adds constit annotations to an existing Document
doc to
represent the parse tree structure of a set of trees trees. |
void |
PTBReader.addAnnotations(ParseTreeNode tree,
Document doc,
Span span,
boolean jetCategories)
Adds constit annotations to an existing Document
doc to
represent the parse tree structure tree. |
| Constructor and Description |
|---|
Treebank(Document doc,
List<ParseTreeNode> parseTreeList) |
| Modifier and Type | Method and Description |
|---|---|
void |
HMMannotator.annotate(Document doc)
use the HMM to add annotations to Document 'doc'.
|
void |
HMMTagger.annotate(Document doc,
Span span,
String type)
tag 'span' of 'doc' according to the Penn Tree Bank tag set.
|
void |
HMMannotator.annotateNbest(Document doc,
int n,
String hypId)
use the HMM to add N-best annotations to Document 'doc'.
|
void |
HMMannotator.annotateSpan(Document doc,
Span textSpan)
use the HMM to add annotations to Span 'textSpan' of Document 'doc'.
|
ArrayList |
HMMannotator.annotateSpanNbest(Document doc,
Span textSpan,
int n,
String hypId)
use the HMM to add annotations to Span 'textSpan' of Document 'doc'.
|
double |
HMM.getLocalMargin(Document doc,
Annotation[] tokens,
String excludedTag,
int excludedTagStart,
int excludedTagEnd)
returns the margin for assigning a particular tag to a sequence of
tokens.
|
static boolean |
HMMNameTagger.inZone(Document doc,
Span span,
String zoneType)
returns 'true' if Span 'span' is enclosed in an annotation of type
'zoneType'.
|
void |
HMMTagger.prune(Document doc,
Span span)
prune existing 'constit' annotations on 'span' of 'doc' using information
from a part-of-speech tagger.
|
static void |
Retagger.pruneConstit(Document d,
Span zone)
prunes constit annotations obtained from lexical look-up
using Penn tags (recorded as tagger annotations).
|
void |
HMMTagger.score(Document doc,
Document key)
compare the 'constit' tags of Documents 'doc' and 'key', and report (to
System.out) the agreement rate.
|
void |
HMMNameTagger.tag(Document doc,
Span span)
tag span 'span' of Document 'doc' with Named Entity annotations.
|
ArrayList |
XNameTagger.tag(Document doc,
Span span,
String sentno)
tag span 'span' of Document 'doc' with N-best Named Entity annotations.
|
void |
HMMNameTagger.tagDocument(Document doc) |
void |
HMMTagger.tagJet(Document doc,
Span span)
tag 'span' of 'doc' according to the Jet part of speech set.
|
void |
HMMTagger.tagPenn(Document doc,
Span span)
tag 'span' of 'doc' according to the Penn Tree Bank tag set.
|
static void |
HMMNameTagger.tagPersonZone(Document doc,
Span span,
HMMannotator annotator) |
void |
HMMannotator.train(Document doc)
use the annotations on Document 'doc' to train the HMM.
|
void |
HMM.train(Document doc,
Annotation[] tokens,
String[] tags)
a slower algorithm for training the HMM.
|
void |
HMM.train0(Document doc,
Annotation[] tokens,
String[] tags)
a fast, simple algorithm for training the HMM.
|
void |
HMMannotator.trainOnSpan(Document doc,
Span textSpan)
use the annotations on Span 'span' of Document 'doc' to train the HMM.
|
String[] |
HMM.viterbi(Document doc,
Annotation[] tokens)
a Viterbi decoder for HMMs.
|
int[] |
HMM.viterbiPath(Document doc,
Annotation[] tokens)
a Viterbi decoder for HMMs.
|
| Modifier and Type | Method and Description |
|---|---|
static int |
Lexicon.annotateWithDefinitions(Document doc,
int posn)
annotateWithDefinitions looks for the longest defined lexical item
consisting of the tokens starting at position posn; if such
an item is found, then for each definition of this item, an
annotation of type constit is added to the item, with the
item's definition as its attributes.
|
static void |
Lexicon.annotateWithDefinitions(Document doc,
int start,
int end) |
static Annotation[] |
Tokenizer.gatherTokens(Document doc,
Span span)
returns an array containing all token annotations in
span of doc. |
static String[] |
Tokenizer.gatherTokenStrings(Document doc,
Span span)
returns an array of Strings corresponding to all the tokens
in
span of doc. |
int |
LexicalEntry.matches(Document doc,
int posn)
determines whether the lexical entry matches the tokens in
Document doc, starting at position posn.
|
static int |
Tokenizer.skipWS(Document doc,
int posn,
int end)
advances to the next non-whitespace character in a document.
|
static int |
Tokenizer.skipWSX(Document doc,
int posn,
int end)
advances to the next non-whitespace character in a document,
skipping any XML tags.
|
void |
Stemmer.tagStem(Document doc,
Span span)
Added stem feature to each token annotation if token text and stem are
difference.
|
static void |
Tokenizer.tokenize(Document doc,
Span span)
tokenizes the portion of Document doc covered by span.
|
static void |
Tokenizer.tokenizeOnWS(Document doc,
Span span)
tokenizes portion 'span' of 'doc', splitting only on white space.
|
| Modifier and Type | Method and Description |
|---|---|
boolean |
PartOfSpeechRule.accept(Document doc,
Annotation[] tokens,
int n) |
boolean |
RegexpRule.accept(Document doc,
Annotation[] tokens,
int pos) |
boolean |
NamedEntityRule.accept(Document doc,
Annotation[] tokens,
int n) |
boolean |
StringRule.accept(Document doc,
Annotation[] tokens,
int n) |
boolean |
MatchRule.accept(Document doc,
Annotation[] tokens,
int n) |
boolean |
ClassRule.accept(Document doc,
Annotation[] tokens,
int n,
ClassHierarchyResolver resolver) |
boolean |
TransformRule.accept(Document doc,
Annotation[] tokens,
int pos,
ClassHierarchyResolver resolver)
determines whether the left-hand side of the rule matches the tokens
beginning with token[pos].
|
boolean |
MatchRuleItem.accept(Document doc,
Annotation[] tokens,
int pos,
ClassHierarchyResolver resolver) |
void |
NameAnnotator.annotate(Document doc)
annotate document
doc with named entity (ENAMEX)
annotations using the dictionary and rules of the ENE tagger. |
void |
DictionaryTagger.annotate(Document doc)
look up all the tokens in 'doc' in the ENE dictionary.
|
void |
ClassAnnotator.annotate(Document doc,
Span span) |
void |
NameAnnotator.annotate(Document doc,
Span span)
annotate the text in
span with named entity (ENAMEX)
annotations using the dictionary and rules of the ENE tagger. |
void |
CRFNameTagger.annotate(Document doc,
Span span) |
void |
DictionaryTagger.annotate(Document doc,
Span span)
look up the tokens in
span in the ENE dictionary and
record the results on the NE_INTERNAL annotations for these tokens. |
void |
TransformRules.apply(Document doc,
Span span)
applies the transformation rules to 'span'.
|
void |
Evaluator.evaluate(Document systemOut,
Document goldOut) |
static void |
NamedEntityUtil.packNamedEntity(Document doc,
Span span,
String system)
create ENAMEX annotations from the NE_INTERNAL annotations used internally
by the Extended Named Entity annotator.
|
static void |
NamedEntityUtil.packNamedEntity(Document doc,
String system) |
static void |
NamedEntityUtil.splitToNamedEntity(Document doc) |
static void |
NamedEntityUtil.splitToNamedEntity(Document doc,
Span span)
create NE_INTERNAL annotations for use by the Extended Named Entity
annotator.
|
void |
TransformRule.transform(Document doc,
Annotation[] tokens,
int pos)
applies the transformation (right-hand part) of the rule to the tokens
starting with token[pos].
|
| Modifier and Type | Method and Description |
|---|---|
void |
CRFNameTagger.train(Collection<Document> docs) |
| Constructor and Description |
|---|
DocumentToSentenceIterator(Document doc,
String textSegmentName) |
DocumentToSentenceIterator(Document doc,
String textSegmentName,
int firstIndex) |
| Modifier and Type | Method and Description |
|---|---|
void |
SyntacticRelationSet.addRelations(Document doc)
search Document
doc for syntactic relatioms (encoded
as features on annotations) and add them to the SyntacticRelationSet. |
static void |
AddSyntacticRelations.annotate(Document doc,
Span span)
annotate the constituents of document 'doc' within Span 'span'.
|
static void |
StatParser.buildParserInput(Document doc,
int start,
int end,
boolean setPOS)
build the arrays 'words', 'spans', 'wordDefns', and 'pennPOS' for
the parser:
words[i] = the i-th word string, for the PTB parser
(normalized to PTB form, such as -LRB- for '('
spans[i] = the span of the i-th word string
wordDefns[i] = the Jet word defn of the i-th sentence element
(if there are several defns, takes the first one)
pennPOS[i] = the PTB POS for the i-th sentence element
|
static Annotation |
StatParser.buildWordDefn(Document doc,
String word,
Span span,
Annotation wordDefn,
String pennPOS) |
static Vector |
Parsers.BUParse(Document doc,
int posn,
int end,
Grammar gram)
apply a bottom-up ('immediate constituent') parser to characters
posn to end of Document using grammar gram.
|
static Vector |
Parsers.chartParse(Document doc,
int posn,
int end,
Grammar gram)
apply a top-down active chart parser to characters
posn to end of Document doc using grammar gram.
|
static void |
StatParser.clearInputAnnotations(Document doc)
for ACE: erase all the characters within ANNOTATION ...
|
static void |
StatParser.deleteUnusedConstits(Document doc,
Span span,
Annotation rootAnnotation)
deletes all annotations of type 'constit' within span 'span' of
Document 'doc' which are not descendants of 'rootAnnotation'.
|
static void |
StatParser.fixHyphenatedItems(Document doc)
for hyphenated forms X-Y, which are treated as three separate tokens
by the ACE tokenizer, create a single constituent with category 'hyphword'
and Penn POS JJ.
|
static String |
SynFun.getHead(Document doc,
Annotation ann)
returns the head string of constituent 'ann' in a parse tree.
|
static String |
SynFun.getName(Document doc,
Annotation constit)
returns the name associated with a noun phrase, as a single
string, or null if the np does not have a name.
|
static String |
SynFun.getNameOrHead(Document doc,
Annotation ann)
if the head (the end of the 'headC' chain) of constituent 'ann'
is a name, return the name itself (with tokens connected by '-');
otherwise return the head as determined by 'getHead'.
|
static Annotation |
ParseTreeNode.makeParseAnnotations(Document doc,
ParseTreeNode n)
given a parse tree in the form of nested ParseTreeNodes, adds an
Annotation of type 'constit' to Document 'doc' for each non-terminal node
in the tree.
|
static Vector |
Parsers.parse(Document doc,
int posn,
int end,
Grammar gram)
parse characters posn to end of Document using
grammar gram.
|
static ParseTreeNode |
StatParser.parse(Document doc,
Span span)
parse the sentence in 'span' of Document 'doc'.
|
static SyntacticRelationSet |
DepParser.parseDocument(Document doc)
parse all the sentences in Document 'doc', returning a
SyntacticRelationSet containing all the dependency relations.
|
static void |
DepParser.parseSentence(Document doc,
Span span,
SyntacticRelationSet relations)
generate the dependency parse for a sentence, adding its arcs to
'relations'.
|
static boolean |
Parsers.recognize(Document doc,
int posn,
int end,
Grammar gram)
apply a top-down recognizer to characters posn to end
of Document using grammar gram.
|
static Vector |
Parsers.TDParse(Document doc,
int posn,
int end,
Grammar gram)
apply a top-down backtracking parser to characters posn to
end of Document using grammar gram.
|
static void |
ParseTreeNode.terminalToToken(Document doc,
ParseTreeNode node) |
| Modifier and Type | Method and Description |
|---|---|
void |
PatternSet.apply(Document doc)
applies the rules in the PatternSet to the entire document.
|
void |
PatternSet.apply(Document doc,
Span span) |
void |
PatternCollection.apply(String patternSetName,
Document doc)
applies the rules in the named PatternSet to the document.
|
void |
PatternCollection.apply(String patternSetName,
Document doc,
Span span)
applies the rules in the named PatternSet to the specified span.
|
void |
FinalPatternNode.eval(Document doc,
int posn,
HashMap bindings,
PatternApplication patap)
Method invoked when this node is reached during pattern matching;
records the actions to be performed.
|
abstract void |
PatternNode.eval(Document doc,
int posn,
HashMap bindings,
PatternApplication patap) |
void |
InternalPatternNode.eval(Document doc,
int posn,
HashMap bindings,
PatternApplication patap) |
void |
PatternArc.eval(Document doc,
int posn,
String tokenString,
HashMap bindings,
PatternApplication patap) |
void |
GetStartPatternElement.eval(Document doc,
int posn,
String tokenString,
HashMap bindings,
PatternApplication patap,
PatternNode node) |
void |
GetEndPatternElement.eval(Document doc,
int posn,
String tokenString,
HashMap bindings,
PatternApplication patap,
PatternNode node) |
void |
AssignmentPatternElement.eval(Document doc,
int posn,
String tokenString,
HashMap bindings,
PatternApplication patap,
PatternNode node) |
void |
UndefinedCapPatternElement.eval(Document doc,
int posn,
String tokenString,
HashMap bindings,
PatternApplication patap,
PatternNode node) |
void |
AnnotationPatternElement.eval(Document doc,
int posn,
String tokenString,
HashMap bindings,
PatternApplication patap,
PatternNode node) |
void |
TokenStringPatternElement.eval(Document doc,
int posn,
String tokenString,
HashMap bindings,
PatternApplication patap,
PatternNode node) |
abstract void |
AtomicPatternElement.eval(Document doc,
int posn,
String tokenString,
HashMap bindings,
PatternApplication patap,
PatternNode node) |
void |
NullPatternElement.eval(Document doc,
int posn,
String tokenString,
HashMap bindings,
PatternApplication patap,
PatternNode node) |
void |
IntegerPatternElement.eval(Document doc,
int posn,
String tokenString,
HashMap bindings,
PatternApplication patap,
PatternNode node) |
String |
StringExpression.evaluate(Document doc,
PatternApplication patap)
evaluates the StringExpression at the time when print or write
action is performed.
|
static void |
NewAnnotationAction.hideAnnotations(Document doc,
String type,
Span span)
hides (adds the 'hidden' feature) to all annotations of type type
beginning at the starting position of span span.
|
int |
AddFeaturesAction.perform(Document doc,
PatternApplication patap)
performs the action, adding the specified Annotation.
|
int |
PrintAction.perform(Document doc,
PatternApplication patap)
performs the "print" action, writing the message to the Console.
|
abstract int |
Action.perform(Document doc,
PatternApplication patap) |
int |
WriteAction.perform(Document doc,
PatternApplication patap)
performs the "write" action, writing the message to standard output.
|
int |
NewAnnotationAction.perform(Document doc,
PatternApplication patap)
performs the action, adding the specified Annotation.
|
| Constructor and Description |
|---|
PatternApplication(Document doc,
int start) |
| Modifier and Type | Method and Description |
|---|---|
static void |
CorefFilter.buildEntitiesFromLinkedMentions(Document doc)
buildEntitiesFromLinkedMentions takes a Document with annotations
following the MUC standard (linked-mention representation)
coref id=mention-ID ref=prior-mention-ID and generates the internal representation of coreference, with annotations of the form entity mentions=Vector(mentions) |
static void |
CorefFilter.buildEntitiesFromMentions(Document doc)
buildEntitiesFromMentions takes a Document with coreference marked
in numbered-entity form, with Annotations
of the form
mention entity=entity-ID where coreferential entities share the same entity-ID, and generates annotations of the form entity mentions=Vector(mentions) |
static void |
CorefFilter.buildLinkedMentionsFromEntities(Document doc)
buildLinkedMentionsFromEntities takes a Document with coreference information
in the form of mention attributes on entities and generates Annotations
of the form
COREF ID="n1" REF="n2" over the heads of mentions, where coreferential mentions are linked by having the same entity-ID. |
static void |
CorefFilter.buildMentionsFromEntities(Document doc)
buildMentionsFromEntities takes a Document with coreference information
in the form of mention attributes on entities and generates Annotations
of the form
mention entity=entity-ID over the heads of mentions, where coreferential mentions are linked by having the same entity-ID. |
static void |
CorefCompare.compareDocuments(Document response,
Document key)
compare the entity annotations (coreference) in Documents
'response' and 'key', updating Document 'response'.
|
static int |
Hobbs.distance(Document doc,
Annotation m1,
Annotation m2,
ArrayList<Annotation> antecedents,
Vector sentences)
computes the distance (number of mention nodes traversed) in a Hobbs search
starting from parse tree node 'm2' and searching backwards for parse
tree node 'm1'.
|
static Vector |
CorefScorer.findMentions(Document doc)
return a Vector of all the mentions in the document (the union of
the 'mentions' feature of all entities).
|
static Vector<Annotation> |
Resolve.gatherClauses(Document doc,
Span span)
returns the set of all clauses (constituents of category s, rn-wh,
or rn-vingo) within Span
span of Document doc. |
static Vector<Annotation> |
Resolve.gatherMentions(Document doc,
Span span)
returns the set of all mentions -- constituents which are
subject to reference resolution.
|
static HashMap<Annotation,Annotation> |
Resolve.gatherSyntacticCoref(Document doc,
Vector<Annotation> mentions,
Vector<Annotation> clauses)
gatherSyntacticCoref looks for particular syntactic patterns in the
text which indicate coreference, and returns a Map with one entry
for each such syntactic coreference, linking the anaphor to the
antecedent.
|
static String[] |
Resolve.getHeadTokens(Document doc,
Annotation constit) |
static String[] |
Resolve.getNameTokens(Document doc,
Annotation constit)
returns the name associated with a noun phrase, as an array of token
strings, or null if the np does not have a name.
|
static boolean |
Resolve.matchPronoun(Document doc,
Annotation anaphor,
String mentionHead,
Annotation ent)
return true if pronoun 'mentionHead' is a possible anaphor for
entity 'ent' (this also includes possessive pronouns of category
'det', and headless noun phrases of category 'np').
|
static float |
MaxEntResolve.matchPronoun(Document doc,
Annotation anaphor,
String pronoun,
Annotation entity,
boolean parse,
ArrayList<Annotation> antecedents)
return the probability that pronoun 'pronoun' is a possible anaphor for
entity 'ent' (this also includes possessive pronouns of category
'det', and headless noun phrases of category 'np').
|
static boolean |
Resolve.nameNomCoref(Document doc,
String det,
String mentionHead,
Annotation mention,
Annotation entity)
return true if a common noun phrase headed by 'mentionHead' is a possible
anaphoric reference to the (named) entity 'entity'.
|
static boolean |
Resolve.nomInName(Document doc,
Annotation mention,
Annotation entity) |
static void |
MaxEntResolve.references(Document doc,
Span span)
Resolve.references resolves the mentions (noun groups) in
span of Document doc. |
static void |
Resolve.references(Document doc,
Span span)
Resolve.references resolves the mentions (noun groups) in
span of Document doc. |
static void |
Resolve.references(Document doc,
Span span,
Vector<Annotation> mentions,
Vector<Annotation> clauses) |
static void |
MaxEntResolve.references(Document doc,
Span span,
Vector mentions,
Vector clauses) |
abstract void |
DocumentScorer.score(Document responseDoc,
Document keyDoc)
compute a coreference score between two documents,
responseDoc and keyDoc. |
void |
CorefScorer.score(Document responseDoc,
Document keyDoc)
compare the two documents,
responseDoc and
keyDoc, setting recall and
precision. |
void |
PronounScorer.score(Document responseDoc,
Document keyDoc)
compare the two documents,
responseDoc and
keyDoc, setting accuracy and
overallAccuracy. |
static void |
MaxEntResolve.train(Document doc)
trains coreference from a document
doc marked with
coref tags. |
static void |
MaxEntResolve.trainOnMention(Document doc,
Annotation mention)
add information on mention
mention and its possible
antecedents to the training data which will be used to train the
coreference model. |
static void |
Resolve.updateEvents(Document doc,
Span span,
Map mentionToEntity)
updates events based on reference resolution.
|
| Constructor and Description |
|---|
EntityView(Document doc,
int docNo)
Creates an EntityView for Document
doc. |
| Modifier and Type | Field and Description |
|---|---|
Document |
SGMLScorer.doc1 |
Document |
SGMLScorer.doc2 |
| Modifier and Type | Method and Description |
|---|---|
static Document |
SGMLProcessor.sgmlToDoc(Document doc,
String tag)
Takes a
Document doc whose text contains
SGML markup; deletes all existing annotations and returns the
doc with tag tags removed from the text and tag
annotations added to the document. |
static Document |
SGMLProcessor.sgmlToDoc(Document doc,
String[] tags) |
static Document |
SGMLProcessor.sgmlToDoc(Document doc,
String sgmlText,
String tag) |
static Document |
SGMLProcessor.sgmlToDoc(Document doc,
String sgmlText,
String[] tags) |
static Document |
SGMLProcessor.sgmlToDoc(String sgmlText,
String tag)
Converts an SGML-marked String sgmlText to a
Document
instance with tag tags removed from the text and tag
annotations added to the document. |
static Document |
SGMLProcessor.sgmlToDoc(String sgmlText,
String[] tags)
Converts an SGML-marked String sgmlText to a
Document
instance with tags tags removed from the text and tags
annotations added to the document. |
| Modifier and Type | Method and Description |
|---|---|
static void |
SGMLProcessor.dereference(Document doc)
convert all references to Annotations appearing as features of
other annotations from their string form ("#nnnn", where nnnn
is the id of the Annotation being references) to actual pointers
to Annotations.
|
static Document |
SGMLProcessor.sgmlToDoc(Document doc,
String tag)
Takes a
Document doc whose text contains
SGML markup; deletes all existing annotations and returns the
doc with tag tags removed from the text and tag
annotations added to the document. |
static Document |
SGMLProcessor.sgmlToDoc(Document doc,
String[] tags) |
static Document |
SGMLProcessor.sgmlToDoc(Document doc,
String sgmlText,
String tag) |
static Document |
SGMLProcessor.sgmlToDoc(Document doc,
String sgmlText,
String[] tags) |
void |
NameTagger.tag(Document doc,
Span span) |
void |
NameTagger.tagDocument(Document doc) |
| Constructor and Description |
|---|
SGMLScorer(Document doc1,
Document doc2)
Construct an
SGMLScorer to compare two Documents. |
| Modifier and Type | Method and Description |
|---|---|
void |
NumberAnnotator.annotate(Document doc)
Annotates number expression and normalize value.
|
void |
NumberAnnotator.annotate(Document doc,
Span span)
Annotates number expression and normalize value.
|
void |
TimeAnnotator.annotate(Document doc,
Span span,
org.joda.time.DateTime ref)
annotate the time expressions in 'span' with TIMEX2 annotations.
|
void |
ScriptRule.apply(Document doc,
List<Object> values,
Span span,
org.joda.time.DateTime ref) |
abstract void |
TimeRule.apply(Document doc,
List<Object> values,
Span span,
org.joda.time.DateTime ref) |
void |
SimpleTimeRule.apply(Document doc,
List<Object> values,
Span span,
org.joda.time.DateTime ref) |
PatternMatchResult |
DayOfWeekPattern.match(Document doc,
List<Annotation> tokens,
int offset) |
PatternMatchResult |
NumberPattern.match(Document doc,
List<Annotation> tokens,
int offset) |
PatternMatchResult |
RegexPattern.match(Document doc,
List<Annotation> tokens,
int offset) |
PatternMatchResult |
MonthPattern.match(Document doc,
List<Annotation> tokens,
int offset) |
abstract PatternMatchResult |
PatternItem.match(Document doc,
List<Annotation> tokens,
int offset)
if tokens[offset] matches the Pattern Item, return a PatternMatchResult
containing the normalized value of the matched token along with
the span of the matched token, else
null. |
PatternMatchResult |
TimePattern.match(Document doc,
List<Annotation> tokens,
int offset)
if the tokens beginning at token[offset] constitute a time expression,
return a PatternMatchResult incorporating that time expression and
its span; otherwise return
null. |
PatternMatchResult |
StringPattern.match(Document doc,
List<Annotation> tokens,
int offset) |
Span |
TimeRule.matches(Document doc,
List<Annotation> tokens,
int offset,
org.joda.time.DateTime ref,
List<Object> values)
matches the pattern portion of the current TimeRule against the sequence
of
tokens in doc starting with |
static void |
TimeMain.processDocument(Document doc)
determines the reference time and adds TIMEX2 annotations to all the
TEXT fields of document
doc. |
| Modifier and Type | Class and Description |
|---|---|
class |
ExternalDocument
a Document associated with a file.
|
| Modifier and Type | Field and Description |
|---|---|
protected Document |
View.document |
| Modifier and Type | Method and Description |
|---|---|
Document |
Span.document()
Returns the Document associated with a Span.
|
| Modifier and Type | Method and Description |
|---|---|
boolean |
AnnotationTool.annotateDocument(Document doc,
Span annotationZone)
display annotation tool with Document 'doc', allowing user to
add annotations within Span 'annotationZone' of the document.
|
int |
Span.endNoWS(Document doc)
Returns the end of the span, after trimming any white space at the
end of the span.
|
void |
Span.setDocument(Document doc)
Sets the Document associated with a Span.
|
| Constructor and Description |
|---|
Document(Document doc) |
View(Document doc,
int docNo)
Creates a new View.
|
| Modifier and Type | Method and Description |
|---|---|
static Document |
AceUtils.loadAnnotatedDocument(File file)
Reads annotation information written in APF format and construct
Jet.Tipster.Document object. |
| Modifier and Type | Method and Description |
|---|---|
static Collection<Document> |
AceUtils.loadAnnotatedDocumentsFromDirectory(File dir)
Reads annotation information files written in APF format and constructs
Jet.Tipster.Document objects from a specified directory. |
| Modifier and Type | Method and Description |
|---|---|
static void |
AceUtils.writeNamedEntities(Document doc,
Writer out) |
| Modifier and Type | Method and Description |
|---|---|
static void |
SpecialZoner.findDateline(Document doc,
int textOffset,
String text)
finds datelines in newswire texts, and marks them with a
dateline annotation.
|
static void |
SpecialZoner.findSpecialZones(Document doc)
marks datelines and textBreaks (blank lines and rules) within the
TEXT annotation of Document 'doc'.
|
static void |
SpecialZoner.findTextBreaks(Document doc,
int textOffset,
String text)
finds text breaks marked by double blank lines or by lines consisting entirely
of "-", "~", and "_" characters, and marks the line with a textBreak
annotation.
|
static void |
SpeechSplitter.split(Document doc,
Span textSpan) |
static void |
SentenceSplitter.split(Document doc,
Span textSpan)
splits the text in textSpan into sentences, adding sentence
annotations to the document.
|
| Constructor and Description |
|---|
SentenceSet(Document doc)
create a new SentenceSet from a document by retrieving all the
sentence annotations.
|
Copyright © 2016 New York University. All rights reserved.