See: Description
| Class | Description |
|---|---|
| Annotation |
An Annotation assigns a type and a set of features to a portion of a
Document.
|
| AnnotationColor |
provides a mechanism for associating particular highlighting colors with
particular annotation types in Document displays.
|
| AnnotationTool |
a tool for manually adding annotations to a Document.
|
| CollectionAnnotationTool |
a tool for displaying a collection and allowing the AnnotationTool to
be invoked on documents in the collection.
|
| CollectionView |
display of a DocumentCollection, with buttons to select views of
individual Documents.
|
| Document |
Document provides a container for the text of a document and the annotations
on a document.
|
| DocumentCollection |
a set of ExternalDocuments.
|
| ExternalDocument |
a Document associated with a file.
|
| MakeCollection |
MakeCollection input-file collection-file collection-directory
converts a single file with multiple documents into a Jet collection
input-file: a concatenated collection of documents in a single
file, each beginning with
|
| Span |
A portion of a document, represented by its starting and ending
character positions, and a pointer to the document.
|
| View |
displays a
Document with its annotations. |
In the course of processing, the Jet system builds up a lot of information about the words and phrases in a Document: simple things like parts-of-speech for individual words and type information (person/company/location) for names, as well as more complex things like phrases and clauses (with internal structure). We want to have a single class of object for capturing all of this information and associating it with a Document. The class we use for this purpose is the Annotation. An Annotation is associated with a Span (substring) of the text of a Document. The Annotation has a type and a set of features with values. For example, an annotation can indicate that a portion of a document is a sentence, or is a token with a given part-of-speech. More complex structures can be build by having Annotations which point to other annotations.
A Document is processed in a series of stages, such as tokenization, sentence splitting, dictionary look-up, pattern matching, etc. Each stage uses the Annotations placed on the Document by previous stages, and adds its own Annotations to the Document.
Annotations provide a mark-up capability very similar to that of SGML or XML (although Annotations do not have to be nested the way SGML/XML mark-up it). The Document class provides a method for converting selected Annotations on a Document to XML mark-up, and in the future will have a method for converting XML mark-up to Annotations. In addition, the Document class provides a method for viewing a Document and highlighting selected annotations (this is very primitive at present).
Copyright © 2016 New York University. All rights reserved.