public class PTBReader extends Object
| Constructor and Description |
|---|
PTBReader() |
| Modifier and Type | Method and Description |
|---|---|
void |
addAnnotations(List<ParseTreeNode> trees,
Document doc,
String targetAnnotation,
Span span,
boolean jetCategories)
Adds constit annotations to an existing Document
doc to
represent the parse tree structure of a set of trees trees. |
void |
addAnnotations(List<ParseTreeNode> trees,
List<Integer> offsets,
Document doc,
String targetAnnotation,
Span span,
boolean jetCategories)
Adds constit annotations to an existing Document
doc to
represent the parse tree structure of a set of trees trees. |
void |
addAnnotations(ParseTreeNode tree,
Document doc,
Span span,
boolean jetCategories)
Adds constit annotations to an existing Document
doc to
represent the parse tree structure tree. |
List<Integer> |
getOffsets() |
Treebank |
load(File file)
Builds Document object from Penn treebank corpus.
|
Treebank |
load(File file,
String encoding)
Builds Document object from Penn treebank corpus.
|
Treebank |
load(Reader in)
Builds Jet.Tipster.Document object from Penn treebank corpus.
|
List<ParseTreeNode> |
loadParseTrees(File file) |
List<ParseTreeNode> |
loadParseTrees(Reader in)
Loads parse tree corpus from Penn Treebank corpus.
|
static void |
main(String[] args)
converts a set of Penn TreeBank files into text documents.
|
void |
setAddingToken(boolean b)
Sets a adding tokens automatically or not.
|
void |
setBackslashAsEscapeCharacter(boolean b)
Sets a backslash is treated as escape character or not.
|
public void addAnnotations(ParseTreeNode tree, Document doc, Span span, boolean jetCategories)
doc to
represent the parse tree structure tree.tree - the parse tree (for a portion of Document doc)doc - the documentspan - the portion of doc covered by the parse treejetCategories - if true, use Jet categories as terminal categories
(if false, use categories read from parse trees)public void addAnnotations(List<ParseTreeNode> trees, Document doc, String targetAnnotation, Span span, boolean jetCategories)
doc to
represent the parse tree structure of a set of trees trees.trees - list of parse treesdoc - document to which annotations should be addedtargetAnnotation - name of annotation to determine spans to add parse tree
annotations.span - target span.jetCategories - if false, use lexical categories from Penn Tree Bank; if
true, use categories from Jetpublic void addAnnotations(List<ParseTreeNode> trees, List<Integer> offsets, Document doc, String targetAnnotation, Span span, boolean jetCategories)
doc to
represent the parse tree structure of a set of trees trees.
This version is provided for parse tree files which include sentence
offsets.trees - list of parse treesoffsets - list of the starting position (in doc) of the text
corresponding to each parse treedoc - document to which annotations should be addedtargetAnnotation - name of annotation to get 'parse' feature pointing
to parse treespan - target span.jetCategories - if false, use lexical categories from Penn Tree Bank; if
true, use categories from Jetpublic List<ParseTreeNode> loadParseTrees(Reader in) throws IOException, InvalidFormatException
This method loads the parse trees, but not determine annotation span and not set annotation.
Also sets offsets to a list of the sentence offsets,
if they are encoded as comments preceding each tree.
in - the Reader from which the Penn Trees are readIOExceptionInvalidFormatExceptionpublic List<ParseTreeNode> loadParseTrees(File file) throws IOException, InvalidFormatException
IOExceptionInvalidFormatExceptionpublic Treebank load(Reader in) throws IOException, InvalidFormatException
in - IOExceptionInvalidFormatExceptionpublic Treebank load(File file) throws IOException, InvalidFormatException
file - IOExceptionInvalidFormatExceptionpublic Treebank load(File file, String encoding) throws IOException, InvalidFormatException
file - encoding - IOExceptionInvalidFormatExceptionpublic void setBackslashAsEscapeCharacter(boolean b)
b - public void setAddingToken(boolean b)
b - Copyright © 2016 New York University. All rights reserved.