public class PennTreebankTokenizer extends Tokenizer_ImplBase
| Constructor and Description |
|---|
PennTreebankTokenizer() |
| Modifier and Type | Method and Description |
|---|---|
String[] |
getTokenTexts(String text)
Tokenizes the input text and returns a string array corresponding to the tokens.
|
getTokenspublic static Pattern ampersandPattern
public static String ampersandRegex
public static Pattern beginOrEndPattern
public static String beginOrEndRegex
public static Pattern bracesPattern
public static String bracesRegex
public static String closedBracesRegex
public static Pattern colonPattern
public static String colonRegex
public static Pattern commaPattern
public static String commaRegex
public static Pattern dashPattern
public static Pattern dollarSignPattern
public static String dollarSignRegex
public static Pattern doubleQuotePattern
public static String doubleQuoteRegex
public static Pattern ellipsisPattern
public static String ellipsisRegex
public static Pattern extraSpacePattern
public static String extraSpaceRegex
public static Pattern multipleWhitespacePattern
public static String multipleWhitespaceRegex
public static Pattern nonFinalPunctPattern
public static String nonFinalPunctRegex
public static Pattern nonPeriodPunctPattern
public static String nonPeriodPunctRegex
public static Pattern oneWordAbbreviationPattern
public static String oneWordAbbreviationRegex
public static String openBracesRegex
public static Pattern periodPattern
public static String periodRegex
public static Pattern quotePattern
public static String quoteRegex
public static Pattern singleQuotePattern
public static String singleQuoteRegex
public static Pattern tAbbreviationPattern
public static String tAbbreviationRegex
public static Pattern[] threeWordAbbreviationPatterns
public static String[] threeWordAbbreviationRegexes
public static Pattern tripleQuotePattern
public static String tripleQuoteRegex
public static Pattern[] twoWordAbbreviationPatterns
public static String[] twoWordAbbreviationRegexes
public PennTreebankTokenizer()
public String[] getTokenTexts(String text)
getTokenTexts in class Tokenizer_ImplBaseCopyright © 2014. All rights reserved.