public class Normalizer extends Object
| Modifier and Type | Field and Description |
|---|---|
static Pattern |
apostrophe |
static Pattern |
doubleAsciiQuote |
static Pattern |
doubleAsciiQuoteAlphaNumeric |
static Pattern |
ellipsis |
static Pattern |
invertSingleAsciiQuote |
static Pattern |
leftDoubleQuote |
static Pattern |
leftSingleQuote |
static Pattern |
longDash |
static Pattern |
oneFourth |
static Pattern |
oneHalf |
static Pattern |
oneThird |
static Pattern |
rightDoubleQuote |
static Pattern |
rightSingleQuote |
static Pattern |
singleAsciiQuote |
static String |
THREE_DOTS |
static Pattern |
threeQuarters |
static String |
TO_ASCII_SINGLE_QUOTE |
static Pattern |
toAsciiDoubleQuote |
static Pattern |
toAsciiSingleQuote |
static Pattern |
twoThirds |
| Modifier and Type | Method and Description |
|---|---|
static void |
convertNonCanonicalStrings(List<Token> sentence,
String lang)
Converts non-unicode and other strings into their unicode
counterparts.
|
static void |
normalizeDoubleQuotes(List<Token> sentence,
String lang)
Normalizes double and ambiguous quotes according to language
and corpus.
|
static void |
normalizeQuotes(List<Token> sentence,
String lang)
Normalizes non-ambiguous quotes according to language and corpus.
|
public static final String THREE_DOTS
public static final Pattern ellipsis
public static final Pattern longDash
public static final Pattern oneFourth
public static final Pattern oneThird
public static final Pattern oneHalf
public static final Pattern twoThirds
public static final Pattern threeQuarters
public static final Pattern apostrophe
public static final Pattern leftSingleQuote
public static final Pattern rightSingleQuote
public static final Pattern leftDoubleQuote
public static final Pattern rightDoubleQuote
public static final Pattern singleAsciiQuote
public static final Pattern invertSingleAsciiQuote
public static final Pattern doubleAsciiQuote
public static final Pattern doubleAsciiQuoteAlphaNumeric
public static final String TO_ASCII_SINGLE_QUOTE
public static final Pattern toAsciiSingleQuote
public static final Pattern toAsciiDoubleQuote
public static void convertNonCanonicalStrings(List<Token> sentence, String lang)
sentence - the list of tokenslang - the languagepublic static void normalizeQuotes(List<Token> sentence, String lang)
sentence - the list of tokenslang - the languageCopyright © 2016 IXA pipes. All rights reserved.