public final class OpenKoreanTextProcessorJava extends Object
| Constructor and Description |
|---|
OpenKoreanTextProcessorJava() |
| Modifier and Type | Method and Description |
|---|---|
static void |
addNounsToDictionary(List<String> words)
Add user-defined words to the noun dictionary.
|
static String |
detokenize(List<String> tokens)
Detokenize the input list of words.
|
static List<KoreanPhraseExtractor.KoreanPhrase> |
extractPhrases(scala.collection.Seq<KoreanToken> tokens,
boolean filterSpam,
boolean includeHashtags)
Extract phrases from Korean input text
|
static CharSequence |
normalize(CharSequence text)
Normalize Korean text
그랰ㅋㅋㅋㅋㅋㅋ -> 그래ㅋㅋ
|
static List<Sentence> |
splitSentences(CharSequence text)
Split input text into sentences.
|
static scala.collection.Seq<KoreanToken> |
tokenize(CharSequence text)
Tokenize with the builder options.
|
static List<KoreanTokenJava> |
tokensToJavaKoreanTokenList(scala.collection.Seq<KoreanToken> tokens) |
static List<KoreanTokenJava> |
tokensToJavaKoreanTokenList(scala.collection.Seq<KoreanToken> tokens,
boolean keepSpace)
Transforms the tokenization output to List
|
static List<String> |
tokensToJavaStringList(scala.collection.Seq<KoreanToken> tokens) |
static List<String> |
tokensToJavaStringList(scala.collection.Seq<KoreanToken> tokens,
boolean keepSpace)
Tokenize with the builder options into a String Iterable.
|
public static CharSequence normalize(CharSequence text)
text - Input text.public static scala.collection.Seq<KoreanToken> tokenize(CharSequence text)
text - Input text.public static void addNounsToDictionary(List<String> words)
words - List of user nouns.public static List<KoreanTokenJava> tokensToJavaKoreanTokenList(scala.collection.Seq<KoreanToken> tokens, boolean keepSpace)
tokens - Korean tokens (output of tokenize(CharSequence text)).public static List<KoreanTokenJava> tokensToJavaKoreanTokenList(scala.collection.Seq<KoreanToken> tokens)
public static List<String> tokensToJavaStringList(scala.collection.Seq<KoreanToken> tokens, boolean keepSpace)
tokens - Korean tokens (output of tokenize(CharSequence text)).public static List<String> tokensToJavaStringList(scala.collection.Seq<KoreanToken> tokens)
public static List<Sentence> splitSentences(CharSequence text)
text - Input text.public static List<KoreanPhraseExtractor.KoreanPhrase> extractPhrases(scala.collection.Seq<KoreanToken> tokens, boolean filterSpam, boolean includeHashtags)
tokens - Korean tokens (output of tokenize(CharSequence text)).Copyright © 2014–2017. All rights reserved.