org.languagetool.tokenizers.uk
Class UkrainianWordTokenizer
java.lang.Object
org.languagetool.tokenizers.uk.UkrainianWordTokenizer
- All Implemented Interfaces:
- Tokenizer
public class UkrainianWordTokenizer
- extends Object
- implements Tokenizer
Tokenizes a sentence into words.
Punctuation and whitespace gets its own token.
Specific to Ukrainian: apostrophes (0x27 and U+2019) not in the list as they are part of the word
- Author:
- Andriy Rysin
| Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
UkrainianWordTokenizer
public UkrainianWordTokenizer()
tokenize
public List<String> tokenize(String text)
- Specified by:
tokenize in interface Tokenizer
Copyright © 2013. All Rights Reserved.