- All Implemented Interfaces:
- Tokenizer
public class UkrainianWordTokenizer
extends Object
implements Tokenizer
Tokenizes a sentence into words.
Punctuation and whitespace gets its own token.
Specific to Ukrainian: apostrophes (0x27 and U+2019) not in the list as they are part of the word
- Author:
- Andriy Rysin