org.languagetool.tokenizers.uk
Class UkrainianWordTokenizer

java.lang.Object
  extended by org.languagetool.tokenizers.uk.UkrainianWordTokenizer
All Implemented Interfaces:
Tokenizer

public class UkrainianWordTokenizer
extends Object
implements Tokenizer

Tokenizes a sentence into words. Punctuation and whitespace gets its own token. Specific to Ukrainian: apostrophes (0x27 and U+2019) not in the list as they are part of the word

Author:
Andriy Rysin

Constructor Summary
UkrainianWordTokenizer()
           
 
Method Summary
 List<String> tokenize(String text)
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

UkrainianWordTokenizer

public UkrainianWordTokenizer()
Method Detail

tokenize

public List<String> tokenize(String text)
Specified by:
tokenize in interface Tokenizer


Copyright © 2013. All Rights Reserved.