Class UkrainianWordTokenizer

java.lang.Object
org.languagetool.tokenizers.uk.UkrainianWordTokenizer
All Implemented Interfaces:
Tokenizer

public class UkrainianWordTokenizer extends Object implements Tokenizer
Tokenizes a sentence into words. Punctuation and whitespace gets its own token. Specific to Ukrainian: apostrophes (0x27 and U+2019) not in the list as they are part of the word
Author:
Andriy Rysin
  • Field Details

    • WORDS_WITH_BRACKETS_PATTERN

      public static final Pattern WORDS_WITH_BRACKETS_PATTERN
  • Constructor Details

    • UkrainianWordTokenizer

      public UkrainianWordTokenizer()
  • Method Details