Class CatalanWordTokenizer

java.lang.Object
org.languagetool.tokenizers.WordTokenizer
org.languagetool.tokenizers.ca.CatalanWordTokenizer
All Implemented Interfaces:
Tokenizer

public class CatalanWordTokenizer extends WordTokenizer
Tokenizes a sentence into words. Punctuation and whitespace gets its own token. Special treatment for hyphens and apostrophes in Catalan.
Author:
Jaume OrtolĂ 
  • Field Details

  • Constructor Details

    • CatalanWordTokenizer

      public CatalanWordTokenizer()
  • Method Details

    • tokenize

      public List<String> tokenize(String text)
      Specified by:
      tokenize in interface Tokenizer
      Overrides:
      tokenize in class WordTokenizer
      Parameters:
      text - Text to tokenize
      Returns:
      List of tokens. Note: a special string xxCA_APOSxx is used to replace apostrophes, and xxCA_HYPHENxx to replace hyphens.