Class FrenchWordTokenizer

java.lang.Object
org.languagetool.tokenizers.WordTokenizer
org.languagetool.tokenizers.fr.FrenchWordTokenizer
All Implemented Interfaces:
Tokenizer

public class FrenchWordTokenizer extends WordTokenizer
Tokenizes a sentence into words. Punctuation and whitespace get its own token. Special treatment for hyphens and apostrophes in French.
Author:
Jaume OrtolĂ 
  • Constructor Details

    • FrenchWordTokenizer

      public FrenchWordTokenizer()
  • Method Details

    • tokenize

      public List<String> tokenize(String text)
      Specified by:
      tokenize in interface Tokenizer
      Overrides:
      tokenize in class WordTokenizer
      Parameters:
      text - Text to tokenize
      Returns:
      List of tokens. Note: a special string xxFR_APOSxx is used to replace apostrophes, and xxFR_HYPHENxx to replace hyphens.