Package org.languagetool.tokenizers.nl
Class DutchWordTokenizer
java.lang.Object
org.languagetool.tokenizers.WordTokenizer
org.languagetool.tokenizers.nl.DutchWordTokenizer
- All Implemented Interfaces:
Tokenizer
-
Field Summary
Fields inherited from class org.languagetool.tokenizers.WordTokenizer
REMOVED_EMOJI -
Constructor Summary
Constructors -
Method Summary
Methods inherited from class org.languagetool.tokenizers.WordTokenizer
getProtocols, isCurrencyExpression, isEMail, isUrl, joinEMails, joinEMailsAndUrls, joinUrls, replaceEmojis, restoreEmojis, splitCurrencyExpression
-
Constructor Details
-
DutchWordTokenizer
public DutchWordTokenizer()
-
-
Method Details
-
tokenize
Tokenizes just like WordTokenizer with the exception for words such as "oma's" that contain an apostrophe in their middle.- Specified by:
tokenizein interfaceTokenizer- Overrides:
tokenizein classWordTokenizer- Parameters:
text- Text to tokenize- Returns:
- List of tokens
-
getTokenizingCharacters
- Overrides:
getTokenizingCharactersin classWordTokenizer
-