Class StreamCssTokenizer

java.lang.Object
org.jhotdraw8.css.parser.StreamCssTokenizer
All Implemented Interfaces:
CssTokenizer

public class StreamCssTokenizer extends Object implements CssTokenizer
StreamCssTokenizer processes an input stream of characters into tokens for the CssParser.

The tokenizer implements the ISO 14977 EBNF productions listed below. Only productions with all caps names are returned as tokens. Productions with lowercase names are used as internal macros.

The tokenizer uses CssScanner for preprocessing the input stream. The preprocessed input stream does not contain the following characters: \000, \r, \f.

 IDENT       = ident ;
 AT_KEYWORD  = "@" , ident ;
 STRING      = string ;
 BAD_STRING  = badstring ;
 BAD_URI     = baduri ;
 BAD_COMMENT = badcomment ;
 HASH        = '#' , name ;
 NUMBER      = num
 PERCENTAGE  = num , '%' ;
 DIMENSION   = num , ident ;
 URI         = ( "url(" , w , string , w , ')'
               | "url(" , w, { urichar | nonascii | escape }-, w, ')'
               )
 UNICODE_RANGE = "u+", ( hexd, 5 * [ hexd ]
                       | hexd, 5 * [ hexd ] [ '-', hexd, 5 * [ hexd ] ]
                       | [ hexd, 4 * [ hexd ] ] , '?' , 5 * [ '?' ]
                       ) ;
 CDO           = "<!--" ;
 CDC           = "-->" ;
 :             = ':' ;
 ;             = ';' ;
 {             = '{' ;
 }             = '}' ;
 (             = '(' ;
 )             = ')' ;
 [             = '[' ;
 ]             = ']' ;
 S             = { w }- ;
 COMMENT       = '/', '*' , { ? anything but '*' followed by '/' ? } , '*', '/' ;
 ROUND_BLOCK      = ident , '(' ;
 INCLUDE_MATCH = '~', '=' ;
 DASH_MATCH    = '|', '=' ;
 PREFIX_MATCH  = '^', '=' ;
 SUFFIX_MATCH  = '$', '=' ;
 SUBSTRING_MATCH
               = '*', '=' ;
 COLUMN        = '|', '|' ;
 DELIM         = ? any other character not matched by the above rules,
                   and neither a single nor a double quote ? ;

 ident         = [ '-' ] , nmstart , { nmchar }
               | [ '--' ] , { nmchar } ;
 name          = { nmchar }- ;
 nmstart       = '_' | letter | nonascii | escape ;
 nonascii      = ? U+00A0 through U+10FFFF ? ;
 letter        = ? 'a' through 'z' or 'A' through 'Z' ?
 unicode       = '\' , ( 6 * hexd
                       | hexd , 5 * [hexd] , w
                       );
 escape        = ( unicode
                 | '\' , -( newline | hexd)
                 ) ;
 nmchar        = '_' | letter | digit | '-' | nonascii | escape ;
 num           = [ '+' | '-' ] ,
                 ( { digit }-
                 | { digit } , '.' , { digit }-
                 )
                 [ 'e'  , [ '+' | '-' ] , { digit }- ] ;
 digit         = ? '0' through '9' ?
 letter        = ? 'a' through 'z' ? | ? 'A' through 'Z' ? ;
 string        = string1 | string2 ;
 string1       = '"' , { -( newline | '"' ) | '\\' , newline |  escape } , '"' ;
 string2       = "'" , { -( newline | "'" ) | '\\' , newline |  escape } , "'" ;
 badstring     = badstring1 | badstring2 ;
 badstring1    = '"' , { -( newline | '"' ) | '\\' , newline |  escape } ;
 badstring2    = "'" , { -( newline | "'" ) | '\\' , newline |  escape } ;
 badcomment    = badcomment1 | badcomment2 ;
 badcomment1   = '/' , '*' , { ? anything but '*' followed by '/' ? } , '*' ;
 badcomment2   = '/' , '*' , { ? anything but '*' followed by '/' ? } ;
 baduri        = baduri1 | baduri2 | baduri3 ;
 baduri1       = "url(" , w , { urichar | nonascii | escape } , w ;
 baduri2       = "url(" , w , string, w ;
 baduri3       = "url(" , w , badstring ;
 newline       = '\n' ;
 w             = { ' ' | '\t' | newline } ;
 urichar       = '!' | '#' | '$' | '%' | '&' | ? '*' through '[' ?
                 | ? ']' through '~' ? ;
 hexd           = digit | ? 'a' through 'f' ? | ? 'A' through 'F' ? ;
 

References:

CSS Syntax Module Level 3, Paragraph 4. Tokenization
w3.org
CSS Values and Units Module Level 4, Paragraph 4.5 Resource Locators: the <url> type
drafts.csswg.org
Author:
Werner Randelshofer