The lexer is composed of the following rules:
Contrary to CSS 3 which allows for any encoding as long as the first 128 bytes match ASCII sufficiently, CSS Preprocessor only accepts UTF-8. This is because (1) 99% of the CSS files out there are ACSII anyway and therefore already UTF-8 compatible and (2) because the Snap! Websites environment is using UTF-8 throughout all of its documents (although in memory text data may use a different format such as UTF-16 or UTF-32.)
The input stream is checked for invalid data. The lexer generates an error if an invalid character is found. Characters that are considered invalid are:
Note that the parsing will continue after such errors. However, if one or more errors occured while parsing an input stream, you should not use the output since it is likely invalid.
An example using the CSS Preprocessor lexer with a string:
A valid character is any character code point defined between 0x000000 and 0x10FFFF inclusive.
The input-stream defines a small set of characters within that range that are considered invalid in CSS Preprocessor streams. Any character considered invalid is replaced by the 0xFFFD code point so the rest of the implementation does not have to check for invalid characters each time.
An ASCII character is any value between 0 and 127 inclusive.
CSS 3 references ASCII and non-ASCII characters.
A NON-ASCII character is any valid character code point over 127.
Note that "anything" means any character that is not considered invalid by the CSS Preprocessor implementation.
Note that "anything" means any character that is not considered invalid by the CSS Preprocessor implementation.
The CSS Preprocessor returns a C++ comment appearing on multiple lines, one after another, as a single C++ comment token. This is used that way because it is possible to mark a comment as @preserve in order to keep said comment in the output. In most cases this is used in the comment at the top or bottom which includes the copyright notice about the document.
CSS Preprocessor counts lines starting at one and incrementing anytime a "\n", "\r", "\r\n" sequence is found.
CSS Preprocessor resets the line counter back to one and increments the page counter by one each time a "\f" is found.
CSS Preprocessor also counts the total number of lines and pages in a separate counter.
The line number is used to print out errors. If you use paging (\f), you may have a harder time to find your errors in the current version.
Note that line counting also happens in C++ and C-like comments.
URL do not accept non-printable characters if not written between quotes. This rule shows you which characters are considered non-printable in CSS 3.
Whitespaces are quite important in CSS since they are required in many cases. For example, a dash (-) can start an identifier, so you want to add a space after a dash if you want to use the minus sign.
The CSS Preprocessor documentation often references the WHITESPACE token meaning that any number of whitespaces, including zero. It may be written as WHITESPACE* (0 or more whitespaces) or WHITESPACE+ (one or more whitespaces) to be more explicit.
CSS 3 defines a whitespace, a whitespace-token, and a ws* to represents all those possibilities.
An hexadecimal digit is any digit (0-9) and a letter from A to F either in lowercase (a-f) or in uppercase (A-F).
An hexadecimal digit is any digit (0-9) and a letter from A to F either in lowercase (a-f) or in uppercase (A-F).
Allow hexadecimal or direct escaping of any character, except the new line character ('\' followed by any newline). There is an exception to the newline character in strings.
The hexadecimal syntax allows for any number from 0 to 0xFFFFFF. However, the same constraint applies to escape characters and only code points that are considered valid from the input stream are considered valid in an escape sequence. This means any character between 0 and 0x10FFFF except those marked as invalid in the input-stream section.
We do not have any exception to the identifier. Our lexer returns the same identifiers as CSS 3 allows. Note that there is an extension compared to CSS 2.x, characters 0x80 to 0x9F are accepted in identifiers.
Note that since you can write an identifier using escape characters, it can really be composed of any character except NEWLINE and invalid characters. This allows for CSS to represent any character that can be used in an attribute value in HTML.
A FUNCTION token is an IDENTIFIER immediately followed by an open parenthesis. No WHITESPACE is allowed between the IDENTIFIER and the parenthesis.
Various CSS definitions require an AT-KEYWORD. Note that a full keyword can be defined, starting with a dash, with escape sequence, etc.
Our extensions generally make use of AT-KEYWORD commands to extend the capabilities of CSS.
The PLACEHOLDER is a CSS Preprocessor extension allowing for the definition of rules that do not get included in your CSS unless they get referenced.
Variables are a CSS Preprocess extension, very similar to the variables defined in the SASS language (also to variables in PHP).
The name of a variable is very limited on purpose.
Just like regular identifiers followed by a '(', we view variables immediately followed by a '(' as Variable Functions.
The name of a variable function is very limited on purpose.
The HASH token is an identifier without the first character being limited.
Strings can be written between '...' or "...". The string can contain a quote if properly escaped. You may either escape the quote itself or use the corresponding hexadecimal encoding:
Of course, you can use ' in a string quoted with " and vice versa.
Strings accept the backslashed followed by a newline to insert a newline in the string and write that string on multiple lines. In other words, the slash is removed, but not the '\n' character.
The URL token is quite peculiar in CSS. This keyword was available since CSS 1 which did not really offer functions per se. For that reason it allows some backward compatible syntax which would certainly be quite different in CSS 3 had they chosen to not allow as is URLs to be entered (i.e. only allow quoted URLs.)
Also because of that, the URL is a special token and not a function. Note that the syntax allows for an empty URL, which is important to be able to cancel a previous URL definition (overwrite a background image with nothing.)
The URL can be nearly any kind of characters except spaces, parenthesis, quotes, and the backslash. To include such character you may either ESCAPE them or use quotes.
CSS 3 distinguishes between integers and floating points, only the definition of an integer is just a floating point with no decimal digits after the period and no exponent.
The CSS Preprocessor lexer returns two different types of tokens: INTEGER and DECIMAL_NUMBER. The compiler may force the use of one or the other in a few places where the type has to be an INTEGER or a DECIMAL_NUMBER. For example, a PERCENT number always uses a DECIMAL_NUMBER.
The CSS 3 lexer is expected to include the signs as part of a number (to simplify the rest of the grammar.) This is important because otherwise rules such as a background field would look like expressions:
Here the +3px and -5px are viewed as two distinct numbers. If we were to make the + and - operators instead of part of the numbers, these two numbers would look like a subtraction (3px - 5px). When you write expressions, you should anyway always add spaces around your operators. Another one that may get you is negating the result of a function call. Without the space the dash becomes part of the function name. In the following, you are calling a function named 'color-saturate' instead of subtracting 'saturate($color, -33%)' from '$color':
The correct expression would be:
Example of numbers:
When a number is immediately followed by an identifier, the result is a dimension.
Note that the name of a dimension can start with the character 'e' (i.e. "13em",) however, if the character 'e' is followed by a sign ('+' or '-') or a DIGIT, then the 'e' is taken as the exponent character.
The lexer let you enter any dimension. At some point the compiler will make sure that all dimensions are understood by CSS. That being said, the CSS Proprocessor is likely to understand many other dimensions and convert to on that CSS 3 understands.
You may find a complete list of supported CSS 3 dimmensions here: http://www.w3.org/TR/css3-values/
A PERCENT is a number immediately followed by the '' character. Internally a PERCENT is always represented as a decimal number, even if the number was an integer (integers are automatically converted as required.)
Note that the percent character can be appended using the escape character. In that case, it is viewed as a dimension which will fail validation.
Define a range of Unicode characters from their code points. The expression allows for:
The mask mechanism actually generates a range like the third syntax, only it replaces the '?' character with '0' for the start code point and with 'f' for the end code point.
A Unicode range is used by @font-face definitions to limit the number of characters to be loaded for a page.
Match when the parameter on the right is included in the list of parameters found on the left. The value on the left is a list of whitespace separated words (i.e. a list of classes).
Match when the first element of the hyphen separated list of words on the left is equal to the value on the right.
Match when the value on the left starts with the value on the right.
Match when the value on the left ends with the value on the right.
Match when the value on the right is found in the value on the left.
The CSS 3 documentation says:
At this point I am not too sure whether that means it is only a lexer artifact or whether it would be an operator people can use.
All of that being said, since we support the Logical AND "AND" (CSS Preprocessor extension), we accept this operator as the Logical OR in our expressions.
The OR operator takes two boolean value. If at least one of these boolean value is true, then the result is true, otherwise it is false.
You may also use the 'or' identifier.
The '&&' operator is the logical AND operator. It returns true when its left and right handsides are both set to true.
You may also use the 'and' identifier.
CSS 3 clearly uses '=' to test for equality. Somehow, SASS added '==' which is really not consistent. To be more compatible with SASS, we support both. At this point we do not warn or anything when '==' is found. We may do so later. Internally, we immediately convert '==' to the exact same token as '='.
We offer a 'not equal' operator for our expressions and also attributes.
The attribute extension is converted by the compiler to valid CSS as in:
We added the ':=' operator to allow one to set a variable within an expression. For example, you could write an assignment of a long expression, then reuse that value many times in the rest of the expression:
We added the '<=' operator to allow one to compare values in an expression between each others.
We added the '>=' operator to allow one to compare values in an expression between each others.
The '**' operator is an extension that allow you to caculate the power of a number (left hand side) by another (right hand side). Note that dimensions (numbers with a unit) cannot be used with the '**' operator.
The Comment Document Open is understood so that way it can be skipped when reading a block of data coming from an HTML <style> tag.
The Comment Document Close is understood so that way it can be skipped when reading a block of data coming from an HTML <style> tag.
The DELIMITER is activated for any character that does not activate any other lexer rule.
For example, a period that is not followed by a DIGIT is returned as itself. The grammar generally shows delimiters using a simple quoted string rather than its node_type_t name.
The delimiters actually return a specific node_type_t value for each one of these characters:
Any character that does not match one of these DELIMITER characters, or another lexer token, generates an immediate lexer error.
Documentation of CSS Preprocessor.
This document is part of the Snap! Websites Project.
Copyright by Made to Order Software Corp.