-
In classical compiler design, a lexer’s job is to convert a flat character stream into a flat token stream. It does not understand nesting or structure beyond very local patterns.
-
In the standard model, the lexer is intentionally simple, and understanding nesting is the parser’s job.
-
From a theoretical lexer design standpoint, the key rule is:
-
A lexer should encode only what is unambiguous at the lexical level, and nothing that depends on grammar or semantics.
-
A lexer must never build language constructs whose shape depends on syntax or meaning, even if the meaning is “known”.
-
For example, suppose
Vector2(20, 30): knowing thatVector2is “just two f32s” is semantic knowledge, not lexical knowledge.
-
Examples
Example 1
-
[node name="flor" parent="." unique_id=2138173886 instance=ExtResource("3_7sc02")]
LBRACKET
IDENT(node)
IDENT(name)
EQUAL
STRING("flor")
IDENT(parent)
EQUAL
STRING(".")
IDENT(unique_id)
EQUAL
NUMBER(2138173886)
IDENT(instance)
EQUAL
IDENT(ExtResource)
LPAREN
STRING("3_7sc02")
RPAREN
RBRACKET
Example 2
-
Vector2(506, 323) -
must be tokenized into multiple tokens, not one.
-
Each token independently fits your { type, value } model.
-
Exact theoretical token sequence
-
Token
-
type: IDENTIFIER
-
value: "Vector2"
-
-
Token
-
type: LEFT_PAREN
-
value: ( or None
-
-
Token
-
type: NUMBER_LITERAL
-
value: 506 (numeric value, not string)
-
-
Token
-
type: COMMA
-
value: , or None
-
-
Token
-
type: NUMBER_LITERAL
-
value: 323 (numeric value, not string)
-
-
Token
-
type: RIGHT_PAREN
-
value: ) or None
-
-
That is the full and correct lexical output.
-
Parser responsibility (high level):
-
The parser’s job is to:
-
Consume tokens according to grammar rules
-
Establish structure and relationships(2025-12-25)
-
Produce an AST, not values
-
-
The parser does not:
-
Decide what Vector2 means
-
Construct arrays
-
Convert to f32
-
Perform semantic validation
-
-
-
Where values are manipulated
-
Values are first legitimately created, evaluated, and manipulated during semantic analysis / constant evaluation, after parsing but before code generation.
-
Semantic analysis / constant evaluation / lowering
-
This phase may have different names, but conceptually it is where:
-
Symbols are resolved
-
Types are assigned
-
Expressions may be evaluated
-
Constants may be folded
-
Builtins may be lowered
-
IR-friendly representations are produced
-
-
This is the first phase allowed to manipulate actual values.
-
-