-
Quick theory around Precedence, Loop and Recursion .
-
No code.
-
-
Computerphile - Basic theory explanation for a parser, thinking of it as grammar/sentences .
-
No code.
-
-
Chomsky grammar hierarchy:
0. Unrestricted.-
Context-sensitive.
-
Context-free.
-
Regular.
-
Parsing strategy direction
-
How the tree is built.
-
How the parser is constructed relative to the grammar derivation
-
Top-down β builds the tree from the start symbol toward the leaves
-
Bottom-up β builds the tree from tokens toward the start symbol
-
Formally, it corresponds to:
-
top-down β leftmost derivation
-
bottom-up β reverse rightmost derivation
-
Top-down Parsing
-
A top-down parser discovers and processes the hierarchical tree starting from the top, and incrementally works its way first downwards and then rightwards.
-
Top-down parsing eagerly decides what a construct is much earlier, when it has only scanned the leftmost symbol of that construct and has not yet parsed any of its parts.
-
Builds the parse tree from root to leaves
-
Predicts productions before seeing all input
-
Commonly implemented as recursive descent or LL parsers
-
Pros
-
Simple to implement by hand
-
Easy to understand and debug
-
Good error reporting
-
-
Cons
-
Cannot handle left-recursive grammars without rewriting
-
Less powerful grammar coverage
-
May require grammar refactoring (left factoring)
-
-
Typical use cases
-
Hand-written parsers
-
Simple languages
-
Config formats
-
Bottom-Up Parsing
-
A bottom-up parse discovers and processes that tree starting from the bottom left end, and incrementally works its way upwards and rightwards.
-
A parser may act on the structure hierarchy's low, mid, and highest levels without ever creating an actual data tree; the tree is then merely implicit in the parser's actions. Bottom-up parsing patiently waits until it has scanned and parsed all parts of some construct before committing to what the combined construct is.
-
If a language grammar has multiple rules that may start with the same leftmost symbols but have different endings, then that grammar can be efficiently handled by a deterministic bottom-up parse but cannot be handled top-down without guesswork and backtracking. So bottom-up parsers in practice handle a somewhat larger range of computer language grammars than deterministic top-down parsers do.
-
Computerphile - Parsing Bottom Up, Shift, Reduction .
-
Looking for a longer and longer string is called shifting.
-
Going up and making it more abstract is called reducing.
-
Bottom up is basically saying 'I want the longest possible handle'. There was an intuition around bottom up being more powerful, as you gather more contextual information.
-
-
Builds the parse tree from leaves to root
-
Shifts tokens and reduces using productions
-
Pros
-
Handles a larger class of grammars
-
Naturally supports left recursion
-
Often more suitable for complex programming languages
-
-
Cons
-
Harder to implement manually
-
Parser tables can be complex
-
Error messages are usually worse
-
-
Typical use cases
-
Production compilers
-
Parser generators (Yacc/Bison)
-
Complex expression grammars
-
Formal Grammar Families
-
What class of grammars can be parsed predictively.
-
It is a formal language classification.
-
"What grammars can be parsed deterministically with limited lookahead?"
-
In practice :
-
LL parsers β top-down
-
Except : Not all top-down parsers are LL
-
Examples:
-
Pratt parser
-
PEG / Packrat
-
-
-
-
LR parsers β bottom-up
-
-
Comparing :
-
LR parsers can handle a larger range of languages and grammars than precedence parsers or top-down LL parsing. This is because the LR parser waits until it has seen an entire instance of some grammar pattern before committing to what it has found. An LL parser has to decide or guess what it is seeing much sooner, when it has only seen the leftmost input symbol of that pattern.
-
LL parser
-
Meaning :
-
Left to right, performing Leftmost derivation of the sentence.
-
LL(k)
-
L: scan left-to-right
-
L: produce leftmost derivation
-
k: k tokens of lookahead
-
-
LR parser
-
Meaning :
-
left-to-right, rightmost derivation in reverse.
-
LR(k)
-
L: scan left-to-right
-
R: reverse rightmost derivation
-
k: k lookahead
-
-
-
A type of bottom-up parser that analyze deterministic context-free languages in linear time.
-
This is ideal for computer languages, but LR parsers are not suited for human languages which need more flexible but inevitably slower methods.
-
-
Reads input text from left to right without backing up.
-
To avoid backtracking or guessing, the LR parser is allowed to peek ahead at
klookahead input symbols before deciding how to parse earlier symbols. -
Variants :
-
SLR parsers (Simple LR)
-
LALR parsers (Look Ahead LR)
-
Look-ahead, left-to-right, rightmost derivation parser.
-
Shift-reduce parser
-
-
canonical LR(1) parsers
-
minimal LR(1) parsers
-
generalized LR parsers
-
Recursive Descent Parser (RD)
-
It's a top-down parsing strategy where each grammar rule is implemented as a function. It remains one of the most widely used parsing approaches in modern language toolingβespecially for hand-written compilers, interpreters, and DSLs.
-
RD relation to LL :
-
All predictive recursive descent parsers are LL parsers.
-
But not all recursive descent parsers are LL.
-
-
excellent control over diagnostics
-
easy grammar evolution
-
predictable performance
-
no generator dependency
-
incremental parsing
-
error recovery control
-
partial AST construction
-
fast reparsing
-
Limitations :
-
Recursive descent cannot handle left recursion.
Expr β Expr "+" Term-
This would infinite-loop in RD.
-
This can be avoided by using RD with Pratt.
-
-
LL/RD parsers cannot naturally handle highly ambiguous constructs.
T(a);-
Is it:
-
a variable declaration?
-
a function call?
-
a cast?
-
-
LR parsers handle this more mechanically; RD requires manual disambiguation.
-
-
Deep lookahead languages get messy
-
If your language needs large lookahead, you start writing hacks.
if peek(p).kind == .Something && peek(p,2).kind == .SomethingElse && peek(p,3).kind == ...-
When this appears frequently, the grammar may be better suited to LR.
-
-
Error recovery is manual work
-
Parser generators give you structured recovery.
-
With RD + Pratt you must design:
-
synchronization tokens
-
recovery points
-
diagnostic strategy
-
-
-
-
"statements are easy with RD; expressions are hard without Pratt".
-
Used in :
-
Go compiler: hand-written RD + precedence climbing for expressions
-
Rust compiler: largely hand-written top-down parser
-
Swift compiler: hand-written RD
-
Extremely common in:
-
embeddable languages
-
game scripting
-
configuration languages
-
DSLs
-
-
IDE parsers and tooling.
-
Plain recursive descent (LL(1)-style)
-
The classic textbook form.
-
Characteristics:
-
one-token lookahead
-
no backtracking
-
grammar must be LL(1)
-
very fast and simple
-
-
Used in :
-
simple languages
-
teaching compilers
-
carefully designed grammars
-
-
Limitations
-
cannot handle left recursion
-
grammar often must be refactored
-
limited expressiveness
-
-
Today: rarely used in pure form for real languages.
Recursive descent with manual lookahead (LL(k))
-
Most real-world RD parsers are here.
-
Characteristics:
-
multiple-token lookahead when needed
-
selective disambiguation logic
-
still mostly predictive
-
no full backtracking
-
-
Used in :
-
Go
-
Swift
-
parts of Rust
-
-
Why it works well
-
Most programming languages are designed to be easy to parse with limited lookahead.
-
This is the dominant modern RD style.
-
Recursive descent with backtracking
-
Characteristics:
-
try one rule, rewind on failure
-
conceptually simple
-
can parse more grammars
-
-
Pros
-
flexible
-
easy grammar expression
-
-
Cons
-
potentially exponential
-
poor worst-case performance
-
tricky error messages
-
-
Modern usage
-
Used cautiously in:
-
some DSL parsers
-
quick prototypes
-
parser combinator libraries
-
-
Rare in performance-critical compilers unless heavily controlled.
-
Packrat / PEG parsers (memoized recursive descent)
-
Technically still recursive descent, but with memoization.
-
Characteristics:
-
unlimited lookahead
-
linear time via memoization
-
ordered choice (PEG semantics)
-
handles many ambiguous cases cleanly
-
-
Where used
-
Modern parsing ecosystems such as:
-
PEG-based language tools
-
some configuration languages
-
research and newer language implementations
-
-
-
Pros:
-
very expressive grammar
-
no LL/LR refactoring
-
deterministic
-
-
Cons:
-
high memory usage
-
different ambiguity semantics
-
can surprise language designers
-
Recursive descent + Pratt (very common hybrid)
-
This is arguably the most common real-world pattern for hand-written languages.
-
Structure:
-
recursive descent β statements/declarations
-
Pratt parser β expressions
-
-
Used because:
-
statements are easy with RD
-
expressions are hard without Pratt
-
clean separation of concerns
-
-
Common in:
-
interpreters
-
scripting languages
-
modern hobby/indie compilers
-
many production hand-written parsers
-
Operator-precedence Parser
-
The term operator-precedence parsing is overloaded:
-
Classical operator-precedence parsers (Floyd, bottom-up) β rare today
-
Pratt parsing (top-down operator precedence) β widely used in hand-written parsers
-
They solve similar problems but are architecturally different.
-
Top-down Operator-Precedence Parser (Pratt Parsing)
-
"Top Down Operator-Precedence", based on recursive descent.
-
LL or LR? :
-
Pratt is a technique inside recursive descent, not a competing parser family like LL vs LR.
-
Pratt is not an alternative for LL or LR.
-
LL and LR are formal grammar families defined by how a parser recognizes productions in a context-free grammar. Pratt parsing is a procedural expression-parsing technique that does not fit either formal family.
-
-
Is a recursive descent parsing technique used to parse expressions with operator precedence and associativity in a very flexible way.
-
Instead of using a fixed grammar with many precedence levels, Pratt parsing assigns each token two behaviors:
-
nud (null denotation) β how the token behaves when it appears at the start of an expression
-
led (left denotation) β how the token behaves when it appears in the middle of an expression (infix/postfix)
-
-
Each operator also has a binding power (precedence) that controls how tightly it binds.
-
The parser then uses a single recursive function that:
-
Parses a prefix expression (via nud)
-
While the next operator has higher precedence than the current context:
-
consumes the operator
-
parses the right-hand side (via led)
-
-
-
Pratt Parsing, Binding-power .
-
A bit of Rust code.
-
-
Top-down
-
Hand-written
-
Function-driven (nud/led)
-
Expression-focused
-
Extremely ergonomic
-
It excels at handling:
-
Binary operators (+ - * /)
-
Unary operators (-x, !x)
-
Postfix operators (x++, x!)
-
Mixed precedences
-
Right/left associativity
-
Custom operators
-
-
Typical use cases
-
Arithmetic/logical expressions
-
Scripting languages
-
DSLs
-
Many production languages' expression subsystems
-
-
Used in :
-
Lua
-
JavaScript engines (hand-written parsers in some implementations)
-
Lua
-
Monkey language / Crafting Interpreters ecosystem
-
-
Even though Pratt is popular, many production compilers instead use:
-
Recursive descent + precedence climbing
-
Used by:
-
Go compiler
-
many C/C++ frontends
-
some Rust tooling
-
-
Very similar in spirit to Pratt but simpler structurally.
-
-
LR/LALR parsers
-
Still dominant in many large compilers via tools like Bison.
-
Used historically and currently in parts of:
-
C toolchains
-
SQL parsers
-
many legacy compilers
-
-
-
Limitations :
-
Extremely complex expression grammars can stress Pratt
-
Pratt handles most languages well, but becomes tricky with:
-
mixfix operators
-
user-defined precedence
-
context-sensitive operators
-
-
Still solvable, but requires care.
-
-
Classical Operator-Precedence Parsers (Floyd style)
-
Bottom-up
-
Table-driven
-
Based on precedence relations between terminals
-
Related to shift-reduce parsing
-
Interprets an operator-precedence grammar.
-
Were used in :
-
Some early ALGOL compilers
-
Academic compilers for expression grammars
-
Teaching compilers in compiler textbooks.
-
-
Used in :
-
embedded math evaluators
-
spreadsheet formula engines
-
simple rule engines
-
legacy financial systems
-
Reason: small tables, predictable behavior.
-
-
They were quickly displaced by LR-family parsers.
-
Mostly of historical/academic interest today
-
Example: most calculators use operator-precedence parsers to convert from the human-readable infix notation relying on order of operations to a format that is optimized for evaluation such as Reverse Polish notation (RPN).
-
This Operator-precedence parser is not used often in practice; however they do have some properties that make them useful within a larger design. First, they are simple enough to write by hand, which is not generally the case with more sophisticated right shift-reduce parsers. Second, they can be written to consult an operator table at run time, which makes them suitable for languages that can add to or change their operators while parsing. (An example is Haskell, which allows user-defined infix operators with custom associativity and precedence; consequently, an operator-precedence parser must be run on the program after parsing of all referenced modules.)