To extend a simple calculator to evaluate full arithmetic expressions—including those with mixed operators and parentheses—we structure the grammar in hierarchical layers, each representing a different precedence level. This ensures correct evaluation order without requiring explicit parentheses in every case.
The grammar is divided into three nonterminals: Expr, Factor, and Term, ordered from lowest to highest precedence.
- Expr handles addition and subtraction.
- Factor handles multiplication and division.
- Term represetns the most basic units: numbers or parenthesized expressions.
Here’s the grammar definition:
use std::str::FromStr;
grammar;
pub Expr: i32 = {
<l:Expr> "+" <r:Factor> => l + r,
<l:Expr> "-" <r:Factor> => l - r,
Factor,
};
Factor: i32 = {
<l:Factor> "*" <r:Term> => l * r,
<l:Factor> "/" <r:Term> => l / r,
Term,
};
Term: i32 = {
Num,
"(" <Expr> ")",
};
Num: i32 = {
r"[0-9]+" => i32::from_str(<>).unwrap(),
};
This structure enforces precedence naturally. For example, in the expression 2 + 3 * 4, the parser must match the multiplication before the addition. Since * operates on Term values and + operates on Factor values, the multiplication is parsed first—because a Factor cannot contain an Expr on its left side. This prevents the incorrect interpretation where addition would be performed before multiplication.
The hierarchy ensures that only valid parse trees are possible. Any attempt to parse 2 + 3 * 4 as (2 + 3) * 4 fails because the left operand of the multiplication would need to be an Expr, which is not allowed within a Factor.
An alternative approach using LALRPOP’s built-in attributes simplifies the grammar when dealing with many precedence levels. Instead of nesting nonterminals, we define a single Expr with explicit precedence and associativity annotations:
pub Expr: i32 = {
#[precedence(level = "0")] // Highest precedence
Term,
#[precedence(level = "1")] #[assoc(side = "left")]
<l:Expr> "*" <r:Expr> => l * r,
<l:Expr> "/" <r:Expr> => l / r,
#[precedence(level = "2")] #[assoc(side = "left")]
<l:Expr> "+" <r:Expr> => l + r,
<l:Expr> "-" <r:Expr> => l - r,
};
Here, lower level values indicate higher precedence. So Term (level 0) binds tightest, followed by multiplication/division (level 1), then addition/subtraction (level 2). This means 13 + 7 * 3 is correctly interpreted as 13 + (7 * 3).
The #[assoc(side = "left")] annotation ensures left associativity, which is essential for avoiding ambiguity in chains like 1 + 2 + 3. Without it, the parser might ambiguously interpret the expression as either (1 + 2) + 3 or 1 + (2 + 3). Left associativity forces the leftmost operation to bind first, matching standard mathematical convention.
Only the top-level nonterminal (Expr) is marked pub, as it is the only one intended for external access. Marking helper nonterminals as public generates unnecessary boilerplate (like new() methods), which can trigger unused-function warnings. If such warnings arise, remove pub from any nonterminal that is not the entry point.
Parentheses are handled by allowing Term to recursively contain an Expr wrapped in ( ), enabling users to override default precedence explicitly—e.g., (2 + 3) * 4.