Constant folding is a compiler optimization technique that evaluates constant expressions at compile time and replaces them with their computed values, reducing runtime computation overhead.
Constant Folding in Traditional Compilers
Traditional compilers identify constant expressions during compilation and replace them with precomputed results. For example:
seconds_per_day = 24 * 60 * 60
This is optimized to:
seconds_per_day = 86400
In CPython, this optimization is visible in the bytecode:
import dis
dis.dis("seconds_per_day = 24*60*60")
Output:
1 0 LOAD_CONST 0 (86400)
2 STORE_NAME 0 (seconds_per_day)
4 LOAD_CONST 1 (None)
6 RETURN_VALUE
The interpreter directly loads the constant 86400, avoiding three loads and two multiplications at runtime.
For an expression to be eligible for constant folding, all its subexpressions must be constants. This often requires constant propagation, which substitutes variables with their known constant values.
Consider this code:
int x = 14;
int y = 7 - x / 2;
return y * (28 / x + 2);
After constant propagation (replacing x with 14):
int x = 14;
int y = 7 - 14 / 2;
return y * (28 / 14 + 2);
Now, both 7 - 14/2 and 28/14 + 2 are constant expressions and can be folded:
int x = 14;
int y = 0;
return y * 4;
Further propagation replaces y with 0:
int x = 14;
int y = 0;
return 0 * 4;
Finally, 0 * 4 folds to 0:
int x = 14;
int y = 0;
return 0;
Constant propagation relies on data-flow analysis over the control flow graph (CFG). For each basic block n, the set of reachable definitions Reaches(n) is computed as:
\[ Reaches(n) = \bigcup_{m \in preds(n)} \left( DEDef(m) \cup (Reaches(m) \cap \overline{DefKill(m)}) \right) \]
Where:
preds(n): predecessors of blocknDEDef(m): definitions in blockmthat reach its exit without redefinitionDefKill(m): definitions killed (redefined) in blockm
A definition is reachable at a use site if it appears in Reaches(n) and hasn't been redefined along the path from entry to the use.
Note: language-specific rules affect what qualifies as a compile-time constant. In Java, only final variables initialized with constant expressions are treated as compile-time constants. Thus:
String a = "a";
String bc = "bc";
String s1 = a + bc; // Not folded—uses StringBuilder at runtime
Languages also impose practical limits. In CPython:
2**64is folded (result fits in internal representation), but4**64is not:
dis.dis("x = 2**64") # LOAD_CONST with 18446744073709551616
dis.dis("x = 4**64") # BINARY_POWER remains
Similarly, string repetition is only folded if the result length ≤ 4096:
dis.dis("s = '-' * 4097") # BINARY_MULTIPLY remains
Constant Folding in AI Compilers
AI compilers apply constant folding to computation graphs rather then abstract syntax trees. A node is folded if its inputs are compile-time constants or become constants after prior folding.
Common foldable patterns include:
- Constant-input operations: If all inputs to an op (e.g.,
AddN) are constant tensors, compute the result at compile time and replace the op with a constant node. - Shape-dependent ops: Operations like
SizeorShapedepend only on input tensor shapes. If shapes are known statically, these ops are replaced with constant values. For a tensor of shape(1,2,3,4),Size()becomes the constant24. - Chained folding: Even if an op’s inputs aren’t initially constant, they may become so after folding upstream nodes. For example, if
AddN1is folded to a constant, andAddN2consumes it along with another constant,AddN2can then be folded too.
However, AI compilers often avoid folding very large constants to prevent graph bloat.
Implementation Examples
Traditional Compiler (CPython)
CPython uses astfold_expr to recursively traverse the AST. For binary operations:
- Fold left and right operands recursively
- If both are constants, call
fold_binop - Perform the operation (e.g., via
PyNumber_Add) - Replace the AST node with the result
AI Compiler (TensorFlow)
TensorFlow’s constant folding proceeds as follows:
- Traverse the graph in reverse post-order (ensuring inputs are processed before consumers)
- Identify constant nodes and foldable shape ops (e.g.,
Shape,Size) - Build a "constant subgraph" containing all foldable nodes and their dependencies
- Mark nodes whose outputs feed non-constant parts of the graph as "to be folded"
- For each such node, execute it in a sandboxed environment (since all inputs are constants) and replace it with the resulting constant tensor
This approach ensures correctness while maximizing optimizaton opportunities across complex computation graphs.