The naive CSE rule would be
e'[e, e] ~> let x = e in e'[x, x].
That is, whenever a subexpression e
occurs twice in the expression e'
, we use a let-binding to compute e
once. This however leads itself to some trivial space leaks. For example
sum [1..n] + prod [1..n]
is typically O(1)
space usage in a lazy functional programming language like Haskell (as sum
and prod
would tail-recurse and blah blah blah), but would become O(n)
when the naive CSE rule is enacted. This can be terrible for programs when n
is high!
The approach is then to make this rule more specific, restricting it to a small set of cases that we know won't have the problem. We can begin by more specifically enumerating the problems with the naive rule, which will form a set of priorities for us to develop a better CSE:
The two occurrences of e
might be far apart in e'
, leading to a long lifetime for the let x = e
binding.
The let-binding must always allocate a closure where previously there might not have been one.
This can create an unbound number of closures.
There are cases where the closure might never deallocate.
Something better
let x = e in e'[e] ~> let x = e in e'[x]
This is a more conservative rule but is much safer. Here we recognize that e
appears twice but the first occurrence syntactically dominates the second expression, meaning here that the programmer has already introduced a let-binding. We can safely just reuse that let-binding and replace the second occurrence of e
with x
. No new closures are allocated.
Another example of syntactic domination:
case e of { x -> e'[e] } ~> case e of { x -> e'[x] }
And yet another:
case e of {
Constructor x0 x1 ... xn ->
e'[e]
}
~>
case e of {
Constructor x0 x1 ... xn ->
e'[Constructor x0 x1 ... xn]
}
These rules all take advantage of existing structure in the program to ensure that the kinetics of space usage remain the same before and after the transformation. They are much more conservative than the original CSE but they are also much safer.
See also
For a full discussion of CSE in a lazy FPL, read Chitil's (very accessible) 1997 paper. For a full treatment of how CSE works in a production compiler, see GHC's CSE.hs module, which is documented very thoroughly thanks to GHC's tradition of writing long footnotes. The comment-to-code ratio in that module is off the charts. Also note how old that file is (1993)!
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…