(++)
's notoriously bad asymptotics come about when you use it in a left-associative manner - that is, when (++)
's left argument is the result of another call to (++)
. Right-associative expressions run efficiently, though.
To put it more concretely: evaluating a left-nested append of m
lists like
((ws ++ xs) ++ ys) ++ zs -- m = 3 in this example
to WHNF requires you to force m
thunks, because (++)
is strict in its left argument.
case (
case (
case ws of { [] -> xs ; (w:ws) -> w:(ws ++ xs) }
) of { [] -> ys ; (x:xs) -> x:(xs ++ ys) }
) of { [] -> zs ; (y:ys) -> y:(ys ++ zs) }
In general, to fully evaluate n
elements of such a list, this'll require forcing that stack of m
thunks n
times, for O(m*n) complexity. When the entire list is built from appends of singleton lists (ie (([w] ++ [x]) ++ [y]) ++ [z]
), m = n
so the cost is O(n2).
Evaluating a right-nested append like
ws ++ (xs ++ (ys ++ zs))
to WHNF is much easier (O(1)):
case ws of
[] -> xs ++ (ys ++ zs)
(w:ws) -> w:(ws ++ (xs ++ (ys ++ zs)))
Evaluating n
elements just requires evaluating n
thunks, which is about as good as you can expect to do.
Difference lists work by paying a small (O(m)) up-front cost to automatically re-associate calls to (++)
.
newtype DList a = DList ([a] -> [a])
fromList xs = DList (xs ++)
toList (DList f) = f []
instance Monoid (DList a) where
mempty = fromList []
DList f `mappend` DList g = DList (f . g)
Now a left-nested expression,
toList (((fromList ws <> fromList xs) <> fromList ys) <> fromList zs)
gets evaluated in a right-associative manner:
((((ws ++) . (xs ++)) . (ys ++)) . (zs ++)) []
-- expand innermost (.)
(((l0 -> ws ++ (xs ++ l0)) . (ys ++)) . (zs ++)) []
-- expand innermost (.)
((l1 -> (l0 -> ws ++ (xs ++ l0)) (ys ++ l1)) . (zs ++)) []
-- beta reduce
((l1 -> ws ++ (xs ++ (ys ++ l1))) . (zs ++)) []
-- expand innermost (.)
(l2 -> (l1 -> ws ++ (xs ++ (ys ++ l1))) (zs ++ l2)) []
-- beta reduce
(l2 -> ws ++ (xs ++ (ys ++ (zs ++ l2)))) []
-- beta reduce
ws ++ (xs ++ (ys ++ (zs ++ [])))
You perform O(m) steps to evaluate the composed function, then O(n) steps to evaluate the resulting expression, for a total complexity of O(m+n), which is asymptotically better than O(m*n). When the list is made up entirely of appends of singleton lists, m = n
and you get O(2n) ~ O(n), which is asymptotically better than O(n2).
This trick works for any Monoid
.
newtype RMonoid m = RMonoid (m -> m) -- "right-associative monoid"
toRM m = RMonoid (m <>)
fromRM (RMonoid f) = f mempty
instance Monoid m => Monoid (RMonoid m):
mempty = toRM mempty
RMonoid f `mappend` RMonoid g = RMonoid (f . g)
See also, for example, the Codensity monad, which applies this idea to monadic expressions built using (>>=)
(rather than monoidal expressions built using (<>)
).
I hope I have convinced you that (++)
only causes a problem when used in a left-associative fashion. Now, you can easily write a list-like data structure for which append is strict in its right argument, and left-associative appends are therefore not a problem.
data Snoc a = Nil | Snoc (Snoc a) a
xs +++ Nil = xs
xs +++ (Snoc ys y) = Snoc (xs +++ ys) y
We recover O(1) WHNF of left-nested appends,
((ws +++ xs) +++ ys) +++ zs
case zs of
Nil -> (ws +++ xs) +++ ys
Snoc zs z -> Snoc ((ws +++ xs) +++ ys) +++ zs) z
but at the expense of slow right-nested appends.
ws +++ (xs +++ (ys +++ zs))
case (
case (
case zs of { Nil -> ys ; (Snoc zs z) -> Snoc (ys +++ zs) z }
) of { Nil -> xs ; (Snoc ys y) -> Snoc (xs +++ ys) y }
) of { Nil -> ws ; (Snoc xs x) -> Snoc (ws +++ xs) y }
Then, of course, you end up writing a new type of difference list which reassociates appends to the left!
newtype LMonoid m = LMonoid (m -> m) -- "left-associative monoid"
toLM m = LMonoid (<> m)
fromLM (LMonoid f) = f mempty
instance Monoid m => Monoid (LMonoid m):
mempty = toLM mempty
LMonoid f `mappend` LMonoid g = LMonoid (g . f)