haskell - Explicit Purely-Functional Data-Structure For Difference Lists

Question

Welcome To Ask or Share your Answers For Others

haskell - Explicit Purely-Functional Data-Structure For Difference Lists

asked Oct 17, 2021 in Technique[技术] by 深蓝 (71.8m points)

haskell - Explicit Purely-Functional Data-Structure For Difference Lists

In Haskell, difference lists, in the sense of

[a] representation of a list with an efficient concatenation operation

seem to be implemented in terms of function composition.

Functions and (dynamic) function compositions, though, must also be represented somehow in the computer's memory using data structures, which raises the question of how dlists could be implemented in Haskell without using function compositions, but, rather, through some basic purely-functional node-based data structures. How could this be done with the same performance guarantees as those of function composition?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2021-10-17T01:13:37+0000

(++)'s notoriously bad asymptotics come about when you use it in a left-associative manner - that is, when (++)'s left argument is the result of another call to (++). Right-associative expressions run efficiently, though.

To put it more concretely: evaluating a left-nested append of m lists like

((ws ++ xs) ++ ys) ++ zs  -- m = 3 in this example

to WHNF requires you to force m thunks, because (++) is strict in its left argument.

case (
    case (
        case ws of { [] -> xs ; (w:ws) -> w:(ws ++ xs) }
    ) of { [] -> ys ; (x:xs) -> x:(xs ++ ys) }
) of { [] -> zs ; (y:ys) -> y:(ys ++ zs) }

In general, to fully evaluate n elements of such a list, this'll require forcing that stack of m thunks n times, for O(m*n) complexity. When the entire list is built from appends of singleton lists (ie (([w] ++ [x]) ++ [y]) ++ [z]), m = n so the cost is O(n²).

Evaluating a right-nested append like

ws ++ (xs ++ (ys ++ zs))

to WHNF is much easier (O(1)):

case ws of
    [] -> xs ++ (ys ++ zs)
    (w:ws) -> w:(ws ++ (xs ++ (ys ++ zs)))

Evaluating n elements just requires evaluating n thunks, which is about as good as you can expect to do.

Difference lists work by paying a small (O(m)) up-front cost to automatically re-associate calls to (++).

newtype DList a = DList ([a] -> [a])
fromList xs = DList (xs ++)
toList (DList f) = f []

instance Monoid (DList a) where
    mempty = fromList []
    DList f `mappend` DList g = DList (f . g)

Now a left-nested expression,

toList (((fromList ws <> fromList xs) <> fromList ys) <> fromList zs)

gets evaluated in a right-associative manner:

((((ws ++) . (xs ++)) . (ys ++)) . (zs ++)) []

-- expand innermost (.)
(((l0 -> ws ++ (xs ++ l0)) . (ys ++)) . (zs ++)) []

-- expand innermost (.)
((l1 -> (l0 -> ws ++ (xs ++ l0)) (ys ++ l1)) . (zs ++)) []
-- beta reduce
((l1 -> ws ++ (xs ++ (ys ++ l1))) . (zs ++)) []

-- expand innermost (.)
(l2 -> (l1 -> ws ++ (xs ++ (ys ++ l1))) (zs ++ l2)) []
-- beta reduce
(l2 -> ws ++ (xs ++ (ys ++ (zs ++ l2)))) []
-- beta reduce
ws ++ (xs ++ (ys ++ (zs ++ [])))

You perform O(m) steps to evaluate the composed function, then O(n) steps to evaluate the resulting expression, for a total complexity of O(m+n), which is asymptotically better than O(m*n). When the list is made up entirely of appends of singleton lists, m = n and you get O(2n) ~ O(n), which is asymptotically better than O(n²).

This trick works for any Monoid.

newtype RMonoid m = RMonoid (m -> m)  -- "right-associative monoid"
toRM m = RMonoid (m <>)
fromRM (RMonoid f) = f mempty
instance Monoid m => Monoid (RMonoid m):
    mempty = toRM mempty
    RMonoid f `mappend` RMonoid g = RMonoid (f . g)

See also, for example, the Codensity monad, which applies this idea to monadic expressions built using (>>=) (rather than monoidal expressions built using (<>)).

I hope I have convinced you that (++) only causes a problem when used in a left-associative fashion. Now, you can easily write a list-like data structure for which append is strict in its right argument, and left-associative appends are therefore not a problem.

data Snoc a = Nil | Snoc (Snoc a) a

xs +++ Nil = xs
xs +++ (Snoc ys y) = Snoc (xs +++ ys) y

We recover O(1) WHNF of left-nested appends,

((ws +++ xs) +++ ys) +++ zs

case zs of
    Nil -> (ws +++ xs) +++ ys
    Snoc zs z -> Snoc ((ws +++ xs) +++ ys) +++ zs) z

but at the expense of slow right-nested appends.

ws +++ (xs +++ (ys +++ zs))

case (
    case (
        case zs of { Nil -> ys ; (Snoc zs z) -> Snoc (ys +++ zs) z }
    ) of { Nil -> xs ; (Snoc ys y) -> Snoc (xs +++ ys) y }
) of { Nil -> ws ; (Snoc xs x) -> Snoc (ws +++ xs) y }

Then, of course, you end up writing a new type of difference list which reassociates appends to the left!

newtype LMonoid m = LMonoid (m -> m)  -- "left-associative monoid"
toLM m = LMonoid (<> m)
fromLM (LMonoid f) = f mempty
instance Monoid m => Monoid (LMonoid m):
    mempty = toLM mempty
    LMonoid f `mappend` LMonoid g = LMonoid (g . f)

Categories

haskell - Explicit Purely-Functional Data-Structure For Difference Lists

haskell - Explicit Purely-Functional Data-Structure For Difference Lists

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags