data structures - Implement an immutable deque as a balanced binary tree?

Question

Welcome To Ask or Share your Answers For Others

data structures - Implement an immutable deque as a balanced binary tree?

asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

data structures - Implement an immutable deque as a balanced binary tree?

I've been thinking for a while about how to go about implementing a deque (that is, a double-ended queue) as an immutable data structure.

There seem to be different ways of doing this. AFAIK, immutable data structures are generally hierarchical, so that major parts of it can be reused after modifying operations such as the insertion or removal of an item.

Eric Lippert has two articles on his blog about this topic, along with sample implementations in C#.

Both of his implementations strike me as more elaborate than is actually necessary. Couldn't deques simply be implemented as binary trees, where elements can only be inserted or removed on the very "left" (the front) and on the very "right" (the back) of the tree?

                               o
                              / 
                             …   …
                            /     
                           …       …
                          /      / 
              front -->  L   …   …   R  <-- back

Additionally, the tree would be kept reasonably balanced with rotations:

right rotations upon insertion at the front or upon removal from the back, and
left rotations upon removal from the front or insertion at the back.

Eric Lippert is, in my opinion, a very smart person whom I deeply respect, yet he apparently didn't consider this approach. Thus I wonder, was it for a good reason? Is my suggested way of implementing deques na?ve?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2021-10-23T17:43:15+0000

As Daniel noted, implementing immutable deques with well-known balanced search trees like AVL or red-black trees gives Θ(lg n) worst case complexity. Some of the implementations Lippert discusses may seem elaborate at first glance, but there are many immutable deques with o(lg n) worst or average or amortized complexity that are built from balanced trees along with two simple ideas:

Reverse the Spines

To perform deque operations on a traditional balanced search tree, we need access to the ends, but we only have access to the center. To get to the left end, we must navigate left child pointers until we finally reach a dead end. It would be preferable to have a pointer to the left and right ends without all that navigation effort. In fact, we really don't need access to the root node very often. Let's store a balanced search tree so that access to the ends is O(1).

Here is an example in C of how you might normally store an AVL tree:
```
struct AVLTree {
  const char * value;
  int height;
  struct AVLTree * leftChild;
  struct AVLTree * rightChild;
};
```
To set up the tree so that we can start at the edges and move towards the root, we change the tree and store all of the pointers along the paths from the root to the left and rightmost children in reverse. (These paths are called the left and right spine, respectively). Just like reversing a singly-linked list, the last element becomes the first, so the leftmost child is now easily accessible.

This is a little tricky to understand. To help explain it, imagine that you only did this for the left spine:
```
struct LeftSpine {
  const char * value;
  int height;
  struct AVLTree * rightChild;
  struct LeftSpine * parent;
};
```
In some sense, the leftmost child is now the "root" of the tree. If you drew the tree this way, it would look very strange, but if you simply take your normal drawing of a tree and reverse all of the arrows on the left spine, the meaning of the LeftSpine struct should become clearer. Access to the left side of the tree is now immediate. The same can be done for the right spine:
```
struct RightSpine {
  double value;
  int height;
  struct AVLTree * leftChild;
  struct RightSpine * parent;
};
```
If you store both a left and a right spine as well as the center element, you have immediate access to both ends. Inserting and deleting may still be Ω(lg n), because rebalancing operations may require traversing the entire left or right spine, but simply viewing to the left and rightmost elements is now O(1).

An example of this strategy is used to make purely functional treaps with implementations in SML and Java (more documentation). This is also a key idea in several other immutable deques with o(lg n) performance.
Bound the Rabalancing Work

As noted above, inserting at the left or rightmost end of an AVL tree can require Ω(lg n) time for traversing the spine. Here is an example of an AVL tree that demonstrates this:

A full binary tree is defined by induction as:
- A full binary tree of height 0 is an empty node.
- A full binary tree of height n+1 has a root node with full binary trees of height n as children.
Pushing an element onto the left of a full binary tree will necessarily increase the maximum height of the tree. Since the AVL trees above store that information in each node, and since every tree along the left spine of a full binary tree is also a full binary tree, pushing an element onto the left of an AVL deque that happens to be a full binary tree will require incrementing Ω(lg n) height values along the left spine.

(Two notes on this: (a) You can store AVL trees without keeping the height in the node; instead you keep only balance information (left-taller, right-taller, or even). This doesn't change the performance of the above example. (b) In AVL trees, you might need to do not only Ω(lg n) balance or height information updates, but Ω(lg n) rebalancing operations. I don't recall the details of this, and it may be only on deletions, rather than insertions.)

In order to achieve o(lg n) deque operations, we need to limit this work. Immutable deques represented by balanced trees usually use at least one of the following strategies:
- Anticipate where rebalancing will be needed. If you are using a tree that requires o(lg n) rebalancing but you know where that rebalancing will be needed and you can get there quickly enough, you can perform your deque operations in o(lg n) time. Deques that use this as a strategy will store not just two pointers into the deque (the ends of the left and right spines, as discussed above), but some small number of jump pointers to places higher along the spines. Deque operations can then access the roots of the trees pointed to by the jump pointers in O(1) time. If o(lg n) jump pointers are maintained for all of the places where rebalancing (or changing node information) will be needed, deque operations can take o(lg n) time.
  
  (Of course, this makes the tree actually a dag, since the trees on the spines pointed to by jump pointers are also pointed to by their children on the spine. Immutable data structures don't usually get along with non-tree graphs, since replacing a node pointed to by more than one other node requires replacing all of the other nodes that point to it. I have seen this fixed by just eliminating the non-jump pointers, turning the dag back into a tree. One can then store a singly-linked list with jump pointers as a list of lists. Each subordinate list contains all of the nodes between the head of that list and its jump pointer. This requires some care to deal with partially overlapping jump pointers, and a full explanation is probably not appropriate for this aside.)
  
  This is one of the tricks used by Tsakalidis in his paper "AVL Trees for localized search" to allow O(1) deque operations on AVL trees with a relaxed balance condition. It is also the main idea used by Kaplan and Tarjan in their paper "Purely functional, real-time deques with catenation" and a later refinement of that by Mihaesau and Tarjan. Munro et al.'s "Deterministic Skip Lists" also deserves a mention here, though translating skip lists to an immutable setting by using trees sometimes changes the properties that allow such efficient modification near the ends. For examples of the translation, see Messeguer's "Skip trees, an alternative data structure to Skip lists in a concurrent approach", Dean and Jones's "Exploring the duality between skip lists and binary search trees", and Lamoureux and Nickerson's "On the Equivalence of B-trees and deterministic skip lists".
- Do the work in bulk. In the full binary tree example above, no rebalancing is needed on a push, but Ω(lg n) nodes need to have their height or balance information updated. Instead of actually doing the incrementation, you could simply mark the spine at the ends as needing incrementation.
  
  One way to understand this process is by analogy to binary numbers. (2^n)-1 is represented in binary by a string of n 1's. When adding 1 to this number, you need to change all of the 1's to 0's and then add a 1 at the end. The following Haskell encodes binary numbers as non-empty strings of bits, least significant first.
```
data Bit = Zero | One

type Binary = (Bit,[Bit])

incr :: Binary -> Binary
incr (Zero,x) = (One,x)
incr (One,[]) = (Zero,[One])
incr (One,(x:xs)) = 
    let (y,ys) = incr (x,xs)
    in (Zero,y:ys)
```
  incr is a recursive function, and for numbers of the form (One,replicate k One), incr calls itself Ω(k) times.
  
  Instead, we might represent groups of equal bits by only the number of bits in the group. Neighboring bits or groups of bits are combined into one group if they are equal (in value, not in number). We can increment in O(1) time:
```
data Bits = Zeros Int | Ones Int

type SegmentedBinary = (Bits,[Bits])

segIncr :: SegmentedBinary -> SegmentedBinary
segIncr (Zeros 1,[]) = (Ones 1,[])
segIncr (Zeros 1,(Ones n:rest)) = (Ones (n+1),rest)
segIncr (Zeros n,rest) = (Ones 1,Zeros (n-1):rest)
segIncr (Ones n,[]) = (Zeros n,[Ones 1])
segIncr (Ones n,(Zeros 1:Ones m:rest)) = (Zeros n,Ones (m+1):rest)
segIncr (Ones n,(Zeros p:rest)) = (Zeros n,Ones 1:Zeros (p-1):rest)
```
  Since segIncr is not recursive and doesn't call any functions other than plus and minus on Ints, you can see it takes O(1) time.
  
  Some of the deques mentioned in the section above entitled "Anticipate where rebalancing will be needed" actually use a different numerically-inspired technique called "redundant number systems" to limit the rebalancing work to O(1) and locate it quickly. Redundant numerical representations are fascinating, but possibly too far afield for this discussion. Elmasry et al.'s "Strictly-regular number system and data structures" is not a bad place to start reading about that topic. Hinze's "Bootstrapping one-sided flexible arrays" may also be useful.
  
  In "Making data structures persistent", Driscoll et al. describe lazy recoloring, which they attribute to Tsakalidis. They apply it to red-black trees, which can be rebalanced after insertion or deletion with O(1) rotations (but Ω(lg n) recolorings) (see Tarjan's "Updataing a balanced tree in O(1) rotations"). The core of the idea is to mark a large path of nodes that need to be recolored but not rotated. A similar idea is used on AVL trees in the older versions of Brown & Tarjan's "A fast merging algorithm". (Newer versions of the same work use 2-3 trees; I have not read the newer ones and I do not know if they use any techniques like lazy recoloring.)
- Randomize. Treaps, mentioned above, can be implemented in a functional setting so that they perform deque operations on O(1) time on average. Since deques do not need to inspect their elements, this average is not susceptible to malicious input degrading performance, unlike simple (no rebalancing) binary search trees, which are fast on average input. Treaps use an independent source of random bits instead of relying on randomness from t

Categories

data structures - Implement an immutable deque as a balanced binary tree?

data structures - Implement an immutable deque as a balanced binary tree?

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags