Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
125 views
in Technique[技术] by (71.8m points)

javascript - How to store random integers from the set of 32-bit integers in a compact data structure to rapidly check membership?

I am thinking about how to organize/allocate memory and that led me to this question, which I have distilled to its essence, so it may seem out of the blue, but the memory question is too complicated and confusing and distracting, so I am not asking it here. I tried asking it earlier and got no response. So here it goes.

Say you have a system where you want to check if you have some integer exists in it as fast as possible, and you want to be able to add and remove integers as fast as possible too. That is, it's a key-store essentially, not even a key-value-store. It only stores integer keys, and I guess theoretically the value would conceptually be a true boolean, but I don't think it needs to be there necessarily.

One solution is to use sparse arrays. For example, this array would return true in O(1) time for 3, 5, and 9.

[-, -, 3, -, 5, -, -, -, 9, -, -, ...]

You just specify the integer - 1 and get back if it exists at that position in the array. But the problem with this is twofold. First, it takes up a lot of wasted memory with all these null values. And secondly, for my project, I have the added constraint that arrays can only exist at sizes 1, 2, 4, 8, 16, and 32 items in length. So I am stuck using short arrays. Which means I'm left with trees of some sort most likely.

So these are the constraints:

  1. Positive 32-bit integers.
  2. Compact data structure (minimal wasted space).
  3. Fast lookup times.
  4. Fast deletion/insertion times.
  5. Array sizes in the data structure can only be sized at 1, 2, 4, 8, 16, or 32 items.

One approach I am thinking that might work that fits these constraints is a sort of tree, like a B+tree. But I am not super familiar with these yet, still playing with them. I think it would be like a key-only B+tree (no values, and hence no leaves). But not entirely sure what this would look like (in JavaScript, if I had to pick a language).

Another approach I thought of is a binary trie. But this would take up to 32 steps per integer to add and check membership, which seems like we could potentially optimize somehow.

What is the main approach to storing sequential 32-bit integers in some data structure for rapidly checking membership, doing insertions, and doing removals? Perhaps this can learn from IP lookup tables or things of that nature.

I am using this for a custom programming language, though seeing an implementation in JavaScript would be helpful. If it's just the name of the data structure you're providing, please explain how it works or implement it in JavaScript without resorting to JavaScript built-ins which may already solve the problem.

question from:https://stackoverflow.com/questions/65645449/how-to-store-random-integers-from-the-set-of-32-bit-integers-in-a-compact-data-s

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

You wrote:

One approach I am thinking that might work that fits these constraints is a sort of tree, like a B+tree. [...] I think it would be like a key-only B+tree (no values, and hence no leaves). But not entirely sure what this would look like (in JavaScript, if I had to pick a language).

Indeed, a key-only B(+)-tree would be a possible solution. It solves the problem of the wasted space you get from a sparse array solution. It is also a solution that works well when the data type happens to be string or float instead of number, which would be a problem for the sparse array solution.

The code I post below is similar to another B+ tree related answer I posted earlier on another question of yours, but which was not a search tree.

The idea is to use a B+ tree, taking into account the extra requirement about array sizes that are powers of 2. The actual data keys are found in the bottom level of the tree. The internal nodes repeat the smallest key that its subtree contains. This can be used for deciding which child to choose when walking from the root down the tree, in search for a bottom-level location where a key should be found or should be inserted.

The tree is kept compact by distributing children/keys to neighbors when a node gets full, only splitting a node when there is no room in the neighbors either. When a key is deleted, the affected node will merge with a neighbor when possible. Each node (except the root) is guaranteed to have at least half the use of the maximum (32) node capacity -- either for storing children or keys (at the bottom level).

Keys are located starting the search at the root. A child is selected using binary search so that the key of that child is the greatest key that is less or equal to the key were are locating. If the key is not found at the bottom level, we have at least the location where it should be, and it could be inserted there.

Implementation

class Node {
    constructor(capacity) {
        // Mimic fixed-size array (avoid accidentally growing it)
        this.children = Object.seal(Array(capacity).fill(null));
        this.childCount = 0; // Number of used slots in children array
        this.key = null; // Property for supporting a search
        // Maintain back-link to parent.
        this.parent = null;
        // Per level in the tree, maintain a doubly linked list
        this.prev = this.next = null;
    }
    setCapacity(capacity) {
        if (capacity < 1) return;
        // Here we make a new array, and copy the data into it
        let children = Object.seal(Array(capacity).fill(null));
        for (let i = 0; i < this.childCount; i++) children[i] = this.children[i];
        this.children = children;
    }
    isLeaf() {
        return !(this.children[0] instanceof Node);
    }
    index() {
        return this.parent.children.indexOf(this);
    }
    updateKey() {
        this.key = this.isLeaf() ? this.children[0] : this.children[0].key;
        for (let node = this.parent; node; node = node.parent) {
            node.key = node.children[0].key;
        }
    }
    wipe(start, end) {
        this.children.copyWithin(start, end, this.childCount);
        for (let i = this.childCount - end + start; i < this.childCount; i++) {
            this.children[i] = null;
        }
        this.childCount -= end - start;
        // Reduce allocated size if possible
        if (this.childCount * 2 <= this.children.length) this.setCapacity(this.children.length / 2);
        // Update key if first item changed
        if (start === 0) this.updateKey();
    }
    moveFrom(neighbor, target, start, count=1) {
        // Note: `start` can have two meanings:
        //   if neighbor is null, it is the key/Node to move to the target
        //   if neighbor is a Node, it is the index from where keys(s)/Node(s) have to be moved to the target
        // Make room in target node
        if (this.childCount + count > this.children.length) this.setCapacity(this.children.length * 2);
        this.children.copyWithin(target + count, target, Math.max(target + count, this.childCount));
        this.childCount += count;
        if (neighbor !== null) {
            // Copy the children
            for (let i = 0; i < count; i++) {
                this.children[target + i] = neighbor.children[start + i];
            }
            // Remove the original references
            neighbor.wipe(start, start + count);
        } else {
            this.children[target] = start; // start is key to insert
        }
        // Set parent link(s)
        if (!this.isLeaf()) {
            for (let i = 0; i < count; i++) {
                this.children[target + i].parent = this;
            }
        }
        // Update key if first item changed
        if (target === 0) this.updateKey();
    }
    moveToNext(count) {
        this.next.moveFrom(this, 0, this.childCount - count, count);
    }
    moveFromNext(count) {
        this.moveFrom(this.next, this.childCount, 0, count);
    }
    basicRemove(index) {
        if (!this.isLeaf()) {
            // Take node out of the level's linked list
            let prev = this.children[index].prev;
            let next = this.children[index].next;
            if (prev) prev.next = next;
            if (next) next.prev = prev;
        }
        this.wipe(index, index + 1);
    }
    basicInsert(index, value) { // value can be a key or a Node
        this.moveFrom(null, index, value);
        if (value instanceof Node) {
            // Insert node in the level's linked list
            if (index > 0) {
                value.prev = this.children[index-1];
                value.next = value.prev.next;
            } else if (this.childCount > 1) {
                value.next = this.children[1];
                value.prev = value.next.prev;
            }
            if (value.prev) value.prev.next = value;
            if (value.next) value.next.prev = value;
        }
    }
    pairWithSmallest() {            
        return this.prev && (!this.next || this.next.childCount > this.prev.childCount)
            ? [this.prev, this] : [this, this.next];
    }
    toString() {
        return "[" + this.children.map(v => v??"-").join() + "]";
    }
}

class Tree {
    constructor(nodeCapacity=32) {
        this.nodeCapacity = nodeCapacity;
        this.root = new Node(1);
        this.first = this.root; // Head of doubly linked list at bottom level
    }
    locate(key) {
        let node = this.root;
        while (!node.isLeaf()) {
            // Binary search among keys
            let low = 0;
            let high = node.childCount;
            while (low < high) {
                let index = (low + high) >> 1;
                if (key >= node.children[index].key) {
                    low = index + 1;
                } else {
                    high = index;
                }
            }
            if (low) low--;
            node = node.children[low];
        }
        // Binary search in leaf
        let low = 0;
        let high = node.childCount;
        while (low < high) {
            let index = (low + high) >> 1;
            if (key > node.children[index]) {
                low = index + 1;
            } else {
                high = index;
            }
        }
        return [node, low];
    }
    has(key) {
        let [node, index] = this.locate(key);
        return index < node.childCount && node.children[index] === key;
    }
    remove(key) {
        let [node, index] = this.locate(key);
        if (index >= node.childCount || node.children[index] !== key) return; // not found
        while (true) {
            node.basicRemove(index);

            // Exit when node's fill ratio is fine
            if (!node.parent || node.childCount * 2 > this.nodeCapacity) return;
            // Node has potentially too few children, we should either merge or redistribute
            
            let [left, right] = node.pairWithSmallest();
            
            if (!left || !right) { // A node with no siblings? Must become the root!
                this.root = node;
                node.parent = null;
                return;
            }
            let sumCount = left.childCount + right.childCount;
            let childCount = sumCount >> 1;
            
            // Check whether to merge or to redistribute
            if (sumCount > this.nodeCapacity) { // redistribute
                // Move some data from the bigger to the smaller node
                let shift = childCount - node.childCount;
                if (!shift) { // Boundary case: when a redistribution would bring no improvement
                    console.assert(node.childCount * 2 === this.nodeCapacity && sumCount === this.nodeCapacity + 1);
                    return;
                }
                if (node === left) { // move some children from right to left
                    left.moveFromNext(shift);
                } else { // move some children from left to right
                    left.moveToNext(shift);
                }
                return;
            }
            
            // Merge:
            // Move all data from the right to the left
            left.moveFromNext(right.childCount);
            // Prepare to delete right node
            node = right.parent;
            index = right.index();
        }
    }
    insert(value) {  // value is a key
        let [node, index] = this.locate(value);
        if (index < node.childCount && node[index] === value) return; // already present
        while (node.childCount === this.nodeCapacity) { // No room here
            if (index === 0 && node.prev && node.prev.childCount < this.nodeCapacity) {
                return node.prev.basicInsert(node.prev.childCount, value);
            }
            // Check whether we can redistribute (to avoid a split)
            if (node !== this.root) {
                let [left, right] = node.pairWithSmallest();
                let joinedIndex = left === node ? index : left.childCount + index;
                let sumCount = left.childCount + right.childCount + 1;
                if (sumCount <= 2 * this.nodeCapacity) { // redistribute
                    let childCount = sumCount >> 1;
                    if (node === right) { // redistribute to the left
                        let insertInLeft = joinedIndex < childCount;
                        left.moveFromNext(childCount - left.childCount - +insertInLeft);
                    } else { // redistribute to the right
                        let insertInRight = index >= sumCount - childCount;
                        left.moveToNext(childCount - right.childCount - +insertInRight);
                    }
                    if (joinedIndex > left.childCount || 
                            joinedIndex === left.childCount && left.childCount > right.childCount) {
                        right.basicInsert(joinedIndex - left.childCount, value);
                    } else {
                        left.basicInsert(joinedIndex, value);
                    }
                    return;
                }
            }
            // Cannot redistribute: split node
            let childCount = node.childCount >&g

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...