c++ - Is using a vector of boolean values slower than a dynamic bitset?

Question

Welcome To Ask or Share your Answers For Others

c++ - Is using a vector of boolean values slower than a dynamic bitset?

1 Answer

深蓝 · Answer 1 · 2021-10-17T02:50:10+0000

A great deal here depends on how many Boolean values you're working with.

Both bitset and vector<bool> normally use a packed representation where a Boolean is stored as only a single bit.

On one hand, that imposes some overhead in the form of bit manipulation to access a single value.

On the other hand, that also means many more of your Booleans will fit in your cache.

If you're using a lot of Booleans (e.g., implementing a sieve of Eratosthenes) fitting more of them in the cache will almost always end up a net gain. The reduction in memory use will gain you a lot more than the bit manipulation loses.

Most of the arguments against std::vector<bool> come back to the fact that it is not a standard container (i.e., it does not meet the requirements for a container). IMO, this is mostly a question of expectations -- since it says vector, many people expect it to be a container (other types of vectors are), and they often react negatively to the fact that vector<bool> isn't a container.

If you're using the vector in a way that really requires it to be a container, then you probably want to use some other combination -- either deque<bool> or vector<char> can work fine. Think before you do that though -- there's a lot of (lousy, IMO) advice that vector<bool> should be avoided in general, with little or no explanation of why it should be avoided at all, or under what circumstances it makes a real difference to you.

Yes, there are situations where something else will work better. If you're in one of those situations, using something else is clearly a good idea. But, be sure you're really in one of those situations first. Anybody who tells you (for example) that "Herb says you should use vector<char>" without a lot of explanation about the tradeoffs involved should not be trusted.

Let's give a real example. Since it was mentioned in the comments, let's consider the Sieve of Eratosthenes:

#include <vector>
#include <iostream>
#include <iterator>
#include <chrono>

unsigned long primes = 0;

template <class bool_t>
unsigned long sieve(unsigned max) {
    std::vector<bool_t> sieve(max, false);
    sieve[0] = sieve[1] = true;

    for (int i = 2; i < max; i++) {
        if (!sieve[i]) {
            ++primes;
            for (int temp = 2 * i; temp < max; temp += i)
                sieve[temp] = true;
        }
    }
    return primes;
}

// Warning: auto return type will fail with older compilers
// Fine with g++ 5.1 and VC++ 2015 though.
//
template <class F>
auto timer(F f, int max) {
    auto start = std::chrono::high_resolution_clock::now();
    primes += f(max);
    auto stop = std::chrono::high_resolution_clock::now();

    return stop - start;
}

int main() {
    using namespace std::chrono;

    unsigned number = 100000000;

    auto using_bool = timer(sieve<bool>, number);
    auto using_char = timer(sieve<char>, number);

    std::cout << "ignore: " << primes << "
";
    std::cout << "Time using bool: " << duration_cast<milliseconds>(using_bool).count() << "
";
    std::cout << "Time using char: " << duration_cast<milliseconds>(using_char).count() << "
";
}

We've used a large enough array that we can expect a large portion of it to occupy main memory. I've also gone to a little pain to ensure that the only thing that changes between one invocation and the other is the use of a vector<char> vs. vector<bool>. Here are some results. First with VC++ 2015:

ignore: 34568730
Time using bool: 2623
Time using char: 3108

...then the time using g++ 5.1:

ignore: 34568730
Time using bool: 2359
Time using char: 3116

Obviously, the vector<bool> wins in both cases--by around 15% with VC++, and over 30% with gcc. Also note that in this case, I've chosen the size to show vector<char> in quite favorable light. If, for example, I reduce number from 100000000 to 10000000, the time differential becomes much larger:

ignore: 3987474
Time using bool: 72
Time using char: 249

Although I haven't done a lot of work to confirm, I'd guess that in this case, the version using vector<bool> is saving enough space that the array fits entirely in the cache, while the vector<char> is large enough to overflow the cache, and involve a great deal of main memory access.

Categories

c++ - Is using a vector of boolean values slower than a dynamic bitset?

c++ - Is using a vector of boolean values slower than a dynamic bitset?

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags