arrays - Why setting HashTable's length to a Prime Number is a good practice?

Question

Welcome To Ask or Share your Answers For Others

arrays - Why setting HashTable's length to a Prime Number is a good practice?

asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

arrays - Why setting HashTable's length to a Prime Number is a good practice?

I was going through Eric Lippert's latest Blog post for Guidelines and rules for GetHashCode when i hit this para:

We could be even more clever here; just as a List resizes itself when it gets full, the bucket set could resize itself as well, to ensure that the average bucket length stays low. Also, for technical reasons it is often a good idea to make the bucket set length a prime number, rather than 100. There are plenty of improvements we could make to this hash table. But this quick sketch of a naive implementation of a hash table will do for now. I want to keep it simple.

So looks like i'm missing something. Why is it a good practice to set it to a prime number?.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2021-10-23T18:37:12+0000

You can find people that suggest the two opposite ends of the spectrum. On the one side, choosing a prime number for the size of the hash table will reduce the chances of collisions, even if the hash function is not too effective distributing the results. Note that if (in the simplest example to argue about) a power of 2 size is decided, only the lower bits affect the bucket, while for a prime number most bits in the result of the hash will be used.

On the other hand, you can gain more by choosing a better hash function, or even rehashing he result of the hash function by applying some bit operations, and using a power of 2 hash size to speed up calculations.

As an example from real life, Java HashTable were initially implemented by using prime (or almost prime sizes), but from Java 1.4 on, the design was changed to use power of two number of buckets and added a second fast hash function applied to the result of the initial hash. An interesting article commenting that change can be found here.

So basically:

a prime number helps dispersing the inputs across the different buckets even in the event of not-so-good hash functions.
a similar effect can be achieved by post processing the result of the hash function, and using a power of 2 size to speedup the modulo operation (bit mask) and compensate for the post processing.

Categories

arrays - Why setting HashTable's length to a Prime Number is a good practice?

arrays - Why setting HashTable's length to a Prime Number is a good practice?

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags