I often see or hear of modulus being used as a last step of hashing or after hashing. e.g. h( input ) % N
where h
is the hash function and %
is the modulus operator.
Indeed.
If I am designing a hash table, and want to map a large set of keys to a smaller space of indices for the hash table, doesn't the modulus operator achieve that?
That's precisely the purpose of the modulo operator: to restrict the range of array indexes, so yes.
But you cannot simply use the modulo operator by itself: the modulo operator requires an integer value: you cannot get the "modulo of a string over N
" or "modulo of an object-graph over N
"[1].
Furthermore, if I wanted to randomize the distribution across those locations within the hash table, is the remainder generated by modulus not sufficient?
No, it does not - because the modulo operator doesn't give you pseudorandom output - nor does it have any kind of avalanche effect - which means that similar input values will have similar output hashes, which will result in clustering in your hashtable bins, which will result in subpar performance due to the greatly increased likelihood of hash-collisions (and so requiring slower techniques like linear-probing which defeat the purpose of a hashtable because you lose O(1)
lookup times.
What does the hashing function h
provide on top of the modulus operator?
The domain of h
can be anything, especially non-integer values.
[1] Technically speaking, this is possible if you use the value of the memory address of an object (i.e. an object pointer), but that doesn't work if you have hashtable keys that don't use object identity, such as a stack-allocated object or custom struct
.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…