I know the hashing principal for HashMap in Java, so wanted to know that how the hashing works for the Hive while we bucketing the data in various bucket.
I recently had to dig into some Hive source code to figure this out for myself. Here's what I found:
For an integer field, the hash is just the integer value. For a string, it uses a similar version of Java's String hashCode. When hashing multiple values, the hash is a similar version of Java’s List hashCode.
2.1m questions
2.1m answers
60 comments
57.0k users