When sharding data by a non-integer primary key or a composite key (usually concatenated in to a string), developers often rely on Java's built-in String.hashCode() method to derive a hash value, which is then used for modulo-based sharding. While convenient, this approach can produce negative hash codes, leading to invalid shard numbers. Understanding why this happens and how to mitigate it is essential.
Java's String.hashCode() calculates a hash using the following formula:
s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]
Where s[i] is the i-th character of the string, n is the string length. The computation uses int arithmetic, meaning results wrap around on overflow. The implementation is:
public int hashCode() {
int h = hash;
if (h == 0 && value.length > 0) {
char val[] = value;
for (int i = 0; i < value.length; i++) {
h = 31 * h + val[i];
}
hash = h;
}
return h;
}
The hash is an int, so its range is Integer.MIN_VALUE (-2147483648) to Integer.MAX_VALUE (2147483647). When the string is long enough, the repeated multiplication by 31 and addition of character values can exceed Integer.MAX_VALUE, causing integer overflow and a negative result.
Here are some illustrative examples:
String hashStr0 = "35953305172933/";
System.out.println(hashStr0.hashCode()); // 2147483647 Integer.MAX_VALUE
System.out.println(Math.abs(hashStr0.hashCode())); // 2147483647 Integer.MAX_VALUE
System.out.println("-------------------");
String hashStr = "359533051729330";
System.out.println(hashStr.hashCode()); // -2147483648 Integer.MIN_VALUE
System.out.println(Math.abs(hashStr.hashCode())); // -2147483648 Integer.MIN_VALUE
System.out.println("-------------------");
String hashStr2 = "56800004874";
System.out.println(hashStr2.hashCode()); // -2082984168
System.out.println(Math.abs(hashStr2.hashCode())); // 2082984168
System.out.println("-------------------");
String hashStr3 = "";
System.out.println(hashStr3.hashCode()); // 0
System.out.println(Math.abs(hashStr3.hashCode())); // 0
Notice that "359533051729330" yields Integer.MIN_VALUE. Calling Math.abs() on Integer.MIN_VALUE returns Integer.MIN_VALUE (a negative number) because the absolute value overflows int range. This makes Math.abs() unreliable for ensuring non-negative results in edge cases.
To safely obtain a non-negative hash, use a bitwise AND with Integer.MAX_VALUE. This clears the sign bit while preserving the rest of the value:
int safeHash = str.hashCode() & Integer.MAX_VALUE;
Since Integer.MAX_VALUE is 0x7FFFFFFF, and its most significant bit is 0, the result of & is always non-negative (in the range 0 to Integer.MAX_VALUE). This approach handles all strings, including those whose hash code is Integer.MIN_VALUE.