[SERVER-22839] Replace the hashing function for StringData Created: 24/Feb/16 Updated: 26/Aug/19 Resolved: 26/Aug/19 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Internal Code |
| Affects Version/s: | 3.3.2 |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Mira Carey | Assignee: | Billy Donahue |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Sprint: | Dev Tools 2019-09-09 |
| Participants: |
| Description |
|
StringData uses murmurhash for it's hashing function. On 32 bit systems it returns a little endian decoded uint32_t of the 32 bit hash output. On 64 bit systems it returns a little endian decoded uint64_t of the 128 bit hash output. We should pick something better than murmur, cityhash perhaps (then we could pick something better and stop generating 128 bits of hash only to throw half of it away) |
| Comments |
| Comment by Billy Donahue [ 26/Aug/19 ] |
|
This is an old ticket and it looks like StringData doesn't hash itself anymore. Maybe an obsolete concern? Reopen if I'm missing something here. |
| Comment by Adam Midvidy [ 24/Feb/16 ] |
|
wiredtiger already has a cityhash implementation, https://github.com/mongodb/mongo/blob/f950ac39aeea87cd64f5832b2d5fab451f220b58/src/third_party/wiredtiger/src/support/hash_city.c |
| Comment by Mira Carey [ 24/Feb/16 ] |
|
redbeard0531, you'd mentioned wanting to consider other hashes. Did you have any workloads in mind where the hashing function might make a difference? |
| Comment by Andy Schwerin [ 24/Feb/16 ] |
|
I don't really see a problem with having extra intermediate state while computing the hash, and I don't really care about the endianness of the output hash except as we have to clean up to work on big-endian systems. There's nothing wrong per se with switching hash functions, but are we going to receive observable benefits? |