[SERVER-22839] Replace the hashing function for StringData Created: 24/Feb/16  Updated: 26/Aug/19  Resolved: 26/Aug/19

Status: Closed
Project: Core Server
Component/s: Internal Code
Affects Version/s: 3.3.2
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Mira Carey Assignee: Billy Donahue
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Sprint: Dev Tools 2019-09-09
Participants:

 Description   

StringData uses murmurhash for it's hashing function. On 32 bit systems it returns a little endian decoded uint32_t of the 32 bit hash output. On 64 bit systems it returns a little endian decoded uint64_t of the 128 bit hash output.

We should pick something better than murmur, cityhash perhaps (then we could pick something better and stop generating 128 bits of hash only to throw half of it away)



 Comments   
Comment by Billy Donahue [ 26/Aug/19 ]

This is an old ticket and it looks like StringData doesn't hash itself anymore. Maybe an obsolete concern? Reopen if I'm missing something here.

Comment by Adam Midvidy [ 24/Feb/16 ]

wiredtiger already has a cityhash implementation, https://github.com/mongodb/mongo/blob/f950ac39aeea87cd64f5832b2d5fab451f220b58/src/third_party/wiredtiger/src/support/hash_city.c

Comment by Mira Carey [ 24/Feb/16 ]

redbeard0531, you'd mentioned wanting to consider other hashes. Did you have any workloads in mind where the hashing function might make a difference?

Comment by Andy Schwerin [ 24/Feb/16 ]

I don't really see a problem with having extra intermediate state while computing the hash, and I don't really care about the endianness of the output hash except as we have to clean up to work on big-endian systems. There's nothing wrong per se with switching hash functions, but are we going to receive observable benefits?

Generated at Thu Feb 08 04:01:35 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.