[SERVER-78126] For specific kinds of input, mongo::Value() always hashes to the same result on big-endian platforms Created: 15/Jun/23 Updated: 29/Oct/23 Resolved: 22/Jun/23 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | 6.0.6, 6.3.1, 4.4.22, 5.0.18, 7.0.0-rc3 |
| Fix Version/s: | 7.1.0-rc0, 5.0.19, 4.4.23, 7.0.0-rc6, 6.0.8 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | David Storch | Assignee: | David Storch |
| Resolution: | Fixed | Votes: | 1 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||||||||||||||
| Backport Requested: |
v7.0, v6.0, v5.0, v4.4
|
||||||||||||||||||||||||||||||||
| Steps To Reproduce: | The explain results printed by this script will show that this query is much slower on s390x than on little-endian platforms. The effect gets more severe as the number of documents increases.
|
||||||||||||||||||||||||||||||||
| Sprint: | QE 2023-06-26 | ||||||||||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||||||||||
| Case: | (copied to CRM) | ||||||||||||||||||||||||||||||||
| Description |
|
On big-endian platforms, there are situations in which large classes of unequal mongo::Value objects can hash to the exact same hash code. The consequences can be severe for the performance of the aggregation framework. For example, $group is implemented as a hash aggregation, meaning that the Value objects on which we are aggregating serve as keys in a hash table. If all such keys have the same hash, then during the $group build phase each insertion into the hash table will degrade from O(1) to O(n). The build phase as a whole therefore degrades from O(n) to O(n^2). Presumably there could be other code paths where we use Value objects as keys in a hash table which could also be subject to a performance problem. The specifics of the bug are a bit subtle, so I will explain it by example. Imagine we have two values which represent arrays of strings:
The hashes should be different because the first elements of the two arrays differ. The computation of the hash involves the following steps:
Imagine that we are hashing ["a", "x"] on a big-endian platform and we have already performed steps 1 and 2. We have some 64-bit hash at this point which we can represent as 4 high-order bytes followed by 4 low-order bytes: 0xhhhhhhhhllllllll. The problem lies with how we call into MurmurHash here. This uses a uint32_t as the seed, which means that the seed becomes 0xllllllll. We pass a pointer to the 64-bit hash (&seed) as the location into which the 32-bit result should be written. On a big-endian machine, this means that we write the resulting 32-bit hash into the 4 high-order bytes. Let's use 0xgggggggg to notate the resulting hash code. The means that the resulting hash is 0xggggggggllllllll. The 32-low order bits do not change at all and remain 0xllllllll. Here's where things get even more interesting. Step 4 is to hash_combine() the second String type tag. The implementation of boost::hash_combine() is such that the 32 low-order bits of the result are a function of the low-order bits of the incoming seed. This means that the low order of the result are some function of 0xllllllll which we can notate as f(0xllllllll). The final step is to MurmurHash the second string. As before, f(0xllllllll) becomes the seed to the hash function and the resulting 32-bit hash is written into the high-order bits. The critical observation here is that the value of f(0xllllllll) does not depend at all on the contents of the first string ("a")! It is only a function of the initial seed and type tags. The means that ["a", "x"] and ["b", "x"] will have the same hash. Indeed, the hash does not depend whatsoever on the value of the first string in the array! The short version of all this is that we use MurmurHash in such a way that on big-endian platforms the seed is the low-order 32-bits and the resulting hash is written into the high-order 32-bits. If you attempt to compose multiple such calls to MurmurHash, this means that the resulting hash does not depend at all on the first of the two hashed values. This leads to tons of hash collisions. I've provided a repro script in the "steps to reproduce" section which demonstrates how this can cause a performance problem for a $group query on a big-endian platform. I've verified that the problem exists on HEAD of the v4.4 branch. I still need to check that the bug exists on the master branch, but I think it does given that the code hasn't changed much. |
| Comments |
| Comment by Githook User [ 23/Jun/23 ] |
|
Author: {'name': 'David Storch', 'email': 'david.storch@mongodb.com', 'username': 'dstorch'}Message: This is a partial backport of commit 1e54a90b42. It (cherry picked from commit 5a4520f0ac6139231ff71fa64d09a304c0e64cc5) |
| Comment by Githook User [ 23/Jun/23 ] |
|
Author: {'name': 'David Storch', 'email': 'david.storch@mongodb.com', 'username': 'dstorch'}Message: This is a partial backport of commit 1e54a90b42. It (cherry picked from commit 5a4520f0ac6139231ff71fa64d09a304c0e64cc5) |
| Comment by David Storch [ 23/Jun/23 ] |
|
I'm marking this as fixed in 7.0.0-rc6 even though the commit for this ticket has not been backported to the 7.0 branch. The reason is that we have opted to instead backport the changes from As of this writing, the fix for the issue is now present in the master, 7.0, and 6.0 branches. The backports for inclusion in 5.0.19 and 4.4.23 should follow soon. |
| Comment by Githook User [ 23/Jun/23 ] |
|
Author: {'name': 'David Storch', 'email': 'david.storch@mongodb.com', 'username': 'dstorch'}Message: This is a partial backport of commit 1e54a90b42. It |
| Comment by Githook User [ 22/Jun/23 ] |
|
Author: {'name': 'David Storch', 'email': 'david.storch@mongodb.com', 'username': 'dstorch'}Message: Also introduces a new, safer interface for MurmurHash3 |
| Comment by David Storch [ 15/Jun/23 ] |
|
I confirmed that the bug affects a recent build of the master branch. I've updated the "Affects Versions" field of this ticket accordingly. |
| Comment by David Storch [ 15/Jun/23 ] |
|
We are planning to schedule related ticket My proposal though, unless folks think otherwise, would be to introduce a set of helper functions in the mongo code base through which we call MurmurHash3 that is easier to use correctly. And then change all the callers to use this interface rather than call MurmurHash3 directly. Then we can change some or all of these call sites to use Abseil as followup work (e.g. |
| Comment by Andy Schwerin [ 15/Jun/23 ] |
|
We should probably look at all the callers of MurmurHash3_x86_32 and replace them with something safer. Maybe the absl hasher if we can switch away from MurmurHash?
|