Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Fixed
Priority: Major - P3
Fix Version/s: 7.1.0-rc0, 5.0.19, 4.4.23, 7.0.0-rc6, 6.0.8
Affects Version/s: 6.0.6, 6.3.1, 4.4.22, 5.0.18, 7.0.0-rc3
Component/s: None
Labels:
None

Backwards Compatibility:
Fully Compatible
Operating System:
ALL
Backport Requested:

v7.0, v6.0, v5.0, v4.4
Steps To Reproduce:
Hide

The explain results printed by this script will show that this query is much slower on s390x than on little-endian platforms. The effect gets more severe as the number of documents increases.

(function() { "use strict"; const RANDOM_STR_LEN = 50; const coll = db.c; coll.drop(); function generateRandomStr() { // Draw from an alphabet of lowercase letters. const characters = "abcdefghijklmnopqrstuvwxyz"; let result = ""; for (let i = 0; i < RANDOM_STR_LEN; i++) { const randomIndex = Math.floor(Math.random() * characters.length); result += characters.charAt(randomIndex); } return result; } // Generate documents where "a" is a random string and "b" is always the same string. for (let i = 0; i < 10000; ++i) { const doc = {_id: i, a: generateRandomStr(), b: "constantString"} assert.commandWorked(coll.insert(doc)); } let pipeline = [{$group: {_id: {a: "$a", b: "$b"}}}]; let explain = coll.explain("executionStats").aggregate(pipeline); // The explain output shows that the $group stage is super slow on big-endian platforms. printjson(explain); }());
Show
The explain results printed by this script will show that this query is much slower on s390x than on little-endian platforms. The effect gets more severe as the number of documents increases. (function() { "use strict" ; const RANDOM_STR_LEN = 50; const coll = db.c; coll.drop(); function generateRandomStr() { // Draw from an alphabet of lowercase letters. const characters = "abcdefghijklmnopqrstuvwxyz" ; let result = ""; for (let i = 0; i < RANDOM_STR_LEN; i++) { const randomIndex = Math .floor( Math .random() * characters.length); result += characters.charAt(randomIndex); } return result; } // Generate documents where "a" is a random string and "b" is always the same string. for (let i = 0; i < 10000; ++i) { const doc = {_id: i, a: generateRandomStr(), b: "constantString" } assert .commandWorked(coll.insert(doc)); } let pipeline = [{$group: {_id: {a: "$a" , b: "$b" }}}]; let explain = coll.explain( "executionStats" ).aggregate(pipeline); // The explain output shows that the $group stage is super slow on big-endian platforms. printjson(explain); }());
Sprint:
QE 2023-06-26
Case:
Confidence Status:
None
Work Order:
3
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

On big-endian platforms, there are situations in which large classes of unequal mongo::Value objects can hash to the exact same hash code. The consequences can be severe for the performance of the aggregation framework. For example, $group is implemented as a hash aggregation, meaning that the Value objects on which we are aggregating serve as keys in a hash table. If all such keys have the same hash, then during the $group build phase each insertion into the hash table will degrade from O(1) to O(n). The build phase as a whole therefore degrades from O(n) to O(n^2). Presumably there could be other code paths where we use Value objects as keys in a hash table which could also be subject to a performance problem.

The specifics of the bug are a bit subtle, so I will explain it by example. Imagine we have two values which represent arrays of strings:

Value 1: ["a", "x"]
Value 2: ["b", "x"]

The hashes should be different because the first elements of the two arrays differ. The computation of the hash involves the following steps:

boost::hash_combine() the Array type tag with the initial seed.
boost::hash_combine() the first String type tag.
Hash the first string using MurmurHash3.
boost::hash_combine() the second String type tag.
Hash the second string using MurmurHash3.

Imagine that we are hashing ["a", "x"] on a big-endian platform and we have already performed steps 1 and 2. We have some 64-bit hash at this point which we can represent as 4 high-order bytes followed by 4 low-order bytes: 0xhhhhhhhhllllllll. The problem lies with how we call into MurmurHash here. This uses a uint32_t as the seed, which means that the seed becomes 0xllllllll. We pass a pointer to the 64-bit hash (&seed) as the location into which the 32-bit result should be written. On a big-endian machine, this means that we write the resulting 32-bit hash into the 4 high-order bytes. Let's use 0xgggggggg to notate the resulting hash code. The means that the resulting hash is 0xggggggggllllllll. The 32-low order bits do not change at all and remain 0xllllllll.

Here's where things get even more interesting. Step 4 is to hash_combine() the second String type tag. The implementation of boost::hash_combine() is such that the 32 low-order bits of the result are a function of the low-order bits of the incoming seed. This means that the low order of the result are some function of 0xllllllll which we can notate as f(0xllllllll). The final step is to MurmurHash the second string. As before, f(0xllllllll) becomes the seed to the hash function and the resulting 32-bit hash is written into the high-order bits. The critical observation here is that the value of f(0xllllllll) does not depend at all on the contents of the first string ("a")! It is only a function of the initial seed and type tags. The means that ["a", "x"] and ["b", "x"] will have the same hash. Indeed, the hash does not depend whatsoever on the value of the first string in the array!

The short version of all this is that we use MurmurHash in such a way that on big-endian platforms the seed is the low-order 32-bits and the resulting hash is written into the high-order 32-bits. If you attempt to compose multiple such calls to MurmurHash, this means that the resulting hash does not depend at all on the first of the two hashed values. This leads to tons of hash collisions.

I've provided a repro script in the "steps to reproduce" section which demonstrates how this can cause a performance problem for a $group query on a big-endian platform. I've verified that the problem exists on HEAD of the v4.4 branch. I still need to check that the bug exists on the master branch, but I think it does given that the code hasn't changed much.

is related to

SERVER-77702 Replace MurmurHash3 with absl::Hash if possible in Value::hash_combine

Closed

SERVER-77703 Replace MurmurHash3 with absl::Hash if possible in oplog_applier_utils.cpp

Closed

Assignee:: David Storch
Reporter:: David Storch
Participants:: Andy Schwerin, David Storch, Githook User
Votes:: 1 Vote for this issue
Watchers:: 23 Start watching this issue

Created:: Jun 15 2023 03:13:44 PM UTC
Updated:: Oct 29 2023 09:20:00 PM UTC
Resolved:: Jun 22 2023 07:50:11 PM UTC
Confidence Status Last Update:: 16/Jun/23 5:36 PM

Details

Description

Attachments

Issue Links

Forms

Activity

People

Dates