[SERVER-56056] [sbe][sharding] Expected 1 chunk skip but saw none in shard3.js Created: 12/Apr/21  Updated: 29/Oct/23  Resolved: 21/Apr/21

Status: Closed
Project: Core Server
Component/s: Querying, Sharding
Affects Version/s: None
Fix Version/s: 5.0.0-rc0

Type: Bug Priority: Major - P3
Reporter: Kyle Suarez Assignee: Justin Seyster
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to SERVER-55092 [SBE] Fix shard filtering for hashed ... Closed
related to SERVER-55010 Enable sharding suite against SBE bui... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Sprint: Query Execution 2021-05-03
Participants:

 Description   

The test asserts that we have 1 chunk skipped in a SHARDING_FILTER stage:

var e = a.find().explain("executionStats").executionStats;
assert.eq(3, e.nReturned, "ex1");
assert.eq(0, e.totalKeysExamined, "ex2");
assert.eq(4, e.totalDocsExamined, "ex3");
 
var chunkSkips = 0;
for (var shard in e.executionStages.shards) {
    var theShard = e.executionStages.shards[shard];
    chunkSkips += getChunkSkips(theShard.executionStages);
}
assert.eq(1, chunkSkips, "ex4"); // <----- fails

In the error, we got no chunk skips at all:

[js_test:shard3] uncaught exception: Error: [1] != [0] are not equal : ex4 :
[js_test:shard3] doassert@src/mongo/shell/assert.js:20:14
[js_test:shard3] assert.eq@src/mongo/shell/assert.js:179:9
[js_test:shard3] @jstests/sharding/shard3.js:91:1
[js_test:shard3] @jstests/sharding/shard3.js:1:2
[js_test:shard3] failed to load: jstests/sharding/shard3.js
[js_test:shard3] exiting with code -3

The analyze plan helper looks for a SHARDING_FILTER stage in the explain output and extracts the chunkSkips field:

/**
 * Get the number of chunk skips for the BSON exec stats tree rooted at 'root'.
 */
function getChunkSkips(root) {
    if (root.stage === "SHARDING_FILTER") {
        return root.chunkSkips;
    } else if ("inputStage" in root) {
        return getChunkSkips(root.inputStage);
    } else if ("inputStages" in root) {
        var skips = 0;
        for (var i = 0; i < root.inputStages.length; i++) {
            skips += getChunkSkips(root.inputStages[0]);
        }
        return skips;
    }
 
    return 0;
}

Also, "shard3.js" is one of those old tests that should improve the error message reported when this assertion does trip.



 Comments   
Comment by Githook User [ 21/Apr/21 ]

Author:

{'name': 'Justin Seyster', 'email': 'justin.seyster@mongodb.com', 'username': 'jseyster'}

Message: SERVER-56056 Correctly extract chunkSkips metric from SBE explain
Branch: master
https://github.com/mongodb/mongo/commit/ccca20406141099cf6f778cd01e633ecd1fd8f11

Comment by Anton Korshunov [ 19/Apr/21 ]

The way how chunk skips should be calculated is different in SBE. Since we don't have a dedicated PlanStage for the shard filter in SBE, we cannot use the same approach as in the classic engine. In SBE a shard filter is implemented as a regular filter stage with a shardFilter expression. So, in getChunkSkips function in this test we can get a nodeId for the SHARDING_FILTER QSN from the queryPlanner subsection of the explain output, walk the executionStats tree and find the first filter stage whose nodeId is the same as the nodeId of the SHARDING_FILTER. This stage would have advances and numTested metrics. By subtracting the former from the latter, we should get the chunkSkips value.

Comment by Eric Cox (Inactive) [ 15/Apr/21 ]

This fix looks non-trivial. Moving this back to open.

Comment by Eric Cox (Inactive) [ 15/Apr/21 ]

The executionStats do show that there are nReturned of 3 from the project inputStage of the filter stage that utilizes the shardFilterer check, and then the final nReturned from the plan is 2. So we are definitely skipping 1 document. However, getChunkSkips() is summing over a chunkSkips field in the execution stats. I think the problem here is that we aren't computing the chunkSkips when we do shardFiltering in sbe.

$ git grep chunkSkips
jstests/libs/analyze_plan.js:        return root.chunkSkips;
jstests/sharding/shard3.js:var chunkSkips = 0;
jstests/sharding/shard3.js:    chunkSkips += getChunkSkips(theShard.executionStages);
jstests/sharding/shard3.js:assert.eq(1, chunkSkips, "ex4");
src/mongo/db/exec/plan_stats.h:    ShardingFilterStats() : chunkSkips(0) {}
src/mongo/db/exec/plan_stats.h:    size_t chunkSkips;
src/mongo/db/exec/shard_filter.cpp:                ++_specificStats.chunkSkips;
src/mongo/db/query/plan_explainer_impl.cpp:            bob->appendNumber("chunkSkips", static_cast<long long>(spec->chunkSkips));

Comment by Kyle Suarez [ 12/Apr/21 ]

Seems possibly related to SERVER-55092, though I think I ran this patch build after that commit went in.

Generated at Thu Feb 08 05:38:11 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.