[SERVER-70810] SHARDING_FILTER stage missing on shards from cluster count command explain with query predicate Created: 24/Oct/22  Updated: 02/Feb/23

Status: Backlog
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Matt Boros Assignee: Backlog - Query Optimization
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Duplicate
is duplicated by SERVER-70811 SHARDING_FILTER stage missing from db... Closed
Related
related to SERVER-3645 Sharded collection counts (on primary... Closed
Assigned Teams:
Query Optimization
Operating System: ALL
Participants:

 Description   

This bug has varying severity depending on the version. The query is

db.coll.count({predicate}) 

with secondary reads.

On 4.4, the result is incorrect, and includes orphan documents. A SHARDING_FILTER is missing from the explain plan.

On 5.0 and 6.0, orphan documents are filtered out and the count is correct, but the SHARDING_FILTER stage is not reported in the explain plan. This makes me think the explain is incorrect, and the stage is actually included.

Repro script (based on shard_filtering.js):

 (function() {
    "use strict";
    
    load("jstests/libs/analyze_plan.js");
    
    // Deliberately inserts orphans outside of migration.
    TestData.skipCheckOrphans = true;
    const st = new ShardingTest({shards: 2, rs: {nodes: 2}});
    const collName = "test.shardfilter";
    const mongosDb = st.s.getDB("test");
    const mongosColl = st.s.getCollection(collName);
    
    assert.commandWorked(st.s.adminCommand({enableSharding: "test"}));
    st.ensurePrimaryShard("test", st.shard1.name);
    assert.commandWorked(
        st.s.adminCommand({shardCollection: collName, key: {a: 1, "b.c": 1, "d.e.f": 1}}));
    
    // Put a chunk with no data onto shard0 in order to make sure that both shards get targeted.
    assert.commandWorked(st.s.adminCommand({split: collName, middle: {a: 20, "b.c": 0, "d.e.f": 0}}));
    assert.commandWorked(st.s.adminCommand({split: collName, middle: {a: 30, "b.c": 0, "d.e.f": 0}}));
    assert.commandWorked(st.s.adminCommand(
        {moveChunk: collName, find: {a: 25, "b.c": 0, "d.e.f": 0}, to: st.shard0.shardName}));
    
    // Shard the collection and insert some docs.
    const docs = [
        {_id: 0, a: 1, b: {c: 1}, d: {e: {f: 1}}, g: 100, z: "z"},
        {_id: 1, a: 1, b: {c: 2}, d: {e: {f: 2}}, g: 100.9, z: "z"},
        {_id: 2, a: 1, b: {c: 3}, d: {e: {f: 3}}, g: "a", z: "z"},
        {_id: 3, a: 1, b: {c: 3}, d: {e: {f: 3}}, g: [1, 2, 3], z: "z"},
        {_id: 4, a: "a", b: {c: "b"}, d: {e: {f: "c"}}, g: null, z: "z"},
        {_id: 5, a: 1.0, b: {c: "b"}, d: {e: {f: Infinity}}, g: NaN, z: "z"},
    ];
    assert.commandWorked(mongosColl.insert(docs));
    assert.eq(mongosColl.find().itcount(), 6);
    
    // Insert some documents with valid partial shard keys to both shards. The versions of these
    // documents on shard0 are orphans, since all of the data is owned by shard1.
    const docsWithMissingAndNullKeys = [
        {_id: 6, a: "missingParts", z: "z"},
        {_id: 7, a: null, b: {c: 1}, d: {e: {f: 1}}, z: "z"},
        {_id: 8, a: "null", b: {c: null}, d: {e: {f: 1}}, z: "z"},
        {_id: 9, a: "deepNull", b: {c: 1}, d: {e: {f: null}}, z: "z"},
    ];
    assert.commandWorked(st.shard0.getCollection(collName).insert(docsWithMissingAndNullKeys));
    assert.commandWorked(st.shard1.getCollection(collName).insert(docsWithMissingAndNullKeys));
    
    // Insert orphan docs without missing or null shard keys onto shard0 and test that they get filtered
    // out.
    const orphanDocs = [
        {_id: 10, a: 100, b: {c: 10}, d: {e: {f: 999}}, g: "a", z: "z"},
        {_id: 11, a: 101, b: {c: 11}, d: {e: {f: 1000}}, g: "b", z: "z"}
    ];
    assert.commandWorked(st.shard0.getCollection(collName).insert(orphanDocs));
    assert.eq(mongosColl.find().itcount(), 10);
 
    // With primary read pref, count with predicate filters out orphans
    assert.eq(mongosColl.count({z: "z"}), 10);
    // The explain plan includes a sharding filter
    jsTestLog(mongosColl.explain().count({z: "z"}));
 
    mongosDb.shardfilter.getMongo().setReadPref("secondary");
    // With secondary read pref, count with predicate still filters out orphans
    assert.eq(mongosColl.count({z: "z"}), 10);
    // The following explain doesn't include a sharding filter
    jsTestLog(mongosColl.explain().count({z: "z"}));
    
    st.stop();
})();



 Comments   
Comment by Joe Kanaan [ 02/Feb/23 ]

This is a benign issue with the explain output. We will fix this when we revisit the explain command in a future project.

Comment by James Wahlin [ 25/Jan/23 ]

I confirmed that an explain of a count command on a sharded collection will omit the shard filtering stage on the individual shards. I would propose that we focus this ticket on fixing that problem.

Historically the count command did not filter for orphans. This was addressed for MongoDB 4.0 under SERVER-3645, which notably did not tackle shard filtering on secondaries. I suspect that fix for secondaries came later which is why 4.4 does not filter but later versions do.

Generated at Thu Feb 08 06:17:11 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.