[SERVER-34338] find explain in legacy query path on mongos does not retry on StaleShardVersion Created: 05/Apr/18  Updated: 29/Oct/23  Resolved: 01/Jun/18

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 3.6.3, 3.7.3
Fix Version/s: 4.0.0, 4.1.1

Type: Bug Priority: Major - P3
Reporter: Esha Maharishi (Inactive) Assignee: Esha Maharishi (Inactive)
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Related
is related to SERVER-29449 Explain of find command does not tran... Backlog
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v4.0
Steps To Reproduce:

(function() {
    "use strict";
 
    const dbName = "test";
    const collName = "foo";
    const ns = dbName + "." + collName;
 
    const st = new ShardingTest({mongos: 2, shards: 1, verbose: 2});
 
    let staleMongos = st.s0;
    let freshMongos = st.s1;
 
    jsTest.log("Make the stale mongos load a cache entry for db " + dbName + " once");
    assert.writeOK(staleMongos.getDB(dbName).getCollection(collName).insert({_id: 1}));
 
    jsTest.log("Call shardCollection on " + ns + " from the fresh mongos");
    assert.commandWorked(freshMongos.adminCommand({enableSharding: dbName}));
    assert.commandWorked(freshMongos.adminCommand({shardCollection: ns, key: { "_id": 1 }}));
 
    jsTest.log("Ensure the shard knows " + ns + " is sharded");
    assert.commandWorked(st.shard0.adminCommand({_flushRoutingTableCacheUpdates: ns, syncFromConfig: true}));
 
    jsTest.log("Run explain find on " + ns + " from the stale mongos");
    // Expect the test to fail here.
    staleMongos.getDB(dbName).getCollection(collName).find({$query: {}, $explain: true}).next();
 
    st.stop();
})();

Sprint: Sharding 2018-05-21, Sharding 2018-06-04
Participants:

 Description   

This is a regression introduce by this line in this commit on 3.6, where we pulled the StaleShardVersion handling in mongos up from scatterGather() to strategy::runCommand().

There are no tests that run "find explain" (which is the only thing that goes through the legacy query path that does not internally handle StaleShardVersion) from a stale mongos with --shellReadMode=legacy, so this was not caught.

See attached repro, which should be run with --shellReadMode=legacy to make the "find explain" go through the legacy query path on mongos.



 Comments   
Comment by Githook User [ 01/Jun/18 ]

Author:

{'username': 'EshaMaharishi', 'name': 'Esha Maharishi', 'email': 'esha.maharishi@mongodb.com'}

Message: SERVER-34338 Also blacklist explainFind_stale_mongos.js from last_stable_misc.yml and the file that generates the split-up suites for last_stable
Branch: v4.0
https://github.com/mongodb/mongo/commit/1bc33e263dfbc14c6f87937bf562b03091ca848e

Comment by Githook User [ 01/Jun/18 ]

Author:

{'username': 'EshaMaharishi', 'name': 'Esha Maharishi', 'email': 'esha.maharishi@mongodb.com'}

Message: SERVER-34338 Also blacklist explainFind_stale_mongos.js from last_stable_misc.yml and the file that generates the split-up suites for last_stable
Branch: master
https://github.com/mongodb/mongo/commit/49da2e52dd44b4163aba3e5f39071614e4d2d11c

Comment by Githook User [ 01/Jun/18 ]

Author:

{'username': 'EshaMaharishi', 'name': 'Esha Maharishi', 'email': 'esha.maharishi@mongodb.com'}

Message: SERVER-34338 Find explain in legacy query path on mongos does not retry on StaleShardVersion

(cherry picked from commit 3e31679dfa6d85ebe48855582304c8cb7a635b0d)
Branch: v4.0
https://github.com/mongodb/mongo/commit/1f3c873569959e5e0c547bad9030a5ba2b3eb0dc

Comment by Githook User [ 01/Jun/18 ]

Author:

{'username': 'EshaMaharishi', 'name': 'Esha Maharishi', 'email': 'esha.maharishi@mongodb.com'}

Message: SERVER-34338 Find explain in legacy query path on mongos does not retry on StaleShardVersion
Branch: master
https://github.com/mongodb/mongo/commit/3e31679dfa6d85ebe48855582304c8cb7a635b0d

Comment by Charlie Swanson [ 25/Apr/18 ]

Linking to SERVER-29449, only because I believe this is an additional symptom of separate code paths for explain/non-explain in the find command.

Generated at Thu Feb 08 04:36:20 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.