[SERVER-70745] explain command for cluster delete results in error when on shardsvr mongod Created: 20/Oct/22  Updated: 29/Oct/23  Resolved: 01/Nov/22

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 6.2.0-rc0

Type: Bug Priority: Major - P3
Reporter: Wenqin Ye Assignee: Wenqin Ye
Resolution: Fixed Votes: 0
Labels: sharding-nyc-subteam2
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Backwards Compatibility: Fully Compatible
Operating System: ALL
Participants:

 Description   

If you use the explain command for a cluster delete on a shardsvr mongod, you get an error. However if you run the same command without the explain on a shardsvr mongod it works. 

Code to reproduce the error:

(function() {
const kDBName = "foo";
const kCollName = "bar";
const st = new ShardingTest({mongos: 1, shards: 1, config: 1});const clusterCommandsCases = [
    // This does not error
    //{cmd: {clusterDelete: kCollName, deletes: [{q: {}, limit: 0}]}},    
 
    // This errors
    {cmd: {explain: {clusterDelete: `${kCollName}`, deletes: [{q: {}, limit: 0}]}}}
];
 
assert.commandWorked(st.s0.adminCommand({enablesharding: kDBName}));
assert.commandWorked(st.s0.adminCommand({shardcollection: kDBName + "." + kCollName, key: {a: 1}}));
 
for (let testCase of clusterCommandsCases) {
    assert.commandWorked(st.rs0.getPrimary().getDB(kDBName).runCommand(testCase.cmd),
                         tojson(testCase.cmd));
}
 
st.stop();
})(); 



 Comments   
Comment by Githook User [ 01/Nov/22 ]

Author:

{'name': 'wenqinYe', 'email': 'wenqin908@gmail.com', 'username': 'wenqinYe'}

Message: SERVER-70745 explain command for cluster delete results in error when on shardsvr mongod
Branch: master
https://github.com/mongodb/mongo/commit/52622f9c3cc14aee28efcec2376fc06edb20ebdb

Comment by Wenqin Ye [ 25/Oct/22 ]

So after investigating this bug, I was able to figure out that the problem is that the explain method creates a request that gets routed to the same shard server, which results in the explain method being called again in an infinite loop. 

In our slack conversations discussing this bug, Cheahuychou said there were two ways to fix this: 

  1. Remove the "cluster" prefix before forwarding the explain command to shards. So shards receive just the "insert"/"update"/"delete".
  2. Not support the explain version of the cluster commands that got linked in mongods (i..e add uasserts at the start of the commands)

We (Cheahuychou, Andrew Witten and I) decided that the best option would be #2 as cluster commands (such as clusterDelete) are only used internally, and so an explain on them doesn't make sense. 

Generated at Thu Feb 08 06:17:00 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.