-
Type: Bug
-
Resolution: Fixed
-
Priority: Major - P3
-
Affects Version/s: 4.4.0-rc7
-
Component/s: Sharding
-
Fully Compatible
-
ALL
-
v4.4, v4.2
-
-
Sharding 2020-07-13, Sharding 2020-07-27, Sharding 2020-08-10
-
20
If a StaleShardVersion is encountered while running a BatchWriteExec::executeBatch, we will call ChunkManagerTargeter::noteStaleShardResponse that will populate the _remoteShardVersions map with the stale error info.
Then a refresh of the targeter is attempted that will trigger a refresh of the routingInfo , this can throw a ShardNotFound exception (catched here). At this point we will loop again in the executeBatch function and we will call ChunkManagerTargeter::noteCouldNotTarget, now since the _remoteShardVersions map is not empty we will hit this dassert.
The bug has been triggered by the jstests/concurrency/fsm_workloads_add_remove_shards/clusterwide_ops_with_add_drop_shards.js test that is manually dropping the `config.shards` collection of the mongos causing the ShardNotFound error. It is only happening in the Enterprise RHEL 6.2 DEBUG Code Coverage and the
{UB}SAN Enterprise Ubuntu 18.04 DEBUG both in 4.4 and in master.