Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-48504

Combination of StaleShardVersion and ShardNotFound exception make the ChunkManagerTargeter crash

    • Fully Compatible
    • ALL
    • v4.4, v4.2
    • Hide

      The bug has been triggered by the jstests/concurrency/fsm_workloads_add_remove_shards/clusterwide_ops_with_add_drop_shards.js test that is manually dropping the `config.shards` collection of the mongos causing the ShardNotFound error.

      Show
      The bug has been triggered by the jstests/concurrency/fsm_workloads_add_remove_shards/clusterwide_ops_with_add_drop_shards.js test that is manually dropping the `config.shards` collection of the mongos causing the ShardNotFound error.
    • Sharding 2020-07-13, Sharding 2020-07-27, Sharding 2020-08-10
    • 20

      If a StaleShardVersion is encountered while running a BatchWriteExec::executeBatch, we will call ChunkManagerTargeter::noteStaleShardResponse that will populate the _remoteShardVersions map  with the stale error info.
      Then a refresh of the targeter is attempted that will trigger a refresh of the routingInfo , this can throw a ShardNotFound exception (catched here). At this point we will loop again in the executeBatch function and we will call ChunkManagerTargeter::noteCouldNotTarget, now since the _remoteShardVersions map is not empty we will hit this dassert.

       

      The bug has been triggered by the jstests/concurrency/fsm_workloads_add_remove_shards/clusterwide_ops_with_add_drop_shards.js test that is manually dropping the `config.shards` collection of the mongos causing the ShardNotFound error. It is only happening in the Enterprise RHEL 6.2 DEBUG Code Coverage and the

      {UB}

      SAN Enterprise Ubuntu 18.04 DEBUG both in 4.4 and in master.

       

       

       

            Assignee:
            tommaso.tocci@mongodb.com Tommaso Tocci
            Reporter:
            tommaso.tocci@mongodb.com Tommaso Tocci
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: