Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-46487

The mongos routing for scatter/gather ops can have unbounded latency

    • Fully Compatible
    • ALL
    • v4.4, v4.2, v4.0, v3.6
    • Sharding 2020-04-20

      Issue Status as of April 2020

      ISSUE SUMMARY

      The mongoS routing of scatter/gather operations iterates over the boundaries for the collection’s chunks sorted in ascending order. As the list is iterated, shards that own a chunk for the collection are discovered. If all shards in the cluster that own a chunk for the collection are discovered partway through the iteration, the iteration early-exits and the mongoS targets all of these shards.

      This logic is suboptimal if a large number of the sorted chunks are initially owned by a subset of shards. This can result in a large number of iterations needed to discover all shards. As an example, on a 2-shard cluster with a collection where the first 100k sorted chunks are owned on shard 1, the mongoS iterates over the boundaries for 100k chunks when routing a scatter/gather operation.

      USER IMPACT

      On a sharded collection where a large number of the sorted chunks (e.g. 100k) are initially owned by a subset of shards, a scatter/gather operation can display increased latency (e.g. milliseconds) or increased mongoS CPU utilization.
      This chunk distribution can result from adding a shard to the cluster, as contiguous low chunk ranges get migrated into the new shard.

      WORKAROUND

      Rearrange the chunks in a collection so that the initial chunks in the sorted list discover all the shards. The script below can be used to verify the number of iterations required to route a scatter/gather operation.

      var ns = 'mydb.mycoll'
      var nShards = db.getSiblingDB('config').shards.count();
      var count = 0;
      var shards = [];
      print('Iterating ' + ns + ' chunks sorted in ascending order...')
      var cursor = db.getSiblingDB('config').chunks.aggregate([{$match : {ns: ns}}, {$sort : { min : 1 }}]);
      while (cursor.hasNext() && shards.length != nShards) {
         let nextShard = cursor.next().shard;
         if (!shards.includes(nextShard)) {
            shards.push(nextShard);
            print("  discovered shard " + nextShard + " at chunk iteration " + count)
         };
         ++count;
      }
      print('Done. ' + count + ' chunk iterations to route a scatter/gather operation.')
      

            Assignee:
            gregory.noma@mongodb.com Gregory Noma
            Reporter:
            josef.ahmad@mongodb.com Josef Ahmad
            Votes:
            4 Vote for this issue
            Watchers:
            29 Start watching this issue

              Created:
              Updated:
              Resolved: