Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-46487

The mongos routing for scatter/gather ops can have unbounded latency

    XMLWordPrintable

    Details

    • Backwards Compatibility:
      Fully Compatible
    • Operating System:
      ALL
    • Backport Requested:
      v4.4, v4.2, v4.0, v3.6
    • Sprint:
      Sharding 2020-04-20
    • Case:

      Description

      Issue Status as of April 2020

      ISSUE SUMMARY

      The mongoS routing of scatter/gather operations iterates over the boundaries for the collection’s chunks sorted in ascending order. As the list is iterated, shards that own a chunk for the collection are discovered. If all shards in the cluster that own a chunk for the collection are discovered partway through the iteration, the iteration early-exits and the mongoS targets all of these shards.

      This logic is suboptimal if a large number of the sorted chunks are initially owned by a subset of shards. This can result in a large number of iterations needed to discover all shards. As an example, on a 2-shard cluster with a collection where the first 100k sorted chunks are owned on shard 1, the mongoS iterates over the boundaries for 100k chunks when routing a scatter/gather operation.

      USER IMPACT

      On a sharded collection where a large number of the sorted chunks (e.g. 100k) are initially owned by a subset of shards, a scatter/gather operation can display increased latency (e.g. milliseconds) or increased mongoS CPU utilization.
      This chunk distribution can result from adding a shard to the cluster, as contiguous low chunk ranges get migrated into the new shard.

      WORKAROUND

      Rearrange the chunks in a collection so that the initial chunks in the sorted list discover all the shards. The script below can be used to verify the number of iterations required to route a scatter/gather operation.

      var ns = 'mydb.mycoll'
      var nShards = db.getSiblingDB('config').shards.count();
      var count = 0;
      var shards = [];
      print('Iterating ' + ns + ' chunks sorted in ascending order...')
      var cursor = db.getSiblingDB('config').chunks.aggregate([{$match : {ns: ns}}, {$sort : { min : 1 }}]);
      while (cursor.hasNext() && shards.length != nShards) {
         let nextShard = cursor.next().shard;
         if (!shards.includes(nextShard)) {
            shards.push(nextShard);
            print("  discovered shard " + nextShard + " at chunk iteration " + count)
         };
         ++count;
      }
      print('Done. ' + count + ' chunk iterations to route a scatter/gather operation.')
      

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              gregory.noma Gregory Noma
              Reporter:
              josef.ahmad Josef Ahmad
              Participants:
              Votes:
              4 Vote for this issue
              Watchers:
              24 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: