Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-29405

After move chunk out, pause for secondary queries to drain

    • Type: Icon: New Feature New Feature
    • Resolution: Fixed
    • Priority: Icon: Major - P3 Major - P3
    • 3.5.10
    • Affects Version/s: 3.5.8
    • Component/s: Sharding
    • None
    • Fully Compatible
    • Sharding 2017-07-10

      When moving a chunk off a shard finishes, secondary read queries might still depend on the chunk. We need to obey a cluster-wide config parameter that specifies how long after each chunk move completes before we begin actually deleting documents in the range, because secondaries don't have any choice about performing deletes as they appear in the oplog, and must kill any dependent queries still running.

      By default, range deletion on an emigrated chunk, or any range deletion on a range that a running query still depends on, is delayed until all such queries terminate, or 15 minutes, whichever is longer. Other range deletions proceed in the meantime, most particularly of ranges about to be migrated in. The delay is configurable per-server with e.g.

      {setParameter: {orphanCleanupDelaySecs: 0}}
      

      In tests this value is set to 2.

      This behavior will probably need integration into management tools. For example, when migrating chunks off of a shard whose storage usage has been found to be growing at an alarming rate, it probably should be reduced, temporarily, to zero. Users who run queries on shard secondaries that run over 15 minutes may want to increase it. Users who run queries on secondaries that always complete in much less than 15 minutes may want to reduce it.

            Assignee:
            nathan.myers Nathan Myers
            Reporter:
            nathan.myers Nathan Myers
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

              Created:
              Updated:
              Resolved: