When moving a chunk off a shard finishes, secondary read queries might still depend on the chunk. We need to obey a cluster-wide config parameter that specifies how long after each chunk move completes before we begin actually deleting documents in the range, because secondaries don't have any choice about performing deletes as they appear in the oplog, and must kill any dependent queries still running.
By default, range deletion on an emigrated chunk, or any range deletion on a range that a running query still depends on, is delayed until all such queries terminate, or 15 minutes, whichever is longer. Other range deletions proceed in the meantime, most particularly of ranges about to be migrated in. The delay is configurable per-server with e.g.
{setParameter: {orphanCleanupDelaySecs: 0}}
In tests this value is set to 2.
This behavior will probably need integration into management tools. For example, when migrating chunks off of a shard whose storage usage has been found to be growing at an alarming rate, it probably should be reduced, temporarily, to zero. Users who run queries on shard secondaries that run over 15 minutes may want to increase it. Users who run queries on secondaries that always complete in much less than 15 minutes may want to reduce it.
- related to
-
SERVER-87673 Queries which run on secondaries and exceed orphanCleanupDelaySecs may miss documents which were donated by chunk migration
- Backlog
-
SERVER-31837 Recipient shard should not wait for `orphanCleanupDelaySecs`
- Blocked
-
SERVER-14873 Ability to pause background rangeDelete jobs
- Closed