[SERVER-29405] After move chunk out, pause for secondary queries to drain Created: 30/May/17  Updated: 30/Oct/23  Resolved: 30/Aug/17

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 3.5.8
Fix Version/s: 3.5.10

Type: New Feature Priority: Major - P3
Reporter: Nathan Myers Assignee: Nathan Myers
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Documented
is documented by DOCS-10446 Docs for SERVER-29405: After move chu... Closed
Related
related to SERVER-31837 Recipient shard should not wait for `... Backlog
related to SERVER-14873 Ability to pause background rangeDele... Closed
Backwards Compatibility: Fully Compatible
Sprint: Sharding 2017-07-10
Participants:
Case:

 Description   

When moving a chunk off a shard finishes, secondary read queries might still depend on the chunk. We need to obey a cluster-wide config parameter that specifies how long after each chunk move completes before we begin actually deleting documents in the range, because secondaries don't have any choice about performing deletes as they appear in the oplog, and must kill any dependent queries still running.

By default, range deletion on an emigrated chunk, or any range deletion on a range that a running query still depends on, is delayed until all such queries terminate, or 15 minutes, whichever is longer. Other range deletions proceed in the meantime, most particularly of ranges about to be migrated in. The delay is configurable per-server with e.g.

{setParameter: {orphanCleanupDelaySecs: 0}}

In tests this value is set to 2.

This behavior will probably need integration into management tools. For example, when migrating chunks off of a shard whose storage usage has been found to be growing at an alarming rate, it probably should be reduced, temporarily, to zero. Users who run queries on shard secondaries that run over 15 minutes may want to increase it. Users who run queries on secondaries that always complete in much less than 15 minutes may want to reduce it.



 Comments   
Comment by Kevin Albertson [ 17/Oct/17 ]

I'm documenting this parameter and want to confirm I have the right understanding and check a few points.

IIUC we want to avoid killing queries on shard secondaries depending on chunks that have emigrated. This parameter controls how long the primary in a shard will wait before deleting a chunk, to avoid killing those dependent queries.

I have the following questions
1. Is this really a "cluster-wide" config parameter, or just meant to be set on a single mongod instance? Checking on my local sharded cluster, I can only set this on a mongod.
2. Is "until all such queries terminate, or 15 minutes, whichever is longer", meant to say "whichever comes first"? If we are able to wait until all dependent queries terminate before deleting, I don't see why this parameter would be necessary.
3. Setting this parameter to 0 effectively means any queries currently reading from emigrated chunks on a shard secondary will be killed as soon as the delete is replicated, correct?

Comment by Githook User [ 30/Jun/17 ]

Author:

{u'username': u'nathan-myers-mongo', u'name': u'Nathan Myers', u'email': u'nathan.myers@10gen.com'}

Message: SERVER-29405 delay deleting orphaned shard chunks

When deleting the donor range after migrating a chunk off of a shard,
the range deleter will schedule the deletion at some time in the future,
according to a server parameter orphanCleanupDelaySecs, which defaults
to 900, or 15 minutes. It does not delay range deletions preparatory to
migrating a range in, and does not put off deleting the donor range if
the moveChunk command has set the option _waitForDelete.

The file jstests/sharding/write_commands_sharding_state.js had CR (0x0D)
line endings, which made the patch fail lint. The substantive changes
in the file were to add "_waitForDelete" options to the moveChunk
commands.
Branch: master
https://github.com/mongodb/mongo/commit/c63465a42ed89ee6563841d7b349fa85de69963e

Comment by Nathan Myers [ 30/May/17 ]

This requires defining and documenting a new cluster-wide parameter.

Generated at Thu Feb 08 04:20:47 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.