[DOCS-10446] Docs for SERVER-29405: After move chunk out, pause for secondary queries to drain Created: 27/Jun/17  Updated: 29/Oct/23  Resolved: 25/Oct/17

Status: Closed
Project: Documentation
Component/s: Server
Affects Version/s: None
Fix Version/s: 3.5.10

Type: Task Priority: Major - P3
Reporter: Emily Hall Assignee: Kevin Albertson
Resolution: Fixed Votes: 0
Labels: Sharding
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Documented
documents SERVER-29405 After move chunk out, pause for secon... Closed
Duplicate
Participants:
Days since reply: 6 years, 15 weeks, 6 days ago
Epic Link: DOCS: 3.6 Server

 Description   

Documentation Request Summary:

A new server startup setParameter, orphanCleanupDelaySecs, needs to be documented, along with usage advice.

Engineering Ticket Description:

When moving a chunk off a shard finishes, secondary read queries might still depend on the chunk. We need to obey a cluster-wide config parameter that specifies how long after each chunk move completes before we begin actually deleting documents in the range, because secondaries don't have any choice about performing deletes as they appear in the oplog, and must kill any dependent queries still running.

By default, range deletion on an emigrated chunk, or any range deletion on a range that a running query still depends on, is delayed until all such queries terminate, or 15 minutes, whichever is longer. Other range deletions proceed in the meantime, most particularly of ranges about to be migrated in. The delay is configurable per-server with e.g.

{setParameter: {orphanCleanupDelaySecs: 0}}

In tests this value is set to 2.

This behavior will probably need integration into management tools. For example, when migrating chunks off of a shard whose storage usage has been found to be growing at an alarming rate, it probably should be reduced, temporarily, to zero. Users who run queries on shard secondaries that run over 15 minutes may want to increase it. Users who run queries on secondaries that always complete in much less than 15 minutes may want to reduce it.



 Comments   
Comment by Kevin Albertson [ 26/Oct/17 ]

The commit that resolved this has the wrong DOCS ticket noted. The commit is here: https://github.com/mongodb/docs/commit/4cc42a79f24fc357890a905d642e922389841f47

Comment by Kevin Albertson [ 23/Oct/17 ]

From discussing with Nathan:

  • this is set on a mongod, not a mongos. Ideally ops manager should take care of setting this on every mongod.
  • secondaries won't kill queries dependent on orphaned data getting deleted, but we should make users aware that long running queries on shard secondaries may see "disappearing documents" if this parameter is too low
  • deletion occurs on the primary. The primary is unaware of queries running on the secondary. The primary cannot automatically wait until all secondary queries are completed before deleting.
  • if the value is too high, it may fill up storage space
  • if the value is too low, shard secondary queries are more likely to see disappearing docs
  • shard secondary queries will not use emigrated chunks if they were emigrated before the query started, even if they have not been deleted.
Generated at Thu Feb 08 08:00:35 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.