[SERVER-63243] Range deleter must not clean up orphan ranges in a round-robin fashion Created: 03/Feb/22  Updated: 29/Oct/23  Resolved: 07/Jun/22

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: 5.0.10, 4.4.16, 6.0.0-rc10, 6.1.0-rc0

Type: Bug Priority: Major - P3
Reporter: Pierlauro Sciarelli Assignee: Allison Easton
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v6.0, v5.0, v4.4
Sprint: Sharding EMEA 2022-04-04, Sharding EMEA 2022-06-13
Participants:
Case:

 Description   

In presence of more than one ready range deletion referring ranges containing more than rangeDeleterBatchSize documents, the range-deleter is currently working in a round-robin fashion.

Example with ready range deletions task documents A and B:

  • Pick A's range, delete batch according to rangeDeleterBatchSize, re-enqueue for deletion
  • Pick B's range, delete batch according to rangeDeleterBatchSize, re-enqueue for deletion
  • Pick A's range, delete batch according to rangeDeleterBatchSize, re-enqueue for deletion
  • ...continue until orphan documents in ranges have to be cleared up...

As a result, users may incur into some issues such as:

  • The balancer may get blocked trying to move back some range on the old donor.
  • With sufficiently fast migrations, the time for a range deletion task to complete increases exponentially. (The more ranges get enqueued, the less is probable for a range to be chosen for deletion).

SERVER-61637 increased the batch default size to MAX_INT in order to work around this issue, but the problem still stands in case the parameter is set to a lower value. SERVER-47699 pretty much removed the need for batching, but there are some scenarios in which it may still be needed to decrease the batch size (e.g. if a machine is under very intense user CRUD load).

Objective of this ticket is making sure that once a batch from a range has been deleted, the next round will keep on deleting the same range.

One solution could be to loop within the deletion task as long as the number of deleted documents for the range is greater than zero.



 Comments   
Comment by Githook User [ 30/Jun/22 ]

Author:

{'name': 'Allison Easton', 'email': 'allison.easton@mongodb.com', 'username': 'allisoneaston'}

Message: SERVER-63243 Range deleter must not clean up orphan ranges in a round-robin fashion

(cherry picked from commit f44581d5bfe275a3b9f0454dd7843c04ccfd1f2d)
Branch: v4.4
https://github.com/mongodb/mongo/commit/d151213391fddc1e0a7056bed7853ba955001e1b

Comment by Githook User [ 28/Jun/22 ]

Author:

{'name': 'Allison Easton', 'email': 'allison.easton@mongodb.com', 'username': 'allisoneaston'}

Message: SERVER-63243 Range deleter must not clean up orphan ranges in a round-robin fashion

(cherry picked from commit f44581d5bfe275a3b9f0454dd7843c04ccfd1f2d)
Branch: v5.0
https://github.com/mongodb/mongo/commit/a1c118ad5c2ac78c8751b65ec8a4f5e27fc719fa

Comment by Githook User [ 10/Jun/22 ]

Author:

{'name': 'Allison Easton', 'email': 'allison.easton@mongodb.com', 'username': 'allisoneaston'}

Message: SERVER-63243 Range deleter must not clean up orphan ranges in a round-robin fashion

(cherry picked from commit f44581d5bfe275a3b9f0454dd7843c04ccfd1f2d
Branch: v6.0
https://github.com/mongodb/mongo/commit/119a34c2e06680d35c089effa2c9c809e1bd1102

Comment by Githook User [ 07/Jun/22 ]

Author:

{'name': 'Allison Easton', 'email': 'allison.easton@mongodb.com', 'username': 'allisoneaston'}

Message: SERVER-63243 Range deleter must not clean up orphan ranges in a round-robin fashion
Branch: master
https://github.com/mongodb/mongo/commit/f44581d5bfe275a3b9f0454dd7843c04ccfd1f2d

Comment by Pierlauro Sciarelli [ 31/May/22 ]

kaloian.manassiev@mongodb.com observed that also without withDelayBetweenIterations the range deletion gets rescheduled behind the others. Either we made a wrong assessment at the time of closing the ticket, either something has changed and the executor is now yielding.

Reopening the ticket so that we don't forget to double-check.

Comment by Allison Easton [ 28/Mar/22 ]

This ticket was solved by SERVER-62368. By removing the withDelayBetweenIterations, the range deletions are processed fully before passing to the next range deletion task.

Generated at Thu Feb 08 05:57:15 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.