[SERVER-66279] Use the BatchedDeleteStage in the range deleter Created: 06/May/22  Updated: 04/Jul/23  Resolved: 04/Jul/23

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Task Priority: Major - P3
Reporter: Josef Ahmad Assignee: Pierlauro Sciarelli
Resolution: Won't Do Votes: 0
Labels: range-deleter-improvements
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
depends on SERVER-77543 Support `fromMigrate` in batched dele... Closed
Duplicate
is duplicated by SERVER-67331 Use efficient multi-deletes for range... Closed
is duplicated by SERVER-72298 Use batch deletions to delete orphane... Closed
Related
related to SERVER-78661 Complete TODO listed in SERVER-66279 Closed
Assigned Teams:
Sharding EMEA
Sprint: Sharding EMEA 2023-06-12, Sharding EMEA 2023-06-26, Sharding EMEA 2023-07-10
Participants:

 Description   

The new BatchedDeleteStage deletes documents efficiently in batches.

Dependencies that I'm aware of to switch the range deleter from the DeleteStage to the BatchedDeleteStage:

  • Support for the fromMigrate parameter: depends on SERVER-64107 which in turns depends on SERVER-65859.
  • Support for the returnDeleted parameter.
  • Support for the removeSaver parameter (for moveParanoia).


 Comments   
Comment by Pierlauro Sciarelli [ 04/Jul/23 ]

Closing as "won't do" because - after some internal testing - it was decided that the increase in latency caused by batched deletions (compared to doc-by-doc deletions) does not play well with the requirement that the range deleter must not impact user workload

Comment by Louis Williams [ 22/Dec/22 ]

Within a range deleter batch, the BatchDeleteStage will further allow you to split up a multi-delete into its own batches. This can be tuned by specifying a doc limit and time limit.

The batched deletes will probably not create significant performance improvements on workloads that are IO-bound, especially those deleting data that is out of cache. That said, batched deletes will reduce the journaling rate, so instead of one disk write request per document, we will only need one per batch.

Comment by Garaudy Etienne [ 22/Dec/22 ]

whatever value we use for rangeDeleterBatchSize, will now be the number of documents we delete in a single batch. Therefore we should reduce the default batch size down to a reasonable value based on testing (anywhere between 128 and 5000?) since we will no longer be deleting documents one-at-a-time.

Generated at Thu Feb 08 06:04:58 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.