[SERVER-64557] BatchedDeleteStage should consider yielding based on document and time targets Created: 16/Mar/22  Updated: 09/May/22  Resolved: 09/May/22

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Task Priority: Major - P3
Reporter: Haley Connelly Assignee: Josef Ahmad
Resolution: Won't Do Votes: 0
Labels: PM-2227-M3
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File 10MB docs.png     PNG File 10MB-docs-2MB-stageSizeTarget.png     PNG File 1kB.png    
Issue Links:
Related
is related to SERVER-63039 Add a staged document size target to ... Closed
Sprint: Execution Team 2022-04-18, Execution Team 2022-05-02, Execution Team 2022-05-16
Participants:

 Description   

Documents that are fetched during the BatchedDeletePhase must be refetched if the snapshot changes before the batch is ready for the delete. 

We need to ensure the YieldPolicy tied to the BatchedDeleteStage limits auto yields that cause the snapshot to be abandoned. 



 Comments   
Comment by Josef Ahmad [ 09/May/22 ]

With SERVER-66105 setting the batchedDeletesTargetStagedDocBytes default to 2MB, the refetches due to yield do not appear to be significant any longer. The observed rate of refetches continues to be below 1% as before for 1kB-sized documents, and our benchmarks show similarly low refetch rate for larger documents. This leaves the performance unchanged for smaller documents, and brings the batched deleter performance on par with the doc-by-doc deleter for large documents. We've also included the refetchesDueToYield diagnostic metric for observability.

Closing this ticket as Won't Do, as the batched deleter performance with the current auto-yielding policy appears to be satisfactory.

Comment by Josef Ahmad [ 04/May/22 ]

We can reduce the refetches considerably by setting a lower default for batchedDeletesTargetStagedDocBytes. By setting this target to 2MB instead of the current 30MB, we get zero refetches with on a mass deletion with 10MB documents, with roughly the same negligible rate of refetches with the 1kB document test. Below are the metrics of the 10MB workload with batchedDeletesTargetStagedDocBytes set to 2MB. Note the throughput speed-up compared to the chart in my previous comment (serverStatus batchedDeletes docs rate).

Comment by Haley Connelly [ 13/Apr/22 ]

We are still waiting on design decisions for the best course of action here. This ticket is blocked until we come to a consensus.

Comment by Josef Ahmad [ 01/Apr/22 ]

If it helps to make an informed decision, I've run an instrumented version of the server that reports the refetches of staged documents due to yielding.

On a mass deletion of 1kB sized documents, we refetch less than 1 document every 10 100. (EDIT: fixed typo)

On a mass deletion of 10MB sized documents, we actually end up refetching almost every document. This is probably because the target batch size of 10 documents actually commits 2-3 documents per batch, as we meet the targetBatchTime of 5ms. This means that we'll likely yield before the next iteration or the following one, so we're likely to refetch some of the remaining staged documents.

Generated at Thu Feb 08 06:00:37 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.