[SERVER-61849] Make collection defragmenter cleanup asyncronous Created: 02/Dec/21  Updated: 06/Dec/22  Resolved: 05/Jan/22

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Task Priority: Minor - P4
Reporter: Tommaso Tocci Assignee: [DO NOT USE] Backlog - Sharding EMEA
Resolution: Won't Do Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Assigned Teams:
Sharding EMEA
Participants:

 Description   

The current cleanup of the defragmentation state is done in BalancerDefragmentationPolicyImpl::refreshCollectionDefragmentationStatus. As part of the defragmentation state cleanup we perform the removal of the data size estimation from all the chunks, for collection with millions of chunks this could take up to seconds.Currently the cleanup is done synchronously in BalancerDefragmentationPolicyImpl::refreshCollectionDefragmentationStatus and it will block the balancer round and the streaming action (since we are holding the _streamingMutex lock).

My proposal is to introduce an ad-hoc defragmentation phase that will be responsible for performing the cleanup procedure. So that we can execute it asyncornously.

Regarding the interruption of the defragmentation process I would structure it as follows:
1. refreshCollectionDefragmentationStatus set an interrupted flag on the collectionState
2. All the phases will stop producing actions if the interrupted flag is enabled and will eventually move to the subsequent phase.
3. The last phase will execute the clear of the data size
4. When all the phases are completed we can clear the collection entry in config.collections and remove the state from the _defragmentationState map



 Comments   
Comment by Paolo Polato [ 17/Dec/21 ]

This ticket is intended just for performance optimisation and it is not a must have for epic delivery.

Generated at Thu Feb 08 05:53:28 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.