[SERVER-63973] Investigate compact not holding the PBWM lock Created: 24/Feb/22  Updated: 06/Dec/22  Resolved: 10/Oct/22

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Task Priority: Major - P3
Reporter: Haley Connelly Assignee: Backlog - Storage Execution Team
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Duplicate
duplicates SERVER-68694 The compact command should not block ... Closed
Related
is related to SERVER-64031 serverStatus should not take PBWM lock Closed
Assigned Teams:
Storage Execution
Participants:
Case:

 Comments   
Comment by Louis Williams [ 28/Feb/22 ]

milkie, I think even if we set the mode to RECOVERING, the node will stop replicating and have the same observed availability problems unless we also eliminate the PBWM lock.

I also think that separately, serverStatus should opt-out of the PBWM lock, since this is important to observability.

Comment by Eric Milkie [ 25/Feb/22 ]

I wonder if we should revert back to setting RECOVERING state automatically during compact, so that users don't need to worry about one secondary being not entirely usable. The easiest way for a user to avoid sending read requests to a not entirely usable node is indeed to simply set maintenance mode.

Comment by Louis Williams [ 25/Feb/22 ]

If compact doesn't take the PBWM lock, which I don't believe it should, then we don't need to make it yield. We just need to make it clear in our documentation that the node in question is "available", but will be under extra load that may not make it entirely usable.

Comment by Josef Ahmad [ 25/Feb/22 ]

It would be also useful to determine if we can make compact yield periodically, to avoid the contention with other acquirers with stronger mode.

Generated at Thu Feb 08 05:59:09 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.