[SERVER-63973] Investigate compact not holding the PBWM lock Created: 24/Feb/22 Updated: 06/Dec/22 Resolved: 10/Oct/22 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Task | Priority: | Major - P3 |
| Reporter: | Haley Connelly | Assignee: | Backlog - Storage Execution Team |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||
| Assigned Teams: |
Storage Execution
|
||||||||||||||||||||
| Participants: | |||||||||||||||||||||
| Case: | (copied to CRM) | ||||||||||||||||||||
| Comments |
| Comment by Louis Williams [ 28/Feb/22 ] |
|
milkie, I think even if we set the mode to RECOVERING, the node will stop replicating and have the same observed availability problems unless we also eliminate the PBWM lock. I also think that separately, serverStatus should opt-out of the PBWM lock, since this is important to observability. |
| Comment by Eric Milkie [ 25/Feb/22 ] |
|
I wonder if we should revert back to setting RECOVERING state automatically during compact, so that users don't need to worry about one secondary being not entirely usable. The easiest way for a user to avoid sending read requests to a not entirely usable node is indeed to simply set maintenance mode. |
| Comment by Louis Williams [ 25/Feb/22 ] |
|
If compact doesn't take the PBWM lock, which I don't believe it should, then we don't need to make it yield. We just need to make it clear in our documentation that the node in question is "available", but will be under extra load that may not make it entirely usable. |
| Comment by Josef Ahmad [ 25/Feb/22 ] |
|
It would be also useful to determine if we can make compact yield periodically, to avoid the contention with other acquirers with stronger mode. |