[SERVER-48398] Writing config document to "local.system.replset" should not acquire PBWM lock. Created: 26/May/20  Updated: 08/Jan/24

Status: Backlog
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Suganthi Mani Assignee: Backlog - Replication Team
Resolution: Unresolved Votes: 0
Labels: former-quick-wins
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Related
related to SERVER-50519 resumable index build hangs waiting f... Closed
related to SERVER-38341 Remove Parallel Batch Writer Mutex Closed
related to SERVER-60351 Writing last vote to "local.replset.e... Closed
is related to SERVER-48399 Writing config document to "local.sys... Closed
Assigned Teams:
Replication
Participants:
Linked BF Score: 144

 Description   

It seems when the node persists new config in collection "local.system.replset", it takes "local" database lock in stronger mode (X) and PBWM lock in IS mode. This can lead to 2 major side effects.
1) Since PBWM lock is taken in IS mode, this can block the secondary oplog applier which requires PBWM in X mode. This can result in replication lag. 
2) Since it takes "local" database lock in X mode, this can block other local database readers and writers. This will be addressed by SERVER-48399

  • Mainly, if this node X is the sync source for node Y, then the oplog fetcher of the node Y can be blocked behind the the reconfig via heartbeat thread due to database lock conflict, leading to replication lag.

Generally, Upserts on non-replicated collection doesn't need to acquire pbwm lock. Basically, storeLocalConfigDocument  should acquire DB lock under ShouldNotConflictWithSecondaryBatchApplicationBlock.



 Comments   
Comment by Suganthi Mani [ 26/May/20 ]

+ writing last vote document should also not acquire pbwm lock.

Generated at Thu Feb 08 05:17:00 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.