[SERVER-37340] Make FCV upgrade use command oplog entry instead of observing writes to the FCV document Created: 27/Sep/18  Updated: 21/Feb/23

Status: Backlog
Project: Core Server
Component/s: Upgrade/Downgrade
Affects Version/s: None
Fix Version/s: None

Type: Task Priority: Major - P3
Reporter: Kaloian Manassiev Assignee: Backlog - Replication Team
Resolution: Unresolved Votes: 0
Labels: pm-2821-quick-wins
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Related
is related to SERVER-34483 Avoid taking DBLocks when clearing in... Closed
Assigned Teams:
Replication
Participants:

 Description   

In the current FCV upgrade logic, the primary node in a replica set executes a command and has the opportunity to acquire strong locks for consistency. However, the secondaries only get to observe writes to the FCV document which are executed while already holding a hierarchy of locks and as a result, they are not allowed to take any further locks on their own out of risk of introducing deadlocks.

During 4.0 development we uncovered cases (SERVER-34483) where upgrade requires the acquisition of strong locks on secondaries and because of this, it would be much more convenient if the FCV upgrade logic wrote a command oplog entry, which on the secondary nodes executes serially.



 Comments   
Comment by Kaloian Manassiev [ 30/Jun/22 ]

Since FCV is a macro operation that potentially impacts the entire instances, it should not be using OpObservers on documents, at the very least because of the locking problem. Because of this, I still believe there is merit to make it use 'c' oplog entries.

Passing this ticket to the replication team, since they own the FCV infrastructure, to possibly be done under PM-2821.

Comment by Max Hirschhorn [ 12/Nov/21 ]

it would be much more convenient if the FCV upgrade logic wrote a command oplog entry, which on the secondary nodes executes serially.

The changes from ff982a6 as part of SERVER-40169 made it so writes to the admin.system.version collection are applied in their own batch on secondaries.

I'm not sure this entirely solves the lock upgrade problem mentioned in the description because there can still be readers on secondaries. Assigning this to Sharding EMEA because they had done work on setFCV in the MongoDB 5.0 and I'd like more input from Kal.

Generated at Thu Feb 08 04:45:43 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.