[SERVER-38473] Replace uses of UninterruptibleLockGuard and MODE_X collection locks with uses of a database sharding state mutex for movePrimary functions Created: 07/Dec/18  Updated: 29/Oct/23  Resolved: 05/Feb/19

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: 4.1.8

Type: Task Priority: Major - P3
Reporter: Siyuan Zhou Assignee: Blake Oler
Resolution: Fixed Votes: 0
Labels: prepare_interruptibility, uninterruptibleLockGuardRemoval
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to SERVER-33577 Remove UninterruptibleLockGuards in s... Closed
Backwards Compatibility: Fully Compatible
Sprint: Sharding 2019-01-14, Sharding 2019-01-28, Sharding 2019-02-11
Participants:

 Description   

We use UninterruptibleLockGuard in several places for movePrimaries. Since they acquire strong locks on normal databases and collections, they will be blocked by prepared transactions, causing deadlock on stepdown or shutdown.

Here's a list of all the occurrences of UninterruptibleLockGuard for movePrimaries.

We will employ a new DatabaseShardingStateLock (similar to the CollectionShardingRuntimeLock) to safeguard concurrent access to the database critical section.

When leaving critical section no longer conflicts with prepared transactions, they can run after prepared transactions yield locks on stepdown or being killed on shutdown. See SERVER-38162 for the order of events on shutdown and SERVER-38282 for stepdown.

The following function signature changes will allow us to relax the database locking to IX instead of X under the above UninterruptibleLockGuards. Database locks highlighted in bold will have a proposed change. All other database locks remain unchanged.

  • enterCriticalSectionCatchUpPhase: Database X Lock, DSSLock X Lock
  • enterCriticalSectionCommitPhase: Database X Lock, DSSLock X Lock
  • extitCriticalSection: Database X IX Lock, DSSLock X Lock
  • getCriticalSectionSignal: Database IS Lock, DSSLock IS Lock
  • getDbVersion: Database IS or X Lock (situational), DSS IS Lock
  • setDbVersion (when setting dbVersion to a meaningful value): Database X Lock, DSS X Lock
  • setDbVersion (when setting dbVersion to boost::none, AKA "clearing" the dbVersion): Database X IX Lock, DSS X
  • checkDbVersion: Database IS or X Lock (situational), DSS IS Lock
  • getMovePrimarySourceManager: No Database Lock (reflects current usage), DSS IS Lock
  • setMovePrimarySourceManager: Database X Lock, DSS X Lock
  • clearMovePrimarySourceManager: Database X IX Lock, DSS X Lock

These situations can use relaxed locking as a result:

  1. On the source node for movePrimary, if we can't commit the moved primary, we check if the node has stepped down. If it has stepped down, we must set the database version to boost::none to indicate that the we now do not know the authoritative version. Since setting the database version to boost::none is semantically equivalent to "clearing" the database version, we can now use a Database IX Lock instead of a Database X Lock when doing so. The DSSLock in exclusive mode will prevent concurrent changes to the database version.
  2. On the source node for movePrimary, we must clear state variables on cleanup. This includes clearing the movePrimarySourceManager and criticalSection variables. We may now use a Database IX Lock instead of a Database X Lock, since the DSSLock in exclusive mode will prevent concurrent changes to the database sharding state's in-memory variables.


 Comments   
Comment by Githook User [ 05/Feb/19 ]

Author:

{'name': 'Blake Oler', 'email': 'blake.oler@mongodb.com', 'username': 'BlakeIsBlake'}

Message: SERVER-38473 Fix lint
Branch: master
https://github.com/mongodb/mongo/commit/5647fa34c70e1d7d255fcd85840115e95d81466b

Comment by Githook User [ 05/Feb/19 ]

Author:

{'name': 'Blake Oler', 'email': 'blake.oler@mongodb.com', 'username': 'BlakeIsBlake'}

Message: SERVER-38473 Create DatabaseShardingStateLock to ensure concurrency around DatabaseShardingState usage
Branch: master
https://github.com/mongodb/mongo/commit/75d64b2e170abf1d22058e3afffdaf872a4c067c

Generated at Thu Feb 08 04:49:03 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.