Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-39092

ReplicationStateTransitionLockGuard should be resilient to exceptions thrown before waitForLockUntil()



    • Type: Bug
    • Status: In Progress
    • Priority: Major - P3
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: 4.1 Required
    • Component/s: Replication
    • Labels:
    • Operating System:
    • Sprint:
      Repl 2019-02-25, Repl 2019-03-11
    • Linked BF Score:


      In ReplicationStateTransitionLockGuard destructor, we have an invariant which checks for  lock result not equal to LOCK_WAITING before unlocking the rstl lock. But, for the below valid event sequence,  we would be calling the ReplicationStateTransitionLockGuard destructor with _result set as "LOCK_WAITING" to unlock the rstl lock.

      1) Thread A  issues stepdown cmd ( can be triggered either by heartbeat or user) .

      2) Thread B issues conditional stepdown triggered by the user.

      3) Thread A marks thread B as killed.

      3) Thread A acquires the rstl lock in X mode.

      4) Thread B enqueues the rstl lock and set the _result as LOCK_WAITING.

      5) Thread B calls ReplicationStateTransitionLockGuard::waitForLockUntil with non-zero timeout.

      6) Thread B wait for rstl lock times out and lead to calling ReplicationStateTransitionLockGuard destructor with _result as "LOCK_WAITING" leading to invariant failure. 

      Note: There is no need to worry that the rstl lock state won't be cleaned up, because unlockOnErrorGuard in LockerImpl::lockComplete will clean up the state in the lock manger and in the locker on any failed lock attempts. Effectively when we hit the ReplicationStateTransitionLockGuard destructor, there is nothing to clean up for the above scenario.




            • Votes:
              0 Vote for this issue
              6 Start watching this issue


              • Created: