Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-36514

Hold lock on oplog as soon as optime is reserved



    • Type: Task
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Won't Fix
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Replication
    • Labels:


      There exists a potential 3-way deadlock currently.

      1. First, perform an insert into a replicated collection using insertDocuments(). An optime is generated, but not committed. If another write transaction commits after this with a later optime, a "hole" is created by the timestamped write that is not yet committed.

      2. Someone tries to read the oplog, for example a secondary querying against its sync source. Oplog reads call waitForAllEarlierOplogWritesToBeVisible() to wait for all uncommitted operations to become committed and visible.

      • This waits for the uncommitted insert in step 1 to be committed while holding an IS lock on the oplog (and thus on the 'local' database).

      3. A DDL operation is received on the "local" database, and enqueues a request for an X lock on the 'local' database.

      Now the local database DDL operation in thread 3 blocks behind the IX lock held by thread 2. Thread 2 can't complete until the insert in thread 1 completes. And the insert in thread 1 winds up blocking when it goes to acquire an IX lock on the local database in order to write its oplog entry. That lock acquisition is blocked by the pending X lock request by thread 3.

      This deadlock was made far more likely to hit as part of SERVER-35365, which changed all mapReduce commands to perform DDL operations on the local database.

      One way to fix this would be if thread 1 took the lock on the oplog before creating the oplog hole and held on to that lock until the oplog entry was written and the hole removed.


          Issue Links



              spencer Spencer Brody (Inactive)
              spencer Spencer Brody (Inactive)
              0 Vote for this issue
              9 Start watching this issue