Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-35367

Hold locks in fewer callers of waitForAllEarlierOplogWritesToBeVisible()

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Fixed
    • Affects Version/s: 4.0.0-rc1, 4.1.1
    • Fix Version/s: 4.0.2, 4.1.2
    • Component/s: Storage
    • Labels:
    • Backwards Compatibility:
      Fully Compatible
    • Operating System:
      ALL
    • Backport Requested:
      v4.0, v3.6
    • Sprint:
      Repl 2018-07-30, Repl 2018-08-13, Repl 2018-08-27
    • Case:
    • Linked BF Score:
      36

      Description

      ReplicationCoordinatorExternalStateImpl::waitForAllEarlierOplogWritesToBeVisible() holds a collection lock on the oplog while doing a blocking wait. This can cause a hang described below:

      1. First, perform an insert into a replicated collection using insertDocuments(). An optime is generated, but not committed. If another write occurs after this at a later optime, a "hole" is created by the timestamped write is that is not yet committed.

      2. A reader using readConcern "atClusterTime" or "afterClusterTime" begins a read. This uses ReplicationCoordinatorExternalStateImpl::waitForAllEarlierOplogWritesToBeVisible() to wait for all uncommitted operations to become committed and visible.

      • This waits for the uncommitted insert in step 1 to be commited while holding a DBLock("local", MODE_IS)

      3. A dropCollection command is received on the "local" database, and enqueues a DBLock("local", MODE_X).

      4. The first insert completes the insert in the storage engine and attempts to write the oplog entry at the generated optime. It attempts to acquire a DBLock("local", MODE_IX).

      • The previously enqueued dropCollection operation prevents the insert from acquiring the "local" database lock.
      • waitForAllEarlierOplogWritesToBeVisible() holds its collection lock while waiting the insert to become visible, which waits behind the dropCollection operation

      This method should be redesigned so that a collection lock is not required to be held while waiting for the last oplog entry to become visible.

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              spencer Spencer Brody (Inactive)
              Reporter:
              louis.williams Louis Williams
              Participants:
              Votes:
              0 Vote for this issue
              Watchers:
              13 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: