Checkpoint pick-up / Oplog race leads to inconsistent metadata

XMLWordPrintableJSON

    • Storage Engines - Foundations
    • None
    • 5

      An investigation done in SERVER-116085 uncovered a potential race condition when a follower installs a checkpoint. During checkpoint installation, ingest constituents are created and layered:, colgroup:, and file: entries are added to shared metadata without holding a schema lock.

      If an oplog schema op runs concurrently, it may read the metadata while it is only partially updated. For example, the oplog may observe that a layered: entry already exists and assume the corresponding metadata is fully present. However, when it later attempts to validate the file: entry, it may not yet exist, leading to an assertion failure.

      Problem
      Checkpoint installation changes shared metadata in a non-atomic way relative to other schema operations.

      Solution

      Take a schema lock while installing a checkpoint, ideally wrapping the entire checkpoint pickup/install phase.

            Assignee:
            Sid Mahajan
            Reporter:
            Sid Mahajan
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated:
              Resolved: