Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-58636

Initial syncing node can miss final oplog entry when calculating stopTimestamp against a secondary sync source

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Fixed
    • Affects Version/s: 4.4.7, 5.0.0-rc8
    • Fix Version/s: 4.4.11, 5.1.0-rc0, 5.0.5
    • Component/s: None
    • Labels:
      None
    • Backwards Compatibility:
      Fully Compatible
    • Operating System:
      ALL
    • Backport Requested:
      v5.0, v4.4, v4.2, v4.0
    • Sprint:
      Repl 2021-08-09, Repl 2021-08-23, Repl 2021-09-06, Repl 2021-09-20, Repl 2021-10-04, Repl 2021-10-18
    • Linked BF Score:
      115

      Description

      (10/4/21 discoverability update: We figured out this happens due to the relevant secondary read getting a readSource of lastApplied (per most other use cases). Making that an untimestamped read solves the problem.)

      The calls to applyCommand_inlock and scheduleOplogWrites in secondary application are not atomic. So it's possible that when an initial syncing node chooses a secondary as a sync source, it sees that a command like drop has been applied, but misses the oplog entry when calculating the stopTimestamp.

      The following can happen:

      1. Initial syncing node sees the drop on collection foo has been applied on a secondary sync source (but no oplog write yet). The collectionCloner will stop with NamespaceNotFound error, expecting us to apply the drop during the initial sync oplog application phase.
      2. Initial syncing node fetches the lastApplied of the sync source, setting the stopTimestamp to T.
      3. The sync source writes the oplog for the drop from (1) at timestamp T + 1.
      4. The initial syncing node reaches stopTimestamp T, transitions to secondary, and applies the drop, and crashes because the collection does not exist.

        Attachments

          Activity

            People

            Assignee:
            vesselina.ratcheva Vesselina Ratcheva
            Reporter:
            jason.chan Jason Chan
            Participants:
            Votes:
            0 Vote for this issue
            Watchers:
            12 Start watching this issue

              Dates

              Created:
              Updated:
              Resolved: