Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-42910

Oplog query with higher timestamp but lower term than the sync source shouldn't time out due to afterClusterTime

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.0.13, 4.2.1, 4.3.1
    • Component/s: Replication
    • Labels:
      None
    • Backwards Compatibility:
      Fully Compatible
    • Operating System:
      ALL
    • Backport Requested:
      v4.2, v4.0
    • Sprint:
      Repl 2019-08-26

      Description

      SERVER-33812 attach afterClusterTime to all oplog queries. A node with higher timestamp but lower term than the sync source should roll back due to an empty batch, e.g. the old primary has (ts: 9, term: 1), while the new primary has (ts: 8, term: 2). However, the oplog query failed with MaxTimeMSExpired added in SERVER-35200. I believe the query times out while waiting for afterClusterTime. In production, it's very likely the old primary will roll back when new writes arrive with even higher timestamp, maybe by the periodic no-op writer. However, it is still a liveness issue.

       

       

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              siyuan.zhou Siyuan Zhou
              Reporter:
              siyuan.zhou Siyuan Zhou
              Participants:
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: