Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-45010

Clean shutdown after rollbackViaRefetch with eMRC=false can cause us to incorrectly overwrite unstable checkpoints

    • Type: Icon: Bug Bug
    • Resolution: Fixed
    • Priority: Icon: Major - P3 Major - P3
    • 4.2.4, 4.3.3
    • Affects Version/s: 4.2.1, 4.3.2
    • Component/s: Replication
    • Fully Compatible
    • ALL
    • v4.2
    • Repl 2019-12-30, Repl 2020-01-13, Repl 2020-01-27
    • 50

      At the end of rollbackViaRefetch when eMRC=false, we will take an unstable checkpoint before proceeding. After this is complete, we enter RECOVERING state and try to catch up our oplog. If the server is shut down cleanly after we take these unstable checkpoints, though, upon shutdown we will explicitly take another checkpoint, which will be stable because we have a stable timestamp set. We do set the initialDataTimestamp ahead of the stable timestamp after rollbackViaRefetch so that no checkpoints are taken, but the logic that normally handles that in the WTCheckpointThread is bypassed during shutdown, so we take a full stable checkpoint regardless of the initialDataTimestamp value. Taking a stable checkpoint and recovering from it on restart in this case causes us to break the assumptions required for the correctness of rollbackViaRefetch with eMRC=false. See SERVER-38925 for an explanation of why these unstable checkpoints are necessary.

        1. bf-15306-repro.diff
          3 kB
          William Schultz

            Assignee:
            william.schultz@mongodb.com William Schultz (Inactive)
            Reporter:
            william.schultz@mongodb.com William Schultz (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            10 Start watching this issue

              Created:
              Updated:
              Resolved: