Rollback may not happen to primary when resumed from sleep state

XMLWordPrintableJSON

    • Type: Bug
    • Resolution: Duplicate
    • Priority: Major - P3
    • None
    • Affects Version/s: 2.4.4
    • Component/s: Replication
    • None
    • Environment:
      Ubuntu 12.04 on a Dell XPS13 laptop
    • ALL
    • Hide

      1) A script is inserting one document per second. (The documents happen to contain output of unix date command.) See rstest.sh.

      2) The primary node (27001 in the log) is running in foreground in the shell and is stopped with Ctrl-Z.

      3) Another node (27002) becomes primary as designed.

      3) More inserts are directed to the new primary (via mongos node).

      4) Inserts are stopped. No writes are happening anywhere.

      5) A few minutes later the original primary continues (type fg in the shell). It now:

      6) proceeds to write to oplog one pending write that it was handling when stopped (this is probably not reproducible 100% of the time, but with some statistical certainty. You can remove the sleep 1 from rstest.sh to increase probability.)

      7) the oplog write gets the timestamp of current time, not the time when the insert was sent from client and this node was still primary.

      8) the old primary sends heartbeats to other members claiming to be primary and also claiming to have most recent OpTime in its oplog.

      9) the new/current primary freaks out, yields, and rolls back all the inserts it had written in (3) above. Joins the old primary as a secondary.

      Note: in the syslog there are also repeated calls to FindCommonPoint for he next 2 minutes or so, complaining about a 10 seconds gap in oplogs. I didn't troubleshoot what that is about, but probably there may be another bug lurking somewhere?

      Show
      1) A script is inserting one document per second. (The documents happen to contain output of unix date command.) See rstest.sh. 2) The primary node (27001 in the log) is running in foreground in the shell and is stopped with Ctrl-Z. 3) Another node (27002) becomes primary as designed. 3) More inserts are directed to the new primary (via mongos node). 4) Inserts are stopped. No writes are happening anywhere. 5) A few minutes later the original primary continues (type fg in the shell). It now: 6) proceeds to write to oplog one pending write that it was handling when stopped (this is probably not reproducible 100% of the time, but with some statistical certainty. You can remove the sleep 1 from rstest.sh to increase probability.) 7) the oplog write gets the timestamp of current time, not the time when the insert was sent from client and this node was still primary. 8) the old primary sends heartbeats to other members claiming to be primary and also claiming to have most recent OpTime in its oplog. 9) the new/current primary freaks out, yields, and rolls back all the inserts it had written in (3) above. Joins the old primary as a secondary. Note: in the syslog there are also repeated calls to FindCommonPoint for he next 2 minutes or so, complaining about a 10 seconds gap in oplogs. I didn't troubleshoot what that is about, but probably there may be another bug lurking somewhere?
    • None
    • 3
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      A replica set primary is stopped with Ctrl-Z. After it is resumed, it completes one final write operation, returns to the replica set claiming to be primary and claiming to have a more recent state of the oplog. This causes the current (real) primary to yield and to roll back everything it has written in the meantime.

      Proposed fix: The real primary node should be able to know that it has been elected with majorty support, and that writes accepted after that election are valid, and the returning primary should be rejected.

        1. rollbacks-when-old-primary-comes-back.log
          67 kB
        2. rollbacks-when-old-primary-comes-back-take2.log
          66 kB
        3. rstest.sh
          0.2 kB

              Assignee:
              Unassigned
              Reporter:
              Henrik Ingo (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

                Created:
                Updated:
                Resolved: