Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-38089

Fatal error 40292 due to missing oplog entry at the start time stamp

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Critical - P2
    • Resolution: Cannot Reproduce
    • Affects Version/s: 4.0.1, 4.0.4
    • Fix Version/s: None
    • Component/s: Replication
    • Labels:
    • Operating System:
      ALL
    • Steps To Reproduce:
      Hide

      I have the various log, turtle, wt, and diagnostic data files. But I have not idea how to recreate it. 

      Am attaching all the pertinent data,  I could think of.

      • mongod-1.tgz: Secondary (with the problem)
      • mongod-0.tgz: Primary

      We are using monogdb cloud manager. The tar file contains logs of the automation agents as well.

      Since the automation agent keeps trying to restart the server, you'd find repetitive entries. It contains errors from 4.0.1 and from 4.0.4.

       

      Show
      I have the various log, turtle, wt, and diagnostic data files. But I have not idea how to recreate it.  Am attaching all the pertinent data,  I could think of. mongod-1.tgz: Secondary (with the problem) mongod-0.tgz: Primary We are using monogdb cloud manager. The tar file contains logs of the automation agents as well. Since the automation agent keeps trying to restart the server, you'd find repetitive entries. It contains errors from 4.0.1 and from 4.0.4.  

      Description

       

      A series of weird happenings have ended up in a situation triggering Fatal Error 40292 in mongo/db/repl/replication_recovery.cpp line 133.

      We have a PSA configuration. The entire issue started when we our DB approached (or hit) the disk limit. We resized the disk about mid October. From there on we've not been able to have a stable system. The servers keep going down.

      I got involved two days ago, so I don't have the entire history.

      I found that our system was triggering WT-4335, with version 4.0.1. I upgraded to 4.0.4 yesterday. I am guessing now the fatal error is triggering at the root cause. (The WT-4335 was triggering after all the oplog replay, now the error is at the start itself.)

      2018-11-12T05:41:22.612+0000 I FTDC [initandlisten] Initializing full-time diagnostic data capture with directory '/mongo/rs0_8/diagnostic.data'
      2018-11-12T05:41:22.614+0000 I REPL [initandlisten] Rollback ID is 12
      2018-11-12T05:41:22.614+0000 I REPL [initandlisten] Recovering from stable timestamp: Timestamp(1539794142, 116) (top of oplog: { ts: Timestamp(1541830881, 1), t: 85 }, appliedThrough: { ts: Timestamp(0, 0), t: -1 }, TruncateAfter: Timestamp(0, 0))
      2018-11-12T05:41:22.614+0000 I REPL [initandlisten] Starting recovery oplog application at the stable timestamp: Timestamp(1539794142, 116)
      2018-11-12T05:41:22.614+0000 I REPL [initandlisten] Replaying stored operations from { : Timestamp(1539794142, 116) } (exclusive) to { : Timestamp(1541830881, 1) } (inclusive).
      {{2018-11-12T05:41:22.617+0000 F REPL [initandlisten] Oplog entry at { : Timestamp(1539794142, 116) } is missing; actual entry found is { : Timestamp(1539794616, 312) }}}
      2018-11-12T05:41:22.617+0000 F - [initandlisten] Fatal Assertion 40292 at src/mongo/db/repl/replication_recovery.cpp 134
      {{2018-11-12T05:41:22.617+0000 F - [initandlisten] }}

      For reference, here is log with 4.0.1 (prior to the above error).

      2018-11-11T10:23:54.182+0000 I ROLLBACK [rsBackgroundSync] Rolling back to the stable timestamp. StableTimestamp: Timestamp(1539794142, 116) Initial Data Timestamp: Timestamp(1539794142, 116)2018-11-11T10:23:54.275+0000 E STORAGE [rsBackgroundSync] WiredTiger error (22) [1541931834:274955][23314:0x7f35f5bd7700], WT_CONNECTION.rollback_to_stable: rollback_to_stable illegal with active transactions: Invalid argument Raw: [1541931834:274955][23314:0x7f35f5bd7700], WT_CONNECTION.rollback_to_stable: rollback_to_stable illegal with active transactions: Invalid argument2018-11-11T10:23:54.275+0000 F ROLLBACK [rsBackgroundSync] RecoverToStableTimestamp failed. :: caused by :: UnrecoverableRollbackError: Error rolling back to stable. Err: Invalid argument

       

       

       

        Attachments

        1. image-2018-11-13-21-54-57-681.png
          image-2018-11-13-21-54-57-681.png
          337 kB
        2. mongod-0.tgz
          78.95 MB
        3. mongod-1.tgz
          59.13 MB
        4. mongodb-0.log-2018-11-14
          30.12 MB
        5. mongodb-1.log-2018-11-14
          6.53 MB
        6. node1.png
          node1.png
          313 kB

          Issue Links

            Activity

              People

              Assignee:
              daniel.hatcher Danny Hatcher (Inactive)
              Reporter:
              DrArunL Arun Lakhotia [X]
              Participants:
              Votes:
              0 Vote for this issue
              Watchers:
              13 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: