Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-53482

Invariant failure: opCtx != nullptr && _opCtx == nullptr in src/mongo/db/client.cpp, line: 126

    • Type: Icon: Question Question
    • Resolution: Duplicate
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: 4.4.1, 4.4.2
    • Component/s: None
    • Labels:
      None

      I've seen the above on three different occasions since upgrading to to 4.4.x.

      The first time was on 4.4.1.  I was adding a hidden node to the replica set.  The primary locked up for a few seconds and then all three nodes crashed with the same fatal error.

      The second time was on 4.4.x and I don't recall anything special going on.  It only crashed one instance that time.

      The third time was on 4.4.2-ent (Cloud Manager Backup limited license) and it happened unprovoked.  It started with the primary locking up for about 10 seconds (no log activity, failed health check) and then the remainder of the servers crashed as soon as they were elected primary.

      In all three cases, the ctx in the log was: TopologyVersionObserver and the most recent case also had logs with ctx=monitoring-keys-for-HMAC.

      I could not find any similar issues regarding this particular invariant failure (client.cpp).

      Happy to provide additional context/logs if you feel this is worth investigating, otherwise I will likely have to revert to 4.2.  The above invariant failure has only happened on 3 occasions, but the mongod freeze they started with has happened over a dozen times in the last 2 months.

        1. 2020-11-09T00_16_19.496.zip
          37.26 MB
        2. 2020-12-19T21_21_13.189.zip
          37.17 MB

            Assignee:
            dmitry.agranat@mongodb.com Dmitry Agranat
            Reporter:
            b.granetzke@fetchrewards.com Brian Granetzke
            Votes:
            0 Vote for this issue
            Watchers:
            13 Start watching this issue

              Created:
              Updated:
              Resolved: