[SERVER-67643] Arbiter crash with Invariant failure Created: 29/Jun/22  Updated: 09/Aug/22  Resolved: 09/Aug/22

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: William N/A Assignee: Chris Kelly
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Operating System: ALL
Steps To Reproduce:

Below the log with backtrace:{}

2022-06-28T17:01:43.083+0000 F -        [replexec-1053] Invariant failure opTime.getTimestamp().getInc() > 0 str::stream() << "Impossible optime received: " << opTime.toString() src/mongo/db/repl/replication_coordinator_impl.cpp 1172
2022-06-28T17:01:43.084+0000 F -        [replexec-1053]***aborting after invariant() failure
2022-06-28T17:01:43.146+0000 F -        [replexec-1053] Got signal: 6 (Aborted). 0x563ffd59db11 0x563ffd59cd29 0x563ffd59d20d 0x7f50a5d34690 0x7f50a598e277 0x7f50a598f968 0x563ffbcede31 0x563ffc46bb27 0x563ffc46ca97 0x563ffc48538f 0x563ffd0156f6 0x563ffd018ea3 0x563ffd01938b 0x563ffcd7c27c 0x563ffcd7c77c 0x563ffcd7d166 0x563ffd6ad240 0x7f50a5d2cde5 0x7f50a5a56bad
----- BEGIN BACKTRACE -----
{{

Unknown macro: {"backtrace"}

,{"b":"563FFB359000","o":"2243D29"},{"b":"563FFB359000","o":"224420D"},{"b":"7F50A5D25000","o":"F690"},{"b":"7F50A5958000","o":"36277","s":"gsignal"},{"b":"7F50A5958000","o":"37968","s":"abort"},{"b":"563FFB359000","o":"994E31","s":"ZN5mongo17invariantOKFailedEPKcRKNS_6StatusES1_j"},{"b":"563FFB359000","o":"1112B27","s":"_ZN5mongo4repl26ReplicationCoordinatorImpl30_setMyLastAppliedOpTime_inlockERKNS0_6OpTimeEbNS0_22ReplicationCoordinator15DataConsistencyE"},{"b":"563FFB359000","o":"1113A97","s":"_ZN5mongo4repl26ReplicationCoordinatorImpl26_advanceCommitPoint_inlockERKNS0_6OpTimeE"},{"b":"563FFB359000","o":"112C38F","s":"_ZN5mongo4repl26ReplicationCoordinatorImpl24_handleHeartbeatResponseERKNS_8executor12TaskExecutor25RemoteCommandCallbackArgsEi"},{"b":"563FFB359000","o":"1CBC6F6"},{"b":"563FFB359000","o":"1CBFEA3","s":"_ZN5mongo8executor22ThreadPoolTaskExecutor11runCallbackESt10shared_ptrINS1_13CallbackStateEE"},{"b":"563FFB359000","o":"1CC038B"},{"b":"563FFB359000","o":"1A2327C","s":"_ZN5mongo10ThreadPool10_doOneTaskEPSt11unique_lockISt5mutexE"},{"b":"563FFB359000","o":"1A2377C","s":"_ZN5mongo10ThreadPool13_consumeTasksEv"},{"b":"563FFB359000","o":"1A24166","s":"_ZN5mongo10ThreadPool17_workerThreadBodyEPS0_RKNSt7_cxx1112basic_stringIcSt11char_traitsIcESaIcEEE"},{"b":"563FFB359000","o":"2354240"},{"b":"7F50A5D25000","o":"7DE5"},{"b":"7F50A5958000","o":"FEBAD","s":"clone"}],"processInfo":{ "mongodbVersion" : "3.6.8", "gitVersion" : "6bc9ed599c3fa164703346a22bad17e33fa913e4", "compiledModules" : [], "uname" :

Unknown macro: { "sysname" }

, "somap" : [ { "b" : "563FFB359000", "elfType" : 3, "buildId" : "C84EF0975B8E908850A77231839734AFF5317D2E" }, { "b" : "7FFC2BFF3000", "elfType" : 3, "buildId" : "C66D048A3FE5381D0F296EFC151FC9381952705A" }, { "b" : "7F50A6F34000", "path" : "/lib64/libresolv.so.2", "elfType" : 3, "buildId" : "9E5E0BF5F22DE7555BC4B9853240817147489258" }, { "b" : "7F50A6AD6000", "path" : "/lib64/libcrypto.so.10", "elfType" : 3, "buildId" : "0B7F7487280FE68AF9302A282FAE37776A99BC80" }, { "b" : "7F50A6865000", "path" : "/lib64/libssl.so.10", "elfType" : 3, "buildId" : "9C4EB34A346260F2A77746F4E5ED837619137DB7" }, { "b" : "7F50A6661000", "path" : "/lib64/libdl.so.2", "elfType" : 3, "buildId" : "0E5CD5BAA5EE8BF3648A5031B088F9A78C89364F" }, { "b" : "7F50A6459000", "path" : "/lib64/librt.so.1", "elfType" : 3, "buildId" : "F2701E2A24459D5B55DF5549D585F091E7BCF07A" }, { "b" : "7F50A6157000", "path" : "/lib64/libm.so.6", "elfType" : 3, "buildId" : "07FB92AFEF1756F093371CE60C3AE85DD3A06325" }, { "b" : "7F50A5F41000", "path" : "/lib64/libgcc_s.so.1", "elfType" : 3, "buildId" : "A03C9A80E995ED5F43077AB754A258FA0E34C3CD" }, { "b" : "7F50A5D25000", "path" : "/lib64/libpthread.so.0", "elfType" : 3, "buildId" : "D973C39D1900DC61D8519C653C3BC405692DE563" }, { "b" : "7F50A5958000", "path" : "/lib64/libc.so.6", "elfType" : 3, "buildId" : "AF310F56618FC1EF9158973484F60942F11CC0FB" }, { "b" : "7F50A714D000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3, "buildId" : "8402047FD4A85B3CD1142346EA06BCD6E15A82A3" }, { "b" : "7F50A5742000", "path" : "/lib64/libz.so.1", "elfType" : 3, "buildId" : "89C6AF118B6B4FB6A73AE1813E2C8BDD722956D1" }, { "b" : "7F50A54F5000", "path" : "/usr/lib64/libgssapi_krb5.so.2", "elfType" : 3, "buildId" : "B4C91D3D76D4819DDD1F7D2360F600AC13280628" }, { "b" : "7F50A520D000", "path" : "/usr/lib64/libkrb5.so.3", "elfType" : 3, "buildId" : "295B8EDE0A3D565218B9086DECC047FB28F235E5" }, { "b" : "7F50A500A000", "path" : "/usr/lib64/libcom_err.so.2", "elfType" : 3, "buildId" : "AB007F5DF96C66E515542598F5BE1429ED63D86F" }, { "b" : "7F50A4DD7000", "path" : "/usr/lib64/libk5crypto.so.3", "elfType" : 3, "buildId" : "68572D6C73CAEE5B0F36E50873B092275DE5697C" }, { "b" : "7F50A4BC9000", "path" : "/usr/lib64/libkrb5support.so.0", "elfType" : 3, "buildId" : "2B9C3C8ED0AC9CE675617BDB4174D5EACF1FB0FF" }, { "b" : "7F50A49C6000", "path" : "/lib64/libkeyutils.so.1", "elfType" : 3, "buildId" : "37A58210FA50C91E09387765408A92909468D25B" }, { "b" : "7F50A47A5000", "path" : "/usr/lib64/libselinux.so.1", "elfType" : 3, "buildId" : "F5054DC94443326819FBF3065CFDF5E4726F57EE" } ] }}}}
 mongod-cluster-canais(_ZN5mongo15printStackTraceERSo+0x41) [0x563ffd59db11]
 mongod-cluster-canais(+0x2243D29) [0x563ffd59cd29]
 mongod-cluster-canais(+0x224420D) [0x563ffd59d20d]
 libpthread.so.0(+0xF690) [0x7f50a5d34690]
 libc.so.6(gsignal+0x37) [0x7f50a598e277]
 libc.so.6(abort+0x148) [0x7f50a598f968]
 mongod-cluster-canais(_ZN5mongo17invariantOKFailedEPKcRKNS_6StatusES1_j+0x0) [0x563ffbcede31]
 mongod-cluster-canais(_ZN5mongo4repl26ReplicationCoordinatorImpl30_setMyLastAppliedOpTime_inlockERKNS0_6OpTimeEbNS0_22ReplicationCoordinator15DataConsistencyE+0x347) [0x563ffc46bb27]
 mongod-cluster-canais(_ZN5mongo4repl26ReplicationCoordinatorImpl26_advanceCommitPoint_inlockERKNS0_6OpTimeE+0x67) [0x563ffc46ca97]
 mongod-cluster-canais(_ZN5mongo4repl26ReplicationCoordinatorImpl24_handleHeartbeatResponseERKNS_8executor12TaskExecutor25RemoteCommandCallbackArgsEi+0x29CF) [0x563ffc48538f]
 mongod-cluster-canais(+0x1CBC6F6) [0x563ffd0156f6]
 mongod-cluster-canais(_ZN5mongo8executor22ThreadPoolTaskExecutor11runCallbackESt10shared_ptrINS1_13CallbackStateEE+0x1B3) [0x563ffd018ea3]
 mongod-cluster-canais(+0x1CC038B) [0x563ffd01938b]
 mongod-cluster-canais(_ZN5mongo10ThreadPool10_doOneTaskEPSt11unique_lockISt5mutexE+0x14C) [0x563ffcd7c27c]
 mongod-cluster-canais(_ZN5mongo10ThreadPool13_consumeTasksEv+0xBC) [0x563ffcd7c77c]
 mongod-cluster-canais(ZN5mongo10ThreadPool17_workerThreadBodyEPS0_RKNSt7_cxx1112basic_stringIcSt11char_traitsIcESaIcEEE+0x96) [0x563ffcd7d166]
 mongod-cluster-canais(+0x2354240) [0x563ffd6ad240]
 libpthread.so.0(+0x7DE5) [0x7f50a5d2cde5]
 libc.so.6(clone+0x6D) [0x7f50a5a56bad]
-----  END BACKTRACE  -----

Participants:

 Description   

Hi All,

We're facing recurrent issues with an Arbiter that crashes with Invariant failure. It starts without any error when manually started.

Cluster Version: 3.6.6

Arbiter version: 3.6.8

rs.status()

{{{}}
    "set" : "RS",
    "date" : ISODate("2022-06-29T10:25:53.269-03:00"),
    "myState" : 2,
    "term" : 16,
    "syncingTo" : "db-mongo-prda-canais20.agiplan.aws.local:27017",
    "syncSourceHost" : "db-mongo-prda-canais20.agiplan.aws.local:27017",
    "syncSourceId" : 0,
    "heartbeatIntervalMillis" : 2000,
    "optimes" : {
        "lastCommittedOpTime" : {
            "ts" : Timestamp(1656509153, 38),
            "t" : 16
        },
        "readConcernMajorityOpTime" : {
            "ts" : Timestamp(1656509153, 38),
            "t" : 16
        },
        "appliedOpTime" : {
            "ts" : Timestamp(1656509153, 38),
            "t" : 16
        },
        "durableOpTime" : {
            "ts" : Timestamp(1656509153, 38),
            "t" : 16
{{        }}}
    },
    "members" : [
        {
            "_id" : 0,
            "name" : "db-mongo-prda-canais20.agiplan.aws.local:27017",
            "health" : 1,
            "state" : 1,
            "stateStr" : "PRIMARY",
            "uptime" : 6010698,
            "optime" : {
                "ts" : Timestamp(1656509152, 83),
                "t" : 16
            },
            "optimeDurable" : {
                "ts" : Timestamp(1656509152, 83),
                "t" : 16
            },
            "optimeDate" : ISODate("2022-06-29T10:25:52.000-03:00"),
            "optimeDurableDate" : ISODate("2022-06-29T10:25:52.000-03:00"),
            "lastHeartbeat" : ISODate("2022-06-29T10:25:52.441-03:00"),
            "lastHeartbeatRecv" : ISODate("2022-06-29T10:25:52.627-03:00"),
            "pingMs" : 0,
            "lastHeartbeatMessage" : "",
            "syncingTo" : "",
            "syncSourceHost" : "",
            "syncSourceId" : -1,
            "infoMessage" : "",
            "electionTime" : Timestamp(1650497816, 31),
            "electionDate" : ISODate("2022-04-20T20:36:56.000-03:00"),
            "configVersion" : 9
        },
        {
            "_id" : 1,
            "name" : "db-mongo-prdb-canais20.agiplan.aws.local:27017",
            "health" : 1,
            "state" : 2,
            "stateStr" : "SECONDARY",
            "uptime" : 6010701,
            "optime" : {
                "ts" : Timestamp(1656509153, 38),
                "t" : 16
            },
            "optimeDate" : ISODate("2022-06-29T10:25:53.000-03:00"),
            "syncingTo" : "db-mongo-prda-canais20.agiplan.aws.local:27017",
            "syncSourceHost" : "db-mongo-prda-canais20.agiplan.aws.local:27017",
            "syncSourceId" : 0,
            "infoMessage" : "",
            "configVersion" : 9,
            "self" : true,
            "lastHeartbeatMessage" : ""
        },
        {
            "_id" : 2,
            "name" : "db-mongo-prd-arbiter.agiplan.aws.local:27019",
            "health" : 1,
            "state" : 7,
            "stateStr" : "ARBITER",
            "uptime" : 2293,
            "lastHeartbeat" : ISODate("2022-06-29T10:25:52.371-03:00"),
            "lastHeartbeatRecv" : ISODate("2022-06-29T10:25:52.704-03:00"),
            "pingMs" : 114,
            "lastHeartbeatMessage" : "",
            "syncingTo" : "",
            "syncSourceHost" : "",
            "syncSourceId" : -1,
            "infoMessage" : "",
            "configVersion" : 9
{{        }}}
    ],
    "ok" : 1,
    "operationTime" : Timestamp(1656509153, 38),
    "$clusterTime" : {
        "clusterTime" : Timestamp(1656509153, 38),
        "signature" : {
            "hash" : BinData(0,"kw3MZZFfQCXdcLhsJ1dshb2/ShY="),
            "keyId" : NumberLong("7088229749746565133")
{{        }}}
{{    }}}
}

rs.conf()

{
    "_id" : "RS",
    "version" : 9,
    "protocolVersion" : 1,
    "members" : [
        {
            "_id" : 0,
            "host" : "db-mongo-prda-canais20.agiplan.aws.local:27017",
            "arbiterOnly" : false,
            "buildIndexes" : true,
            "hidden" : false,
            "priority" : 2,
            "tags" :

Unknown macro: {                              }

,
            "slaveDelay" : 0,
            "votes" : 1
        },
        {
            "_id" : 1,
            "host" : "db-mongo-prdb-canais20.agiplan.aws.local:27017",
            "arbiterOnly" : false,
            "buildIndexes" : true,
            "hidden" : false,
            "priority" : 1,
            "tags" :

,
            "slaveDelay" : 0,
            "votes" : 1
        },
        {
            "_id" : 2,
            "host" : "db-mongo-prd-arbiter.agiplan.aws.local:27019",
            "arbiterOnly" : true,
            "buildIndexes" : true,
            "hidden" : true,
            "priority" : 0,
            "tags" :

Unknown macro: {                              }

,
            "slaveDelay" : 0,
            "votes" : 1
        }
    ],
    "settings" : {
        "chainingAllowed" : true,
        "heartbeatIntervalMillis" : 2000,
        "heartbeatTimeoutSecs" : 10,
        "electionTimeoutMillis" : 10000,
        "catchUpTimeoutMillis" : -1,
        "catchUpTakeoverDelayMillis" : 30000,
        "getLastErrorModes" :

Unknown macro: {                      }

,
        "getLastErrorDefaults" :

Unknown macro: {             "w" }

,
        "replicaSetId" : ObjectId("5bac3393ba858f462d11d9f4")
    }
}



 Comments   
Comment by Chris Kelly [ 09/Aug/22 ]

We haven’t heard back from you for some time, so I’m going to close this ticket. If this is still an issue for you, please provide additional information and we will reopen the ticket.

Comment by Chris Kelly [ 20/Jul/22 ]

Hi William,

Just wanted to check in - if you upgraded to a newer version, has your issue persisted?

Christopher

Comment by Chris Kelly [ 01/Jul/22 ]

Hi William,

MongoDB v3.6 reached end of life in April 2022 and is no longer supported. I would recommend upgrading to version v4.2 or newer first and foremost.

I'm not finding much information around this besides SERVER-50949, which may or may not be related currently. It looks like that was resolved in v5.0, but it may not be exactly what you're experiencing. I wouldn't expect this to be happening on an arbiter though, so I'm going to check up more on that.

Still, I would recommend upgrading if possible just in case your issue is resolved already, and 5.0 seems like it may have a resolution to this. Also, I was having a hard time parsing your stack trace as it was copied into the Jira ticket - could you instead upload the following to the ticket (including the stack trace in the logs):

  • the mongo logs
  • the $dbpath/diagnostic.data directory (the contents are described here)

This would just help me verify whether it is related to SERVER-50949, but keep in mind that I am still going to recommend an upgrade first on your end.

Christopher

 

Generated at Thu Feb 08 06:08:40 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.