[SERVER-53537] MongoDB crashed after state change Created: 29/Dec/20 Updated: 27/Jan/21 Resolved: 27/Jan/21 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | 4.4.2 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Konstantin Krasnov | Assignee: | Dmitry Agranat |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||
| Operating System: | ALL | ||||||||||||||||
| Participants: | |||||||||||||||||
| Case: | (copied to CRM) | ||||||||||||||||
| Description |
|
Hello, I need help as I don't understand why mongod crashed. Mongod crashed after state change from Secondary to Primary. What could be the reason for the crash? Log Error formatting macro: code: java.lang.StackOverflowError
{"t": {"$date":"2020-12-28T00:33:36.420+03:00"},"s":"I", "c":"ELECTION", "id":21450, "ctx":"ReplCoord-4690","msg":"Election succeeded, assuming primary role","attr":{"term":182}} ,"s":"I", "c":"REPL", "id":21358, "ctx":"ReplCoord-4690","msg":"Replica set state transition","attr":{"newState":"PRIMARY","oldState":"SECONDARY"}} ,"s":"I", "c":"REPL", "id":21106, "ctx":"ReplCoord-4690","msg":"Resetting sync source to empty","attr":{"previousSyncSource":"mongo46:27017"}} ,"s":"I", "c":"CONNPOOL", "id":22566, "ctx":"ReplNetwork","msg":"Ending connection due to bad connection status","attr":{"hostAndPort":"mongo46:27017","error":"CallbackCanceled: Callback was canceled","numOpenConns":2}} ,"s":"I", "c":"REPL", "id":21359, "ctx":"ReplCoord-4690","msg":"Entering primary catch-up mode"} ,"s":"F", "c":"-", "id":23079, "ctx":"TopologyVersionObserver","msg":"Invariant failure","attr":{"expr":"opCtx != nullptr && _opCtx == nullptr","file":"src/mongo/db/client.cpp","line":126}} ,"s":"F", "c":"-", "id":23080, "ctx":"TopologyVersionObserver","msg":"\n\n***aborting after invariant() failure\n\n"} ,"s":"F", "c":"CONTROL", "id":4757800, "ctx":"TopologyVersionObserver","msg":"Writing fatal message","attr":{"message":"Got signal: 6 (Aborted).\n"}} ,"s":"I", "c":"COMMAND", "id":51803, "ctx":"conn28221","msg":"Slow query","attr":{"type":"command","ns":"irkkt.STATS.med_care","command":{"aggregate":"STATS.med_care","pipeline":[{"$match":{"$and":[{"federal_subject_code":{"$in":["19"]}},{"op_date":{"$gte": {"$date":"2020-11-01T00:00:00.000Z"},"$lt": {"$date":"2020-11-30T23:59:59.000Z"}}}]}},{"$group":{"_id":null,"sum": {"$sum":"$kiz_count"}}}],"cursor":{},"lsid":{"id":{"$uuid":"82fd217c-2909-4f04-875d-295d32e75068"}},"$clusterTime":{"clusterTime":{"$timestamp":{"t":1609104803,"i":50}},"signature":{"hash":{"$binary":{"base64":"AAAAAAAAAAAAAAAAAAAAAAAAAAA=","subType":"0"}},"keyId":0}},"$db":"irkkt","$readPreference":{"mode":"secondary"}},"planSummary":"IXSCAN { op_date: -1, federal_subject_code: 1 }","keysExamined":277077,"docsExamined":1618,"cursorExhausted":true,"numYields":277,"nreturned":1,"queryHash":"AB897A3D","planCacheKey":"FA757DED","reslen":256,"locks":{"ReplicationStateTransition":{"acquireCount":{"w":279}},"Global":{"acquireCount":{"r":279}},"Database":{"acquireCount":{"r":279}},"Collection":{"acquireCount":{"r":279}},"Mutex":{"acquireCount": {"r":2}}},"storage":{"data":{"bytesRead":17324253,"timeReadingMicros":22990}},"protocol":"op_msg","durationMillis":511}} ,"s":"I", "c":"NETWORK", "id":4333213, "ctx":"ReplicaSetMonitor-TaskExecutor","msg":"RSM Topology Change","attr":{"replicaSet":"testing","newTopologyDescription":"{ id: \"7714ed9e-5260-4338-87d8-c2f62995024e\", topologyType: \"ReplicaSetWithPrimary\", servers: { mongo44:27017: { address: \"mongo44:27017\", topologyVersion: { processId: ObjectId('5fde58d937bb66a7a0a75c61'), counter: 5 }, roundTripTime: 841, lastWriteDate: new Date(1609104803000), opTime: { ts: Timestamp(1609104803, 50), t: 181 }, type: \"RSSecondary\", minWireVersion: 9, maxWireVersion: 9, me: \"mongo44:27017\", setName: \"testing\", setVersion: 86588, primary: \"mongo46:27017\", lastUpdateTime: new Date(1609104812544), logicalSessionTimeoutMinutes: 30, hosts: { 0: \"mongo44:27017\", 1: \"mongo45:27017\", 2: \"mongo46:27017\", 3: \"mongo55:27017\" }, arbiters: {}, passives: {} }, mongo45:27017: { address: \"mongo45:27017\", topologyVersion: { processId: ObjectId('5fde5a21f8d2dbb73e37809d'), counter: 8 }, roundTripTime: 481, lastWriteDate: new Date(1609104803000), opTime: { ts: Timestamp(1609104803, 50), t: 181 }, type: \"RSSecondary\", minWireVersion: 9, maxWireVersion: 9, me: \"mongo45:27017\", setName: \"testing\", setVersion: 86588, primary: \"mongo46:27017\", lastUpdateTime: new Date(1609104809561), logicalSessionTimeoutMinutes: 30, hosts: { 0: \"mongo44:27017\", 1: \"mongo45:27017\", 2: \"mongo46:27017\", 3: \"mongo55:27017\" }, arbiters: {}, passives: {} }, mongo46:27017: { address: \"mongo46:27017\", topologyVersion: { processId: ObjectId('5fde5af01fe8658300d3f47f'), counter: 7 }, roundTripTime: 490, lastWriteDate: new Date(1609104799000), opTime: { ts: Timestamp(1609104799, 353), t: 181 }, type: \"RSPrimary\", minWireVersion: 9, maxWireVersion: 9, me: \"mongo46:27017\", setName: \"testing\", setVersion: 86588, electionId: ObjectId('7fffffff00000000000000b5'), primary: \"mongo46:27017\", lastUpdateTime: new Date(1609104799952), logicalSessionTimeoutMinutes: 30, hosts: { 0: \"mongo44:27017\", 1: \"mongo45:27017\", 2: \"mongo46:27017\", 3: \"mongo55:27017\" }, arbiters: {}, passives: {} }, mongo55:27017: { address: \"mongo55:27017\", topologyVersion: { processId: ObjectId('5fde5985b762cb0d7b23e740'), counter: 6 }, roundTripTime: 353, lastWriteDate: new Date(1609104803000), opTime: { ts: Timestamp(1609104803, 50), t: 181 }, type: \"RSSecondary\", minWireVersion: 9, maxWireVersion: 9, me: \"mongo55:27017\", setName: \"testing\", setVersion: 86588, electionId: ObjectId('7fffffff00000000000000b6'), primary: \"mongo55:27017\", lastUpdateTime: new Date(1609104816513), logicalSessionTimeoutMinutes: 30, hosts: { 0: \"mongo44:27017\", 1: \"mongo45:27017\", 2: \"mongo46:27017\", 3: \"mongo55:27017\" }, arbiters: {}, passives: {} } }, logicalSessionTimeoutMinutes: 30, setName: \"testing\", compatible: true, maxSetVersion: 86588, maxElectionId: ObjectId('7fffffff00000000000000b5') }","previousTopologyDescription":"{ id: \"7714ed9e-5260-4338-87d8-c2f62995024e\", topologyType: \"ReplicaSetWithPrimary\", servers: { mongo44:27017: { address: \"mongo44:27017\", topologyVersion: { processId: ObjectId('5fde58d937bb66a7a0a75c61'), counter: 5 }, roundTripTime: 841, lastWriteDate: new Date(1609104803000), opTime: { ts: Timestamp(1609104803, 50), t: 181 }, type: \"RSSecondary\", minWireVersion: 9, maxWireVersion: 9, me: \"mongo44:27017\", setName: \"testing\", setVersion: 86588, primary: \"mongo46:27017\", lastUpdateTime: new Date(1609104812544), logicalSessionTimeoutMinutes: 30, hosts: { 0: \"mongo44:27017\", 1: \"mongo45:27017\", 2: \"mongo46:27017\", 3: \"mongo55:27017\" }, arbiters: {}, passives: {} }, mongo45:27017: { address: \"mongo45:27017\", topologyVersion: { processId: ObjectId('5fde5a21f8d2dbb73e37809d'), counter: 8 }, roundTripTime: 481, lastWriteDate: new Date(1609104803000), opTime: { ts: Timestamp(1609104803, 50), t: 181 }, type: \"RSSecondary\", minWireVersion: 9, maxWireVersion: 9, me: \"mongo45:27017\", setName: \"testing\", setVersion: 86588, primary: \"mongo46:27017\", lastUpdateTime: new Date(1609104809561), logicalSessionTimeoutMinutes: 30, hosts: { 0: \"mongo44:27017\", 1: \"mongo45:27017\", 2: \"mongo46:27017\", 3: \"mongo55:27017\" }, arbiters: {}, passives: {} }, mongo46:27017: { address: \"mongo46:27017\", topologyVersion: { processId: ObjectId('5fde5af01fe8658300d3f47f'), counter: 7 }, roundTripTime: 490, lastWriteDate: new Date(1609104799000), opTime: { ts: Timestamp(1609104799, 353), t: 181 }, type: \"RSPrimary\", minWireVersion: 9, maxWireVersion: 9, me: \"mongo46:27017\", setName: \"testing\", setVersion: 86588, electionId: ObjectId('7fffffff00000000000000b5'), primary: \"mongo46:27017\", lastUpdateTime: new Date(1609104799952), logicalSessionTimeoutMinutes: 30, hosts: { 0: \"mongo44:27017\", 1: \"mongo45:27017\", 2: \"mongo46:27017\", 3: \"mongo55:27017\" }, arbiters: {}, passives: {} }, mongo55:27017: { address: \"mongo55:27017\", topologyVersion: { processId: ObjectId('5fde5985b762cb0d7b23e740'), counter: 5 }, roundTripTime: 353, lastWriteDate: new Date(1609104803000), opTime: { ts: Timestamp(1609104803, 50), t: 181 }, type: \"RSSecondary\", minWireVersion: 9, maxWireVersion: 9, me: \"mongo55:27017\", setName: \"testing\", setVersion: 86588, primary: \"mongo46:27017\", lastUpdateTime: new Date(1609104809783), logicalSessionTimeoutMinutes: 30, hosts: { 0: \"mongo44:27017\", 1: \"mongo45:27017\", 2: \"mongo46:27017\", 3: \"mongo55:27017\" }, arbiters: {}, passives: {} } }, logicalSessionTimeoutMinutes: 30, setName: \"testing\", compatible: true, maxSetVersion: 86588, maxElectionId: ObjectId('7fffffff00000000000000b5') }"}} ,"s":"I", "c":"CONTROL", "id":31431, "ctx":"TopologyVersionObserver","msg":"BACKTRACE: {bt}","attr":{"bt":{"backtrace":[ {"a":"56047F7DA2C1","b":"56047CAF6000","o":"2CE42C1","s":"_ZN5mongo18stack_trace_detail12_GLOBAL__N_119printStackTraceImplERKNS1_7OptionsEPNS_14StackTraceSinkE.constprop.606","s+":"1E1"}, {"a":"56047F7DB8F9","b":"56047CAF6000","o":"2CE58F9","s":"_ZN5mongo15printStackTraceEv","s+":"29"}, {"a":"56047F7D9156","b":"56047CAF6000","o":"2CE3156","s":"_ZN5mongo12_GLOBAL__N_116abruptQuitActionEiP9siginfo_tPv","s+":"66"}, {"a":"7F65EDB41630","b":"7F65EDB32000","o":"F630","s":"_L_unlock_13","s+":"34"}, {"a":"7F65ED79A387","b":"7F65ED764000","o":"36387","s":"gsignal","s+":"37"}, {"a":"7F65ED79BA78","b":"7F65ED764000","o":"37A78","s":"abort","s+":"148"}, {"a":"56047DA0B0E9","b":"56047CAF6000","o":"F150E9","s":"_ZN5mongo15invariantFailedEPKcS1_j","s+":"12C"}, {"a":"56047D9DE7FA","b":"56047CAF6000","o":"EE87FA","s":"_ZN5mongo6Client19setOperationContextEPNS_16OperationContextE.cold.135","s+":"18"}, {"a":"56047F69B819","b":"56047CAF6000","o":"2BA5819","s":"_ZN5mongo14ServiceContext20makeOperationContextEPNS_6ClientE","s+":"129"}, {"a":"56047F6902C7","b":"56047CAF6000","o":"2B9A2C7","s":"_ZN5mongo6Client20makeOperationContextEv","s+":"27"}, {"a":"56047DF113A1","b":"56047CAF6000","o":"141B3A1","s":"_ZN5mongo4repl23TopologyVersionObserver17_workerThreadBodyEv","s+":"251"}, {"a":"56047DF11A4C","b":"56047CAF6000","o":"141BA4C","s":"_ZNSt6thread11_State_implINS_8_InvokerISt5tupleIJZN5mongo4stdx6threadC4IZNS3_4repl23TopologyVersionObserver4initEPNS3_14ServiceContextEPNS7_22ReplicationCoordinatorEEUlvE0_JELi0EEET_DpOT0_EUlvE_EEEEE6_M_runEv","s+":"5C"}, {"a":"56047F97FFAF","b":"56047CAF6000","o":"2E89FAF","s":"execute_native_thread_routine","s+":"F"}, {"a":"7F65EDB39EA5","b":"7F65EDB32000","o":"7EA5","s":"start_thread","s+":"C5"}, {"a":"7F65ED86296D","b":"7F65ED764000","o":"FE96D","s":"clone","s+":"6D"}],"processInfo":{"mongodbVersion":"4.4.2","gitVersion":"15e73dc5738d2278b688f8929aee605fe4279b0e","compiledModules":[],"uname": {"sysname":"Linux","release":"3.10.0-1160.6.1.el7.x86_64","version":"#1 SMP Tue Nov 17 13:59:11 UTC 2020","machine":"x86_64"},"somap":[ {"b":"56047CAF6000","elfType":3,"buildId":"80A90728DC9FDCCF9246FBEBAEB8F415D7FA8827"}, {"b":"7F65EDB32000","path":"/lib64/libpthread.so.0","elfType":3,"buildId":"2B482B3BAE79DEF4E5BC9791BC6BBDAE0E93E359"}, {"b":"7F65ED764000","path":"/lib64/libc.so.6","elfType":3,"buildId":"F9FAFDE281E0E0E2AF45911AD0FA115B64C2CEA8"}]}}}} ,"s":"I", "c":"CONTROL", "id":31427, "ctx":"TopologyVersionObserver","msg":" Frame: {frame}","attr":{"frame":{"a":"56047F7DA2C1","b":"56047CAF6000","o":"2CE42C1","s":"_ZN5mongo18stack_trace_detail12_GLOBAL__N_119printStackTraceImplERKNS1_7OptionsEPNS_14StackTraceSinkE.constprop.606","s+":"1E1"}}}{"t":{"$date":"2020-12-28T00:33:36.820+03:00"},"s":"I", "c":"CONTROL", "id":31427, "ctx":"TopologyVersionObserver","msg":" Frame: {frame} ","attr":{"frame": {"a":"56047F7DB8F9","b":"56047CAF6000","o":"2CE58F9","s":"_ZN5mongo15printStackTraceEv","s+":"29"}}} ,"s":"I", "c":"CONTROL", "id":31427, "ctx":"TopologyVersionObserver","msg":" Frame: {frame}","attr":{"frame":{"a":"56047F7D9156","b":"56047CAF6000","o":"2CE3156","s":"_ZN5mongo12_GLOBAL__N_116abruptQuitActionEiP9siginfo_tPv","s+":"66"}}}{"t":{"$date":"2020-12-28T00:33:36.820+03:00"},"s":"I", "c":"CONTROL", "id":31427, "ctx":"TopologyVersionObserver","msg":" Frame: {frame} ","attr":{"frame": {"a":"7F65EDB41630","b":"7F65EDB32000","o":"F630","s":"_L_unlock_13","s+":"34"}}} ,"s":"I", "c":"CONTROL", "id":31427, "ctx":"TopologyVersionObserver","msg":" Frame: {frame}","attr":{"frame":{"a":"7F65ED79A387","b":"7F65ED764000","o":"36387","s":"gsignal","s+":"37"}}}{"t":{"$date":"2020-12-28T00:33:36.820+03:00"},"s":"I", "c":"CONTROL", "id":31427, "ctx":"TopologyVersionObserver","msg":" Frame: {frame} ","attr":{"frame": {"a":"7F65ED79BA78","b":"7F65ED764000","o":"37A78","s":"abort","s+":"148"}}} ,"s":"I", "c":"CONTROL", "id":31427, "ctx":"TopologyVersionObserver","msg":" Frame: {frame}","attr":{"frame":{"a":"56047DA0B0E9","b":"56047CAF6000","o":"F150E9","s":"_ZN5mongo15invariantFailedEPKcS1_j","s+":"12C"}}}{"t":{"$date":"2020-12-28T00:33:36.820+03:00"},"s":"I", "c":"CONTROL", "id":31427, "ctx":"TopologyVersionObserver","msg":" Frame: {frame} ","attr":{"frame": {"a":"56047D9DE7FA","b":"56047CAF6000","o":"EE87FA","s":"_ZN5mongo6Client19setOperationContextEPNS_16OperationContextE.cold.135","s+":"18"}}} ,"s":"I", "c":"CONTROL", "id":31427, "ctx":"TopologyVersionObserver","msg":" Frame: {frame}","attr":{"frame":{"a":"56047F69B819","b":"56047CAF6000","o":"2BA5819","s":"_ZN5mongo14ServiceContext20makeOperationContextEPNS_6ClientE","s+":"129"}}}{"t":{"$date":"2020-12-28T00:33:36.820+03:00"},"s":"I", "c":"CONTROL", "id":31427, "ctx":"TopologyVersionObserver","msg":" Frame: {frame} ","attr":{"frame": {"a":"56047F6902C7","b":"56047CAF6000","o":"2B9A2C7","s":"_ZN5mongo6Client20makeOperationContextEv","s+":"27"}}} ,"s":"I", "c":"CONTROL", "id":31427, "ctx":"TopologyVersionObserver","msg":" Frame: {frame}","attr":{"frame":{"a":"56047DF113A1","b":"56047CAF6000","o":"141B3A1","s":"_ZN5mongo4repl23TopologyVersionObserver17_workerThreadBodyEv","s+":"251"}}}{"t":{"$date":"2020-12-28T00:33:36.820+03:00"},"s":"I", "c":"CONTROL", "id":31427, "ctx":"TopologyVersionObserver","msg":" Frame: {frame} ","attr":{"frame": {"a":"56047DF11A4C","b":"56047CAF6000","o":"141BA4C","s":"_ZNSt6thread11_State_implINS_8_InvokerISt5tupleIJZN5mongo4stdx6threadC4IZNS3_4repl23TopologyVersionObserver4initEPNS3_14ServiceContextEPNS7_22ReplicationCoordinatorEEUlvE0_JELi0EEET_DpOT0_EUlvE_EEEEE6_M_runEv","s+":"5C"}}} ,"s":"I", "c":"CONTROL", "id":31427, "ctx":"TopologyVersionObserver","msg":" Frame: {frame}","attr":{"frame":{"a":"56047F97FFAF","b":"56047CAF6000","o":"2E89FAF","s":"execute_native_thread_routine","s+":"F"}}}{"t":{"$date":"2020-12-28T00:33:36.820+03:00"},"s":"I", "c":"CONTROL", "id":31427, "ctx":"TopologyVersionObserver","msg":" Frame: {frame} ","attr":{"frame": {"a":"7F65EDB39EA5","b":"7F65EDB32000","o":"7EA5","s":"start_thread","s+":"C5"}}} ,"s":"I", "c":"CONTROL", "id":31427, "ctx":"TopologyVersionObserver","msg":" Frame: {frame}","attr":{"frame": {"a":"7F65ED86296D","b":"7F65ED764000","o":"FE96D","s":"clone","s+":"6D"}}}
|
| Comments |
| Comment by Dmitry Agranat [ 27/Jan/21 ] |
|
Hi kkrasnov@gmail.com, as this issue is now being investigated in |
| Comment by Dmitry Agranat [ 20/Jan/21 ] |
|
Hi kkrasnov@gmail.com, as an update, this issue is now being investigated in Thanks, |
| Comment by Dmitry Agranat [ 19/Jan/21 ] |
|
Hi pbailey@boomtownroi.com, thank you for opening a separate support case, we are still investigating this issue. |
| Comment by Philip Bailey [ 19/Jan/21 ] |
|
@dmitry.agranat I have created a MongoDB support case with all the relevant logs and diagnostic.data, case. Case 00736988. This bug is rough, because it wrecks the whole replica set when it shows up, we can't reliably do work to administer our cluster during normal business hours for fear of causing a major outage. |
| Comment by Philip Bailey [ 14/Jan/21 ] |
|
We just experienced this same issue. invariant failure after performing an rs.stepDown() on the primary for regular maintenance. Same version of mongod, 4.4.2. I did get the diagnostic.data and have relevant mongodb log snippets from both primary and secondary, both of which mongod crashed after the stepdown command was issues. I would prefer not to post that information in the customer portal, but I can upload these documents via support case or similar to support this ticket. |
| Comment by Konstantin Krasnov [ 07/Jan/21 ] |
|
Hi Dmitry, the core dump doesn't exist. I'll try to save it next time. |
| Comment by Dmitry Agranat [ 06/Jan/21 ] |
|
kkrasnov@gmail.com, do you happen to have a core dump from the reported event? If not, would it be possible to save one if this failure reoccurs? |
| Comment by Dmitry Agranat [ 04/Jan/21 ] |
|
Hi kkrasnov@gmail.com, we can try to figure out what's going on based on the provided backtrace. We will keep you updated on our progress. If this happens again, please save aside mongod logs and diagnostic.data and we'll figure out a way about sensitive data. |
| Comment by Konstantin Krasnov [ 31/Dec/20 ] |
|
Hi Dmitry, The diagnostic.data is already overwritten. Is there a way to remove sensitive data (data from documents) from logs?
Best regards, Konstantin |
| Comment by Dmitry Agranat [ 31/Dec/20 ] |
|
Hi kkrasnov@gmail.com, thank you for the report. In order to investigate this issue, we will need full mongod logs and diagnostic.data from all members of this replica set. You can upload this data into this secure uploader Thanks, |