[SERVER-52735] mongodb crash with "Invariant failure" opCtx != nullptr && _opCtx == nullptr Created: 10/Nov/20  Updated: 06/Dec/22  Resolved: 24/Mar/21

Status: Closed
Project: Core Server
Component/s: Stability
Affects Version/s: 4.4.0
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Andrew Wason Assignee: Backlog - Service Architecture
Resolution: Duplicate Votes: 10
Labels: sa-groomed
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File Screenshot from 2021-02-16 14-59-15.png    
Issue Links:
Duplicate
duplicates SERVER-53537 MongoDB crashed after state change Closed
duplicates SERVER-53566 Investigate and reproduce "opCtx != n... Closed
duplicates SERVER-53482 Invariant failure: opCtx != nullptr &... Closed
is duplicated by SERVER-53747 replica set down after setFeatureComp... Closed
Related
is related to SERVER-53537 MongoDB crashed after state change Closed
is related to SERVER-53566 Investigate and reproduce "opCtx != n... Closed
is related to SERVER-53482 Invariant failure: opCtx != nullptr &... Closed
is related to SERVER-53747 replica set down after setFeatureComp... Closed
is related to SERVER-54445 Compact in Secondary ReplicaSet 4.4.0... Closed
Assigned Teams:
Service Arch
Operating System: ALL
Sprint: Security 2020-11-30
Participants:
Case:

 Description   

Running mondodb 4.4.0 on Ubuntu 20.04.1 one of our replicas crashed with an assertion. How do we diagnose this?

 

{"t":{"$date":"2020-11-10T14:07:13.371+00:00"},"s":"F",  "c":"-",        "id":23079,   "ctx":"monitoring-keys-for-HMAC","msg":"Invariant failure","attr":{"expr":"opCtx != nullptr && _opCtx == nullptr","file":"src/mongo/db/client.cpp","line":126}}
{"t":{"$date":"2020-11-10T14:07:13.371+00:00"},"s":"F",  "c":"-",        "id":23080,   "ctx":"monitoring-keys-for-HMAC","msg":"\n\n***aborting after invariant() failure\n\n"}
{"t":{"$date":"2020-11-10T14:07:13.373+00:00"},"s":"F",  "c":"CONTROL",  "id":4757800, "ctx":"monitoring-keys-for-HMAC","msg":"Writing fatal message","attr":{"message":"Got signal: 6 (Aborted).\n"}}
{"t":{"$date":"2020-11-10T14:07:13.480+00:00"},"s":"I",  "c":"CONTROL",  "id":31431,   "ctx":"monitoring-keys-for-HMAC","msg":"BACKTRACE: {bt}","attr":{"bt":{"backtrace":[{"a":"556342B47811","b":"55633FE8C000","o":"2CBB811","s":"_ZN5mongo18stack_trace_detail12_GLOBAL__N_119printStackTraceImplERKNS1_7OptionsEPNS_14StackTraceSinkE.constprop.606","s+":"1E1"},{"a":"556342B48EB9","b":"55633FE8C000","o":"2CBCEB9","s":"_ZN5mongo15printStackTraceEv","s+":"29"},{"a":"556342B466A6","b":"55633FE8C000","o":"2CBA6A6","s":"_ZN5mongo12_GLOBAL__N_116abruptQuitActionEiP9siginfo_tPv","s+":"66"},{"a":"7F8DEAD7C3C0","b":"7F8DEAD67000","o":"153C0","s":"funlockfile","s+":"60"},{"a":"7F8DEABBB18B","b":"7F8DEAB75000","o":"4618B","s":"gsignal","s+":"CB"},{"a":"7F8DEAB9A859","b":"7F8DEAB75000","o":"25859","s":"abort","s+":"12B"},{"a":"556340D8FBE1","b":"55633FE8C000","o":"F03BE1","s":"_ZN5mongo15invariantFailedEPKcS1_j","s+":"12C"},{"a":"556340D65270","b":"55633FE8C000","o":"ED9270","s":"_ZN5mongo6Client19setOperationContextEPNS_16OperationContextE.cold.135","s+":"18"},{"a":"556342A15BE9","b":"55633FE8C000","o":"2B89BE9","s":"_ZN5mongo14ServiceContext20makeOperationContextEPNS_6ClientE","s+":"129"},{"a":"556342A0A697","b":"55633FE8C000","o":"2B7E697","s":"_ZN5mongo6Client20makeOperationContextEv","s+":"27"},{"a":"556342809712","b":"55633FE8C000","o":"297D712","s":"_ZN5mongo21KeysCollectionManager14PeriodicRunner18_doPeriodicRefreshEPNS_14ServiceContextENSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEENS_8DurationISt5ratioILl1ELl1000EEEE","s+":"162"},{"a":"55634280B6C3","b":"55633FE8C000","o":"297F6C3","s":"_ZNSt6thread11_State_implINS_8_InvokerISt5tupleIJZN5mongo4stdx6threadC4IZNS3_21KeysCollectionManager14PeriodicRunner5startEPNS3_14ServiceContextERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEENS3_8DurationISt5ratioILl1ELl1000EEEEEUlvE_JELi0EEET_DpOT0_EUlvE_EEEEE6_M_runEv","s+":"93"},{"a":"556342CF78FF","b":"55633FE8C000","o":"2E6B8FF","s":"execute_native_thread_routine","s+":"F"},{"a":"7F8DEAD70609","b":"7F8DEAD67000","o":"9609","s":"start_thread","s+":"D9"},{"a":"7F8DEAC97293","b":"7F8DEAB75000","o":"122293","s":"clone","s+":"43"}],"processInfo":{"mongodbVersion":"4.4.0","gitVersion":"563487e100c4215e2dce98d0af2a6a5a2d67c5cf","compiledModules":[],"uname":{"sysname":"Linux","release":"5.4.0-1025-aws","version":"#25-Ubuntu SMP Fri Sep 11 09:37:24 UTC 2020","machine":"x86_64"},"somap":[{"b":"55633FE8C000","elfType":3,"buildId":"77B6A138746C90015067F12B963853BD51DAA5A6"},{"b":"7F8DEAD67000","path":"/lib/x86_64-linux-gnu/libpthread.so.0","elfType":3,"buildId":"4FC5FC33F4429136A494C640B113D76F610E4ABC"},{"b":"7F8DEAB75000","path":"/lib/x86_64-linux-gnu/libc.so.6","elfType":3,"buildId":"F3FF3FDA80B817C464A56EED59FF09DC864EAEB0"}]}}}}
{"t":{"$date":"2020-11-10T14:07:13.480+00:00"},"s":"I",  "c":"CONTROL",  "id":31427,   "ctx":"monitoring-keys-for-HMAC","msg":"  Frame: {frame}","attr":{"frame":{"a":"556342B47811","b":"55633FE8C000","o":"2CBB811","s":"_ZN5mongo18stack_trace_detail12_GLOBAL__N_119printStackTraceImplERKNS1_7OptionsEPNS_14StackTraceSinkE.constprop.606","s+":"1E1"}}}
{"t":{"$date":"2020-11-10T14:07:13.480+00:00"},"s":"I",  "c":"CONTROL",  "id":31427,   "ctx":"monitoring-keys-for-HMAC","msg":"  Frame: {frame}","attr":{"frame":{"a":"556342B48EB9","b":"55633FE8C000","o":"2CBCEB9","s":"_ZN5mongo15printStackTraceEv","s+":"29"}}}
{"t":{"$date":"2020-11-10T14:07:13.480+00:00"},"s":"I",  "c":"CONTROL",  "id":31427,   "ctx":"monitoring-keys-for-HMAC","msg":"  Frame: {frame}","attr":{"frame":{"a":"556342B466A6","b":"55633FE8C000","o":"2CBA6A6","s":"_ZN5mongo12_GLOBAL__N_116abruptQuitActionEiP9siginfo_tPv","s+":"66"}}}
{"t":{"$date":"2020-11-10T14:07:13.480+00:00"},"s":"I",  "c":"CONTROL",  "id":31427,   "ctx":"monitoring-keys-for-HMAC","msg":"  Frame: {frame}","attr":{"frame":{"a":"7F8DEAD7C3C0","b":"7F8DEAD67000","o":"153C0","s":"funlockfile","s+":"60"}}}
{"t":{"$date":"2020-11-10T14:07:13.480+00:00"},"s":"I",  "c":"CONTROL",  "id":31427,   "ctx":"monitoring-keys-for-HMAC","msg":"  Frame: {frame}","attr":{"frame":{"a":"7F8DEABBB18B","b":"7F8DEAB75000","o":"4618B","s":"gsignal","s+":"CB"}}}
{"t":{"$date":"2020-11-10T14:07:13.480+00:00"},"s":"I",  "c":"CONTROL",  "id":31427,   "ctx":"monitoring-keys-for-HMAC","msg":"  Frame: {frame}","attr":{"frame":{"a":"7F8DEAB9A859","b":"7F8DEAB75000","o":"25859","s":"abort","s+":"12B"}}}
{"t":{"$date":"2020-11-10T14:07:13.480+00:00"},"s":"I",  "c":"CONTROL",  "id":31427,   "ctx":"monitoring-keys-for-HMAC","msg":"  Frame: {frame}","attr":{"frame":{"a":"556340D8FBE1","b":"55633FE8C000","o":"F03BE1","s":"_ZN5mongo15invariantFailedEPKcS1_j","s+":"12C"}}}
{"t":{"$date":"2020-11-10T14:07:13.480+00:00"},"s":"I",  "c":"CONTROL",  "id":31427,   "ctx":"monitoring-keys-for-HMAC","msg":"  Frame: {frame}","attr":{"frame":{"a":"556340D65270","b":"55633FE8C000","o":"ED9270","s":"_ZN5mongo6Client19setOperationContextEPNS_16OperationContextE.cold.135","s+":"18"}}}
{"t":{"$date":"2020-11-10T14:07:13.480+00:00"},"s":"I",  "c":"CONTROL",  "id":31427,   "ctx":"monitoring-keys-for-HMAC","msg":"  Frame: {frame}","attr":{"frame":{"a":"556342A15BE9","b":"55633FE8C000","o":"2B89BE9","s":"_ZN5mongo14ServiceContext20makeOperationContextEPNS_6ClientE","s+":"129"}}}
{"t":{"$date":"2020-11-10T14:07:13.480+00:00"},"s":"I",  "c":"CONTROL",  "id":31427,   "ctx":"monitoring-keys-for-HMAC","msg":"  Frame: {frame}","attr":{"frame":{"a":"556342A0A697","b":"55633FE8C000","o":"2B7E697","s":"_ZN5mongo6Client20makeOperationContextEv","s+":"27"}}}
{"t":{"$date":"2020-11-10T14:07:13.480+00:00"},"s":"I",  "c":"CONTROL",  "id":31427,   "ctx":"monitoring-keys-for-HMAC","msg":"  Frame: {frame}","attr":{"frame":{"a":"556342809712","b":"55633FE8C000","o":"297D712","s":"_ZN5mongo21KeysCollectionManager14PeriodicRunner18_doPeriodicRefreshEPNS_14ServiceContextENSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEENS_8DurationISt5ratioILl1ELl1000EEEE","s+":"162"}}}
{"t":{"$date":"2020-11-10T14:07:13.480+00:00"},"s":"I",  "c":"CONTROL",  "id":31427,   "ctx":"monitoring-keys-for-HMAC","msg":"  Frame: {frame}","attr":{"frame":{"a":"55634280B6C3","b":"55633FE8C000","o":"297F6C3","s":"_ZNSt6thread11_State_implINS_8_InvokerISt5tupleIJZN5mongo4stdx6threadC4IZNS3_21KeysCollectionManager14PeriodicRunner5startEPNS3_14ServiceContextERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEENS3_8DurationISt5ratioILl1ELl1000EEEEEUlvE_JELi0EEET_DpOT0_EUlvE_EEEEE6_M_runEv","s+":"93"}}}
{"t":{"$date":"2020-11-10T14:07:13.480+00:00"},"s":"I",  "c":"CONTROL",  "id":31427,   "ctx":"monitoring-keys-for-HMAC","msg":"  Frame: {frame}","attr":{"frame":{"a":"556342CF78FF","b":"55633FE8C000","o":"2E6B8FF","s":"execute_native_thread_routine","s+":"F"}}}
{"t":{"$date":"2020-11-10T14:07:13.480+00:00"},"s":"I",  "c":"CONTROL",  "id":31427,   "ctx":"monitoring-keys-for-HMAC","msg":"  Frame: {frame}","attr":{"frame":{"a":"7F8DEAD70609","b":"7F8DEAD67000","o":"9609","s":"start_thread","s+":"D9"}}}
{"t":{"$date":"2020-11-10T14:07:13.480+00:00"},"s":"I",  "c":"CONTROL",  "id":31427,   "ctx":"monitoring-keys-for-HMAC","msg":"  Frame: {frame}","attr":{"frame":{"a":"7F8DEAC97293","b":"7F8DEAB75000","o":"122293","s":"clone","s+":"43"}}}{"t":{"$date":"2020-11-10T14:13:25.412+00:00"},"s":"W",  "c":"CONTROL",  "id":20698,   "ctx":"main","msg":"***** SERVER RESTARTED *****","tags":["startupWarnings"]}



 Comments   
Comment by Tyler Seip (Inactive) [ 24/Mar/21 ]

Actually, closing as a duplicate of SERVER-53566.

Comment by Tyler Seip (Inactive) [ 24/Mar/21 ]

This should be resolved by SERVER-53566, where it was backported to 4.0, 4.2, and 4.4. Closing as fixed.

Comment by Luca - [ 09/Mar/21 ]

Hi, 

We also encountered the same issue ( "ctx":"monitoring-keys-for-HMAC","msg":"Invariant failure","attr":{"expr":"opCtx != nullptr && _opCtx == nullptr","file":"src/mongo/db/client.cpp","line":126}} ) on the replicates of one of our mongo bases, two days apart between replicate 1 (crash on March 8, at 23:56:15 UTC) and replicate 2 (crash this morning, March 9 at 00:04:25 UTC) with exactly the same log/stacktrace as the one present in the description.

mongodb Version: 4.4.3
gitVersion: 913d6b62acfbb344dde1b116f4161360acd8fd13
system: Linux, Debian 4.19.171-2 (2021-01-30) x86_64
special settings : transparent_hugepage disabled, XFS file system, unlimited ulimits

I hope this will help and that a fix / workaround can be found quickly for version 4.4.

Comment by Francesco Rivola [ 08/Mar/21 ]

We experienced again the same issue in production (after we initially reported in SERVER-53747) This time happened in 5 of our production replica sets likely due to some network issues in Azure Data Center. The chain of failure is exact the same as described by Adrien Jarthon in his comment here https://jira.mongodb.org/browse/SERVER-53566?focusedCommentId=3643746&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-3643746

I have a couple of questions:

  • What is the official recommendation so far. Is it recommended to downgrade to 4.2 (our FCV is still on 4.2) until a fix is provided?
  • The Jira field "Fix Version/s" is set to 5.0 Required. Does this mean that the fix will be included only started from 5.0 and above? 
Comment by Ratika Gandhi [ 02/Mar/21 ]

We are investigating this work on this ticket -SERVER-53566. Please follow this ticket to track the work. 

Comment by Flavio Liroa Jr [ 23/Feb/21 ]

Same problem here. In my case it happened after running a "compact" in one of the secondaries. (SERVER-54445).

Please, when will you have the solution?

Comment by Юрий Соколов [ 21/Feb/21 ]

Looks like logs has not enough information: if there were thread id, then it will be clearer who is setting opCtx.
But may be opCtx simple were not cleared due to https://github.com/mongodb/mongo/blob/c076d1e58878fa2461e0798d97b58d3042e3a33d/src/mongo/db/service_context.cpp#L374-L380 ?

Comment by Юрий Соколов [ 21/Feb/21 ]

Same here. In cluster of 8 shards every shard failed close to each other. Some shards had 2 replicas failed, some 3 replicas. Some of them in `"ctx":"monitoring-keys-for-HMAC"`, some in `"ctx":"TopologyVersionObserver"`

Comment by Rob Gillan [ 18/Feb/21 ]

We've just had the same crash, 3 shard cluster, each triple replicaset in different geographic region on AWS, Ubuntu 16.04, Mongo 4.4.1, Feature Set 4.4

{"t":{"$date":"2021-02-18T04:01:01.722+00:00"},"s":"I", "c":"NETWORK", "id":22943, "ctx":"listener","msg":"Connection accepted","attr":{"remote":"10.101.0.41:14444","connectionId":586461,"connectionCount":363}} {"t":{"$date":"2021-02-18T04:01:01.723+00:00"},"s":"I", "c":"NETWORK", "id":51800, "ctx":"conn586461","msg":"client metadata","attr":{"remote":"10.101.0.41:14444","client":"conn586461","doc":{"driver":{"name":"NetworkInterfaceTL","version":"4.4.1"},"os":{"type":"Linux","name":"Ubuntu","architecture":"x86_64","version":"16.04"}}}} {"t":{"$date":"2021-02-18T04:01:01.811+00:00"},"s":"F", "c":"-", "id":23079, "ctx":"monitoring-keys-for-HMAC","msg":"Invariant failure","attr":{"expr":"opCtx != nullptr && _opCtx == nullptr","file":"src/mongo/db/client.cpp","line":126}} {"t":{"$date":"2021-02-18T04:01:01.811+00:00"},"s":"F", "c":"-", "id":23080, "ctx":"monitoring-keys-for-HMAC","msg":"\n\n***aborting after invariant() failure\n\n"} {"t":{"$date":"2021-02-18T04:01:01.811+00:00"},"s":"F", "c":"CONTROL", "id":4757800, "ctx":"monitoring-keys-for-HMAC","msg":"Writing fatal message","attr":{"message":"Got signal: 6 (Aborted).\n"}} {"t":{"$date":"2021-02-18T04:01:01.899+00:00"},"s":"I", "c":"CONTROL", "id":31431, "ctx":"monitoring-keys-for-HMAC","msg":"BACKTRACE: {bt}","attr":{"bt":{"backtrace":[{"a":"561EC0094501","b":"561EBD3A5000","o":"2CEF501","s":"_ZN5mongo18stack_trace_detail12_GLOBAL__N_119printStackTraceImplERKNS1_7OptionsEPNS_14StackTraceSinkE.constprop.606","s+":"1E1"},{"a":"561EC0095B39","b":"561EBD3A5000","o":"2CF0B39","s":"_ZN5mongo15printStackTraceEv","s+":"29"},{"a":"561EC0093396","b":"561EBD3A5000","o":"2CEE396","s":"_ZN5mongo12_GLOBAL__N_116abruptQuitActionEiP9siginfo_tPv","s+":"66"},{"a":"7F3C86F20390","b":"7F3C86F0F000","o":"11390","s":"funlockfile","s+":"50"},{"a":"7F3C86B7A438","b":"7F3C86B45000","o":"35438","s":"gsignal","s+":"38"},{"a":"7F3C86B7C03A","b":"7F3C86B45000","o":"3703A","s":"abort","s+":"16A"},{"a":"561EBE2BBB81","b":"561EBD3A5000","o":"F16B81","s":"_ZN5mongo15invariantFailedEPKcS1_j","s+":"12C"},{"a":"561EBE28F138","b":"561EBD3A5000","o":"EEA138","s":"_ZN5mongo6Client19setOperationContextEPNS_16OperationContextE.cold.135","s+":"18"},{"a":"561EBFF55AF9","b":"561EBD3A5000","o":"2BB0AF9","s":"_ZN5mongo14ServiceContext20makeOperationContextEPNS_6ClientE","s+":"129"},{"a":"561EBFF4A477","b":"561EBD3A5000","o":"2BA5477","s":"_ZN5mongo6Client20makeOperationContextEv","s+":"27"},{"a":"561EBFD530F2","b":"561EBD3A5000","o":"29AE0F2","s":"_ZN5mongo21KeysCollectionManager14PeriodicRunner18_doPeriodicRefreshEPNS_14ServiceContextENSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEENS_8DurationISt5ratioILl1ELl1000EEEE","s+":"162"},{"a":"561EBFD55053","b":"561EBD3A5000","o":"29B0053","s":"_ZNSt6thread11_State_implINS_8_InvokerISt5tupleIJZN5mongo4stdx6threadC4IZNS3_21KeysCollectionManager14PeriodicRunner5startEPNS3_14ServiceContextERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEENS3_8DurationISt5ratioILl1ELl1000EEEEEUlvE_JELi0EEET_DpOT0_EUlvE_EEEEE6_M_runEv","s+":"93"},{"a":"561EC023A06F","b":"561EBD3A5000","o":"2E9506F","s":"execute_native_thread_routine","s+":"F"},{"a":"7F3C86F166BA","b":"7F3C86F0F000","o":"76BA","s":"start_thread","s+":"CA"},{"a":"7F3C86C4C4DD","b":"7F3C86B45000","o":"1074DD","s":"clone","s+":"6D"}],"processInfo":{"mongodbVersion":"4.4.3","gitVersion":"913d6b62acfbb344dde1b116f4161360acd8fd13","compiledModules":[],"uname":{"sysname":"Linux","release":"4.4.0-1119-aws","version":"#133-Ubuntu SMP Tue Dec 1 19:04:22 UTC 2020","machine":"x86_64"},"somap":[{"b":"561EBD3A5000","elfType":3,"buildId":"A10486C83C8EC5FAF5B7CA522D222F982F51F0C6"},{"b":"7F3C86F0F000","path":"/lib/x86_64-linux-gnu/libpthread.so.0","elfType":3,"buildId":"3DB0B0EE6244F5B89AB1D535F91B17D162CC1701"},{"b":"7F3C86B45000","path":"/lib/x86_64-linux-gnu/libc.so.6","elfType":3,"buildId":"C4FD86EC1EED57A09C79CE601F6C6E3796F574DF"}]}}}} {"t":{"$date":"2021-02-18T04:01:01.899+00:00"},"s":"I", "c":"CONTROL", "id":31427, "ctx":"monitoring-keys-for-HMAC","msg":" Frame: {frame}","attr":{"frame":{"a":"561EC0094501","b":"561EBD3A5000","o":"2CEF501","s":"_ZN5mongo18stack_trace_detail12_GLOBAL__N_119printStackTraceImplERKNS1_7OptionsEPNS_14StackTraceSinkE.constprop.606","s+":"1E1"}}} {"t":{"$date":"2021-02-18T04:01:01.899+00:00"},"s":"I", "c":"CONTROL", "id":31427, "ctx":"monitoring-keys-for-HMAC","msg":" Frame: {frame}","attr":{"frame":{"a":"561EC0095B39","b":"561EBD3A5000","o":"2CF0B39","s":"_ZN5mongo15printStackTraceEv","s+":"29"}}} {"t":{"$date":"2021-02-18T04:01:01.899+00:00"},"s":"I", "c":"CONTROL", "id":31427, "ctx":"monitoring-keys-for-HMAC","msg":" Frame: {frame}","attr":{"frame":{"a":"561EC0093396","b":"561EBD3A5000","o":"2CEE396","s":"_ZN5mongo12_GLOBAL__N_116abruptQuitActionEiP9siginfo_tPv","s+":"66"}}} {"t":{"$date":"2021-02-18T04:01:01.899+00:00"},"s":"I", "c":"CONTROL", "id":31427, "ctx":"monitoring-keys-for-HMAC","msg":" Frame: {frame}","attr":{"frame":{"a":"7F3C86F20390","b":"7F3C86F0F000","o":"11390","s":"funlockfile","s+":"50"}}} {"t":{"$date":"2021-02-18T04:01:01.899+00:00"},"s":"I", "c":"CONTROL", "id":31427, "ctx":"monitoring-keys-for-HMAC","msg":" Frame: {frame}","attr":{"frame":{"a":"7F3C86B7A438","b":"7F3C86B45000","o":"35438","s":"gsignal","s+":"38"}}} {"t":{"$date":"2021-02-18T04:01:01.899+00:00"},"s":"I", "c":"CONTROL", "id":31427, "ctx":"monitoring-keys-for-HMAC","msg":" Frame: {frame}","attr":{"frame":{"a":"7F3C86B7C03A","b":"7F3C86B45000","o":"3703A","s":"abort","s+":"16A"}}} {"t":{"$date":"2021-02-18T04:01:01.899+00:00"},"s":"I", "c":"CONTROL", "id":31427, "ctx":"monitoring-keys-for-HMAC","msg":" Frame: {frame}","attr":{"frame":{"a":"561EBE2BBB81","b":"561EBD3A5000","o":"F16B81","s":"_ZN5mongo15invariantFailedEPKcS1_j","s+":"12C"}}} {"t":{"$date":"2021-02-18T04:01:01.899+00:00"},"s":"I", "c":"CONTROL", "id":31427, "ctx":"monitoring-keys-for-HMAC","msg":" Frame: {frame}","attr":{"frame":{"a":"561EBE28F138","b":"561EBD3A5000","o":"EEA138","s":"_ZN5mongo6Client19setOperationContextEPNS_16OperationContextE.cold.135","s+":"18"}}} {"t":{"$date":"2021-02-18T04:01:01.899+00:00"},"s":"I", "c":"CONTROL", "id":31427, "ctx":"monitoring-keys-for-HMAC","msg":" Frame: {frame}","attr":{"frame":{"a":"561EBFF55AF9","b":"561EBD3A5000","o":"2BB0AF9","s":"_ZN5mongo14ServiceContext20makeOperationContextEPNS_6ClientE","s+":"129"}}} {"t":{"$date":"2021-02-18T04:01:01.899+00:00"},"s":"I", "c":"CONTROL", "id":31427, "ctx":"monitoring-keys-for-HMAC","msg":" Frame: {frame}","attr":{"frame":{"a":"561EBFF4A477","b":"561EBD3A5000","o":"2BA5477","s":"_ZN5mongo6Client20makeOperationContextEv","s+":"27"}}} {"t":{"$date":"2021-02-18T04:01:01.899+00:00"},"s":"I", "c":"CONTROL", "id":31427, "ctx":"monitoring-keys-for-HMAC","msg":" Frame: {frame}","attr":{"frame":{"a":"561EBFD530F2","b":"561EBD3A5000","o":"29AE0F2","s":"_ZN5mongo21KeysCollectionManager14PeriodicRunner18_doPeriodicRefreshEPNS_14ServiceContextENSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEENS_8DurationISt5ratioILl1ELl1000EEEE","s+":"162"}}} {"t":{"$date":"2021-02-18T04:01:01.899+00:00"},"s":"I", "c":"CONTROL", "id":31427, "ctx":"monitoring-keys-for-HMAC","msg":" Frame: {frame}","attr":{"frame":{"a":"561EBFD55053","b":"561EBD3A5000","o":"29B0053","s":"_ZNSt6thread11_State_implINS_8_InvokerISt5tupleIJZN5mongo4stdx6threadC4IZNS3_21KeysCollectionManager14PeriodicRunner5startEPNS3_14ServiceContextERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEENS3_8DurationISt5ratioILl1ELl1000EEEEEUlvE_JELi0EEET_DpOT0_EUlvE_EEEEE6_M_runEv","s+":"93"}}} {"t":{"$date":"2021-02-18T04:01:01.899+00:00"},"s":"I", "c":"CONTROL", "id":31427, "ctx":"monitoring-keys-for-HMAC","msg":" Frame: {frame}","attr":{"frame":{"a":"561EC023A06F","b":"561EBD3A5000","o":"2E9506F","s":"execute_native_thread_routine","s+":"F"}}} {"t":{"$date":"2021-02-18T04:01:01.899+00:00"},"s":"I", "c":"CONTROL", "id":31427, "ctx":"monitoring-keys-for-HMAC","msg":" Frame: {frame}","attr":{"frame":{"a":"7F3C86F166BA","b":"7F3C86F0F000","o":"76BA","s":"start_thread","s+":"CA"}}} {"t":{"$date":"2021-02-18T04:01:01.899+00:00"},"s":"I", "c":"CONTROL", "id":31427, "ctx":"monitoring-keys-for-HMAC","msg":" Frame: {frame}","attr":{"frame":{"a":"7F3C86C4C4DD","b":"7F3C86B45000","o":"1074DD","s":"clone","s+":"6D"}}}

Comment by Mirek Sosna [ 16/Feb/21 ]

I have this issue, I manage repikaSet with 4 servers and Arbiter, 3 days ago one slave crush with:

{"t":\{"$date":"2021-02-13T22:10:45.038+01:00"}

,"s":"I", "c":"-", "id":4333222, "ctx":"ReplicaSetMonitor-TaskExecutor","msg":"RSM received failed isMaster","attr":{"host":"172.16.3.114:27017","error":"NetworkInterfaceExceededTimeLimit: Couldn't get a connection within the time limit of 1000ms","replicaSet":"packets-prod","isMasterReply":"{}"}}

{"t":\{"$date":"2021-02-13T22:10:45.038+01:00"}

,"s":"I", "c":"NETWORK", "id":4712102, "ctx":"ReplicaSetMonitor-TaskExecutor","msg":"Host failed in replica set","attr":{"replicaSet":"packets-prod","host":"172.16.3.114:27017","error":

{"code":202,"codeName":"NetworkInterfaceExceededTimeLimit","errmsg":"Couldn't get a connection within the time limit of 1000ms"}

,"action":{"dropConnections":false,"requestImmediateCheck":false,"outcome":{"host":"172.16.3.114:27017","success":false,"errorMessage":"NetworkInterfaceExceededTimeLimit: Couldn't get a connection within the time limit of 1000ms"}}}}

{"t":\{"$date":"2021-02-13T22:10:48.933+01:00"}

,"s":"F", "c":"-", "id":23079, "ctx":"monitoring-keys-for-HMAC","msg":"Invariant failure","attr":{"expr":"opCtx != nullptr && _opCtx == nullptr","file":"src/mongo/db/client.cpp","line":126}}

{"t":\{"$date":"2021-02-13T22:10:48.933+01:00"}

,"s":"F", "c":"-", "id":23080, "ctx":"monitoring-keys-for-HMAC","msg":"\n\n***aborting after invariant() failure\n\n"}

{"t":\{"$date":"2021-02-13T22:10:48.933+01:00"}

,"s":"F", "c":"CONTROL", "id":4757800, "ctx":"monitoring-keys-for-HMAC","msg":"Writing fatal message","attr":{"message":"Got signal: 6 (Aborted).\n"}}

{"t":\{"$date":"2021-02-13T22:10:49.160+01:00"}

,"s":"I", "c":"CONTROL", "id":31431, "ctx":"monitoring-keys-for-HMAC","msg":"BACKTRACE: {bt}","attr":{"bt":{"backtrace":[

{"a":"55957A7DFC41","b":"559577AF1000","o":"2CEEC41","s":"_ZN5mongo18stack_trace_detail12_GLOBAL__N_119printStackTraceImplERKNS1_7OptionsEPNS_14StackTraceSinkE.constprop.606","s+":"1E1"}

,{"a":"55957A7E1279","b":"559577AF1000","o":"2CF0279","s":"ZN5mongo15printStackTraceEv","s+":"29"},{"a":"55957A7DEAD6","b":"559577AF1000","o":"2CEDAD6","s":"_ZN5mongo12_GLOBAL_N_116abruptQuitActionEiP9siginfo_tPv","s+":"66"},{"a":"7FF1A3156730","b":"7FF1A3144000","o":"12730","s":"funlockfile","s+":"50"},{"a":"7FF1A2FBA7BB","b":"7FF1A2F83000","o":"377BB","s":"gsignal","s+":"10B"},{"a":"7FF1A2FA5535","b":"7FF1A2F83000","o":"22535","s":"abort","s+":"121"},{"a":"559578A07BF3","b":"559577AF1000","o":"F16BF3","s":"ZN5mongo15invariantFailedEPKcS1_j","s+":"12C"},{"a":"5595789DB1AA","b":"559577AF1000","o":"EEA1AA","s":"_ZN5mongo6Client19setOperationContextEPNS_16OperationContextE.cold.135","s+":"18"},{"a":"55957A6A1309","b":"559577AF1000","o":"2BB0309","s":"_ZN5mongo14ServiceContext20makeOperationContextEPNS_6ClientE","s+":"129"},{"a":"55957A695C87","b":"559577AF1000","o":"2BA4C87","s":"_ZN5mongo6Client20makeOperationContextEv","s+":"27"},{"a":"55957A49E932","b":"559577AF1000","o":"29AD932","s":"_ZN5mongo21KeysCollectionManager14PeriodicRunner18_doPeriodicRefreshEPNS_14ServiceContextENSt7cxx1112basic_stringIcSt11char_traitsIcESaIcEEENS_8DurationISt5ratioILl1ELl1000EEEE","s+":"162"},{"a":"55957A4A0893","b":"559577AF1000","o":"29AF893","s":"_ZNSt6thread11_State_implINS_8_InvokerISt5tupleIJZN5mongo4stdx6threadC4IZNS3_21KeysCollectionManager14PeriodicRunner5startEPNS3_14ServiceContextERKNSt7_cxx1112basic_stringIcSt11char_traitsIcESaIcEEENS3_8DurationISt5ratioILl1ELl1000EEEEEUlvE_JELi0EEET_DpOT0_EUlvE_EEEEE6_M_runEv","s+":"93"},{"a":"55957A9857BF","b":"559577AF1000","o":"2E947BF","s":"execute_native_thread_routine","s+":"F"},{"a":"7FF1A314BFA3","b":"7FF1A3144000","o":"7FA3","s":"start_thread","s+":"F3"},{"a":"7FF1A307C4CF","b":"7FF1A2F83000","o":"F94CF","s":"clone","s+":"3F"}],"processInfo":{"mongodbVersion":"4.4.3","gitVersion":"913d6b62acfbb344dde1b116f4161360acd8fd13","compiledModules":[],"uname":

{"sysname":"Linux","release":"4.19.0-10-amd64","version":"#1 SMP Debian 4.19.132-1 (2020-07-24)","machine":"x86_64"}

,"somap":[\{"b":"559577AF1000","elfType":3,"buildId":"6C8A93F8D2B544901FC58C1CCD203AEA182627B5"},\{"b":"7FF1A3144000","path":"/lib/x86_64-linux-gnu/libpthread.so.0","elfType":3,"buildId":"E91114987A0147BD050ADDBD591EB8994B29F4B3"},\{"b":"7FF1A2F83000","path":"/lib/x86_64-linux-gnu/libc.so.6","elfType":3,"buildId":"18B9A9A8C523E5CFE5B5D946D605D09242F09798"}]}}}}

{"t":\{"$date":"2021-02-13T22:10:49.161+01:00"}

,"s":"I", "c":"CONTROL", "id":31427, "ctx":"monitoring-keys-for-HMAC","msg":" Frame: {frame}","attr":{"frame":

{"a":"55957A7DFC41","b":"559577AF1000","o":"2CEEC41","s":"_ZN5mongo18stack_trace_detail12_GLOBAL__N_119printStackTraceImplERKNS1_7OptionsEPNS_14StackTraceSinkE.constprop.606","s+":"1E1"}

}}

{"t":\{"$date":"2021-02-13T22:10:49.161+01:00"}

,"s":"I", "c":"CONTROL", "id":31427, "ctx":"monitoring-keys-for-HMAC","msg":" Frame: {frame}","attr":{"frame":

{"a":"55957A7E1279","b":"559577AF1000","o":"2CF0279","s":"_ZN5mongo15printStackTraceEv","s+":"29"}

}}

{"t":\{"$date":"2021-02-13T22:10:49.161+01:00"}

,"s":"I", "c":"CONTROL", "id":31427, "ctx":"monitoring-keys-for-HMAC","msg":" Frame: {frame}","attr":{"frame":

{"a":"55957A7DEAD6","b":"559577AF1000","o":"2CEDAD6","s":"_ZN5mongo12_GLOBAL__N_116abruptQuitActionEiP9siginfo_tPv","s+":"66"}

}}

{"t":\{"$date":"2021-02-13T22:10:49.161+01:00"}

,"s":"I", "c":"CONTROL", "id":31427, "ctx":"monitoring-keys-for-HMAC","msg":" Frame: {frame}","attr":{"frame":

{"a":"7FF1A3156730","b":"7FF1A3144000","o":"12730","s":"funlockfile","s+":"50"}

}}

{"t":\{"$date":"2021-02-13T22:10:49.161+01:00"}

,"s":"I", "c":"CONTROL", "id":31427, "ctx":"monitoring-keys-for-HMAC","msg":" Frame: {frame}","attr":{"frame":

{"a":"7FF1A2FBA7BB","b":"7FF1A2F83000","o":"377BB","s":"gsignal","s+":"10B"}

}}

{"t":\{"$date":"2021-02-13T22:10:49.161+01:00"}

,"s":"I", "c":"CONTROL", "id":31427, "ctx":"monitoring-keys-for-HMAC","msg":" Frame: {frame}","attr":{"frame":

{"a":"7FF1A2FA5535","b":"7FF1A2F83000","o":"22535","s":"abort","s+":"121"}

}}

{"t":\{"$date":"2021-02-13T22:10:49.161+01:00"}

,"s":"I", "c":"CONTROL", "id":31427, "ctx":"monitoring-keys-for-HMAC","msg":" Frame: {frame}","attr":{"frame":

{"a":"559578A07BF3","b":"559577AF1000","o":"F16BF3","s":"_ZN5mongo15invariantFailedEPKcS1_j","s+":"12C"}

}}

{"t":\{"$date":"2021-02-13T22:10:49.161+01:00"}

,"s":"I", "c":"CONTROL", "id":31427, "ctx":"monitoring-keys-for-HMAC","msg":" Frame: {frame}","attr":{"frame":

{"a":"5595789DB1AA","b":"559577AF1000","o":"EEA1AA","s":"_ZN5mongo6Client19setOperationContextEPNS_16OperationContextE.cold.135","s+":"18"}

}}

{"t":\{"$date":"2021-02-13T22:10:49.161+01:00"}

,"s":"I", "c":"CONTROL", "id":31427, "ctx":"monitoring-keys-for-HMAC","msg":" Frame: {frame}","attr":{"frame":

{"a":"55957A6A1309","b":"559577AF1000","o":"2BB0309","s":"_ZN5mongo14ServiceContext20makeOperationContextEPNS_6ClientE","s+":"129"}

}}

{"t":\{"$date":"2021-02-13T22:10:49.161+01:00"}

,"s":"I", "c":"CONTROL", "id":31427, "ctx":"monitoring-keys-for-HMAC","msg":" Frame: {frame}","attr":{"frame":

{"a":"55957A695C87","b":"559577AF1000","o":"2BA4C87","s":"_ZN5mongo6Client20makeOperationContextEv","s+":"27"}

}}

{"t":\{"$date":"2021-02-13T22:10:49.161+01:00"}

,"s":"I", "c":"CONTROL", "id":31427, "ctx":"monitoring-keys-for-HMAC","msg":" Frame: {frame}","attr":{"frame":

{"a":"55957A49E932","b":"559577AF1000","o":"29AD932","s":"_ZN5mongo21KeysCollectionManager14PeriodicRunner18_doPeriodicRefreshEPNS_14ServiceContextENSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEENS_8DurationISt5ratioILl1ELl1000EEEE","s+":"162"}

}}

{"t":\{"$date":"2021-02-13T22:10:49.161+01:00"}

,"s":"I", "c":"CONTROL", "id":31427, "ctx":"monitoring-keys-for-HMAC","msg":" Frame: {frame}","attr":{"frame":

{"a":"55957A4A0893","b":"559577AF1000","o":"29AF893","s":"_ZNSt6thread11_State_implINS_8_InvokerISt5tupleIJZN5mongo4stdx6threadC4IZNS3_21KeysCollectionManager14PeriodicRunner5startEPNS3_14ServiceContextERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEENS3_8DurationISt5ratioILl1ELl1000EEEEEUlvE_JELi0EEET_DpOT0_EUlvE_EEEEE6_M_runEv","s+":"93"}

}}

{"t":\{"$date":"2021-02-13T22:10:49.161+01:00"}

,"s":"I", "c":"CONTROL", "id":31427, "ctx":"monitoring-keys-for-HMAC","msg":" Frame: {frame}","attr":{"frame":

{"a":"55957A9857BF","b":"559577AF1000","o":"2E947BF","s":"execute_native_thread_routine","s+":"F"}

}}

{"t":\{"$date":"2021-02-13T22:10:49.161+01:00"}

,"s":"I", "c":"CONTROL", "id":31427, "ctx":"monitoring-keys-for-HMAC","msg":" Frame: {frame}","attr":{"frame":

{"a":"7FF1A314BFA3","b":"7FF1A3144000","o":"7FA3","s":"start_thread","s+":"F3"}

}}

{"t":\{"$date":"2021-02-13T22:10:49.161+01:00"}

,"s":"I", "c":"CONTROL", "id":31427, "ctx":"monitoring-keys-for-HMAC","msg":" Frame: {frame}","attr":{"frame":

{"a":"7FF1A307C4CF","b":"7FF1A2F83000","o":"F94CF","s":"clone","s+":"3F"}

}}

In next day the second slave crush, and yesterday third slave and master in the same moment, all servers have such chart:

 

Comment by Mitereiter Balazs Zoltan [ 09/Feb/21 ]

Does this issue only affect version 4.4 only? Would a downgrade solve the problem?

Comment by Konstantin Krasnov [ 03/Feb/21 ]

We are not saving core dumps yet. Currently only diagnostic.data is available.

We will consider the option of enabling core dumps.

Comment by Matthew Tretin (Inactive) [ 02/Feb/21 ]

We're tracking this in SERVER-53566 – it's at the top of our list to look at next, any core dumps would be very helpful! 

Comment by Konstantin Krasnov [ 31/Jan/21 ]

We just experienced the invariant failure. Are you looking for diagnostic.data?

Comment by Brian Granetzke [ 22/Dec/20 ]

I've opened an issue that looks similar to the symptoms above: SERVER-53482 except: v4.4.1 and 4.4.2-ent on Amazon Linux 2.  Also, twice, it crashed all three nodes on the cluster within seconds.

Comment by yang jianghua [ 03/Dec/20 ]

 I have met a same problem as well as the reporter. Hope to get a solution as soon as possible. 

Comment by Dmitry Agranat [ 15/Nov/20 ]

Thanks rectalogic for providing the requested information, we are looking into this and will provide updates based on our foundings.

Comment by Andrew Wason [ 10/Nov/20 ]

Uploaded

Comment by Dmitry Agranat [ 10/Nov/20 ]

Hi rectalogic,

Thank you for the report.

Would you please archive (tar or zip) the full mongod.log files and the $dbpath/diagnostic.data directory (the contents are described here) and upload them to this support uploader location?

Files uploaded to this portal are visible only to MongoDB employees and are routinely deleted after some time.

Thanks,
Dima

Generated at Thu Feb 08 05:28:51 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.