[SERVER-52735] mongodb crash with "Invariant failure" opCtx != nullptr && _opCtx == nullptr Created: 10/Nov/20 Updated: 06/Dec/22 Resolved: 24/Mar/21 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Stability |
| Affects Version/s: | 4.4.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Andrew Wason | Assignee: | Backlog - Service Architecture |
| Resolution: | Duplicate | Votes: | 10 |
| Labels: | sa-groomed | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||||||||||||||||||||||||||||||||||||||
| Issue Links: |
|
||||||||||||||||||||||||||||||||||||||||||||
| Assigned Teams: |
Service Arch
|
||||||||||||||||||||||||||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||||||||||||||||||||||||||
| Sprint: | Security 2020-11-30 | ||||||||||||||||||||||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||||||||||||||||||||||
| Case: | (copied to CRM) | ||||||||||||||||||||||||||||||||||||||||||||
| Description |
|
Running mondodb 4.4.0 on Ubuntu 20.04.1 one of our replicas crashed with an assertion. How do we diagnose this?
|
| Comments |
| Comment by Tyler Seip (Inactive) [ 24/Mar/21 ] | |
|
Actually, closing as a duplicate of | |
| Comment by Tyler Seip (Inactive) [ 24/Mar/21 ] | |
|
This should be resolved by | |
| Comment by Luca - [ 09/Mar/21 ] | |
|
Hi, We also encountered the same issue ( "ctx":"monitoring-keys-for-HMAC","msg":"Invariant failure","attr":{"expr":"opCtx != nullptr && _opCtx == nullptr","file":"src/mongo/db/client.cpp","line":126}} ) on the replicates of one of our mongo bases, two days apart between replicate 1 (crash on March 8, at 23:56:15 UTC) and replicate 2 (crash this morning, March 9 at 00:04:25 UTC) with exactly the same log/stacktrace as the one present in the description. mongodb Version: 4.4.3 I hope this will help and that a fix / workaround can be found quickly for version 4.4. | |
| Comment by Francesco Rivola [ 08/Mar/21 ] | |
|
We experienced again the same issue in production (after we initially reported in I have a couple of questions:
| |
| Comment by Ratika Gandhi [ 02/Mar/21 ] | |
|
We are investigating this work on this ticket - | |
| Comment by Flavio Liroa Jr [ 23/Feb/21 ] | |
|
Same problem here. In my case it happened after running a "compact" in one of the secondaries. ( Please, when will you have the solution? | |
| Comment by Юрий Соколов [ 21/Feb/21 ] | |
|
Looks like logs has not enough information: if there were thread id, then it will be clearer who is setting opCtx. | |
| Comment by Юрий Соколов [ 21/Feb/21 ] | |
|
Same here. In cluster of 8 shards every shard failed close to each other. Some shards had 2 replicas failed, some 3 replicas. Some of them in `"ctx":"monitoring-keys-for-HMAC"`, some in `"ctx":"TopologyVersionObserver"` | |
| Comment by Rob Gillan [ 18/Feb/21 ] | |
|
We've just had the same crash, 3 shard cluster, each triple replicaset in different geographic region on AWS, Ubuntu 16.04, Mongo 4.4.1, Feature Set 4.4
| |
| Comment by Mirek Sosna [ 16/Feb/21 ] | |
|
I have this issue, I manage repikaSet with 4 servers and Arbiter, 3 days ago one slave crush with: {"t":\{"$date":"2021-02-13T22:10:45.038+01:00"},"s":"I", "c":"-", "id":4333222, "ctx":"ReplicaSetMonitor-TaskExecutor","msg":"RSM received failed isMaster","attr":{"host":"172.16.3.114:27017","error":"NetworkInterfaceExceededTimeLimit: Couldn't get a connection within the time limit of 1000ms","replicaSet":"packets-prod","isMasterReply":"{}"}} {"t":\{"$date":"2021-02-13T22:10:45.038+01:00"},"s":"I", "c":"NETWORK", "id":4712102, "ctx":"ReplicaSetMonitor-TaskExecutor","msg":"Host failed in replica set","attr":{"replicaSet":"packets-prod","host":"172.16.3.114:27017","error": {"code":202,"codeName":"NetworkInterfaceExceededTimeLimit","errmsg":"Couldn't get a connection within the time limit of 1000ms"},"action":{"dropConnections":false,"requestImmediateCheck":false,"outcome":{"host":"172.16.3.114:27017","success":false,"errorMessage":"NetworkInterfaceExceededTimeLimit: Couldn't get a connection within the time limit of 1000ms"}}}} {"t":\{"$date":"2021-02-13T22:10:48.933+01:00"},"s":"F", "c":"-", "id":23079, "ctx":"monitoring-keys-for-HMAC","msg":"Invariant failure","attr":{"expr":"opCtx != nullptr && _opCtx == nullptr","file":"src/mongo/db/client.cpp","line":126}} {"t":\{"$date":"2021-02-13T22:10:48.933+01:00"},"s":"F", "c":"-", "id":23080, "ctx":"monitoring-keys-for-HMAC","msg":"\n\n***aborting after invariant() failure\n\n"} {"t":\{"$date":"2021-02-13T22:10:48.933+01:00"},"s":"F", "c":"CONTROL", "id":4757800, "ctx":"monitoring-keys-for-HMAC","msg":"Writing fatal message","attr":{"message":"Got signal: 6 (Aborted).\n"}} {"t":\{"$date":"2021-02-13T22:10:49.160+01:00"},"s":"I", "c":"CONTROL", "id":31431, "ctx":"monitoring-keys-for-HMAC","msg":"BACKTRACE: {bt}","attr":{"bt":{"backtrace":[ {"a":"55957A7DFC41","b":"559577AF1000","o":"2CEEC41","s":"_ZN5mongo18stack_trace_detail12_GLOBAL__N_119printStackTraceImplERKNS1_7OptionsEPNS_14StackTraceSinkE.constprop.606","s+":"1E1"},{"a":"55957A7E1279","b":"559577AF1000","o":"2CF0279","s":"ZN5mongo15printStackTraceEv","s+":"29"},{"a":"55957A7DEAD6","b":"559577AF1000","o":"2CEDAD6","s":"_ZN5mongo12_GLOBAL_N_116abruptQuitActionEiP9siginfo_tPv","s+":"66"},{"a":"7FF1A3156730","b":"7FF1A3144000","o":"12730","s":"funlockfile","s+":"50"},{"a":"7FF1A2FBA7BB","b":"7FF1A2F83000","o":"377BB","s":"gsignal","s+":"10B"},{"a":"7FF1A2FA5535","b":"7FF1A2F83000","o":"22535","s":"abort","s+":"121"},{"a":"559578A07BF3","b":"559577AF1000","o":"F16BF3","s":"ZN5mongo15invariantFailedEPKcS1_j","s+":"12C"},{"a":"5595789DB1AA","b":"559577AF1000","o":"EEA1AA","s":"_ZN5mongo6Client19setOperationContextEPNS_16OperationContextE.cold.135","s+":"18"},{"a":"55957A6A1309","b":"559577AF1000","o":"2BB0309","s":"_ZN5mongo14ServiceContext20makeOperationContextEPNS_6ClientE","s+":"129"},{"a":"55957A695C87","b":"559577AF1000","o":"2BA4C87","s":"_ZN5mongo6Client20makeOperationContextEv","s+":"27"},{"a":"55957A49E932","b":"559577AF1000","o":"29AD932","s":"_ZN5mongo21KeysCollectionManager14PeriodicRunner18_doPeriodicRefreshEPNS_14ServiceContextENSt7cxx1112basic_stringIcSt11char_traitsIcESaIcEEENS_8DurationISt5ratioILl1ELl1000EEEE","s+":"162"},{"a":"55957A4A0893","b":"559577AF1000","o":"29AF893","s":"_ZNSt6thread11_State_implINS_8_InvokerISt5tupleIJZN5mongo4stdx6threadC4IZNS3_21KeysCollectionManager14PeriodicRunner5startEPNS3_14ServiceContextERKNSt7_cxx1112basic_stringIcSt11char_traitsIcESaIcEEENS3_8DurationISt5ratioILl1ELl1000EEEEEUlvE_JELi0EEET_DpOT0_EUlvE_EEEEE6_M_runEv","s+":"93"},{"a":"55957A9857BF","b":"559577AF1000","o":"2E947BF","s":"execute_native_thread_routine","s+":"F"},{"a":"7FF1A314BFA3","b":"7FF1A3144000","o":"7FA3","s":"start_thread","s+":"F3"},{"a":"7FF1A307C4CF","b":"7FF1A2F83000","o":"F94CF","s":"clone","s+":"3F"}],"processInfo":{"mongodbVersion":"4.4.3","gitVersion":"913d6b62acfbb344dde1b116f4161360acd8fd13","compiledModules":[],"uname": {"sysname":"Linux","release":"4.19.0-10-amd64","version":"#1 SMP Debian 4.19.132-1 (2020-07-24)","machine":"x86_64"},"somap":[\{"b":"559577AF1000","elfType":3,"buildId":"6C8A93F8D2B544901FC58C1CCD203AEA182627B5"},\{"b":"7FF1A3144000","path":"/lib/x86_64-linux-gnu/libpthread.so.0","elfType":3,"buildId":"E91114987A0147BD050ADDBD591EB8994B29F4B3"},\{"b":"7FF1A2F83000","path":"/lib/x86_64-linux-gnu/libc.so.6","elfType":3,"buildId":"18B9A9A8C523E5CFE5B5D946D605D09242F09798"}]}}}} {"t":\{"$date":"2021-02-13T22:10:49.161+01:00"},"s":"I", "c":"CONTROL", "id":31427, "ctx":"monitoring-keys-for-HMAC","msg":" Frame: {frame}","attr":{"frame": {"a":"55957A7DFC41","b":"559577AF1000","o":"2CEEC41","s":"_ZN5mongo18stack_trace_detail12_GLOBAL__N_119printStackTraceImplERKNS1_7OptionsEPNS_14StackTraceSinkE.constprop.606","s+":"1E1"}}} {"t":\{"$date":"2021-02-13T22:10:49.161+01:00"},"s":"I", "c":"CONTROL", "id":31427, "ctx":"monitoring-keys-for-HMAC","msg":" Frame: {frame}","attr":{"frame": {"a":"55957A7E1279","b":"559577AF1000","o":"2CF0279","s":"_ZN5mongo15printStackTraceEv","s+":"29"}}} {"t":\{"$date":"2021-02-13T22:10:49.161+01:00"},"s":"I", "c":"CONTROL", "id":31427, "ctx":"monitoring-keys-for-HMAC","msg":" Frame: {frame}","attr":{"frame": {"a":"55957A7DEAD6","b":"559577AF1000","o":"2CEDAD6","s":"_ZN5mongo12_GLOBAL__N_116abruptQuitActionEiP9siginfo_tPv","s+":"66"}}} {"t":\{"$date":"2021-02-13T22:10:49.161+01:00"},"s":"I", "c":"CONTROL", "id":31427, "ctx":"monitoring-keys-for-HMAC","msg":" Frame: {frame}","attr":{"frame": {"a":"7FF1A3156730","b":"7FF1A3144000","o":"12730","s":"funlockfile","s+":"50"}}} {"t":\{"$date":"2021-02-13T22:10:49.161+01:00"},"s":"I", "c":"CONTROL", "id":31427, "ctx":"monitoring-keys-for-HMAC","msg":" Frame: {frame}","attr":{"frame": {"a":"7FF1A2FBA7BB","b":"7FF1A2F83000","o":"377BB","s":"gsignal","s+":"10B"}}} {"t":\{"$date":"2021-02-13T22:10:49.161+01:00"},"s":"I", "c":"CONTROL", "id":31427, "ctx":"monitoring-keys-for-HMAC","msg":" Frame: {frame}","attr":{"frame": {"a":"7FF1A2FA5535","b":"7FF1A2F83000","o":"22535","s":"abort","s+":"121"}}} {"t":\{"$date":"2021-02-13T22:10:49.161+01:00"},"s":"I", "c":"CONTROL", "id":31427, "ctx":"monitoring-keys-for-HMAC","msg":" Frame: {frame}","attr":{"frame": {"a":"559578A07BF3","b":"559577AF1000","o":"F16BF3","s":"_ZN5mongo15invariantFailedEPKcS1_j","s+":"12C"}}} {"t":\{"$date":"2021-02-13T22:10:49.161+01:00"},"s":"I", "c":"CONTROL", "id":31427, "ctx":"monitoring-keys-for-HMAC","msg":" Frame: {frame}","attr":{"frame": {"a":"5595789DB1AA","b":"559577AF1000","o":"EEA1AA","s":"_ZN5mongo6Client19setOperationContextEPNS_16OperationContextE.cold.135","s+":"18"}}} {"t":\{"$date":"2021-02-13T22:10:49.161+01:00"},"s":"I", "c":"CONTROL", "id":31427, "ctx":"monitoring-keys-for-HMAC","msg":" Frame: {frame}","attr":{"frame": {"a":"55957A6A1309","b":"559577AF1000","o":"2BB0309","s":"_ZN5mongo14ServiceContext20makeOperationContextEPNS_6ClientE","s+":"129"}}} {"t":\{"$date":"2021-02-13T22:10:49.161+01:00"},"s":"I", "c":"CONTROL", "id":31427, "ctx":"monitoring-keys-for-HMAC","msg":" Frame: {frame}","attr":{"frame": {"a":"55957A695C87","b":"559577AF1000","o":"2BA4C87","s":"_ZN5mongo6Client20makeOperationContextEv","s+":"27"}}} {"t":\{"$date":"2021-02-13T22:10:49.161+01:00"},"s":"I", "c":"CONTROL", "id":31427, "ctx":"monitoring-keys-for-HMAC","msg":" Frame: {frame}","attr":{"frame": {"a":"55957A49E932","b":"559577AF1000","o":"29AD932","s":"_ZN5mongo21KeysCollectionManager14PeriodicRunner18_doPeriodicRefreshEPNS_14ServiceContextENSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEENS_8DurationISt5ratioILl1ELl1000EEEE","s+":"162"}}} {"t":\{"$date":"2021-02-13T22:10:49.161+01:00"},"s":"I", "c":"CONTROL", "id":31427, "ctx":"monitoring-keys-for-HMAC","msg":" Frame: {frame}","attr":{"frame": {"a":"55957A4A0893","b":"559577AF1000","o":"29AF893","s":"_ZNSt6thread11_State_implINS_8_InvokerISt5tupleIJZN5mongo4stdx6threadC4IZNS3_21KeysCollectionManager14PeriodicRunner5startEPNS3_14ServiceContextERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEENS3_8DurationISt5ratioILl1ELl1000EEEEEUlvE_JELi0EEET_DpOT0_EUlvE_EEEEE6_M_runEv","s+":"93"}}} {"t":\{"$date":"2021-02-13T22:10:49.161+01:00"},"s":"I", "c":"CONTROL", "id":31427, "ctx":"monitoring-keys-for-HMAC","msg":" Frame: {frame}","attr":{"frame": {"a":"55957A9857BF","b":"559577AF1000","o":"2E947BF","s":"execute_native_thread_routine","s+":"F"}}} {"t":\{"$date":"2021-02-13T22:10:49.161+01:00"},"s":"I", "c":"CONTROL", "id":31427, "ctx":"monitoring-keys-for-HMAC","msg":" Frame: {frame}","attr":{"frame": {"a":"7FF1A314BFA3","b":"7FF1A3144000","o":"7FA3","s":"start_thread","s+":"F3"}}} {"t":\{"$date":"2021-02-13T22:10:49.161+01:00"},"s":"I", "c":"CONTROL", "id":31427, "ctx":"monitoring-keys-for-HMAC","msg":" Frame: {frame}","attr":{"frame": {"a":"7FF1A307C4CF","b":"7FF1A2F83000","o":"F94CF","s":"clone","s+":"3F"}}} In next day the second slave crush, and yesterday third slave and master in the same moment, all servers have such chart:
| |
| Comment by Mitereiter Balazs Zoltan [ 09/Feb/21 ] | |
|
Does this issue only affect version 4.4 only? Would a downgrade solve the problem? | |
| Comment by Konstantin Krasnov [ 03/Feb/21 ] | |
|
We are not saving core dumps yet. Currently only diagnostic.data is available. We will consider the option of enabling core dumps. | |
| Comment by Matthew Tretin (Inactive) [ 02/Feb/21 ] | |
|
We're tracking this in | |
| Comment by Konstantin Krasnov [ 31/Jan/21 ] | |
|
We just experienced the invariant failure. Are you looking for diagnostic.data? | |
| Comment by Brian Granetzke [ 22/Dec/20 ] | |
|
I've opened an issue that looks similar to the symptoms above: | |
| Comment by yang jianghua [ 03/Dec/20 ] | |
|
| |
| Comment by Dmitry Agranat [ 15/Nov/20 ] | |
|
Thanks rectalogic for providing the requested information, we are looking into this and will provide updates based on our foundings. | |
| Comment by Andrew Wason [ 10/Nov/20 ] | |
|
Uploaded | |
| Comment by Dmitry Agranat [ 10/Nov/20 ] | |
|
Hi rectalogic, Thank you for the report. Would you please archive (tar or zip) the full mongod.log files and the $dbpath/diagnostic.data directory (the contents are described here) and upload them to this support uploader location? Files uploaded to this portal are visible only to MongoDB employees and are routinely deleted after some time. Thanks, |