[SERVER-51386] Mongo 4.4.1 Crashes often Created: 06/Oct/20  Updated: 01/Feb/21  Resolved: 01/Feb/21

Status: Closed
Project: Core Server
Component/s: WiredTiger
Affects Version/s: 4.4.1
Fix Version/s: None

Type: Question Priority: Major - P3
Reporter: Michael Moore Assignee: Jonathan Streets (Inactive)
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File SERVER-51386.png     PNG File image-2020-10-08-13-22-04-886.png     PNG File screenshot-1.png    
Issue Links:
Related
related to SERVER-50971 Invariant failure, WT_NOTFOUND: item ... Closed
related to SERVER-50880 Mongod Server Failed with signal 6 Closed
Sprint: Storage - Ra 2020-10-19, Storage - Ra 2020-11-02
Participants:
Story Points: 0

 Description   

I have a replica set where one or more members crash almost daily.  I finally caught a stack trace of the crash.  I'm not sure how to fix this, or interpret the stack trace.

This is 4.4.1 on Ubuntu 20, with proper ulimits set.  I also have the following sysctls set:

{{vm.swappiness=1
}}{{net.ipv4.tcp_keepalive_probes = 6
}}{{net.core.somaxconn=4096
}}{{net.ipv4.tcp_fin_timeout=30
}}{{net.ipv4.tcp_keepalive_intvl=30
}}{{net.ipv4.tcp_keepalive_time=120
}}{{net.ipv4.tcp_max_syn_backlog=4096
}}{{vm.max_map_count=9999999
}}{{net.ipv4.ip_local_port_range = 1024 65530
}}{{vm.zone_reclaim_mode=0
}}{{vm.dirty_ratio = 15
}}{{vm.dirty_background_ratio = 5}}

 
Backtrace:

{"t":\{"$date":"2020-10-05T22:25:46.854+00:00"},"s":"F",  "c":"-",        "id":23083,   "ctx":"conn503635","msg":"Invariant failure","attr":\{"expr":"ret","error":"UnknownError: -31803: WT_NOTFOUND: item not found","file":"src/mongo/db/storage/wiredtiger/wiredtiger_record_store.cpp","line":1600}}
 
{"t":\{"$date":"2020-10-05T22:25:46.856+00:00"},"s":"F",  "c":"-",        "id":23084,   "ctx":"conn503635","msg":"\n\n***aborting after invariant() failure\n\n"}
 
{"t":\{"$date":"2020-10-05T22:25:46.857+00:00"},"s":"F",  "c":"CONTROL",  "id":4757800, "ctx":"conn503635","msg":"Writing fatal message","attr":\{"message":"Got signal: 6 (Aborted).\n"}}
 
{"t":\{"$date":"2020-10-05T22:25:47.177+00:00"},"s":"I",  "c":"CONTROL",  "id":31431,   "ctx":"conn503635","msg":"BACKTRACE: \{bt}","attr":\{"bt":{"backtrace":[{"a":"55FF176683B1","b":"55FF14869000","o":"2DFF3B1","s":"_ZN5mongo18stack_trace_detail12_GLOBAL__N_119printStackTraceImplERKNS1_7OptionsEPNS_14StackTraceSinkE.constprop.606","s+":"1E1"},\{"a":"55FF176699E9","b":"55FF14869000","o":"2E009E9","s":"_ZN5mongo15printStackTraceEv","s+":"29"},\{"a":"55FF17667246","b":"55FF14869000","o":"2DFE246","s":"_ZN5mongo12_GLOBAL__N_116abruptQuitActionEiP9siginfo_tPv","s+":"66"},\{"a":"7F938DF733C0","b":"7F938DF5E000","o":"153C0","s":"funlockfile","s+":"60"},\{"a":"7F938DDB218B","b":"7F938DD6C000","o":"4618B","s":"gsignal","s+":"CB"},\{"a":"7F938DD91859","b":"7F938DD6C000","o":"25859","s":"abort","s+":"12B"},\{"a":"55FF157CCEE1","b":"55FF14869000","o":"F63EE1","s":"_ZN5mongo17invariantOKFailedEPKcRKNS_6StatusES1_j","s+":"179"},\{"a":"55FF154EEB5D","b":"55FF14869000","o":"C85B5D","s":"_ZN5mongo21WiredTigerRecordStore12updateRecordEPNS_16OperationContextERKNS_8RecordIdEPKci.cold.1749","s+":"99"},\{"a":"55FF161F4E5F","b":"55FF14869000","o":"198BE5F","s":"_ZN5mongo14CollectionImpl14updateDocumentEPNS_16OperationContextENS_8RecordIdERKNS_11SnapshottedINS_7BSONObjEEERKS5_bPNS_7OpDebugEPNS_20CollectionUpdateArgsE","s+":"1DF"},\{"a":"55FF1639A4BF","b":"55FF14869000","o":"1B314BF","s":"_ZN5mongo11UpdateStage18transformAndUpdateERKNS_11SnapshottedINS_7BSONObjEEERNS_8RecordIdE","s+":"A3F"},\{"a":"55FF1639AEC1","b":"55FF14869000","o":"1B31EC1","s":"_ZN5mongo11UpdateStage6doWorkEPm","s+":"361"},\{"a":"55FF1639D06B","b":"55FF14869000","o":"1B3406B","s":"_ZN5mongo11UpsertStage6doWorkEPm","s+":"7B"},\{"a":"55FF1637FBA8","b":"55FF14869000","o":"1B16BA8","s":"_ZN5mongo9PlanStage4workEPm","s+":"68"},\{"a":"55FF163C41C2","b":"55FF14869000","o":"1B5B1C2","s":"_ZN5mongo16PlanExecutorImpl12_getNextImplEPNS_11SnapshottedINS_8DocumentEEEPNS_8RecordIdE","s+":"222"},\{"a":"55FF163C4C0B","b":"55FF14869000","o":"1B5BC0B","s":"_ZN5mongo16PlanExecutorImpl7getNextEPNS_8DocumentEPNS_8RecordIdE","s+":"4B"},\{"a":"55FF163C4D4D","b":"55FF14869000","o":"1B5BD4D","s":"_ZN5mongo16PlanExecutorImpl11executePlanEv","s+":"4D"},\{"a":"55FF160E8E80","b":"55FF14869000","o":"187FE80","s":"_ZN5mongo14performUpdatesEPNS_16OperationContextERKNS_9write_ops6UpdateE","s+":"E60"},\{"a":"55FF1604E3A6","b":"55FF14869000","o":"17E53A6","s":"_ZNK5mongo12_GLOBAL__N_19CmdUpdate10Invocation7runImplEPNS_16OperationContextERNS_14BSONObjBuilderE","s+":"46"},\{"a":"55FF1604BEED","b":"55FF14869000","o":"17E2EED","s":"_ZN5mongo12_GLOBAL__N_112WriteCommand14InvocationBase3runEPNS_16OperationContextEPNS_3rpc21ReplyBuilderInterfaceE","s+":"15D"},\{"a":"55FF165AA59F","b":"55FF14869000","o":"1D4159F","s":"_ZN5mongo14CommandHelpers20runCommandInvocationEPNS_16OperationContextERKNS_12OpMsgRequestEPNS_17CommandInvocationEPNS_3rpc21ReplyBuilderInterfaceE","s+":"7F"},\{"a":"55FF15D5FA81","b":"55FF14869000","o":"14F6A81","s":"_ZN5mongo12_GLOBAL__N_127invokeWithSessionCheckedOutEPNS_16OperationContextERKNS_12OpMsgRequestEPNS_17CommandInvocationERKNS_30OperationSessionInfoFromClientEPNS_3rpc21ReplyBuilderInterfaceE","s+":"221"},\{"a":"55FF15D609F8","b":"55FF14869000","o":"14F79F8","s":"_ZN5mongo12_GLOBAL__N_114runCommandImplEPNS_16OperationContextEPNS_17CommandInvocationERKNS_12OpMsgRequestEPNS_3rpc21ReplyBuilderInterfaceENS_11LogicalTimeERKNS_23ServiceEntryPointCommon5HooksEPNS_14BSONObjBuilderERKNS_30OperationSessionInfoFromClientE","s+":"7C8"},\{"a":"55FF15D631C9","b":"55FF14869000","o":"14FA1C9","s":"_ZN5mongo12_GLOBAL__N_119execCommandDatabaseEPNS_16OperationContextEPNS_7CommandERKNS_12OpMsgRequestEPNS_3rpc21ReplyBuilderInterfaceERKNS_23ServiceEntryPointCommon5HooksE","s+":"11B9"},\{"a":"55FF15D6448B","b":"55FF14869000","o":"14FB48B","s":"_ZN5mongo12_GLOBAL__N_116receivedCommandsEPNS_16OperationContextERKNS_7MessageERKNS_23ServiceEntryPointCommon5HooksE","s+":"62B"},\{"a":"55FF15D650CD","b":"55FF14869000","o":"14FC0CD","s":"_ZN5mongo23ServiceEntryPointCommon13handleRequestEPNS_16OperationContextERKNS_7MessageERKNS0_5HooksE","s+":"50D"},\{"a":"55FF15D5314C","b":"55FF14869000","o":"14EA14C","s":"_ZN5mongo23ServiceEntryPointMongod13handleRequestEPNS_16OperationContextERKNS_7MessageE","s+":"3C"},\{"a":"55FF15D5D65C","b":"55FF14869000","o":"14F465C","s":"_ZN5mongo19ServiceStateMachine15_processMessageENS0_11ThreadGuardE","s+":"FC"},\{"a":"55FF15D5B505","b":"55FF14869000","o":"14F2505","s":"_ZN5mongo19ServiceStateMachine15_runNextInGuardENS0_11ThreadGuardE","s+":"125"},\{"a":"55FF15D5C4B6","b":"55FF14869000","o":"14F34B6","s":"_ZNSt17_Function_handlerIFvvEZN5mongo19ServiceStateMachine22_scheduleNextWithGuardENS2_11ThreadGuardENS1_9transport15ServiceExecutor13ScheduleFlagsENS4_23ServiceExecutorTaskNameENS2_9OwnershipEEUlvE_E9_M_invokeERKSt9_Any_data","s+":"56"},\{"a":"55FF1702E042","b":"55FF14869000","o":"27C5042","s":"_ZN5mongo9transport26ServiceExecutorSynchronous8scheduleESt8functionIFvvEENS0_15ServiceExecutor13ScheduleFlagsENS0_23ServiceExecutorTaskNameE","s+":"182"},\{"a":"55FF15D5A91B","b":"55FF14869000","o":"14F191B","s":"_ZN5mongo19ServiceStateMachine22_scheduleNextWithGuardENS0_11ThreadGuardENS_9transport15ServiceExecutor13ScheduleFlagsENS2_23ServiceExecutorTaskNameENS0_9OwnershipE","s+":"DB"},\{"a":"55FF15D5BC7D","b":"55FF14869000","o":"14F2C7D","s":"_ZN5mongo19ServiceStateMachine15_sourceCallbackENS_6StatusE","s+":"6AD"},\{"a":"55FF15D5BD60","b":"55FF14869000","o":"14F2D60","s":"_ZN5mongo14future_details4callIRZNS_19ServiceStateMachine14_sourceMessageENS2_11ThreadGuardEEUlNS_10StatusWithINS_7MessageEEEE0_S6_EEDaOT_OT0_","s+":"60"},\{"a":"55FF15D5BFE5","b":"55FF14869000","o":"14F2FE5","s":"_ZN5mongo19ServiceStateMachine14_sourceMessageENS0_11ThreadGuardE","s+":"145"},\{"a":"55FF15D5B57A","b":"55FF14869000","o":"14F257A","s":"_ZN5mongo19ServiceStateMachine15_runNextInGuardENS0_11ThreadGuardE","s+":"19A"},\{"a":"55FF15D5C4B6","b":"55FF14869000","o":"14F34B6","s":"_ZNSt17_Function_handlerIFvvEZN5mongo19ServiceStateMachine22_scheduleNextWithGuardENS2_11ThreadGuardENS1_9transport15ServiceExecutor13ScheduleFlagsENS4_23ServiceExecutorTaskNameENS2_9OwnershipEEUlvE_E9_M_invokeERKSt9_Any_data","s+":"56"},\{"a":"55FF1702E6A8","b":"55FF14869000","o":"27C56A8","s":"_ZNSt17_Function_handlerIFvvEZN5mongo9transport26ServiceExecutorSynchronous8scheduleESt8functionIS0_ENS2_15ServiceExecutor13ScheduleFlagsENS2_23ServiceExecutorTaskNameEEUlvE0_E9_M_invokeERKSt9_Any_data","s+":"B8"},\{"a":"55FF173828A6","b":"55FF14869000","o":"2B198A6","s":"_ZNSt17_Function_handlerIFvvEZN5mongo25launchServiceWorkerThreadESt8functionIS0_EEUlvE1_E9_M_invokeERKSt9_Any_data","s+":"56"},\{"a":"55FF17382914","b":"55FF14869000","o":"2B19914","s":"_ZN5mongo12_GLOBAL__N_17runFuncEPv","s+":"14"},\{"a":"7F938DF67609","b":"7F938DF5E000","o":"9609","s":"start_thread","s+":"D9"},\{"a":"7F938DE8E293","b":"7F938DD6C000","o":"122293","s":"clone","s+":"43"}],"processInfo":\{"mongodbVersion":"4.4.1","gitVersion":"ad91a93a5a31e175f5cbf8c69561e788bbc55ce1","compiledModules":["enterprise"],"uname":{"sysname":"Linux","release":"5.4.0-48-generic","version":"#52-Ubuntu SMP Thu Sep 10 10:58:49 UTC 2020","machine":"x86_64"},"somap":[\{"b":"55FF14869000","elfType":3,"buildId":"475B9C841EE61B5065AA30B170D19A8F7C3680D8"},\{"b":"7F938DF5E000","path":"/lib/x86_64-linux-gnu/libpthread.so.0","elfType":3,"buildId":"4FC5FC33F4429136A494C640B113D76F610E4ABC"},\{"b":"7F938DD6C000","path":"/lib/x86_64-linux-gnu/libc.so.6","elfType":3,"buildId":"F3FF3FDA80B817C464A56EED59FF09DC864EAEB0"}]}}}}
 
{"t":\{"$date":"2020-10-05T22:25:47.177+00:00"},"s":"I",  "c":"CONTROL",  "id":31427,   "ctx":"conn503635","msg":"  Frame: \{frame}","attr":\{"frame":{"a":"55FF176683B1","b":"55FF14869000","o":"2DFF3B1","s":"_ZN5mongo18stack_trace_detail12_GLOBAL__N_119printStackTraceImplERKNS1_7OptionsEPNS_14StackTraceSinkE.constprop.606","s+":"1E1"}}}
 
{"t":\{"$date":"2020-10-05T22:25:47.177+00:00"},"s":"I",  "c":"CONTROL",  "id":31427,   "ctx":"conn503635","msg":"  Frame: \{frame}","attr":\{"frame":{"a":"55FF176699E9","b":"55FF14869000","o":"2E009E9","s":"_ZN5mongo15printStackTraceEv","s+":"29"}}}
 
{"t":\{"$date":"2020-10-05T22:25:47.177+00:00"},"s":"I",  "c":"CONTROL",  "id":31427,   "ctx":"conn503635","msg":"  Frame: \{frame}","attr":\{"frame":{"a":"55FF17667246","b":"55FF14869000","o":"2DFE246","s":"_ZN5mongo12_GLOBAL__N_116abruptQuitActionEiP9siginfo_tPv","s+":"66"}}}
 
{"t":\{"$date":"2020-10-05T22:25:47.177+00:00"},"s":"I",  "c":"CONTROL",  "id":31427,   "ctx":"conn503635","msg":"  Frame: \{frame}","attr":\{"frame":{"a":"7F938DF733C0","b":"7F938DF5E000","o":"153C0","s":"funlockfile","s+":"60"}}}
 
{"t":\{"$date":"2020-10-05T22:25:47.177+00:00"},"s":"I",  "c":"CONTROL",  "id":31427,   "ctx":"conn503635","msg":"  Frame: \{frame}","attr":\{"frame":{"a":"7F938DDB218B","b":"7F938DD6C000","o":"4618B","s":"gsignal","s+":"CB"}}}
 
{"t":\{"$date":"2020-10-05T22:25:47.177+00:00"},"s":"I",  "c":"CONTROL",  "id":31427,   "ctx":"conn503635","msg":"  Frame: \{frame}","attr":\{"frame":{"a":"7F938DD91859","b":"7F938DD6C000","o":"25859","s":"abort","s+":"12B"}}}
 
{"t":\{"$date":"2020-10-05T22:25:47.177+00:00"},"s":"I",  "c":"CONTROL",  "id":31427,   "ctx":"conn503635","msg":"  Frame: \{frame}","attr":\{"frame":{"a":"55FF157CCEE1","b":"55FF14869000","o":"F63EE1","s":"_ZN5mongo17invariantOKFailedEPKcRKNS_6StatusES1_j","s+":"179"}}}
 
{"t":\{"$date":"2020-10-05T22:25:47.177+00:00"},"s":"I",  "c":"CONTROL",  "id":31427,   "ctx":"conn503635","msg":"  Frame: \{frame}","attr":\{"frame":{"a":"55FF154EEB5D","b":"55FF14869000","o":"C85B5D","s":"_ZN5mongo21WiredTigerRecordStore12updateRecordEPNS_16OperationContextERKNS_8RecordIdEPKci.cold.1749","s+":"99"}}}
 
{"t":\{"$date":"2020-10-05T22:25:47.177+00:00"},"s":"I",  "c":"CONTROL",  "id":31427,   "ctx":"conn503635","msg":"  Frame: \{frame}","attr":\{"frame":{"a":"55FF161F4E5F","b":"55FF14869000","o":"198BE5F","s":"_ZN5mongo14CollectionImpl14updateDocumentEPNS_16OperationContextENS_8RecordIdERKNS_11SnapshottedINS_7BSONObjEEERKS5_bPNS_7OpDebugEPNS_20CollectionUpdateArgsE","s+":"1DF"}}}
 
{"t":\{"$date":"2020-10-05T22:25:47.177+00:00"},"s":"I",  "c":"CONTROL",  "id":31427,   "ctx":"conn503635","msg":"  Frame: \{frame}","attr":\{"frame":{"a":"55FF1639A4BF","b":"55FF14869000","o":"1B314BF","s":"_ZN5mongo11UpdateStage18transformAndUpdateERKNS_11SnapshottedINS_7BSONObjEEERNS_8RecordIdE","s+":"A3F"}}}
 
{"t":\{"$date":"2020-10-05T22:25:47.177+00:00"},"s":"I",  "c":"CONTROL",  "id":31427,   "ctx":"conn503635","msg":"  Frame: \{frame}","attr":\{"frame":{"a":"55FF1639AEC1","b":"55FF14869000","o":"1B31EC1","s":"_ZN5mongo11UpdateStage6doWorkEPm","s+":"361"}}}
 
{"t":\{"$date":"2020-10-05T22:25:47.177+00:00"},"s":"I",  "c":"CONTROL",  "id":31427,   "ctx":"conn503635","msg":"  Frame: \{frame}","attr":\{"frame":{"a":"55FF1639D06B","b":"55FF14869000","o":"1B3406B","s":"_ZN5mongo11UpsertStage6doWorkEPm","s+":"7B"}}}
 
{"t":\{"$date":"2020-10-05T22:25:47.177+00:00"},"s":"I",  "c":"CONTROL",  "id":31427,   "ctx":"conn503635","msg":"  Frame: \{frame}","attr":\{"frame":{"a":"55FF1637FBA8","b":"55FF14869000","o":"1B16BA8","s":"_ZN5mongo9PlanStage4workEPm","s+":"68"}}}
 
{"t":\{"$date":"2020-10-05T22:25:47.177+00:00"},"s":"I",  "c":"CONTROL",  "id":31427,   "ctx":"conn503635","msg":"  Frame: \{frame}","attr":\{"frame":{"a":"55FF163C41C2","b":"55FF14869000","o":"1B5B1C2","s":"_ZN5mongo16PlanExecutorImpl12_getNextImplEPNS_11SnapshottedINS_8DocumentEEEPNS_8RecordIdE","s+":"222"}}}
 
{"t":\{"$date":"2020-10-05T22:25:47.177+00:00"},"s":"I",  "c":"CONTROL",  "id":31427,   "ctx":"conn503635","msg":"  Frame: \{frame}","attr":\{"frame":{"a":"55FF163C4C0B","b":"55FF14869000","o":"1B5BC0B","s":"_ZN5mongo16PlanExecutorImpl7getNextEPNS_8DocumentEPNS_8RecordIdE","s+":"4B"}}}
 
{"t":\{"$date":"2020-10-05T22:25:47.177+00:00"},"s":"I",  "c":"CONTROL",  "id":31427,   "ctx":"conn503635","msg":"  Frame: \{frame}","attr":\{"frame":{"a":"55FF163C4D4D","b":"55FF14869000","o":"1B5BD4D","s":"_ZN5mongo16PlanExecutorImpl11executePlanEv","s+":"4D"}}}
 
{"t":\{"$date":"2020-10-05T22:25:47.177+00:00"},"s":"I",  "c":"CONTROL",  "id":31427,   "ctx":"conn503635","msg":"  Frame: \{frame}","attr":\{"frame":{"a":"55FF160E8E80","b":"55FF14869000","o":"187FE80","s":"_ZN5mongo14performUpdatesEPNS_16OperationContextERKNS_9write_ops6UpdateE","s+":"E60"}}}
 
{"t":\{"$date":"2020-10-05T22:25:47.177+00:00"},"s":"I",  "c":"CONTROL",  "id":31427,   "ctx":"conn503635","msg":"  Frame: \{frame}","attr":\{"frame":{"a":"55FF1604E3A6","b":"55FF14869000","o":"17E53A6","s":"_ZNK5mongo12_GLOBAL__N_19CmdUpdate10Invocation7runImplEPNS_16OperationContextERNS_14BSONObjBuilderE","s+":"46"}}}
 
{"t":\{"$date":"2020-10-05T22:25:47.177+00:00"},"s":"I",  "c":"CONTROL",  "id":31427,   "ctx":"conn503635","msg":"  Frame: \{frame}","attr":\{"frame":{"a":"55FF1604BEED","b":"55FF14869000","o":"17E2EED","s":"_ZN5mongo12_GLOBAL__N_112WriteCommand14InvocationBase3runEPNS_16OperationContextEPNS_3rpc21ReplyBuilderInterfaceE","s+":"15D"}}}
 
{"t":\{"$date":"2020-10-05T22:25:47.177+00:00"},"s":"I",  "c":"CONTROL",  "id":31427,   "ctx":"conn503635","msg":"  Frame: \{frame}","attr":\{"frame":{"a":"55FF165AA59F","b":"55FF14869000","o":"1D4159F","s":"_ZN5mongo14CommandHelpers20runCommandInvocationEPNS_16OperationContextERKNS_12OpMsgRequestEPNS_17CommandInvocationEPNS_3rpc21ReplyBuilderInterfaceE","s+":"7F"}}}
 
{"t":\{"$date":"2020-10-05T22:25:47.177+00:00"},"s":"I",  "c":"CONTROL",  "id":31427,   "ctx":"conn503635","msg":"  Frame: \{frame}","attr":\{"frame":{"a":"55FF15D5FA81","b":"55FF14869000","o":"14F6A81","s":"_ZN5mongo12_GLOBAL__N_127invokeWithSessionCheckedOutEPNS_16OperationContextERKNS_12OpMsgRequestEPNS_17CommandInvocationERKNS_30OperationSessionInfoFromClientEPNS_3rpc21ReplyBuilderInterfaceE","s+":"221"}}}
 
{"t":\{"$date":"2020-10-05T22:25:47.177+00:00"},"s":"I",  "c":"CONTROL",  "id":31427,   "ctx":"conn503635","msg":"  Frame: \{frame}","attr":\{"frame":{"a":"55FF15D609F8","b":"55FF14869000","o":"14F79F8","s":"_ZN5mongo12_GLOBAL__N_114runCommandImplEPNS_16OperationContextEPNS_17CommandInvocationERKNS_12OpMsgRequestEPNS_3rpc21ReplyBuilderInterfaceENS_11LogicalTimeERKNS_23ServiceEntryPointCommon5HooksEPNS_14BSONObjBuilderERKNS_30OperationSessionInfoFromClientE","s+":"7C8"}}}
 
{"t":\{"$date":"2020-10-05T22:25:47.177+00:00"},"s":"I",  "c":"CONTROL",  "id":31427,   "ctx":"conn503635","msg":"  Frame: \{frame}","attr":\{"frame":{"a":"55FF15D631C9","b":"55FF14869000","o":"14FA1C9","s":"_ZN5mongo12_GLOBAL__N_119execCommandDatabaseEPNS_16OperationContextEPNS_7CommandERKNS_12OpMsgRequestEPNS_3rpc21ReplyBuilderInterfaceERKNS_23ServiceEntryPointCommon5HooksE","s+":"11B9"}}}
 
{"t":\{"$date":"2020-10-05T22:25:47.177+00:00"},"s":"I",  "c":"CONTROL",  "id":31427,   "ctx":"conn503635","msg":"  Frame: \{frame}","attr":\{"frame":{"a":"55FF15D6448B","b":"55FF14869000","o":"14FB48B","s":"_ZN5mongo12_GLOBAL__N_116receivedCommandsEPNS_16OperationContextERKNS_7MessageERKNS_23ServiceEntryPointCommon5HooksE","s+":"62B"}}}
 
{"t":\{"$date":"2020-10-05T22:25:47.177+00:00"},"s":"I",  "c":"CONTROL",  "id":31427,   "ctx":"conn503635","msg":"  Frame: \{frame}","attr":\{"frame":{"a":"55FF15D650CD","b":"55FF14869000","o":"14FC0CD","s":"_ZN5mongo23ServiceEntryPointCommon13handleRequestEPNS_16OperationContextERKNS_7MessageERKNS0_5HooksE","s+":"50D"}}}
 
{"t":\{"$date":"2020-10-05T22:25:47.177+00:00"},"s":"I",  "c":"CONTROL",  "id":31427,   "ctx":"conn503635","msg":"  Frame: \{frame}","attr":\{"frame":{"a":"55FF15D5314C","b":"55FF14869000","o":"14EA14C","s":"_ZN5mongo23ServiceEntryPointMongod13handleRequestEPNS_16OperationContextERKNS_7MessageE","s+":"3C"}}}
 
{"t":\{"$date":"2020-10-05T22:25:47.177+00:00"},"s":"I",  "c":"CONTROL",  "id":31427,   "ctx":"conn503635","msg":"  Frame: \{frame}","attr":\{"frame":{"a":"55FF15D5D65C","b":"55FF14869000","o":"14F465C","s":"_ZN5mongo19ServiceStateMachine15_processMessageENS0_11ThreadGuardE","s+":"FC"}}}
 
{"t":\{"$date":"2020-10-05T22:25:47.177+00:00"},"s":"I",  "c":"CONTROL",  "id":31427,   "ctx":"conn503635","msg":"  Frame: \{frame}","attr":\{"frame":{"a":"55FF15D5B505","b":"55FF14869000","o":"14F2505","s":"_ZN5mongo19ServiceStateMachine15_runNextInGuardENS0_11ThreadGuardE","s+":"125"}}}
 
{"t":\{"$date":"2020-10-05T22:25:47.177+00:00"},"s":"I",  "c":"CONTROL",  "id":31427,   "ctx":"conn503635","msg":"  Frame: \{frame}","attr":\{"frame":{"a":"55FF15D5C4B6","b":"55FF14869000","o":"14F34B6","s":"_ZNSt17_Function_handlerIFvvEZN5mongo19ServiceStateMachine22_scheduleNextWithGuardENS2_11ThreadGuardENS1_9transport15ServiceExecutor13ScheduleFlagsENS4_23ServiceExecutorTaskNameENS2_9OwnershipEEUlvE_E9_M_invokeERKSt9_Any_data","s+":"56"}}}
 
{"t":\{"$date":"2020-10-05T22:25:47.177+00:00"},"s":"I",  "c":"CONTROL",  "id":31427,   "ctx":"conn503635","msg":"  Frame: \{frame}","attr":\{"frame":{"a":"55FF1702E042","b":"55FF14869000","o":"27C5042","s":"_ZN5mongo9transport26ServiceExecutorSynchronous8scheduleESt8functionIFvvEENS0_15ServiceExecutor13ScheduleFlagsENS0_23ServiceExecutorTaskNameE","s+":"182"}}}
 
{"t":\{"$date":"2020-10-05T22:25:47.177+00:00"},"s":"I",  "c":"CONTROL",  "id":31427,   "ctx":"conn503635","msg":"  Frame: \{frame}","attr":\{"frame":{"a":"55FF15D5A91B","b":"55FF14869000","o":"14F191B","s":"_ZN5mongo19ServiceStateMachine22_scheduleNextWithGuardENS0_11ThreadGuardENS_9transport15ServiceExecutor13ScheduleFlagsENS2_23ServiceExecutorTaskNameENS0_9OwnershipE","s+":"DB"}}}
 
{"t":\{"$date":"2020-10-05T22:25:47.177+00:00"},"s":"I",  "c":"CONTROL",  "id":31427,   "ctx":"conn503635","msg":"  Frame: \{frame}","attr":\{"frame":{"a":"55FF15D5BC7D","b":"55FF14869000","o":"14F2C7D","s":"_ZN5mongo19ServiceStateMachine15_sourceCallbackENS_6StatusE","s+":"6AD"}}}
 
{"t":\{"$date":"2020-10-05T22:25:47.177+00:00"},"s":"I",  "c":"CONTROL",  "id":31427,   "ctx":"conn503635","msg":"  Frame: \{frame}","attr":\{"frame":{"a":"55FF15D5BD60","b":"55FF14869000","o":"14F2D60","s":"_ZN5mongo14future_details4callIRZNS_19ServiceStateMachine14_sourceMessageENS2_11ThreadGuardEEUlNS_10StatusWithINS_7MessageEEEE0_S6_EEDaOT_OT0_","s+":"60"}}}
 
{"t":\{"$date":"2020-10-05T22:25:47.177+00:00"},"s":"I",  "c":"CONTROL",  "id":31427,   "ctx":"conn503635","msg":"  Frame: \{frame}","attr":\{"frame":{"a":"55FF15D5BFE5","b":"55FF14869000","o":"14F2FE5","s":"_ZN5mongo19ServiceStateMachine14_sourceMessageENS0_11ThreadGuardE","s+":"145"}}}
 
{"t":\{"$date":"2020-10-05T22:25:47.177+00:00"},"s":"I",  "c":"CONTROL",  "id":31427,   "ctx":"conn503635","msg":"  Frame: \{frame}","attr":\{"frame":{"a":"55FF15D5B57A","b":"55FF14869000","o":"14F257A","s":"_ZN5mongo19ServiceStateMachine15_runNextInGuardENS0_11ThreadGuardE","s+":"19A"}}}
 
{"t":\{"$date":"2020-10-05T22:25:47.177+00:00"},"s":"I",  "c":"CONTROL",  "id":31427,   "ctx":"conn503635","msg":"  Frame: \{frame}","attr":\{"frame":{"a":"55FF15D5C4B6","b":"55FF14869000","o":"14F34B6","s":"_ZNSt17_Function_handlerIFvvEZN5mongo19ServiceStateMachine22_scheduleNextWithGuardENS2_11ThreadGuardENS1_9transport15ServiceExecutor13ScheduleFlagsENS4_23ServiceExecutorTaskNameENS2_9OwnershipEEUlvE_E9_M_invokeERKSt9_Any_data","s+":"56"}}}
 
{"t":\{"$date":"2020-10-05T22:25:47.177+00:00"},"s":"I",  "c":"CONTROL",  "id":31427,   "ctx":"conn503635","msg":"  Frame: \{frame}","attr":\{"frame":{"a":"55FF1702E6A8","b":"55FF14869000","o":"27C56A8","s":"_ZNSt17_Function_handlerIFvvEZN5mongo9transport26ServiceExecutorSynchronous8scheduleESt8functionIS0_ENS2_15ServiceExecutor13ScheduleFlagsENS2_23ServiceExecutorTaskNameEEUlvE0_E9_M_invokeERKSt9_Any_data","s+":"B8"}}}
 
{"t":\{"$date":"2020-10-05T22:25:47.177+00:00"},"s":"I",  "c":"CONTROL",  "id":31427,   "ctx":"conn503635","msg":"  Frame: \{frame}","attr":\{"frame":{"a":"55FF173828A6","b":"55FF14869000","o":"2B198A6","s":"_ZNSt17_Function_handlerIFvvEZN5mongo25launchServiceWorkerThreadESt8functionIS0_EEUlvE1_E9_M_invokeERKSt9_Any_data","s+":"56"}}}
 
{"t":\{"$date":"2020-10-05T22:25:47.177+00:00"},"s":"I",  "c":"CONTROL",  "id":31427,   "ctx":"conn503635","msg":"  Frame: \{frame}","attr":\{"frame":{"a":"55FF17382914","b":"55FF14869000","o":"2B19914","s":"_ZN5mongo12_GLOBAL__N_17runFuncEPv","s+":"14"}}}
 
{"t":\{"$date":"2020-10-05T22:25:47.177+00:00"},"s":"I",  "c":"CONTROL",  "id":31427,   "ctx":"conn503635","msg":"  Frame: \{frame}","attr":\{"frame":{"a":"7F938DF67609","b":"7F938DF5E000","o":"9609","s":"start_thread","s+":"D9"}}}
 
{"t":\{"$date":"2020-10-05T22:25:47.177+00:00"},"s":"I",  "c":"CONTROL",  "id":31427,   "ctx":"conn503635","msg":"  Frame: \{frame}","attr":\{"frame":{"a":"7F938DE8E293","b":"7F938DD6C000","o":"122293","s":"clone","s+":"43"}}}



 Comments   
Comment by Le Rela [ 01/Feb/21 ]

Hi Jon,

On our side the problem has not reappeared since the upgrade to 4.4.2.

Best

Comment by Jonathan Streets (Inactive) [ 01/Feb/21 ]

Hi michael.moore@jhuapl.edu and lerela.mongo@lio.re

We haven’t heard back from you for some time, so I’m going to close this ticket. If this is still an issue for you, please provide additional information and we will reopen the ticket.

Regards,

Jon

Comment by Le Rela [ 29/Dec/20 ]

We've encountered this issue 3 times in 4 weeks. We've upgraded to 4.4.2 and will let you know if this happens again.

Comment by Alexander Gorrod [ 29/Dec/20 ]

Hi lerela.mongo@lio.re - I'm sorry to hear that you are encountering this issue.

Have you seen the failure more than once? If so about how often is it happening in your environment?

We have not heard any reports of customers running 4.4.2 encountering this issue - my recommendation is that you upgrade. I'd appreciate it if you can let us know if you do encounter the issue when running 4.4.2.

Comment by Luke Pearson [ 08/Nov/20 ]

Hi michael.moore@jhuapl.edu.

I'm sorry to hear you're experiencing this crash even more frequently now. Would you be able to upload the diagnostic data you have from the instance to this location? support uploader. Which is found in $dbpath/diagnostic.data. Files uploaded to this portal are visible only to MongoDB employees and are routinely deleted after some time.

Given the increased frequency of the crash, I'm interested in whether a consistent pattern can be seen in the diagnostic data.

Thanks,
Luke

Comment by Michael Moore [ 06/Nov/20 ]

Thanks!

It's now crashing every few minutes.  Not sure if it's the same stack trace.  I've also removed it from a replica set, so this is a standalone machine now.

{"t":{"$date":"2020-11-06T18:42:22.854+00:00"},"s":"I",  "c":"CONTROL",  "id":31431,   "ctx":"conn963","msg":"BACKTRACE: {bt}","attr":{"bt":{"backtrace":[{"a":"55E1F66F53B1","b":"55E1F38F6000","o":"2DFF3B1","s":"_ZN5mongo18stack_trace_detail12_GLOBAL__N_119printStackTraceImplERKNS1_7OptionsEPNS_14StackTraceSinkE.constprop.606","s+":"1E1"},{"a":"55E1F66F69E9","b":"55E1F38F6000","o":"2E009E9","s":"_ZN5mongo15printStackTraceEv","s+":"29"},{"a":"55E1F66F4246","b":"55E1F38F6000","o":"2DFE246","s":"_ZN5mongo12_GLOBAL__N_116abruptQuitActionEiP9siginfo_tPv","s+":"66"},{"a":"7FDD11C0A3C0","b":"7FDD11BF5000","o":"153C0","s":"funlockfile","s+":"60"},{"a":"7FDD11A4918B","b":"7FDD11A03000","o":"4618B","s":"gsignal","s+":"CB"},{"a":"7FDD11A28859","b":"7FDD11A03000","o":"25859","s":"abort","s+":"12B"},{"a":"55E1F4859EE1","b":"55E1F38F6000","o":"F63EE1","s":"_ZN5mongo17invariantOKFailedEPKcRKNS_6StatusES1_j","s+":"179"},{"a":"55E1F457BB5D","b":"55E1F38F6000","o":"C85B5D","s":"_ZN5mongo21WiredTigerRecordStore12updateRecordEPNS_16OperationContextERKNS_8RecordIdEPKci.cold.1749","s+":"99"},{"a":"55E1F5281E5F","b":"55E1F38F6000","o":"198BE5F","s":"_ZN5mongo14CollectionImpl14updateDocumentEPNS_16OperationContextENS_8RecordIdERKNS_11SnapshottedINS_7BSONObjEEERKS5_bPNS_7OpDebugEPNS_20CollectionUpdateArgsE","s+":"1DF"},{"a":"55E1F54274BF","b":"55E1F38F6000","o":"1B314BF","s":"_ZN5mongo11UpdateStage18transformAndUpdateERKNS_11SnapshottedINS_7BSONObjEEERNS_8RecordIdE","s+":"A3F"},{"a":"55E1F5427EC1","b":"55E1F38F6000","o":"1B31EC1","s":"_ZN5mongo11UpdateStage6doWorkEPm","s+":"361"},{"a":"55E1F540CBA8","b":"55E1F38F6000","o":"1B16BA8","s":"_ZN5mongo9PlanStage4workEPm","s+":"68"},{"a":"55E1F54511C2","b":"55E1F38F6000","o":"1B5B1C2","s":"_ZN5mongo16PlanExecutorImpl12_getNextImplEPNS_11SnapshottedINS_8DocumentEEEPNS_8RecordIdE","s+":"222"},{"a":"55E1F5451C0B","b":"55E1F38F6000","o":"1B5BC0B","s":"_ZN5mongo16PlanExecutorImpl7getNextEPNS_8DocumentEPNS_8RecordIdE","s+":"4B"},{"a":"55E1F5451D4D","b":"55E1F38F6000","o":"1B5BD4D","s":"_ZN5mongo16PlanExecutorImpl11executePlanEv","s+":"4D"},{"a":"55E1F5175E80","b":"55E1F38F6000","o":"187FE80","s":"_ZN5mongo14performUpdatesEPNS_16OperationContextERKNS_9write_ops6UpdateE","s+":"E60"},{"a":"55E1F50DB3A6","b":"55E1F38F6000","o":"17E53A6","s":"_ZNK5mongo12_GLOBAL__N_19CmdUpdate10Invocation7runImplEPNS_16OperationContextERNS_14BSONObjBuilderE","s+":"46"},{"a":"55E1F50D8EED","b":"55E1F38F6000","o":"17E2EED","s":"_ZN5mongo12_GLOBAL__N_112WriteCommand14InvocationBase3runEPNS_16OperationContextEPNS_3rpc21ReplyBuilderInterfaceE","s+":"15D"},{"a":"55E1F563759F","b":"55E1F38F6000","o":"1D4159F","s":"_ZN5mongo14CommandHelpers20runCommandInvocationEPNS_16OperationContextERKNS_12OpMsgRequestEPNS_17CommandInvocationEPNS_3rpc21ReplyBuilderInterfaceE","s+":"7F"},{"a":"55E1F4DEDC1F","b":"55E1F38F6000","o":"14F7C1F","s":"_ZN5mongo12_GLOBAL__N_114runCommandImplEPNS_16OperationContextEPNS_17CommandInvocationERKNS_12OpMsgRequestEPNS_3rpc21ReplyBuilderInterfaceENS_11LogicalTimeERKNS_23ServiceEntryPointCommon5HooksEPNS_14BSONObjBuilderERKNS_30OperationSessionInfoFromClientE","s+":"9EF"},{"a":"55E1F4DF01C9","b":"55E1F38F6000","o":"14FA1C9","s":"_ZN5mongo12_GLOBAL__N_119execCommandDatabaseEPNS_16OperationContextEPNS_7CommandERKNS_12OpMsgRequestEPNS_3rpc21ReplyBuilderInterfaceERKNS_23ServiceEntryPointCommon5HooksE","s+":"11B9"},{"a":"55E1F4DF148B","b":"55E1F38F6000","o":"14FB48B","s":"_ZN5mongo12_GLOBAL__N_116receivedCommandsEPNS_16OperationContextERKNS_7MessageERKNS_23ServiceEntryPointCommon5HooksE","s+":"62B"},{"a":"55E1F4DF20CD","b":"55E1F38F6000","o":"14FC0CD","s":"_ZN5mongo23ServiceEntryPointCommon13handleRequestEPNS_16OperationContextERKNS_7MessageERKNS0_5HooksE","s+":"50D"},{"a":"55E1F4DE014C","b":"55E1F38F6000","o":"14EA14C","s":"_ZN5mongo23ServiceEntryPointMongod13handleRequestEPNS_16OperationContextERKNS_7MessageE","s+":"3C"},{"a":"55E1F4DEA65C","b":"55E1F38F6000","o":"14F465C","s":"_ZN5mongo19ServiceStateMachine15_processMessageENS0_11ThreadGuardE","s+":"FC"},{"a":"55E1F4DE8505","b":"55E1F38F6000","o":"14F2505","s":"_ZN5mongo19ServiceStateMachine15_runNextInGuardENS0_11ThreadGuardE","s+":"125"},{"a":"55E1F4DE94B6","b":"55E1F38F6000","o":"14F34B6","s":"_ZNSt17_Function_handlerIFvvEZN5mongo19ServiceStateMachine22_scheduleNextWithGuardENS2_11ThreadGuardENS1_9transport15ServiceExecutor13ScheduleFlagsENS4_23ServiceExecutorTaskNameENS2_9OwnershipEEUlvE_E9_M_invokeERKSt9_Any_data","s+":"56"},{"a":"55E1F60BB042","b":"55E1F38F6000","o":"27C5042","s":"_ZN5mongo9transport26ServiceExecutorSynchronous8scheduleESt8functionIFvvEENS0_15ServiceExecutor13ScheduleFlagsENS0_23ServiceExecutorTaskNameE","s+":"182"},{"a":"55E1F4DE791B","b":"55E1F38F6000","o":"14F191B","s":"_ZN5mongo19ServiceStateMachine22_scheduleNextWithGuardENS0_11ThreadGuardENS_9transport15ServiceExecutor13ScheduleFlagsENS2_23ServiceExecutorTaskNameENS0_9OwnershipE","s+":"DB"},{"a":"55E1F4DE8C7D","b":"55E1F38F6000","o":"14F2C7D","s":"_ZN5mongo19ServiceStateMachine15_sourceCallbackENS_6StatusE","s+":"6AD"},{"a":"55E1F4DE8D60","b":"55E1F38F6000","o":"14F2D60","s":"_ZN5mongo14future_details4callIRZNS_19ServiceStateMachine14_sourceMessageENS2_11ThreadGuardEEUlNS_10StatusWithINS_7MessageEEEE0_S6_EEDaOT_OT0_","s+":"60"},{"a":"55E1F4DE8FE5","b":"55E1F38F6000","o":"14F2FE5","s":"_ZN5mongo19ServiceStateMachine14_sourceMessageENS0_11ThreadGuardE","s+":"145"},{"a":"55E1F4DE857A","b":"55E1F38F6000","o":"14F257A","s":"_ZN5mongo19ServiceStateMachine15_runNextInGuardENS0_11ThreadGuardE","s+":"19A"},{"a":"55E1F4DE94B6","b":"55E1F38F6000","o":"14F34B6","s":"_ZNSt17_Function_handlerIFvvEZN5mongo19ServiceStateMachine22_scheduleNextWithGuardENS2_11ThreadGuardENS1_9transport15ServiceExecutor13ScheduleFlagsENS4_23ServiceExecutorTaskNameENS2_9OwnershipEEUlvE_E9_M_invokeERKSt9_Any_data","s+":"56"},{"a":"55E1F60BB6A8","b":"55E1F38F6000","o":"27C56A8","s":"_ZNSt17_Function_handlerIFvvEZN5mongo9transport26ServiceExecutorSynchronous8scheduleESt8functionIS0_ENS2_15ServiceExecutor13ScheduleFlagsENS2_23ServiceExecutorTaskNameEEUlvE0_E9_M_invokeERKSt9_Any_data","s+":"B8"},{"a":"55E1F640F8A6","b":"55E1F38F6000","o":"2B198A6","s":"_ZNSt17_Function_handlerIFvvEZN5mongo25launchServiceWorkerThreadESt8functionIS0_EEUlvE1_E9_M_invokeERKSt9_Any_data","s+":"56"},{"a":"55E1F640F914","b":"55E1F38F6000","o":"2B19914","s":"_ZN5mongo12_GLOBAL__N_17runFuncEPv","s+":"14"},{"a":"7FDD11BFE609","b":"7FDD11BF5000","o":"9609","s":"start_thread","s+":"D9"},{"a":"7FDD11B25293","b":"7FDD11A03000","o":"122293","s":"clone","s+":"43"}],"processInfo":{"mongodbVersion":"4.4.1","gitVersion":"ad91a93a5a31e175f5cbf8c69561e788bbc55ce1","compiledModules":["enterprise"],"uname":{"sysname":"Linux","release":"5.4.0-52-generic","version":"#57-Ubuntu SMP Thu Oct 15 10:57:00 UTC 2020","machine":"x86_64"},"somap":[{"b":"55E1F38F6000","elfType":3,"buildId":"475B9C841EE61B5065AA30B170D19A8F7C3680D8"},{"b":"7FDD11BF5000","path":"/lib/x86_64-linux-gnu/libpthread.so.0","elfType":3,"buildId":"4FC5FC33F4429136A494C640B113D76F610E4ABC"},{"b":"7FDD11A03000","path":"/lib/x86_64-linux-gnu/libc.so.6","elfType":3,"buildId":"F3FF3FDA80B817C464A56EED59FF09DC864EAEB0"}]}}}}
{"t":{"$date":"2020-11-06T18:42:22.854+00:00"},"s":"I",  "c":"CONTROL",  "id":31427,   "ctx":"conn963","msg":"  Frame: {frame}","attr":{"frame":{"a":"55E1F66F53B1","b":"55E1F38F6000","o":"2DFF3B1","s":"_ZN5mongo18stack_trace_detail12_GLOBAL__N_119printStackTraceImplERKNS1_7OptionsEPNS_14StackTraceSinkE.constprop.606","s+":"1E1"}}}
{"t":{"$date":"2020-11-06T18:42:22.854+00:00"},"s":"I",  "c":"CONTROL",  "id":31427,   "ctx":"conn963","msg":"  Frame: {frame}","attr":{"frame":{"a":"55E1F66F69E9","b":"55E1F38F6000","o":"2E009E9","s":"_ZN5mongo15printStackTraceEv","s+":"29"}}}
{"t":{"$date":"2020-11-06T18:42:22.854+00:00"},"s":"I",  "c":"CONTROL",  "id":31427,   "ctx":"conn963","msg":"  Frame: {frame}","attr":{"frame":{"a":"55E1F66F4246","b":"55E1F38F6000","o":"2DFE246","s":"_ZN5mongo12_GLOBAL__N_116abruptQuitActionEiP9siginfo_tPv","s+":"66"}}}
{"t":{"$date":"2020-11-06T18:42:22.854+00:00"},"s":"I",  "c":"CONTROL",  "id":31427,   "ctx":"conn963","msg":"  Frame: {frame}","attr":{"frame":{"a":"7FDD11C0A3C0","b":"7FDD11BF5000","o":"153C0","s":"funlockfile","s+":"60"}}}
{"t":{"$date":"2020-11-06T18:42:22.854+00:00"},"s":"I",  "c":"CONTROL",  "id":31427,   "ctx":"conn963","msg":"  Frame: {frame}","attr":{"frame":{"a":"7FDD11A4918B","b":"7FDD11A03000","o":"4618B","s":"gsignal","s+":"CB"}}}
{"t":{"$date":"2020-11-06T18:42:22.854+00:00"},"s":"I",  "c":"CONTROL",  "id":31427,   "ctx":"conn963","msg":"  Frame: {frame}","attr":{"frame":{"a":"7FDD11A28859","b":"7FDD11A03000","o":"25859","s":"abort","s+":"12B"}}}
{"t":{"$date":"2020-11-06T18:42:22.854+00:00"},"s":"I",  "c":"CONTROL",  "id":31427,   "ctx":"conn963","msg":"  Frame: {frame}","attr":{"frame":{"a":"55E1F4859EE1","b":"55E1F38F6000","o":"F63EE1","s":"_ZN5mongo17invariantOKFailedEPKcRKNS_6StatusES1_j","s+":"179"}}}
{"t":{"$date":"2020-11-06T18:42:22.854+00:00"},"s":"I",  "c":"CONTROL",  "id":31427,   "ctx":"conn963","msg":"  Frame: {frame}","attr":{"frame":{"a":"55E1F457BB5D","b":"55E1F38F6000","o":"C85B5D","s":"_ZN5mongo21WiredTigerRecordStore12updateRecordEPNS_16OperationContextERKNS_8RecordIdEPKci.cold.1749","s+":"99"}}}
{"t":{"$date":"2020-11-06T18:42:22.854+00:00"},"s":"I",  "c":"CONTROL",  "id":31427,   "ctx":"conn963","msg":"  Frame: {frame}","attr":{"frame":{"a":"55E1F5281E5F","b":"55E1F38F6000","o":"198BE5F","s":"_ZN5mongo14CollectionImpl14updateDocumentEPNS_16OperationContextENS_8RecordIdERKNS_11SnapshottedINS_7BSONObjEEERKS5_bPNS_7OpDebugEPNS_20CollectionUpdateArgsE","s+":"1DF"}}}
{"t":{"$date":"2020-11-06T18:42:22.854+00:00"},"s":"I",  "c":"CONTROL",  "id":31427,   "ctx":"conn963","msg":"  Frame: {frame}","attr":{"frame":{"a":"55E1F54274BF","b":"55E1F38F6000","o":"1B314BF","s":"_ZN5mongo11UpdateStage18transformAndUpdateERKNS_11SnapshottedINS_7BSONObjEEERNS_8RecordIdE","s+":"A3F"}}}
{"t":{"$date":"2020-11-06T18:42:22.854+00:00"},"s":"I",  "c":"CONTROL",  "id":31427,   "ctx":"conn963","msg":"  Frame: {frame}","attr":{"frame":{"a":"55E1F5427EC1","b":"55E1F38F6000","o":"1B31EC1","s":"_ZN5mongo11UpdateStage6doWorkEPm","s+":"361"}}}
{"t":{"$date":"2020-11-06T18:42:22.854+00:00"},"s":"I",  "c":"CONTROL",  "id":31427,   "ctx":"conn963","msg":"  Frame: {frame}","attr":{"frame":{"a":"55E1F540CBA8","b":"55E1F38F6000","o":"1B16BA8","s":"_ZN5mongo9PlanStage4workEPm","s+":"68"}}}
{"t":{"$date":"2020-11-06T18:42:22.854+00:00"},"s":"I",  "c":"CONTROL",  "id":31427,   "ctx":"conn963","msg":"  Frame: {frame}","attr":{"frame":{"a":"55E1F54511C2","b":"55E1F38F6000","o":"1B5B1C2","s":"_ZN5mongo16PlanExecutorImpl12_getNextImplEPNS_11SnapshottedINS_8DocumentEEEPNS_8RecordIdE","s+":"222"}}}
{"t":{"$date":"2020-11-06T18:42:22.854+00:00"},"s":"I",  "c":"CONTROL",  "id":31427,   "ctx":"conn963","msg":"  Frame: {frame}","attr":{"frame":{"a":"55E1F5451C0B","b":"55E1F38F6000","o":"1B5BC0B","s":"_ZN5mongo16PlanExecutorImpl7getNextEPNS_8DocumentEPNS_8RecordIdE","s+":"4B"}}}
{"t":{"$date":"2020-11-06T18:42:22.854+00:00"},"s":"I",  "c":"CONTROL",  "id":31427,   "ctx":"conn963","msg":"  Frame: {frame}","attr":{"frame":{"a":"55E1F5451D4D","b":"55E1F38F6000","o":"1B5BD4D","s":"_ZN5mongo16PlanExecutorImpl11executePlanEv","s+":"4D"}}}
{"t":{"$date":"2020-11-06T18:42:22.854+00:00"},"s":"I",  "c":"CONTROL",  "id":31427,   "ctx":"conn963","msg":"  Frame: {frame}","attr":{"frame":{"a":"55E1F5175E80","b":"55E1F38F6000","o":"187FE80","s":"_ZN5mongo14performUpdatesEPNS_16OperationContextERKNS_9write_ops6UpdateE","s+":"E60"}}}
{"t":{"$date":"2020-11-06T18:42:22.854+00:00"},"s":"I",  "c":"CONTROL",  "id":31427,   "ctx":"conn963","msg":"  Frame: {frame}","attr":{"frame":{"a":"55E1F50DB3A6","b":"55E1F38F6000","o":"17E53A6","s":"_ZNK5mongo12_GLOBAL__N_19CmdUpdate10Invocation7runImplEPNS_16OperationContextERNS_14BSONObjBuilderE","s+":"46"}}}
{"t":{"$date":"2020-11-06T18:42:22.854+00:00"},"s":"I",  "c":"CONTROL",  "id":31427,   "ctx":"conn963","msg":"  Frame: {frame}","attr":{"frame":{"a":"55E1F50D8EED","b":"55E1F38F6000","o":"17E2EED","s":"_ZN5mongo12_GLOBAL__N_112WriteCommand14InvocationBase3runEPNS_16OperationContextEPNS_3rpc21ReplyBuilderInterfaceE","s+":"15D"}}}
{"t":{"$date":"2020-11-06T18:42:22.854+00:00"},"s":"I",  "c":"CONTROL",  "id":31427,   "ctx":"conn963","msg":"  Frame: {frame}","attr":{"frame":{"a":"55E1F563759F","b":"55E1F38F6000","o":"1D4159F","s":"_ZN5mongo14CommandHelpers20runCommandInvocationEPNS_16OperationContextERKNS_12OpMsgRequestEPNS_17CommandInvocationEPNS_3rpc21ReplyBuilderInterfaceE","s+":"7F"}}}
{"t":{"$date":"2020-11-06T18:42:22.854+00:00"},"s":"I",  "c":"CONTROL",  "id":31427,   "ctx":"conn963","msg":"  Frame: {frame}","attr":{"frame":{"a":"55E1F4DEDC1F","b":"55E1F38F6000","o":"14F7C1F","s":"_ZN5mongo12_GLOBAL__N_114runCommandImplEPNS_16OperationContextEPNS_17CommandInvocationERKNS_12OpMsgRequestEPNS_3rpc21ReplyBuilderInterfaceENS_11LogicalTimeERKNS_23ServiceEntryPointCommon5HooksEPNS_14BSONObjBuilderERKNS_30OperationSessionInfoFromClientE","s+":"9EF"}}}
{"t":{"$date":"2020-11-06T18:42:22.854+00:00"},"s":"I",  "c":"CONTROL",  "id":31427,   "ctx":"conn963","msg":"  Frame: {frame}","attr":{"frame":{"a":"55E1F4DF01C9","b":"55E1F38F6000","o":"14FA1C9","s":"_ZN5mongo12_GLOBAL__N_119execCommandDatabaseEPNS_16OperationContextEPNS_7CommandERKNS_12OpMsgRequestEPNS_3rpc21ReplyBuilderInterfaceERKNS_23ServiceEntryPointCommon5HooksE","s+":"11B9"}}}
{"t":{"$date":"2020-11-06T18:42:22.854+00:00"},"s":"I",  "c":"CONTROL",  "id":31427,   "ctx":"conn963","msg":"  Frame: {frame}","attr":{"frame":{"a":"55E1F4DF148B","b":"55E1F38F6000","o":"14FB48B","s":"_ZN5mongo12_GLOBAL__N_116receivedCommandsEPNS_16OperationContextERKNS_7MessageERKNS_23ServiceEntryPointCommon5HooksE","s+":"62B"}}}
{"t":{"$date":"2020-11-06T18:42:22.854+00:00"},"s":"I",  "c":"CONTROL",  "id":31427,   "ctx":"conn963","msg":"  Frame: {frame}","attr":{"frame":{"a":"55E1F4DF20CD","b":"55E1F38F6000","o":"14FC0CD","s":"_ZN5mongo23ServiceEntryPointCommon13handleRequestEPNS_16OperationContextERKNS_7MessageERKNS0_5HooksE","s+":"50D"}}}
{"t":{"$date":"2020-11-06T18:42:22.854+00:00"},"s":"I",  "c":"CONTROL",  "id":31427,   "ctx":"conn963","msg":"  Frame: {frame}","attr":{"frame":{"a":"55E1F4DE014C","b":"55E1F38F6000","o":"14EA14C","s":"_ZN5mongo23ServiceEntryPointMongod13handleRequestEPNS_16OperationContextERKNS_7MessageE","s+":"3C"}}}
{"t":{"$date":"2020-11-06T18:42:22.854+00:00"},"s":"I",  "c":"CONTROL",  "id":31427,   "ctx":"conn963","msg":"  Frame: {frame}","attr":{"frame":{"a":"55E1F4DEA65C","b":"55E1F38F6000","o":"14F465C","s":"_ZN5mongo19ServiceStateMachine15_processMessageENS0_11ThreadGuardE","s+":"FC"}}}
{"t":{"$date":"2020-11-06T18:42:22.854+00:00"},"s":"I",  "c":"CONTROL",  "id":31427,   "ctx":"conn963","msg":"  Frame: {frame}","attr":{"frame":{"a":"55E1F4DE8505","b":"55E1F38F6000","o":"14F2505","s":"_ZN5mongo19ServiceStateMachine15_runNextInGuardENS0_11ThreadGuardE","s+":"125"}}}
{"t":{"$date":"2020-11-06T18:42:22.854+00:00"},"s":"I",  "c":"CONTROL",  "id":31427,   "ctx":"conn963","msg":"  Frame: {frame}","attr":{"frame":{"a":"55E1F4DE94B6","b":"55E1F38F6000","o":"14F34B6","s":"_ZNSt17_Function_handlerIFvvEZN5mongo19ServiceStateMachine22_scheduleNextWithGuardENS2_11ThreadGuardENS1_9transport15ServiceExecutor13ScheduleFlagsENS4_23ServiceExecutorTaskNameENS2_9OwnershipEEUlvE_E9_M_invokeERKSt9_Any_data","s+":"56"}}}
{"t":{"$date":"2020-11-06T18:42:22.854+00:00"},"s":"I",  "c":"CONTROL",  "id":31427,   "ctx":"conn963","msg":"  Frame: {frame}","attr":{"frame":{"a":"55E1F60BB042","b":"55E1F38F6000","o":"27C5042","s":"_ZN5mongo9transport26ServiceExecutorSynchronous8scheduleESt8functionIFvvEENS0_15ServiceExecutor13ScheduleFlagsENS0_23ServiceExecutorTaskNameE","s+":"182"}}}
{"t":{"$date":"2020-11-06T18:42:22.854+00:00"},"s":"I",  "c":"CONTROL",  "id":31427,   "ctx":"conn963","msg":"  Frame: {frame}","attr":{"frame":{"a":"55E1F4DE791B","b":"55E1F38F6000","o":"14F191B","s":"_ZN5mongo19ServiceStateMachine22_scheduleNextWithGuardENS0_11ThreadGuardENS_9transport15ServiceExecutor13ScheduleFlagsENS2_23ServiceExecutorTaskNameENS0_9OwnershipE","s+":"DB"}}}
{"t":{"$date":"2020-11-06T18:42:22.854+00:00"},"s":"I",  "c":"CONTROL",  "id":31427,   "ctx":"conn963","msg":"  Frame: {frame}","attr":{"frame":{"a":"55E1F4DE8C7D","b":"55E1F38F6000","o":"14F2C7D","s":"_ZN5mongo19ServiceStateMachine15_sourceCallbackENS_6StatusE","s+":"6AD"}}}
{"t":{"$date":"2020-11-06T18:42:22.854+00:00"},"s":"I",  "c":"CONTROL",  "id":31427,   "ctx":"conn963","msg":"  Frame: {frame}","attr":{"frame":{"a":"55E1F4DE8D60","b":"55E1F38F6000","o":"14F2D60","s":"_ZN5mongo14future_details4callIRZNS_19ServiceStateMachine14_sourceMessageENS2_11ThreadGuardEEUlNS_10StatusWithINS_7MessageEEEE0_S6_EEDaOT_OT0_","s+":"60"}}}
{"t":{"$date":"2020-11-06T18:42:22.854+00:00"},"s":"I",  "c":"CONTROL",  "id":31427,   "ctx":"conn963","msg":"  Frame: {frame}","attr":{"frame":{"a":"55E1F4DE8FE5","b":"55E1F38F6000","o":"14F2FE5","s":"_ZN5mongo19ServiceStateMachine14_sourceMessageENS0_11ThreadGuardE","s+":"145"}}}
{"t":{"$date":"2020-11-06T18:42:22.854+00:00"},"s":"I",  "c":"CONTROL",  "id":31427,   "ctx":"conn963","msg":"  Frame: {frame}","attr":{"frame":{"a":"55E1F4DE857A","b":"55E1F38F6000","o":"14F257A","s":"_ZN5mongo19ServiceStateMachine15_runNextInGuardENS0_11ThreadGuardE","s+":"19A"}}}
{"t":{"$date":"2020-11-06T18:42:22.854+00:00"},"s":"I",  "c":"CONTROL",  "id":31427,   "ctx":"conn963","msg":"  Frame: {frame}","attr":{"frame":{"a":"55E1F4DE94B6","b":"55E1F38F6000","o":"14F34B6","s":"_ZNSt17_Function_handlerIFvvEZN5mongo19ServiceStateMachine22_scheduleNextWithGuardENS2_11ThreadGuardENS1_9transport15ServiceExecutor13ScheduleFlagsENS4_23ServiceExecutorTaskNameENS2_9OwnershipEEUlvE_E9_M_invokeERKSt9_Any_data","s+":"56"}}}
{"t":{"$date":"2020-11-06T18:42:22.854+00:00"},"s":"I",  "c":"CONTROL",  "id":31427,   "ctx":"conn963","msg":"  Frame: {frame}","attr":{"frame":{"a":"55E1F60BB6A8","b":"55E1F38F6000","o":"27C56A8","s":"_ZNSt17_Function_handlerIFvvEZN5mongo9transport26ServiceExecutorSynchronous8scheduleESt8functionIS0_ENS2_15ServiceExecutor13ScheduleFlagsENS2_23ServiceExecutorTaskNameEEUlvE0_E9_M_invokeERKSt9_Any_data","s+":"B8"}}}
{"t":{"$date":"2020-11-06T18:42:22.854+00:00"},"s":"I",  "c":"CONTROL",  "id":31427,   "ctx":"conn963","msg":"  Frame: {frame}","attr":{"frame":{"a":"55E1F640F8A6","b":"55E1F38F6000","o":"2B198A6","s":"_ZNSt17_Function_handlerIFvvEZN5mongo25launchServiceWorkerThreadESt8functionIS0_EEUlvE1_E9_M_invokeERKSt9_Any_data","s+":"56"}}}
{"t":{"$date":"2020-11-06T18:42:22.854+00:00"},"s":"I",  "c":"CONTROL",  "id":31427,   "ctx":"conn963","msg":"  Frame: {frame}","attr":{"frame":{"a":"55E1F640F914","b":"55E1F38F6000","o":"2B19914","s":"_ZN5mongo12_GLOBAL__N_17runFuncEPv","s+":"14"}}}
{"t":{"$date":"2020-11-06T18:42:22.854+00:00"},"s":"I",  "c":"CONTROL",  "id":31427,   "ctx":"conn963","msg":"  Frame: {frame}","attr":{"frame":{"a":"7FDD11BFE609","b":"7FDD11BF5000","o":"9609","s":"start_thread","s+":"D9"}}}
{"t":{"$date":"2020-11-06T18:42:22.854+00:00"},"s":"I",  "c":"CONTROL",  "id":31427,   "ctx":"conn963","msg":"  Frame: {frame}","attr":{"frame":{"a":"7FDD11B25293","b":"7FDD11A03000","o":"122293","s":"clone","s+":"43"}}}
{"t":{"$date":"2020-11-06T18:42:22.854+00:00"},"s":"F",  "c":"CONTROL",  "id":4757800, "ctx":"conn1409","msg":"Writing fatal message","attr":{"message":"Got signal: 6 (Aborted).\n"}}

Comment by Jonathan Streets (Inactive) [ 06/Nov/20 ]

hi michael.moore@jhuapl.edu,

just as an FYI. the team is still actively investigating this issue. As mentioned above, we are working with a customer who is running a custom build which will provide more logging information.

 jon

Comment by Michael Moore [ 12/Oct/20 ]

Unfortunately, the collection names and such contain proprietary info so we can't run a custom binary.  We will continue waiting.

If it's of any help, we are running this in a VMWare environment, with the storage volumes for the VM mounted over an iSCSI connection.  The storage is mounted over a 100 gigabit connection to a large NVMe RAID array, which is extremely fast for transfers and very low-latency.  Could be exposing a bug that wasn't expected in your tests.

Comment by Luke Pearson [ 11/Oct/20 ]

Hi michael.moore@jhuapl.edu

We are currently working with a different customer facing a similar issue, in doing so we are planning on providing them with a custom binary which logs further details about the error.

Would you be interested in running this binary yourself? Given that your logs contain proprietary information I understand if that isn't an option. The log lines themselves will not any information about the keys or values being accessed. Otherwise please wait for us to continue to debug the issue.

Comment by Luke Pearson [ 08/Oct/20 ]

Currently I don't have any major theories as to what is causing this failure, one statistic suggests that an eviction of a history page is causing the issue. That page may also be getting split at the same time, a statistic of note is maximum page size at eviction which was 35MB at the time of the ftdc data stopping. We also see an increase in modifies and the checkpoint finishes shortly before this.

Comment by Brian Lane [ 08/Oct/20 ]

Hi michael.moore@jhuapl.edu - it is just a way we schedule issues that transition to in-progress. We are investigating this issue now, and if there are code changes involved, we will create a follow-up backport ticket to backport the changes to 4.4.

Comment by Michael Moore [ 08/Oct/20 ]

I see this has been tagged 5.0 Required.  Am I to understand we are to continue to experience this behavior until 5.0 is released at a future date, or is that just a development priority?

Comment by Michael Moore [ 06/Oct/20 ]

I've uploaded the diagnostic.data information, but I can't upload mongod.log as it contains proprietary information.  Let me know if there's something I can look for you in there aside from the trace I attached above.

Comment by Dmitry Agranat [ 06/Oct/20 ]

Hi michael.moore@jhuapl.edu,

Would you please archive (tar or zip) the mongod.log files and the $dbpath/diagnostic.data directory (the contents are described here) and upload them to this support uploader location?

Files uploaded to this portal are visible only to MongoDB employees and are routinely deleted after some time.

Thanks,
Dima

Comment by Michael Moore [ 06/Oct/20 ]

I forgot to mention this is Enterprise 4.4.1, not Community.  We are currently evaluating Enterprise for an on-premises deployment.

Comment by Michael Moore [ 06/Oct/20 ]

Ulimits:

$ sudo systemctl show mongod|grep Limit
MemoryLimit=infinity
LimitCPU=infinity
LimitCPUSoft=infinity
LimitFSIZE=infinity
LimitFSIZESoft=infinity
LimitDATA=infinity
LimitDATASoft=infinity
LimitSTACK=infinity
LimitSTACKSoft=8388608
LimitCORE=infinity
LimitCORESoft=0
LimitRSS=infinity
LimitRSSSoft=infinity
LimitNOFILE=infinity
LimitNOFILESoft=infinity
LimitAS=infinity
LimitASSoft=infinity
LimitNPROC=infinity
LimitNPROCSoft=infinity
LimitMEMLOCK=infinity
LimitMEMLOCKSoft=infinity
LimitLOCKS=infinity
LimitLOCKSSoft=infinity
LimitSIGPENDING=2062895
LimitSIGPENDINGSoft=2062895
LimitMSGQUEUE=819200
LimitMSGQUEUESoft=819200
LimitNICE=0
LimitNICESoft=0
LimitRTPRIO=0
LimitRTPRIOSoft=0
LimitRTTIME=infinity
LimitRTTIMESoft=infinity
LogRateLimitIntervalUSec=0
LogRateLimitBurst=0
StartLimitIntervalUSec=10s
StartLimitBurst=5
StartLimitAction=none

Generated at Thu Feb 08 05:25:20 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.