[SERVER-34426] mongod crashes - Got signal: 11 (Segmentation fault) Created: 12/Apr/18  Updated: 16/May/18  Resolved: 16/May/18

Status: Closed
Project: Core Server
Component/s: WiredTiger
Affects Version/s: 3.2.18
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: GeorgeT [X] Assignee: Donald Anderson
Resolution: Cannot Reproduce Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Amazon, Ubuntu 16.04


Operating System: ALL
Sprint: Storage Non-NYC 2018-05-07, Storage Non-NYC 2018-05-21
Participants:

 Description   

mongod crashed with Got signal: 11 (Segmentation fault)
using sharding with 3 replica set, one primary mongod instance crashed
how it can fix?

Stack trace:

2018-04-11T16:53:01.034+0000 F -        [conn1569845] Invalid access at address: 0x10
2018-04-11T16:53:01.098+0000 F -        [conn1569845] Got signal: 11 (Segmentation fault).
 
 0x1559492 0x1558439 0x1558e17 0x7f6911254390 0x126f638 0x126da43 0x124882b 0xd91769 0xcc18ec 0xcc31df 0xcb15f4 0xce2589 0xcec058 0xce511f 0xcd42b1 0xca74b8 0xf5a3d5 0xf5c277 0xf5ce44 0xf15558 0xf1625c 0xbc570e 0xc7a1d6 0xc7b4cb 0xb8659b 0xdb8146 0x9c6d20 0x14ff3c1 0x7f691124a6ba 0x7f6910f803dd
----- BEGIN BACKTRACE -----
{"backtrace":[{"b":"400000","o":"1159492","s":"_ZN5mongo15printStackTraceERSo"},{"b":"400000","o":"1158439"},{"b":"400000","o":"1158E17"},{"b":"7F6911243000","o":"11390"},{"b":"400000","o":"E6F638","s":"_ZN5mongo17WiredTigerSession9getCursorERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEmb"},{"b":"400000","o":"E6DA43","s":"_ZN5mongo16WiredTigerCursorC1ERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEmbPNS_16OperationContextE"},{"b":"400000","o":"E4882B","s":"_ZNK5mongo23WiredTigerIndexStandard9newCursorEPNS_16OperationContextEb"},{"b":"400000","o":"991769","s":"_ZNK5mongo17IndexAccessMethod9newCursorEPNS_16OperationContextEb"},{"b":"400000","o":"8C18EC","s":"_ZN5mongo9IndexScan13initIndexScanEv"},{"b":"400000","o":"8C31DF","s":"_ZN5mongo9IndexScan4workEPm"},{"b":"400000","o":"8B15F4","s":"_ZN5mongo10FetchStage4workEPm"},{"b":"400000","o":"8E2589","s":"_ZN5mongo16ShardFilterStage4workEPm"},{"b":"400000","o":"8EC058","s":"_ZN5mongo21SortKeyGeneratorStage4workEPm"},{"b":"400000","o":"8E511F","s":"_ZN5mongo9SortStage4workEPm"},{"b":"400000","o":"8D42B1","s":"_ZN5mongo15ProjectionStage4workEPm"},{"b":"400000","o":"8A74B8","s":"_ZN5mongo15CachedPlanStage12pickBestPlanEPNS_15PlanYieldPolicyE"},{"b":"400000","o":"B5A3D5","s":"_ZN5mongo12PlanExecutor12pickBestPlanENS0_11YieldPolicyE"},{"b":"400000","o":"B5C277","s":"_ZN5mongo12PlanExecutor4makeEPNS_16OperationContextESt10unique_ptrINS_10WorkingSetESt14default_deleteIS4_EES3_INS_9PlanStageES5_IS8_EES3_INS_13QuerySolutionES5_ISB_EES3_INS_14CanonicalQueryES5_ISE_EEPKNS_10CollectionERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEENS0_11YieldPolicyE"},{"b":"400000","o":"B5CE44","s":"_ZN5mongo12PlanExecutor4makeEPNS_16OperationContextESt10unique_ptrINS_10WorkingSetESt14default_deleteIS4_EES3_INS_9PlanStageES5_IS8_EES3_INS_13QuerySolutionES5_ISB_EES3_INS_14CanonicalQueryES5_ISE_EEPKNS_10CollectionENS0_11YieldPolicyE"},{"b":"400000","o":"B15558","s":"_ZN5mongo11getExecutorEPNS_16OperationContextEPNS_10CollectionESt10unique_ptrINS_14CanonicalQueryESt14default_deleteIS5_EENS_12PlanExecutor11YieldPolicyEm"},{"b":"400000","o":"B1625C","s":"_ZN5mongo15getExecutorFindEPNS_16OperationContextEPNS_10CollectionERKNS_15NamespaceStringESt10unique_ptrINS_14CanonicalQueryESt14default_deleteIS8_EENS_12PlanExecutor11YieldPolicyE"},{"b":"400000","o":"7C570E","s":"_ZN5mongo7FindCmd3runEPNS_16OperationContextERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEERNS_7BSONObjEiRS8_RNS_14BSONObjBuilderE"},{"b":"400000","o":"87A1D6","s":"_ZN5mongo7Command3runEPNS_16OperationContextERKNS_3rpc16RequestInterfaceEPNS3_21ReplyBuilderInterfaceE"},{"b":"400000","o":"87B4CB","s":"_ZN5mongo7Command11execCommandEPNS_16OperationContextEPS0_RKNS_3rpc16RequestInterfaceEPNS4_21ReplyBuilderInterfaceE"},{"b":"400000","o":"78659B","s":"_ZN5mongo11runCommandsEPNS_16OperationContextERKNS_3rpc16RequestInterfaceEPNS2_21ReplyBuilderInterfaceE"},{"b":"400000","o":"9B8146","s":"_ZN5mongo16assembleResponseEPNS_16OperationContextERNS_7MessageERNS_10DbResponseERKNS_11HostAndPortE"},{"b":"400000","o":"5C6D20"},{"b":"400000","o":"10FF3C1","s":"_ZN5mongo17PortMessageServer17handleIncomingMsgEPv"},{"b":"7F6911243000","o":"76BA"},{"b":"7F6910E79000","o":"1073DD","s":"clone"}],"processInfo":{ "mongodbVersion" : "3.2.18", "gitVersion" : "4c1bae566c0c00f996a2feb16febf84936ecaf6f", "compiledModules" : [], "uname" : { "sysname" : "Linux", "release" : "4.4.0-1041-aws", "version" : "#50-Ubuntu SMP Wed Nov 15 22:18:17 UTC 2017", "machine" : "x86_64" }, "somap" : [ { "elfType" : 2, "b" : "400000", "buildId" : "3C61A0DBFA07F1A1069E23BC3A1ADF2E695605CE" }, { "b" : "7FFDE85EB000", "elfType" : 3, "buildId" : "481727DDA3C187EE33287AC58BBED9D7EB68AC89" }, { "b" : "7F69121CF000", "path" : "/lib/x86_64-linux-gnu/libssl.so.1.0.0", "elfType" : 3, "buildId" : "DCF10134B91ED2139E3E8C72564668F5CDBA8522" }, { "b" : "7F6911D8B000", "path" : "/lib/x86_64-linux-gnu/libcrypto.so.1.0.0", "elfType" : 3, "buildId" : "1649272BE0CA9FA22F082DC86372B6C9959779B0" }, { "b" : "7F6911B83000", "path" : "/lib/x86_64-linux-gnu/librt.so.1", "elfType" : 3, "buildId" : "F951C1E0765FCAE48F82CAFE35D1ADD36D6C9AF9" }, { "b" : "7F691197F000", "path" : "/lib/x86_64-linux-gnu/libdl.so.2", "elfType" : 3, "buildId" : "0FC788F0861846257B5F1773FBD438E95DFC1032" }, { "b" : "7F6911676000", "path" : "/lib/x86_64-linux-gnu/libm.so.6", "elfType" : 3, "buildId" : "FF7A33D389E756CA381A8189291A968EA5E1F4F8" }, { "b" : "7F6911460000", "path" : "/lib/x86_64-linux-gnu/libgcc_s.so.1", "elfType" : 3, "buildId" : "68220AE2C65D65C1B6AAA12FA6765A6EC2F5F434" }, { "b" : "7F6911243000", "path" : "/lib/x86_64-linux-gnu/libpthread.so.0", "elfType" : 3, "buildId" : "27F189EF8DB8C3734C6A678E6EF3CB0B206D58B2" }, { "b" : "7F6910E79000", "path" : "/lib/x86_64-linux-gnu/libc.so.6", "elfType" : 3, "buildId" : "088A6E00A1814622219F346B41E775B8DD46C518" }, { "b" : "7F6912438000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3, "buildId" : "9157F205547F0EB588E2AB1F2F120B74253A43EA" } ] }}
 mongod(_ZN5mongo15printStackTraceERSo+0x32) [0x1559492]
 mongod(+0x1158439) [0x1558439]
 mongod(+0x1158E17) [0x1558e17]
 libpthread.so.0(+0x11390) [0x7f6911254390]
 mongod(_ZN5mongo17WiredTigerSession9getCursorERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEmb+0x28) [0x126f638]
 mongod(_ZN5mongo16WiredTigerCursorC1ERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEmbPNS_16OperationContextE+0x53) [0x126da43]
 mongod(_ZNK5mongo23WiredTigerIndexStandard9newCursorEPNS_16OperationContextEb+0x15B) [0x124882b]
 mongod(_ZNK5mongo17IndexAccessMethod9newCursorEPNS_16OperationContextEb+0x19) [0xd91769]
 mongod(_ZN5mongo9IndexScan13initIndexScanEv+0x4C) [0xcc18ec]
 mongod(_ZN5mongo9IndexScan4workEPm+0x15F) [0xcc31df]
 mongod(_ZN5mongo10FetchStage4workEPm+0x164) [0xcb15f4]
 mongod(_ZN5mongo16ShardFilterStage4workEPm+0x59) [0xce2589]
 mongod(_ZN5mongo21SortKeyGeneratorStage4workEPm+0x48) [0xcec058]
 mongod(_ZN5mongo9SortStage4workEPm+0x34F) [0xce511f]
 mongod(_ZN5mongo15ProjectionStage4workEPm+0x51) [0xcd42b1]
mongod(_ZN5mongo15CachedPlanStage12pickBestPlanEPNS_15PlanYieldPolicyE+0x188) [0xca74b8]
 mongod(_ZN5mongo12PlanExecutor12pickBestPlanENS0_11YieldPolicyE+0xC5) [0xf5a3d5]
 mongod(_ZN5mongo12PlanExecutor4makeEPNS_16OperationContextESt10unique_ptrINS_10WorkingSetESt14default_deleteIS4_EES3_INS_9PlanStageES5_IS8_EES3_INS_13QuerySolutionES5_ISB_EES3_INS_14CanonicalQueryES5_ISE_EEPKNS_10CollectionERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEENS0_11YieldPolicyE+0x287) [0xf5c277]
 mongod(_ZN5mongo12PlanExecutor4makeEPNS_16OperationContextESt10unique_ptrINS_10WorkingSetESt14default_deleteIS4_EES3_INS_9PlanStageES5_IS8_EES3_INS_13QuerySolutionES5_ISB_EES3_INS_14CanonicalQueryES5_ISE_EEPKNS_10CollectionENS0_11YieldPolicyE+0xC4) [0xf5ce44]
 mongod(_ZN5mongo11getExecutorEPNS_16OperationContextEPNS_10CollectionESt10unique_ptrINS_14CanonicalQueryESt14default_deleteIS5_EENS_12PlanExecutor11YieldPolicyEm+0x108) [0xf15558]
 mongod(_ZN5mongo15getExecutorFindEPNS_16OperationContextEPNS_10CollectionERKNS_15NamespaceStringESt10unique_ptrINS_14CanonicalQueryESt14default_deleteIS8_EENS_12PlanExecutor11YieldPolicyE+0x7C) [0xf1625c]
 mongod(_ZN5mongo7FindCmd3runEPNS_16OperationContextERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEERNS_7BSONObjEiRS8_RNS_14BSONObjBuilderE+0x8EE) [0xbc570e]
 mongod(_ZN5mongo7Command3runEPNS_16OperationContextERKNS_3rpc16RequestInterfaceEPNS3_21ReplyBuilderInterfaceE+0x676) [0xc7a1d6]
 mongod(_ZN5mongo7Command11execCommandEPNS_16OperationContextEPS0_RKNS_3rpc16RequestInterfaceEPNS4_21ReplyBuilderInterfaceE+0x85B) [0xc7b4cb]
 mongod(_ZN5mongo11runCommandsEPNS_16OperationContextERKNS_3rpc16RequestInterfaceEPNS2_21ReplyBuilderInterfaceE+0x25B) [0xb8659b]
 mongod(_ZN5mongo16assembleResponseEPNS_16OperationContextERNS_7MessageERNS_10DbResponseERKNS_11HostAndPortE+0xC36) [0xdb8146]
 mongod(+0x5C6D20) [0x9c6d20]
 mongod(_ZN5mongo17PortMessageServer17handleIncomingMsgEPv+0x311) [0x14ff3c1]
 libpthread.so.0(+0x76BA) [0x7f691124a6ba]
 libc.so.6(clone+0x6D) [0x7f6910f803dd]
-----  END BACKTRACE  -----



 Comments   
Comment by Donald Anderson [ 16/May/18 ]

I'm marking this closed ('cannot reproduce') for now.  If the customer has more information or a test case, it can be reopened.

Comment by GeorgeT [X] [ 14/May/18 ]

@donald.anderson, yes, it`s happened since. unfortunate, logs not saved. when this happen again, we will send logs obligatorily.
use aws ec2 instance for mongodb and have 3 shards with 3 replica set. Usually primary mongodb crash, when load increases, but it happen once on month or less. consequently, test case sound so unclear.

Comment by Donald Anderson [ 11/May/18 ]

The point of failure is an access to an iterator on _cursors, a member of the WiredTigerSession object.  The access address is 0x10, and _cursors looks to be 0x18 bytes into the object.  So it seems that the WiredTigerSession object is NULL, or nearly so, possibly -8 or -16, depending on the implementation of iterators. That's all to say, my best guess is that the session pointer is unexpectedly NULL or otherwise corrupt. I've looked at the related code, and I don't see any direct way that this would happen.

GeorgeT, has this happened since? Any ideas about the conditions that led to this failure? If so, any chance you have a test case?  Thanks!

Generated at Thu Feb 08 04:36:38 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.