[SERVER-31284] Server crash - Need help Created: 27/Sep/17  Updated: 28/Nov/18  Resolved: 10/Jan/18

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 3.4.9
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Govind Agarwal Assignee: Kelsey Schubert
Resolution: Cannot Reproduce Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to SERVER-32170 tcmalloc segfault at time of first sw... Closed
Participants:

 Description   

2017-09-27T09:17:13.815+0000 F -        [conn339221] Got signal: 11 (Segmentation fault).
 
 0xf83bf6a551 0xf83bf69769 0xf83bf69dd6 0x7f2fbeac3390 0xf83c12ace3 0xf83c12adcc 0xf83ca5e9da 0xf83ba6b86b 0xf83b9db502 0xf83ba16a6c 0xf83ba1ab0f 0xf83b4690ff 0xf83b46a7ca 0xf83ba7fc30 0xf83b685112 0xf83b687116 0xf83b286a8d 0xf83b2873bd 0xf83bed2ae1 0x7f2fbeab96ba 0x7f2fbe7ef3dd
----- BEGIN BACKTRACE -----
{"backtrace":[{"b":"F83A9FC000","o":"156E551","s":"_ZN5mongo15printStackTraceERSo"},{"b":"F83A9FC000","o":"156D769"},{"b":"F83A9FC000","o":"156DDD6"},{"b":"7F2FBEAB2000","o":"11390"},{"b":"F83A9FC000","o":"172ECE3","s":"_ZN8tcmalloc11ThreadCache21ReleaseToCentralCacheEPNS0_8FreeListEmi"},{"b":"F83A9FC000","o":"172EDCC","s":"_ZN8tcmalloc11ThreadCache11ListTooLongEPNS0_8FreeListEm"},{"b":"F83A9FC000","o":"20629DA","s":"_ZdlPvRKSt9nothrow_t"},{"b":"F83A9FC000","o":"106F86B","s":"_ZN5mongo4repl23TopologyCoordinatorImpl22fillIsMasterForReplSetEPNS0_16IsMasterResponseE"},{"b":"F83A9FC000","o":"FDF502","s":"_ZN5mongo4repl26ReplicationCoordinatorImpl22fillIsMasterForReplSetEPNS0_16IsMasterResponseE"},{"b":"F83A9FC000","o":"101AA6C","s":"_ZN5mongo4repl21appendReplicationInfoEPNS_16OperationContextERNS_14BSONObjBuilderEi"},{"b":"F83A9FC000","o":"101EB0F","s":"_ZN5mongo4repl11CmdIsMaster3runEPNS_16OperationContextERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEERNS_7BSONObjEiRS9_RNS_14BSONObjBuilderE"},{"b":"F83A9FC000","o":"A6D0FF","s":"_ZN5mongo7Command3runEPNS_16OperationContextERKNS_3rpc16RequestInterfaceEPNS3_21ReplyBuilderInterfaceE"},{"b":"F83A9FC000","o":"A6E7CA","s":"_ZN5mongo7Command11execCommandEPNS_16OperationContextEPS0_RKNS_3rpc16RequestInterfaceEPNS4_21ReplyBuilderInterfaceE"},{"b":"F83A9FC000","o":"1083C30","s":"_ZN5mongo11runCommandsEPNS_16OperationContextERKNS_3rpc16RequestInterfaceEPNS2_21Repl
yBuilderInterfaceE"},{"b":"F83A9FC000","o":"C89112"},{"b":"F83A9FC000","o":"C8B116","s":"_ZN5mongo16assembleResponseEPNS_16OperationContextERNS_7MessageERNS_10DbResponseERKNS_11HostAndPortE"},{"b":"F83A9FC000","o":"88AA8D","s":"_ZN5mongo23ServiceEntryPointMongod12_sessionLoopERKSt10shared_ptrINS_9transport7SessionEE"},{"b":"F83A9FC000","o":"88B3BD"},{"b":"F83A9FC000","o":"14D6AE1"},{"b":"7F2FBEAB2000","o":"76BA"},{"b":"7F2FBE6E8000","o":"1073DD","s":"clone"}],"processInfo":{ "mongodbVersion" : "3.4.9", "gitVersion" : "876ebee8c7dd0e2d992f36a848ff4dc50ee6603e", "compiledModules" : [], "uname" : { "sysname" : "Linux", "release" : "4.4.0-93-generic", "version" : "#116-Ubuntu SMP Fri Aug 11 21:17:51 UTC 2017", "machine" : "x86_64" }, "somap" : [ { "b" : "F83A9FC000", "elfType" : 3, "buildId" : "A97737C3A7656E2EFF91EE8950B4DE177415887F" }, { "b" : "7FFE8B42C000", "elfType" : 3, "buildId" : "54890F8C663DC6CF6219ED314BEF17BC92A67C93" }, { "b" : "7F2FBFA3E000", "path" : "/lib/x86_64-linux-gnu/libssl.so.1.0.0", "elfType" : 3, "buildId" : "675F454AD6FD0B6CA2E41127C7B98079DA37F7B6" }, { "b" : "7F2FBF5FA000", "path" : "/lib/x86_64-linux-gnu/libcrypto.so.1.0.0", "elfType" : 3, "buildId" : "2DA08A7E5BF610030DD33B70DB951399626B7496" }, { "b" : "7F2FBF3F2000", "path" : "/lib/x86_64-linux-gnu/librt.so.1", "elfType" : 3, "buildId" : "F951C1E0765FCAE48F82CAFE35D1ADD36D6C9AF9" }, { "b" : "7F2FBF1EE000", "path" : "/lib/x86_64-l
inux-gnu/libdl.so.2", "elfType" : 3, "buildId" : "0FC788F0861846257B5F1773FBD438E95DFC1032" }, { "b" : "7F2FBEEE5000", "path" : "/lib/x86_64-linux-gnu/libm.so.6", "elfType" : 3, "buildId" : "FF7A33D389E756CA381A8189291A968EA5E1F4F8" }, { "b" : "7F2FBECCF000", "path" : "/lib/x86_64-linux-gnu/libgcc_s.so.1", "elfType" : 3, "buildId" : "68220AE2C65D65C1B6AAA12FA6765A6EC2F5F434" }, { "b" : "7F2FBEAB2000", "path" : "/lib/x86_64-linux-gnu/libpthread.so.0", "elfType" : 3, "buildId" : "27F189EF8DB8C3734C6A678E6EF3CB0B206D58B2" }, { "b" : "7F2FBE6E8000", "path" : "/lib/x86_64-linux-gnu/libc.so.6", "elfType" : 3, "buildId" : "088A6E00A1814622219F346B41E775B8DD46C518" }, { "b" : "7F2FBFCA7000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3, "buildId" : "9157F205547F0EB588E2AB1F2F120B74253A43EA" } ] }}
 mongod(_ZN5mongo15printStackTraceERSo+0x41) [0xf83bf6a551]
 mongod(+0x156D769) [0xf83bf69769]
 mongod(+0x156DDD6) [0xf83bf69dd6]
 libpthread.so.0(+0x11390) [0x7f2fbeac3390]
 mongod(_ZN8tcmalloc11ThreadCache21ReleaseToCentralCacheEPNS0_8FreeListEmi+0xF3) [0xf83c12ace3]
 mongod(_ZN8tcmalloc11ThreadCache11ListTooLongEPNS0_8FreeListEm+0x1C) [0xf83c12adcc]
 mongod(_ZdlPvRKSt9nothrow_t+0x25A) [0xf83ca5e9da]
 mongod(_ZN5mongo4repl23TopologyCoordinatorImpl22fillIsMasterForReplSetEPNS0_16IsMasterResponseE+0x2FB) [0xf83ba6b86b]
 mongod(_ZN5mongo4repl26ReplicationCoordinatorImpl22fillIsMasterForReplSetEPNS0_16IsMasterResponseE+0x82) [0xf83b9db502]
 mongod(_ZN5mongo4repl21appendReplicationInfoEPNS_16OperationContextERNS_14BSONObjBuilderEi+0x6EC) [0xf83ba16a6c]
 mongod(_ZN5mongo4repl11CmdIsMaster3runEPNS_16OperationContextERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEERNS_7BSONObjEiRS9_RNS_14BSONObjBuilderE+0x13F) [0xf83ba1ab0f]
 mongod(_ZN5mongo7Command3runEPNS_16OperationContextERKNS_3rpc16RequestInterfaceEPNS3_21ReplyBuilderInterfaceE+0x4FF) [0xf83b4690ff]
 mongod(_ZN5mongo7Command11execCommandEPNS_16OperationContextEPS0_RKNS_3rpc16RequestInterfaceEPNS4_21ReplyBuilderInterfaceE+0xF6A) [0xf83b46a7ca]
 mongod(_ZN5mongo11runCommandsEPNS_16OperationContextERKNS_3rpc16RequestInterfaceEPNS2_21ReplyBuilderInterfaceE+0x240) [0xf83ba7fc30]
 mongod(+0xC89112) [0xf83b685112]
 mongod(_ZN5mongo16assembleResponseEPNS_16OperationContextERNS_7MessageERNS_10DbResponseERKNS_11HostAndPortE+0x746) [0xf83b687116]
 mongod(_ZN5mongo23ServiceEntryPointMongod12_sessionLoopERKSt10shared_ptrINS_9transport7SessionEE+0x1FD) [0xf83b286a8d]
 mongod(+0x88B3BD) [0xf83b2873bd]
 mongod(+0x14D6AE1) [0xf83bed2ae1]
 libpthread.so.0(+0x76BA) [0x7f2fbeab96ba]
 libc.so.6(clone+0x6D) [0x7f2fbe7ef3dd]
-----  END BACKTRACE  -----
2017-09-27T09:21:47.645+0000 I CONTROL  [main] ***** SERVER RESTARTED *****
2017-09-27T09:21:47.695+0000 I CONTROL  [initandlisten] MongoDB starting : pid=1361 port=27017 dbpath=/mnt/md0/mongodb 64-bit host=mongo-06
2017-09-27T09:21:47.695+0000 I CONTROL  [initandlisten] db version v3.4.9
2017-09-27T09:21:47.695+0000 I CONTROL  [initandlisten] git version: 876ebee8c7dd0e2d992f36a848ff4dc50ee6603e
2017-09-27T09:21:47.695+0000 I CONTROL  [initandlisten] OpenSSL version: OpenSSL 1.0.2g  1 Mar 2016
2017-09-27T09:21:47.695+0000 I CONTROL  [initandlisten] allocator: tcmalloc
2017-09-27T09:21:47.695+0000 I CONTROL  [initandlisten] modules: none
2017-09-27T09:21:47.695+0000 I CONTROL  [initandlisten] build environment:
2017-09-27T09:21:47.695+0000 I CONTROL  [initandlisten]     distmod: ubuntu1604
2017-09-27T09:21:47.695+0000 I CONTROL  [initandlisten]     distarch: x86_64
2017-09-27T09:21:47.695+0000 I CONTROL  [initandlisten]     target_arch: x86_64
2017-09-27T09:21:47.695+0000 I CONTROL  [initandlisten] options: { config: "/etc/mongod.conf", net: { bindIp: "127.0.0.1,10.130.59.191", port: 27017 }, replication: { replSetName: "ls" }, sharding: { clusterRole: "shardsvr" }, storage: { dbPath: "/mnt/md0/mongodb", journal: { enabled: true } }, systemLog: { destination: "file", logAppend: true, path: "/var/log/mongodb/mongod.log" } }
2017-09-27T09:21:47.695+0000 W -        [initandlisten] Detected unclean shutdown - /mnt/md0/mongodb/mongod.lock is not empty.
2017-09-27T09:21:47.723+0000 I -        [initandlisten] Detected data files in /mnt/md0/mongodb created by the 'wiredTiger' storage engine, so setting the active storage engine to 'wiredTiger'.
2017-09-27T09:21:47.723+0000 W STORAGE  [initandlisten] Recovering data from the last clean checkpoint.
2017-09-27T09:21:47.724+0000 I STORAGE  [initandlisten] wiredtiger_open config: create,cache_size=31702M,session_max=20000,eviction=(threads_min=4,threads_max=4),config_base=false,statistics=(fast),log=(enabled=true,archive=true,path=journal,compressor=snappy),file_manager=(close_idle_time=100000),checkpoint=(wait=60,log_size=2GB),statistics_log=(wait=0),
2017-09-27T09:21:54.368+0000 E STORAGE  [initandlisten] WiredTiger error (0) [1506504114:367747][1361:0x7fe07d3f6d00], file:collection-15-2414246657800812719.wt, WT_CURSOR.insert: read checksum error for 12288B block at offset 23007911936: calculated block checksum of 3725859502 doesn't match expected checksum of 2047809465
2017-09-27T09:21:54.369+0000 E STORAGE  [initandlisten] WiredTiger error (0) [1506504114:368581][1361:0x7fe07d3f6d00], file:collection-15-2414246657800812719.wt, WT_CURSOR.insert: collection-15-2414246657800812719.wt: encountered an illegal file format or internal value
2017-09-27T09:21:54.370+0000 E STORAGE  [initandlisten] WiredTiger error (-31804) [1506504114:369659][1361:0x7fe07d3f6d00], file:collection-15-2414246657800812719.wt, WT_CURSOR.insert: the process must exit and restart: WT_PANIC: WiredTiger library panic
2017-09-27T09:21:54.370+0000 I -        [initandlisten] Fatal Assertion 28558 at src/mongo/db/storage/wiredtiger/wiredtiger_util.cpp 361
2017-09-27T09:21:54.370+0000 I -        [initandlisten] 
 
***aborting after fassert() failure



 Comments   
Comment by Kelsey Schubert [ 28/Sep/17 ]

Hi govind@gameberrylabs.com,

Thank for the reporting this issue. From the stacktrace, I believe bad memory is the most likely explanation for this behavior. Would you please test and confirm the health of your memory (e.g. run memtest)?

If memtest does not identify any issues with the memory on the host machine, please provide the following additional information so we can continue to investigate:

  • The complete log files covering at least a hour preceding the segmentation fault
  • The following data files:
    • WiredTiger.*
    • _mdb_catalog.wt
    • sizeStorer.wt
    • An archive of the journal directory
    • An archive of diagnostic.data directory
    • collection-15-2414246657800812719.wt

The logs and diagnostic.data will allow us to help us determine if there was any unusual activity preceding the event. The data files I've requested will allow us to manually inspect the files to better understand what happened here.

Should you need it, I've created a secure upload portal for you to provide these files. Files uploaded to this portal are only visible to MongoDB employees investigating this issue and are routinely deleted after some time.

Thank you for your help,
Kelsey

Generated at Thu Feb 08 04:26:33 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.