[SERVER-36139] Cluster-wide crash due to segfaults in LockPinger thread when an SCCC member is started with replication.replSetName option Created: 16/Jul/18  Updated: 06/Dec/22  Resolved: 16/Jul/18

Status: Closed
Project: Core Server
Component/s: Sharding, Stability
Affects Version/s: 3.2.17
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Dmitry Ryabtsev Assignee: [DO NOT USE] Backlog - Sharding Team
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
duplicates SERVER-32446 Mongod crashes when config servers up... Closed
Assigned Teams:
Sharding
Operating System: ALL
Steps To Reproduce:
  1. Deploy a 3.2.17 sharded cluster with 3 SCCC config servers
  2. Add replication.replSetName option into the configuration file of one of the config servers and re-start it to the change would take effect
  3. Observe how the mongos and the shard processes crash with segfault
Participants:
Case:

 Description   

With legacy SCCC configuration of the config servers, if one of the config servers is accidentally restarted with the replication.replSetName option set the LogPinger thread on the shards and on the mongoS will segfault apparently due to unexpected notMaster outcome of trying to work with the config server that has the replSetName option set.

This will likely cause cluster-wide crashes == outage!

The back trace looks like this:

"The backtrace"

Show all

2018-07-16T00:53:01.415+0000 F - [LockPinger] Invalid access at address: 0x20b7000
2018-07-16T00:53:01.436+0000 F - [LockPinger] Got signal: 11 (Segmentation fault).
 
0x13c6902 0x13c5a59 0x13c5dd8 0x7fa978a4e330 0x7fa97870fb10 0x142a92e 0x1bfe00d 0xa2ec08 0xa2ecae 0x1288c5b 0x12c3c0c 0xa9fc8d 0xa40825 0xaa5a2e 0xaa6132 0xa2fdfb 0xa33005 0xa334c1 0x1202eaf 0x1206d1a 0x120789a 0x1bf42e0 0x7fa978a46184 0x7fa97877303d
----- BEGIN BACKTRACE -----
{"backtrace":[\{"b":"400000","o":"FC6902","s":"_ZN5mongo15printStackTraceERSo"},\{"b":"400000","o":"FC5A59"},\{"b":"400000","o":"FC5DD8"},\{"b":"7FA978A3E000","o":"10330"},\{"b":"7FA978675000","o":"9AB10"},\{"b":"400000","o":"102A92E","s":"_ZNSs12_S_constructIPKcEEPcT_S3_RKSaIcESt20forward_iterator_tag"},\{"b":"400000","o":"17FE00D","s":"_ZNSsC1EPKcmRKSaIcE"},\{"b":"400000","o":"62EC08","s":"_ZN5mongo16ConnectionStringC1ENS_10StringDataESt6vectorINS_11HostAndPortESaIS3_EE"},\{"b":"400000","o":"62ECAE","s":"_ZN5mongo16ConnectionString13forReplicaSetENS_10StringDataESt6vectorINS_11HostAndPortESaIS3_EE"},\{"b":"400000","o":"E88C5B","s":"_ZN5mongo29ShardingNetworkConnectionHook16validateHostImplERKNS_11HostAndPortERKNS_8executor21RemoteCommandResponseEb"},\{"b":"400000","o":"EC3C0C"},\{"b":"400000","o":"69FC8D"},\{"b":"400000","o":"640825","s":"_ZN5mongo18DBClientConnection7connectERKNS_11HostAndPortE"},\{"b":"400000","o":"6A5A2E","s":"_ZN5mongo21SyncClusterConnection8_connectERKSs"},\{"b":"400000","o":"6A6132","s":"_ZN5mongo21SyncClusterConnectionC1ERKSt4listINS_11HostAndPortESaIS2_EEd"},\{"b":"400000","o":"62FDFB","s":"_ZNK5mongo16ConnectionString7connectERSsd"},\{"b":"400000","o":"633005","s":"_ZN5mongo16DBConnectionPool3getERKSsd"},\{"b":"400000","o":"6334C1","s":"_ZN5mongo18ScopedDbConnectionC2ERKSsd"},\{"b":"400000","o":"E02EAF","s":"_ZN5mongo20LegacyDistLockPinger19_distLockPingThreadENS_16ConnectionStringERKSsNSt6chrono8durationIlSt5ratioILl1ELl1000EEEE"},\{"b":"400000","o":"E06D1A","s":"_ZN5mongo20LegacyDistLockPinger18distLockPingThreadENS_16ConnectionStringExRKSsNSt6chrono8durationIlSt5ratioILl1ELl1000EEEE"},\{"b":"400000","o":"E0789A","s":"_ZNSt6thread5_ImplISt12_Bind_simpleIFSt5_BindIFSt7_Mem_fnIMN5mongo20LegacyDistLockPingerEFvNS4_16ConnectionStringExRKSsNSt6chrono8durationIlSt5ratioILl1ELl1000EEEEEEPS5_S6_xSsSD_EEvEEE6_M_runEv"},\{"b":"400000","o":"17F42E0","s":"execute_native_thread_routine"},\{"b":"7FA978A3E000","o":"8184"},\{"b":"7FA978675000","o":"FE03D","s":"clone"}],"processInfo":\{ "mongodbVersion" : "3.2.17", "gitVersion" : "186656d79574f7dfe0831a7e7821292ab380f667", "compiledModules" : [ "enterprise" ], "uname" : { "sysname" : "Linux", "release" : "3.13.0-105-generic", "version" : "#152-Ubuntu SMP Fri Dec 2 15:37:11 UTC 2016", "machine" : "x86_64" }, "somap" : [ \{ "elfType" : 2, "b" : "400000", "buildId" : "C64FB7E39A41884970087FED27370726E8FF84C6" }, \{ "b" : "7FFC238BE000", "elfType" : 3, "buildId" : "9C7CBCF6C957D8FC8E55B45A3C7A1556B38A3097" }, \{ "b" : "7FA97ABCF000", "path" : "/usr/lib/x86_64-linux-gnu/libsasl2.so.2", "elfType" : 3, "buildId" : "666B276BD134B0E9579B67D4EE333F2D0FB813CD" }, \{ "b" : "7FA97A762000", "path" : "/usr/lib/x86_64-linux-gnu/libnetsnmpmibs.so.30", "elfType" : 3, "buildId" : "8047EB46F312235A7AD5E88665194B9B79823731" }, \{ "b" : "7FA97A553000", "path" : "/usr/lib/x86_64-linux-gnu/libsensors.so.4", "elfType" : 3, "buildId" : "859FDBFDD82F0EFDEB44A433D9D8020A232A35E2" }, \{ "b" : "7FA97A34F000", "path" : "/lib/x86_64-linux-gnu/libdl.so.2", "elfType" : 3, "buildId" : "034D6A4EE9DCAB4A34ABD644345CBBB42DC63088" }, \{ "b" : "7FA97A0E6000", "path" : "/usr/lib/x86_64-linux-gnu/libnetsnmpagent.so.30", "elfType" : 3, "buildId" : "440F4DBA9B84E851695DA5087266A215A17F05AF" }, \{ "b" : "7FA979EDC000", "path" : "/lib/x86_64-linux-gnu/libwrap.so.0", "elfType" : 3, "buildId" : "54FCBC5B0F994A13A9B0EAD46F23E7DA7F7FE75B" }, \{ "b" : "7FA979C02000", "path" : "/usr/lib/x86_64-linux-gnu/libnetsnmp.so.30", "elfType" : 3, "buildId" : "3FA90E3998BC0E2B00C1E751A3690FE919E12042" }, \{ "b" : "7FA979826000", "path" : "/lib/x86_64-linux-gnu/libcrypto.so.1.0.0", "elfType" : 3, "buildId" : "CE5EE930D4F0B1F47EDFDACC388EAC6C4DE5CDD2" }, \{ "b" : "7FA9795DF000", "path" : "/usr/lib/x86_64-linux-gnu/libgssapi_krb5.so.2", "elfType" : 3, "buildId" : "55F72A23CB9C0F7529F0E0BEE43981864B74C4FE" }, \{ "b" : "7FA9792D9000", "path" : "/lib/x86_64-linux-gnu/libm.so.6", "elfType" : 3, "buildId" : "300C7884CDEB5667BEA2357D2B8E7A76397562D6" }, \{ "b" : "7FA97907A000", "path" : "/lib/x86_64-linux-gnu/libssl.so.1.0.0", "elfType" : 3, "buildId" : "920BD37B19B7BD04CA38CE35155D6CDCD744EAB5" }, \{ "b" : "7FA978E72000", "path" : "/lib/x86_64-linux-gnu/librt.so.1", "elfType" : 3, "buildId" : "4F930712D3609C93E380E5BE5DF73E7AD273531C" }, \{ "b" : "7FA978C5C000", "path" : "/lib/x86_64-linux-gnu/libgcc_s.so.1", "elfType" : 3, "buildId" : "36311B4457710AE5578C4BF00791DED7359DBB92" }, \{ "b" : "7FA978A3E000", "path" : "/lib/x86_64-linux-gnu/libpthread.so.0", "elfType" : 3, "buildId" : "F64B8AD471FBA1B7A3A64EFB01551E694975E1F7" }, \{ "b" : "7FA978675000", "path" : "/lib/x86_64-linux-gnu/libc.so.6", "elfType" : 3, "buildId" : "D9A10B8EF90300628DD0A3A535106967714D7328" }, \{ "b" : "7FA97ADEA000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3, "buildId" : "2CA513EDC89C7BC06EC183D1A3A03CC0F606319C" }, \{ "b" : "7FA9782EC000", "path" : "/usr/lib/libperl.so.5.18", "elfType" : 3, "buildId" : "C0DB67A9F9ACDD77265A72E03557AC3AF3DCB362" }, \{ "b" : "7FA9780D2000", "path" : "/lib/x86_64-linux-gnu/libnsl.so.1", "elfType" : 3, "buildId" : "77E8046EDCD924AF0081170F3E3BDC4317CCE6A0" }, \{ "b" : "7FA977E07000", "path" : "/usr/lib/x86_64-linux-gnu/libkrb5.so.3", "elfType" : 3, "buildId" : "77287B3AF8DD293D7367EEF27C652C04353752EC" }, \{ "b" : "7FA977BD8000", "path" : "/usr/lib/x86_64-linux-gnu/libk5crypto.so.3", "elfType" : 3, "buildId" : "49E3D743C2B3741229AD3892B22C4794C646E1F2" }, \{ "b" : "7FA9779D4000", "path" : "/lib/x86_64-linux-gnu/libcom_err.so.2", "elfType" : 3, "buildId" : "8D56938ABD6462C4C29822D8E48A131BE1C61F6A" }, \{ "b" : "7FA9777C9000", "path" : "/usr/lib/x86_64-linux-gnu/libkrb5support.so.0", "elfType" : 3, "buildId" : "0B3ABC152466DE0C69954405A0E980B6E0D4B78F" }, \{ "b" : "7FA977590000", "path" : "/lib/x86_64-linux-gnu/libcrypt.so.1", "elfType" : 3, "buildId" : "A2CA559CCEB691EF8623361D52671E146DC0B06C" }, \{ "b" : "7FA97738C000", "path" : "/lib/x86_64-linux-gnu/libkeyutils.so.1", "elfType" : 3, "buildId" : "0F03635F97B93D3DACD84F0ED363C56BD266044F" }, \{ "b" : "7FA977171000", "path" : "/lib/x86_64-linux-gnu/libresolv.so.2", "elfType" : 3, "buildId" : "AD304AFCE6847F7A4D66D22853E87CCBF5A66966" } ] }}
 mongod(_ZN5mongo15printStackTraceERSo+0x32) [0x13c6902]
 mongod(+0xFC5A59) [0x13c5a59]
 mongod(+0xFC5DD8) [0x13c5dd8]
 libpthread.so.0(+0x10330) [0x7fa978a4e330]
 libc.so.6(+0x9AB10) [0x7fa97870fb10]
 mongod(_ZNSs12_S_constructIPKcEEPcT_S3_RKSaIcESt20forward_iterator_tag+0x7E) [0x142a92e]
 mongod(_ZNSsC1EPKcmRKSaIcE+0x1D) [0x1bfe00d]
 mongod(_ZN5mongo16ConnectionStringC1ENS_10StringDataESt6vectorINS_11HostAndPortESaIS3_EE+0x78) [0xa2ec08]
 mongod(_ZN5mongo16ConnectionString13forReplicaSetENS_10StringDataESt6vectorINS_11HostAndPortESaIS3_EE+0x4E) [0xa2ecae]
 mongod(_ZN5mongo29ShardingNetworkConnectionHook16validateHostImplERKNS_11HostAndPortERKNS_8executor21RemoteCommandResponseEb+0x83B) [0x1288c5b]
 mongod(+0xEC3C0C) [0x12c3c0c]
 mongod(+0x69FC8D) [0xa9fc8d]
 mongod(_ZN5mongo18DBClientConnection7connectERKNS_11HostAndPortE+0x765) [0xa40825]
 mongod(_ZN5mongo21SyncClusterConnection8_connectERKSs+0x2EE) [0xaa5a2e]
 mongod(_ZN5mongo21SyncClusterConnectionC1ERKSt4listINS_11HostAndPortESaIS2_EEd+0x2A2) [0xaa6132]
 mongod(_ZNK5mongo16ConnectionString7connectERSsd+0xEB) [0xa2fdfb]
 mongod(_ZN5mongo16DBConnectionPool3getERKSsd+0x145) [0xa33005]
 mongod(_ZN5mongo18ScopedDbConnectionC2ERKSsd+0x61) [0xa334c1]
 mongod(_ZN5mongo20LegacyDistLockPinger19_distLockPingThreadENS_16ConnectionStringERKSsNSt6chrono8durationIlSt5ratioILl1ELl1000EEEE+0x2BF) [0x1202eaf]
 mongod(_ZN5mongo20LegacyDistLockPinger18distLockPingThreadENS_16ConnectionStringExRKSsNSt6chrono8durationIlSt5ratioILl1ELl1000EEEE+0x10A) [0x1206d1a]
 mongod(_ZNSt6thread5_ImplISt12_Bind_simpleIFSt5_BindIFSt7_Mem_fnIMN5mongo20LegacyDistLockPingerEFvNS4_16ConnectionStringExRKSsNSt6chrono8durationIlSt5ratioILl1ELl1000EEEEEEPS5_S6_xSsSD_EEvEEE6_M_runEv+0x11A) [0x120789a]
 mongod(execute_native_thread_routine+0x20) [0x1bf42e0]
 libpthread.so.0(+0x8184) [0x7fa978a46184]
 libc.so.6(clone+0x6D) [0x7fa97877303d]
----- END BACKTRACE -----
2018-07-16T00:53:01.436+0000 F - [LockPinger] /proc/self/maps:
00400000-020b7000 r-xp 00000000 ca:01 394237 /var/lib/mongodb-mms-automation/mongodb-linux-x86_64-3.2.17-ent/bin/mongod
2018-07-16T00:53:01.436+0000 F - [LockPinger] 020b8000-02186000 r--p 01cb7000 ca:01 394237 /var/lib/mongodb-mms-automation/mongodb-linux-x86_64-3.2.17-ent/bin/mongod
2018-07-16T00:53:01.436+0000 F - [LockPinger] 02186000-0218e000 rw-p 01d85000 ca:01 394237 /var/lib/mongodb-mms-automation/mongodb-linux-x86_64-3.2.17-ent/bin/mongod
2018-07-16T00:53:01.436+0000 F - [LockPinger] 0218e000-021ff000 rw-p 00000000 00:00 0 
2018-07-16T00:53:01.436+0000 F - [LockPinger] 037da000-043da000 rw-p 00000000 00:00 0 [heap]
2018-07-16T00:53:01.436+0000 F - [LockPinger] 043da000-54b9c000 rw-p 00000000 00:00 0 [heap]
2018-07-16T00:53:01.436+0000 F - [LockPinger] 7fa645c1a000-7fa6c5b1a000 rw-p 00000000 ca:50 100663430 /data/test.5
<...>

It appears that this problem has been already fixed in v3.2.20 where LockPinger will complain that CSRS has not been initialized:

"Works fine on 3.2.20"

Show all

2018-07-16T01:44:02.021+0000 W SHARDING [LockPinger] distributed lock pinger 'myCluster-config-0.dryabtsev-test.4125.mongodbdns.com:27017,myCluster-config-1.dryabtsev-test.4125.mongodbdns.com:27017,myCluster-config-2.dryabtsev-test.4125.mongodbdns.com:27017/dmn-apple-test-4:27017:1531705231:-1629418921' detected an exception while pinging. :: caused by :: CSRS replica set is not initialized


Generated at Thu Feb 08 04:42:09 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.