[SERVER-20689] Shutting down config replica set can cause mongos to call terminate() when exception of type std::bad_function_call is active Created: 29/Sep/15  Updated: 07/Oct/15  Resolved: 05/Oct/15

Status: Closed
Project: Core Server
Component/s: Networking, Sharding
Affects Version/s: None
Fix Version/s: 3.1.9

Type: Bug Priority: Major - P3
Reporter: J Rassi Assignee: Adam Midvidy
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
Backwards Compatibility: Fully Compatible
Operating System: ALL
Sprint: Platform A (10/09/15)
Participants:
Linked BF Score: 0

 Description   

As evidenced by recent failures in the sharding jscore passthrough suites, shutting down the config servers can cause mongos can call terminate() when an exception of type std::bad_function_call is active.

Example failure of sharded_collections_jscore_passthrough on Linux:

Example failure of sharding_jscore_passthrough on SSL RHEL 5.5:

The sequence of relevant events seems to be as follows:

  • resmoke.py sends a SIGTERM to all three config servers
  • one of the mongos client threads prints the warning "failed to close stream:Transport endpoint is not connected"
  • shortly after, the same client thread calls terminate()
  • resmoke.py attempts to send a SIGTERM to mongos, but discovers that it has crashed

Excerpt from first failure above:

[ShardedClusterFixture:job0:mongos] 2015-09-29T19:23:07.571+0000 F -        [thread2] terminate() called. An exception is active; attempting to gather more information
[ShardedClusterFixture:job0:mongos] 2015-09-29T19:23:07.571+0000 F -        [thread2] std::exception::what(): bad_function_call
[ShardedClusterFixture:job0:mongos] Actual exception type: std::bad_function_call
[ShardedClusterFixture:job0:mongos] 
[ShardedClusterFixture:job0:mongos]  0xba63e2 0xba5d12 0x2b9e0fc9ae46 0x2b9e0fc9ae73 0xd0a795 0x2b9e1037683d 0x2b9e10661fdd
[ShardedClusterFixture:job0:mongos] ----- BEGIN BACKTRACE -----
[ShardedClusterFixture:job0:mongos] {"backtrace":[{"b":"400000","o":"7A63E2"},{"b":"400000","o":"7A5D12"},{"b":"2B9E0FBDE000","o":"BCE46"},{"b":"2B9E0FBDE000","o":"BCE73"},{"b":"400000","o":"90A795"},{"b":"2B9E10370000","o":"683D"},{"b":"2B9E1058D000","o":"D4FDD"}],"processInfo":{ "mongodbVersion" : "3.1.9-pre-", "gitVersion" : "05e9080be96f05fcf4ed74b996308edd14f2cec1", "compiledModules" : [], "uname" : { "sysname" : "Linux", "release" : "2.6.18-194.el5xen", "version" : "#1 SMP Tue Mar 16 22:01:26 EDT 2010", "machine" : "x86_64" }, "somap" : [ { "elfType" : 2, "b" : "400000" }, { "b" : "2B9E0F7D1000", "path" : "/lib64/librt.so.1", "elfType" : 3 }, { "b" : "2B9E0F9DA000", "path" : "/lib64/libdl.so.2", "elfType" : 3 }, { "b" : "2B9E0FBDE000", "path" : "/usr/lib64/libstdc++.so.6", "elfType" : 3 }, { "b" : "2B9E0FEDF000", "path" : "/lib64/libm.so.6", "elfType" : 3 }, { "b" : "2B9E10162000", "path" : "/lib64/libgcc_s.so.1", "elfType" : 3 }, { "b" : "2B9E10370000", "path" : "/lib64/libpthread.so.0", "elfType" : 3 }, { "b" : "2B9E1058D000", "path" : "/lib64/libc.so.6", "elfType" : 3 }, { "b" : "2B9E0F5B3000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3 } ] }}
[ShardedClusterFixture:job0:mongos]  mongos(_ZN5mongo15printStackTraceERSo+0x32) [0xba63e2]
[ShardedClusterFixture:job0:mongos]  mongos(+0x7A5D12) [0xba5d12]
[ShardedClusterFixture:job0:mongos]  libstdc++.so.6(+0xBCE46) [0x2b9e0fc9ae46]
[ShardedClusterFixture:job0:mongos]  libstdc++.so.6(+0xBCE73) [0x2b9e0fc9ae73]
[ShardedClusterFixture:job0:mongos]  mongos(+0x90A795) [0xd0a795]
[ShardedClusterFixture:job0:mongos]  libpthread.so.0(+0x683D) [0x2b9e1037683d]
[ShardedClusterFixture:job0:mongos]  libc.so.6(clone+0x6D) [0x2b9e10661fdd]
[ShardedClusterFixture:job0:mongos] -----  END BACKTRACE  -----

Assigning to adam.midvidy for triage.



 Comments   
Comment by Githook User [ 05/Oct/15 ]

Author:

{u'username': u'amidvidy', u'name': u'Adam Midvidy', u'email': u'amidvidy@gmail.com'}

Message: SERVER-20689 pull refresh callback out of member variable so we don't use a moved-from closure
Branch: master
https://github.com/mongodb/mongo/commit/bd76599a88e8062c634eeb6918bb30f868bc6042

Comment by Adam Midvidy [ 03/Oct/15 ]

failure seems to have occurred again.

Comment by Adam Midvidy [ 02/Oct/15 ]

I think we have fixed this issue. I will leave the BF ticket open until the passthrough suites have ran for a while without failure as this was non-deterministic.

Comment by Githook User [ 02/Oct/15 ]

Author:

{u'username': u'amidvidy', u'name': u'Adam Midvidy', u'email': u'amidvidy@gmail.com'}

Message: SERVER-20689 onFinish should be set when ConnectionPool refreshes a connection
Branch: master
https://github.com/mongodb/mongo/commit/375f918358956a11b977b3fb7dc60079ce8f0218

Comment by J Rassi [ 02/Oct/15 ]

adam.midvidy, here's a failed sharding_jscore_passthrough_WT run with the additional diagnostics output: task, logs.

Comment by Githook User [ 02/Oct/15 ]

Author:

{u'username': u'amidvidy', u'name': u'Adam Midvidy', u'email': u'amidvidy@gmail.com'}

Message: SERVER-20689 improve diagnostics in the case that we leak an exception from an IO worker thread
Branch: master
https://github.com/mongodb/mongo/commit/466f05960f115b0bb5e565d88cedbed9d7b47328

Generated at Thu Feb 08 03:54:59 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.