[SERVER-25029] Segmentation fault in mongos when config servers not available Created: 13/Jul/16 Updated: 18/Dec/17 Resolved: 22/Jul/16 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | 3.2.8 |
| Fix Version/s: | 3.2.9 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Aristarkh Zagorodnikov | Assignee: | Misha Tyulenev |
| Resolution: | Done | Votes: | 0 |
| Labels: | code-and-test | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
Ubuntu 14.04 x64 on Hyper-V 2012R2 w/Docker 1.12 |
||
| Attachments: |
|
||||||||||||||||||||
| Issue Links: |
|
||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||
| Steps To Reproduce: | Create an CSRS and start mongos: |
||||||||||||||||||||
| Sprint: | Sharding 18 (08/05/16) | ||||||||||||||||||||
| Participants: | |||||||||||||||||||||
| Description |
|
Mongos crashes on startup occasionally (3 out of 10 times so far). |
| Comments |
| Comment by Githook User [ 22/Jul/16 ] | |||||||||||||||||||||||||||||
|
Author: {u'username': u'mikety', u'name': u'Misha Tyulenev', u'email': u'misha@mongodb.com'}Message: | |||||||||||||||||||||||||||||
| Comment by Misha Tyulenev [ 21/Jul/16 ] | |||||||||||||||||||||||||||||
|
Pushed the testcase to master, but the fix is applicable only to v3.2 | |||||||||||||||||||||||||||||
| Comment by Githook User [ 21/Jul/16 ] | |||||||||||||||||||||||||||||
|
Author: {u'username': u'mikety', u'name': u'Misha Tyulenev', u'email': u'misha@mongodb.com'}Message: | |||||||||||||||||||||||||||||
| Comment by Misha Tyulenev [ 20/Jul/16 ] | |||||||||||||||||||||||||||||
|
The code that was reading isMaster response for CSRS is not in the master branch: it was gone with thew removal of SCCC. Hence the issue exists only in 3.2. | |||||||||||||||||||||||||||||
| Comment by Andy Schwerin [ 20/Jul/16 ] | |||||||||||||||||||||||||||||
|
Also, misha, your diagnosis appears to be correct. What I don't know right now is why the crash isn't happening on master. | |||||||||||||||||||||||||||||
| Comment by Andy Schwerin [ 20/Jul/16 ] | |||||||||||||||||||||||||||||
|
misha.tyulenev, perhaps you were trying to reproduce this on the master branch, where it does not appear. I can definitely reproduce this on the 3.2 branch using onyxmaster's described reproduction. Actually, when you launch the mongos, you can just pass --configdb config/localhost:10001. | |||||||||||||||||||||||||||||
| Comment by Aristarkh Zagorodnikov [ 20/Jul/16 ] | |||||||||||||||||||||||||||||
|
I wonder what's the problem with reproduction, it's 100% reproducible, in virtual, physical or containerized environments, at least on Ubuntu 14.04 and 16.04, you don't even need any real servers with data (in fact even one server is enough). Preparation:
Repro:
| |||||||||||||||||||||||||||||
| Comment by Misha Tyulenev [ 19/Jul/16 ] | |||||||||||||||||||||||||||||
|
I could not reproduce but by analyzing the code I think found the issue. When mongos tries to connect to config servers it expects that the response to the isMaster will have "setName" field that is only specific for initiated replica sets and if CSRS is still initiating it will cause a segfault. | |||||||||||||||||||||||||||||
| Comment by Ramon Fernandez Marina [ 13/Jul/16 ] | |||||||||||||||||||||||||||||
|
I'm using mtools:
mongos indeed tries to connect before the replica set is ready. | |||||||||||||||||||||||||||||
| Comment by Andy Schwerin [ 13/Jul/16 ] | |||||||||||||||||||||||||||||
|
ramon.fernandez, could you attach instructions for your reproduction, or better yet, a script? | |||||||||||||||||||||||||||||
| Comment by Kaloian Manassiev [ 13/Jul/16 ] | |||||||||||||||||||||||||||||
|
It is very likely that this might have been introduced by this commit: https://github.com/mongodb/mongo/commit/ab8867291f5da498ab96fbc850db51d573bd0c2b On this line we try to construct a ConnectionString with the response of isMaster. Reading the comment for valueStringData, it says that it does not check the type:
So the hypothesis is that we somehow got an empty setName due to the CSRS nodes still starting up and tried to cast that empty BSON to string. | |||||||||||||||||||||||||||||
| Comment by Ramon Fernandez Marina [ 13/Jul/16 ] | |||||||||||||||||||||||||||||
|
Here's the parsed stack trace:
| |||||||||||||||||||||||||||||
| Comment by Ramon Fernandez Marina [ 13/Jul/16 ] | |||||||||||||||||||||||||||||
|
Thanks for your report onyxmaster, we're investigating. EDIT: I can reproduce this easily; it is indeed the case that it happens when the config servers are not yet available when mongos starts. | |||||||||||||||||||||||||||||
| Comment by Aristarkh Zagorodnikov [ 13/Jul/16 ] | |||||||||||||||||||||||||||||
|
It appears to fail when the CSRS is not available yet at startup time. |