[SERVER-14910] mongos crash after temporary connect error to a config server Created: 15/Aug/14 Updated: 15/Jan/15 Resolved: 15/Jan/15 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | 2.4.10 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Taha Jahangir | Assignee: | Unassigned |
| Resolution: | Cannot Reproduce | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
| Operating System: | ALL |
| Participants: |
| Description |
|
After a temporary connct-error to a config server, mongos crashed with signal 11:
Balancing is off (and not running) on mongos severs. |
| Comments |
| Comment by Ramon Fernandez Marina [ 15/Jan/15 ] | ||||||||||||||||||||||||||||||||||
|
Thanks for uploading the logs, and pologies for the late reply taha_jahangir. Unfortunately we have been unable to reproduce the issue and we haven't seen any more cases of it, so I'm resolving this ticket. Feel free to reopen it if this happens again. Regards, | ||||||||||||||||||||||||||||||||||
| Comment by Taha Jahangir [ 27/Sep/14 ] | ||||||||||||||||||||||||||||||||||
|
The crash happened only once. Log file of mongos server is attached. | ||||||||||||||||||||||||||||||||||
| Comment by Ramon Fernandez Marina [ 25/Sep/14 ] | ||||||||||||||||||||||||||||||||||
|
taha_jahangir, is this still an issue for you? We'd like to examine the full logs from the time you start a mongos until it crashes with the errors above. Hopefully this will give us more information to help us understand what's going on. If any of your mongos crashes again, can you please upload the full logs to this ticket? Thanks, | ||||||||||||||||||||||||||||||||||
| Comment by Taha Jahangir [ 04/Sep/14 ] | ||||||||||||||||||||||||||||||||||
|
This error occurred in a production server (with relatively high load), and we are not tried to reproduce it (on a production server!). There is part of full log file: (the first 5 lines are not related to this problem). There is no line like `reason: errno`.
The mongodb setup contains 4 shards (each with 2 data node and 1 arbiter), 3 config servers and 2 mongos. I have wrote details about config servers in previous note. I think the only information to debug is the stack trace provided in logs. | ||||||||||||||||||||||||||||||||||
| Comment by Ramon Fernandez Marina [ 02/Sep/14 ] | ||||||||||||||||||||||||||||||||||
|
I tried to reproduce this behavior by killing one of my config servers in a test setup, but I was not able to:
I agree that mongos should not go belly up in this circumstances, but in order to track down the problem we'll need more information. Are you able to reliably reproduce this behavior? Can you send us more detailed logs? Perhaps the line with reason: errno like the one above can provide a useful hint. Are you able to provide more details about your setup? Thanks, | ||||||||||||||||||||||||||||||||||
| Comment by Taha Jahangir [ 27/Aug/14 ] | ||||||||||||||||||||||||||||||||||
|
There is 3 config servers, two in the same DC, and one in another DC (this is the `servername`) The config server at servername:27001 is always up, but network errors are not unusual (that is on another DC). I think the problem is not whether a config server is or is not listening on that socket. `mongos` should not die when a config server is not accessible. | ||||||||||||||||||||||||||||||||||
| Comment by Ramon Fernandez Marina [ 25/Aug/14 ] | ||||||||||||||||||||||||||||||||||
|
How many config servers do you have? Also, can you check whether there's a mongod process listening on servername:27001? | ||||||||||||||||||||||||||||||||||
| Comment by Taha Jahangir [ 15/Aug/14 ] | ||||||||||||||||||||||||||||||||||
|
mongos version: 2.4.10, on (updated) Ubuntu 14.04 |