[SERVER-27863] Reschedule early alarms in NetworkInterfaceASIO to avoid mongos crash Created: 31/Jan/17 Updated: 17/Jul/17 Resolved: 27/Feb/17 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Shell |
| Affects Version/s: | 3.4.1 |
| Fix Version/s: | 3.4.3, 3.5.4 |
| Type: | Bug | Priority: | Critical - P2 |
| Reporter: | Edik Mkoyan | Assignee: | Jonathan Reams |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||
| Operating System: | ALL | ||||||||||||||||
| Backport Requested: |
v3.4
|
||||||||||||||||
| Steps To Reproduce: | It happens in non busy hours |
||||||||||||||||
| Participants: | |||||||||||||||||
| Description |
|
Mongos servers are crashing with the following log messages
|
| Comments |
| Comment by Githook User [ 06/Mar/17 ] | |||||||||||||||||||||||||||
|
Author: {u'username': u'jbreams', u'name': u'Jonathan Reams', u'email': u'jbreams@mongodb.com'}Message: (cherry picked from commit f725e5137561ba5a521d0f5eb6a60bdeebf34c24) | |||||||||||||||||||||||||||
| Comment by Jonathan Reams [ 24/Feb/17 ] | |||||||||||||||||||||||||||
|
edikmkoyan, there may be a deeper issue here, but we should definitely not crash the server if an alarm fires early, so I've just pushed a fix that logs when an alarm fires early and reschedules it. I'll get it back-ported to 3.4 and it should be in the next release. | |||||||||||||||||||||||||||
| Comment by Githook User [ 24/Feb/17 ] | |||||||||||||||||||||||||||
|
Author: {u'username': u'jbreams', u'name': u'Jonathan Reams', u'email': u'jbreams@mongodb.com'}Message: | |||||||||||||||||||||||||||
| Comment by Edik Mkoyan [ 09/Feb/17 ] | |||||||||||||||||||||||||||
|
I have upgraded my cluster to 3.4.2 and it happened today, too.
| |||||||||||||||||||||||||||
| Comment by Andy Schwerin [ 31/Jan/17 ] | |||||||||||||||||||||||||||
|
I don't think it matters too much what the stack trace is, but I would look for evidence in the logs of the clock jumping around. I have three hypotheses:
I think we should
| |||||||||||||||||||||||||||
| Comment by Ramon Fernandez Marina [ 31/Jan/17 ] | |||||||||||||||||||||||||||
|
Thanks for your report edikmkoyan, and sorry you're having issues with mongos. We're investigating this issue and will post updates in this ticket. Regards, |