[SERVER-17043] Replication code does not retry interrupted system call Created: 25/Jan/15 Updated: 18/Sep/15 Resolved: 17/Feb/15 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Networking, Replication |
| Affects Version/s: | 3.0.0-rc6 |
| Fix Version/s: | 3.0.0-rc9, 3.1.0 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Bruce Lucas (Inactive) | Assignee: | Andrew Morrow (Inactive) |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
| Backwards Compatibility: | Fully Compatible |
| Operating System: | ALL |
| Backport Completed: | |
| Participants: |
| Description |
|
While debugging an issue involving a replica set using gdb got the following error:
|
| Comments |
| Comment by Githook User [ 19/Feb/15 ] |
|
Author: {u'username': u'acmorrow', u'name': u'Andrew Morrow', u'email': u'acm@mongodb.com'}Message: (cherry picked from commit 490b0b2b14fa14b463ef612b79386d20d95b4057) |
| Comment by Githook User [ 17/Feb/15 ] |
|
Author: {u'username': u'acmorrow', u'name': u'Andrew Morrow', u'email': u'acm@mongodb.com'}Message: |
| Comment by Bruce Lucas (Inactive) [ 10/Feb/15 ] |
|
The problem is in ConnectBG::run in sock.cpp - the call to ::connect() isn't retried in case of EINTR. |
| Comment by Bruce Lucas (Inactive) [ 10/Feb/15 ] |
|
Log attached. There's nothing in the log prior to the complaint about the interrupted system call that causes the primary to step down. A couple other system calls report being interrupted and retried. |
| Comment by Scott Hernandez (Inactive) [ 25/Jan/15 ] |
|
Please include the logs from before here. If this happened after the timeout period then it will not be retried, as it is designed and intended. |