[SERVER-22794] Add retry to continuous config primary step-down thread when primary steps down and closes all connections Created: 22/Feb/16 Updated: 21/Nov/16 Resolved: 04/Mar/16 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | None |
| Fix Version/s: | 3.2.5, 3.3.3 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Dianna Hohensee (Inactive) | Assignee: | Dianna Hohensee (Inactive) |
| Resolution: | Done | Votes: | 0 |
| Labels: | test-only | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||
| Backwards Compatibility: | Fully Compatible | ||||
| Operating System: | ALL | ||||
| Backport Completed: | |||||
| Sprint: | Sharding 11 (03/11/16) | ||||
| Participants: | |||||
| Linked BF Score: | 0 | ||||
| Description |
|
The continuous config primary step down thread first tries to contact a config server. The C++ code that is eventually called opens a connection to the specified config server, and follows up with an isMaster command. However, the config server, if a primary, may step down between creating the connection and calling isMaster. When the primary steps down it closes all of its connections, so the followup isMaster command throws an error. In this case, the error should be caught and the command retried, rather than quitting with an error. |
| Comments |
| Comment by Githook User [ 22/Mar/16 ] |
|
Author: {u'username': u'DiannaHohensee', u'name': u'Dianna Hohensee', u'email': u'dianna.hohensee@10gen.com'}Message: (cherry picked from commit 1e7fd17ee33d8c7c8e6c49e590c722ff71c7079e) Conflicts: |
| Comment by Githook User [ 04/Mar/16 ] |
|
Author: {u'username': u'DiannaHohensee', u'name': u'Dianna Hohensee', u'email': u'dianna.hohensee@10gen.com'}Message: |
| Comment by Dianna Hohensee (Inactive) [ 22/Feb/16 ] |
| Comment by Dianna Hohensee (Inactive) [ 22/Feb/16 ] |
|
I'm not sure this is generic enough of a case to put lower down? For this it doesn't affect anything important if step down occurs, and retrying will definitely work – a config server can't step down again immediately after just doing so --, but are most cases such that a mongod closing its connections indicates something so benign and reconnecting works identically? |
| Comment by Scott Hernandez (Inactive) [ 22/Feb/16 ] |
|
This seems like it should be handled at the network layer, not this high up. |