[SERVER-22794] Add retry to continuous config primary step-down thread when primary steps down and closes all connections Created: 22/Feb/16  Updated: 21/Nov/16  Resolved: 04/Mar/16

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: 3.2.5, 3.3.3

Type: Bug Priority: Major - P3
Reporter: Dianna Hohensee (Inactive) Assignee: Dianna Hohensee (Inactive)
Resolution: Done Votes: 0
Labels: test-only
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Completed:
Sprint: Sharding 11 (03/11/16)
Participants:
Linked BF Score: 0

 Description   

The continuous config primary step down thread first tries to contact a config server. The C++ code that is eventually called opens a connection to the specified config server, and follows up with an isMaster command. However, the config server, if a primary, may step down between creating the connection and calling isMaster. When the primary steps down it closes all of its connections, so the followup isMaster command throws an error.

In this case, the error should be caught and the command retried, rather than quitting with an error.



 Comments   
Comment by Githook User [ 22/Mar/16 ]

Author:

{u'username': u'DiannaHohensee', u'name': u'Dianna Hohensee', u'email': u'dianna.hohensee@10gen.com'}

Message: SERVER-22794 fixing network error in continuous step down thread due to unaccounted for primary step down closing connections

(cherry picked from commit 1e7fd17ee33d8c7c8e6c49e590c722ff71c7079e)

Conflicts:
jstests/libs/override_methods/sharding_continuous_config_stepdown.js
Branch: v3.2
https://github.com/mongodb/mongo/commit/53aebe079abfe52a4db1a2414c2b1be11834e5ea

Comment by Githook User [ 04/Mar/16 ]

Author:

{u'username': u'DiannaHohensee', u'name': u'Dianna Hohensee', u'email': u'dianna.hohensee@10gen.com'}

Message: SERVER-22794 fixing network error in continuous step down thread due to unaccounted for primary step down closing connections
Branch: master
https://github.com/mongodb/mongo/commit/1e7fd17ee33d8c7c8e6c49e590c722ff71c7079e

Comment by Dianna Hohensee (Inactive) [ 22/Feb/16 ]

kaloian.manassiev]

Comment by Dianna Hohensee (Inactive) [ 22/Feb/16 ]

I'm not sure this is generic enough of a case to put lower down? For this it doesn't affect anything important if step down occurs, and retrying will definitely work – a config server can't step down again immediately after just doing so --, but are most cases such that a mongod closing its connections indicates something so benign and reconnecting works identically?

Comment by Scott Hernandez (Inactive) [ 22/Feb/16 ]

This seems like it should be handled at the network layer, not this high up.

Generated at Thu Feb 08 04:01:27 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.