[SERVER-22107] Improve error message when ReplicaSetMonitor cannot connect to a replSet node in mongos Created: 08/Jan/16 Updated: 06/Dec/22 Resolved: 12/Dec/16 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | 2.6.11 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor - P4 |
| Reporter: | Emily Stolfo | Assignee: | [DO NOT USE] Backlog - Sharding Team |
| Resolution: | Done | Votes: | 0 |
| Labels: | PM550 | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Assigned Teams: |
Sharding
|
||||||||
| Operating System: | ALL | ||||||||
| Participants: | |||||||||
| Description |
|
When mongos cannot connect to any of the members of a shard replica set for extended periods of time (> 5 minutes), it will remove the ReplicaSetMonitor for that particular set in memory. The consequence is that it will start returning "unknown replica set" error instead of the usual cannot connect to host X. Original summary:
|
| Comments |
| Comment by Andy Schwerin [ 12/Dec/16 ] |
|
As of 3.2.11 and 3.4.0, mongos no longer forgets about replica sets it's had trouble contacting, so the message should not longer appear. |
| Comment by Bernie Hackett [ 13/Jan/16 ] |
|
I see. I misunderstood and thought you were saying it returned this error if it couldn't communicate with any individual seed. The problem is when it can't talk to the shard at all. In that case, I think the error message needs some work. The current message makes it sound like the replica set's setName changed or something. |
| Comment by Randolph Tan [ 13/Jan/16 ] |
|
I believe that is what is happening - mongos can't complete the request because it can't connect to the shard. |
| Comment by Bernie Hackett [ 13/Jan/16 ] |
|
But why is that reported back to the client? This seems like something that should be logged by mongos. It should only cause a client side error if mongos can't complete the request. |
| Comment by Randolph Tan [ 13/Jan/16 ] |
|
Mongos can return this error message if it cannot connect to any node in the seed list (for mongos, the seed list is extracted from config.shards, which is updated whenever mongos detect membership changes). I propose that we change the message to something that gives more context. Note that mongos also 'forgets' cache replica sets if it cannot contact any of it's members for 5 minutes. It can be repopulated again when it needs to talk to the replica set and at least one member in the seed list can be reached. |