[SERVER-22485] ShardNotFound error when looking up replica set with hosts in a different order than is stored in the ShardRegistry Created: 05/Feb/16 Updated: 06/Dec/22 Resolved: 15/Nov/21 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | 3.2.1 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Shakir Sadikali | Assignee: | [DO NOT USE] Backlog - Sharding EMEA |
| Resolution: | Done | Votes: | 3 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||||||||||||||||||||||||||||||||||
| Issue Links: |
|
||||||||||||||||||||||||||||||||||||||||
| Assigned Teams: |
Sharding EMEA
|
||||||||||||||||||||||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||||||||||||||||||||||
| Sprint: | Sharding 10 (02/19/16), Sharding 11 (03/11/16), Sharding 12 (04/01/16), Sharding 16 (06/24/16), Sharding 18 (08/05/16) | ||||||||||||||||||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||||||||||||||||||
| Case: | (copied to CRM) | ||||||||||||||||||||||||||||||||||||||||
| Description |
|
We have a 4 shard cluster. We added 4 new shards. All operations that need to go against the entire cluster fail with errors of the following form.
Bouncing the mongos does not resolve the issue. |
| Comments |
| Comment by Kaloian Manassiev [ 15/Nov/21 ] |
|
With the throw-out of the legacy shard versioning path in 4.0 and later, this reverse lookup is no longer happening, so the order problem has gone away. |
| Comment by Andy Schwerin [ 14/Jul/16 ] |
|
I'm putting this into "debugging with submitter", while misha.tyulenev investigates the risk of a fix. |
| Comment by Spencer Brody (Inactive) [ 12/Jul/16 ] |
|
attached test that repros the issue on 3.2 |
| Comment by Spencer Brody (Inactive) [ 14/Apr/16 ] |
|
Haven't seen this happening on 3.2 since |
| Comment by Randolph Tan [ 10/Feb/16 ] |
|
Note: The recent changes in master ( |
| Comment by Randolph Tan [ 09/Feb/16 ] |
|
Note: it looks like this only affects code that uses ParallelSortClusteredCursor (most commands) and not the new AsyncResultsMerger (new find command). |
| Comment by Randolph Tan [ 09/Feb/16 ] |
|
The issue is that calls to the _shardingRequestMetadataWriter/_shardingReplyMetadataReader is passing the full connection string here: https://github.com/mongodb/mongo/blob/r3.3.1/src/mongo/client/dbclientcursor.cpp#L81 This is problematic if the connection string is a replica set since the internal map does not contain all possible orderings of the replica set node in the connection string format. This means that if the string was stored in the map as "set/host1,host2" a lookup with "set/host2,host1" will not find the desired entry. |