[SERVER-48201] ShardRegistry::reload() competes with updates to the ShardRegistry through the RSM Created: 13/May/20 Updated: 27/Oct/23 Resolved: 09/Oct/20 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Matthew Saltz (Inactive) | Assignee: | Lamont Nelson |
| Resolution: | Gone away | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||
| Sprint: | Sharding 2020-09-21, Sharding 2020-10-05, Sharding 2020-10-19 | ||||||||||||||||||||
| Participants: | |||||||||||||||||||||
| Linked BF Score: | 13 | ||||||||||||||||||||
| Description |
|
After addShard, we call ShardRegistry::reload() on the router so that the following request on the same client will be able to target that shard without receiving a ShardNotFound error. However, with the streamable replica set monitor, calls to onPossibleSet can then overwrite the host data on the ShardRegistry concurrently, leading to a ShardNotFound error on a subsequent request. It doesn't seem like there was ever a guarantee of shard add/remove operations being causally consistent with CRUD ops in any meaningful way, but this breaks tests that used to rely on the shard being available after addShard. |
| Comments |
| Comment by Lamont Nelson [ 09/Oct/20 ] |
|
This is fixed with |
| Comment by Kaloian Manassiev [ 18/Sep/20 ] |
|
I believe that this no longer should be the case (in master) after the changes in |