[SERVER-35252] All config server metadata commands that read from ShardRegistry might read stale data Created: 25/May/18 Updated: 26/Oct/23 |
|
| Status: | Backlog |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Matthew Saltz (Inactive) | Assignee: | Backlog - Catalog and Routing |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | oldshardingemea | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||||||||||
| Issue Links: |
|
||||||||||||||||
| Assigned Teams: |
Catalog and Routing
|
||||||||||||||||
| Operating System: | ALL | ||||||||||||||||
| Sprint: | Sharding 2018-07-02, Sharding 2018-07-16, Sharding 2018-07-30, Sharding 2018-08-13, Sharding 2018-09-24, Sharding 2018-11-05, Sharding 2018-11-19, Sharding 2018-12-03, Sharding 2018-12-17, Sharding 2018-12-31, Sharding 2019-01-14 | ||||||||||||||||
| Participants: | |||||||||||||||||
| Linked BF Score: | 20 | ||||||||||||||||
| Description |
|
If a thread doing removeShard on the config server calls shardRegistry->getShard() in order to check for the existence of a shard, it may not see that the shard has already been removed by another thread. If instead of getShard() we perform a local read + wait for majority write concern, if a concurrent remove is happening but not yet replicated, we will correctly error out. Repro is attached for test configsvr_metadata_commands_require_majority_write_concern.js. |
| Comments |
| Comment by Kaloian Manassiev [ 21/Oct/21 ] |
|
Leaving this ticket on the add/removeShard epic. Commands running on the config server should do direct local reads/writes instead of going through the caches. Same as __configsvrCommitChunkMove for example. |