[SERVER-35252] All config server metadata commands that read from ShardRegistry might read stale data Created: 25/May/18  Updated: 26/Oct/23

Status: Backlog
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Matthew Saltz (Inactive) Assignee: Backlog - Catalog and Routing
Resolution: Unresolved Votes: 0
Labels: oldshardingemea
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: Text File removeshardrepro.patch    
Issue Links:
Depends
Related
related to SERVER-48201 ShardRegistry::reload() competes with... Closed
is related to SERVER-33797 sharding metadata command should make... Closed
Assigned Teams:
Catalog and Routing
Operating System: ALL
Sprint: Sharding 2018-07-02, Sharding 2018-07-16, Sharding 2018-07-30, Sharding 2018-08-13, Sharding 2018-09-24, Sharding 2018-11-05, Sharding 2018-11-19, Sharding 2018-12-03, Sharding 2018-12-17, Sharding 2018-12-31, Sharding 2019-01-14
Participants:
Linked BF Score: 20

 Description   

If a thread doing removeShard on the config server calls shardRegistry->getShard() in order to check for the existence of a shard, it may not see that the shard has already been removed by another thread. If instead of getShard() we perform a local read + wait for majority write concern, if a concurrent remove is happening but not yet replicated, we will correctly error out. Repro is attached for test configsvr_metadata_commands_require_majority_write_concern.js.



 Comments   
Comment by Kaloian Manassiev [ 21/Oct/21 ]

Leaving this ticket on the add/removeShard epic. Commands running on the config server should do direct local reads/writes instead of going through the caches. Same as __configsvrCommitChunkMove for example.

Generated at Thu Feb 08 04:39:17 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.