[SERVER-48537] addShard is not idempotent for retries Created: 02/Jun/20  Updated: 26/Oct/23

Status: Backlog
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Pierlauro Sciarelli Assignee: Backlog - Catalog and Routing
Resolution: Unresolved Votes: 0
Labels: oldshardingemea, sharding-csrs-stepdown-only
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Related
is related to SERVER-48538 Blacklist conversion_of_replica_set_t... Closed
Assigned Teams:
Catalog and Routing
Operating System: ALL
Participants:
Linked BF Score: 14

 Description   

From a high level point of view, the addShard method triggered by a _configsvrAddShard command is executing the following steps:

1) Check if the shard exists in config.shards (if yes, return).
2) Write a new document representing the shard into config.shards.
3) For each database on the shard, write a new document representing it into config.databases.

If there is an interruption between steps 2 and 3, any addShard retry will not execute step 3 resulting in config.databases potentially presenting an inconsistent/incomplete state.



 Comments   
Comment by Pierlauro Sciarelli [ 24/Jul/20 ]

I repurposed the description to give some more context as the original text was very sketchy.

Here are some possible ways for solving the problem:

  • With a transaction: in case of any failure, all the modifications on config.[shards|databases] should be rolled back.
  • With a sentinel value: a sentinel document/field could be written somewhere when all the entries in config.databases have been persisted. Any addShard call should check for its existence and - if not - rewrite all the db entries for the shard.
  • Relying on idempotent writes: writes in config.databases are performed through update operations (upsert=true). Updating several time the same entry would then not cause any problem. The step (3) could always be executed, irregardless from the fact that a shard entry already exists in config.shards.
Generated at Thu Feb 08 05:17:24 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.