[SERVER-30483] Write FSM concurrency workload for removeShard and movePrimary Created: 02/Aug/17  Updated: 06/Dec/22  Resolved: 01/Dec/21

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: None

Type: New Feature Priority: Major - P3
Reporter: Hugh Han Assignee: [DO NOT USE] Backlog - Sharding Team
Resolution: Won't Do Votes: 0
Labels: PM-1017, sharding
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Assigned Teams:
Sharding
Sprint: Sharding 2017-08-21
Participants:

 Description   

Consider the case where we have two nodes, A and B, where A is the primary. If we call movePrimary to move A to B and at the same time call removeShard on B, then the following might happen.

  1. movePrimary copies unsharded collections from A to B.
  2. removeShard checks if B is the primary, which it is not.
  3. movePrimary updates the config server, which now thinks B is the primary.
  4. removeShard removes B.
  5. movePrimary deletes original unsharded collections from A.
    If the above were to happen, then all unsharded collections are deleted, and there would exist no primary, which is very bad.

An FSM concurrency workload should be written to test this type of behavior, and what would happen.



 Comments   
Comment by Hugh Han [ 25/Aug/17 ]

docs: https://paper.dropbox.com/doc/Metadata-Command-FSM-Workloads-1pbVk0Ao7R0PPCmoCTbyE

Comment by Asya Kamsky [ 03/Aug/17 ]

Wouldn't SERVER-30404 prevent that?

Generated at Thu Feb 08 04:23:59 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.