[SERVER-68541] Concurrent removeShard and movePrimary may delete unsharded collections Created: 03/Aug/22  Updated: 29/Oct/23  Resolved: 31/Aug/22

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 6.1.1, 6.0.3, 6.2.0-rc0

Type: Bug Priority: Major - P3
Reporter: Silvia Surroca Assignee: Antonio Fuschetto
Resolution: Fixed Votes: 0
Labels: data-loss
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: Text File repro-undesired-unsharded-collections-remove.patch    
Issue Links:
Backports
Related
is related to SERVER-69890 Concurrent movePrimary and removeShar... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v6.1, v6.0, v5.0, v4.4, v4.2
Steps To Reproduce:

repro-undesired-unsharded-collections-remove.patch
Apply the provided patch on top of commit r6.1.0-alpha-1938-gfe099ee11c9 and run jstests/sharding/remove_shard_and_move_primary.js in the sharding suite.

Sprint: Sharding EMEA 2022-08-22, Sharding EMEA 2022-09-05
Participants:

 Description   

Concurrent removeShard and movePrimary may end up with an undesired delete of unsharded collections.

Bug description
Imagine the following scenario:

  • There are 2 shards: 'shard0', 'shard1'
  • Database 'myDB' primary shard is 'shard0'
  • Collection 'myDB.collA' is unsharded, so it's located in 'shard0'

At some point, someone decides to call concurrently these commands:

  • { removeShard:'shard1' }
  • { movePrimary:'myDB', to: 'shard1'}

    .

Then, if the sequence of the internal executions are the written below, the cluster will end up with an undesired deletion of all the unsharded collections of 'myDB'.

1. removeShard command is called to the config server
2. The config server, following the removeShard thread, checks if the unsharded databases count on the shard is zero. As it's true, the process continues.
3. After that point, the movePrimary is performed, which means that all the unsharded collections are moved to 'shard1'.
4. The removeShard commit phase starts and 'shard1' is removed from the topology of the cluster.

Small note to understand better the 2nd bullet: the removeShard command returns a non completed status if the shard still have unsharded databases and notifies the user that those should be moved explicitly using movePrimary. A better explanation can be found here.



 Comments   
Comment by Githook User [ 08/Nov/22 ]

Author:

{'name': 'Antonio Fuschetto', 'email': 'antonio.fuschetto@mongodb.com', 'username': 'afuschetto'}

Message: SERVER-68541 Serialize the removeShard and commitMovePrimary commands to prevent the loss of moved collections
Branch: v6.0
https://github.com/mongodb/mongo/commit/50e767b1dbbd5104959989e056d3bd04b6119748

Comment by Githook User [ 08/Nov/22 ]

Author:

{'name': 'Antonio Fuschetto', 'email': 'antonio.fuschetto@mongodb.com', 'username': 'afuschetto'}

Message: SERVER-68541 Serialize the removeShard and commitMovePrimary commands to prevent the loss of moved collections
Branch: v6.1
https://github.com/mongodb/mongo/commit/eb90b652aa02c9f6b1d805abfd6956ba7b312f60

Comment by Githook User [ 29/Aug/22 ]

Author:

{'name': 'Antonio Fuschetto', 'email': 'antonio.fuschetto@mongodb.com', 'username': 'afuschetto'}

Message: SERVER-68541 Concurrent removeShard and movePrimary may delete unsharded collections
Branch: master
https://github.com/mongodb/mongo/commit/cf84bec54627ba1efb6aebd829f6b3d11aaf112e

Comment by Githook User [ 29/Aug/22 ]

Author:

{'name': 'Antonio Fuschetto', 'email': 'antonio.fuschetto@mongodb.com', 'username': 'afuschetto'}

Message: SERVER-68541 Concurrent removeShard and movePrimary may delete unsharded collections
Branch: master
https://github.com/mongodb/mongo/commit/107d9c38caae897b7e99af3db4ec429039936c87

Comment by Antonio Fuschetto [ 09/Aug/22 ]

Proposed solution

Following the logic currently implemented to commit the chunk migration, it seems natural to adopt the same approach consisting in 1) to expose a new config server command (i.e., _configsvrCommitMovePrimary) to atomically commit the configuration changes required by the movePrimary command, and 2) to synchronize these configuration changes with the removeShard command (reusing an existing mutex).

This solution serializes the configuration changes of concurrent invocations of the removeShard and movePrimary commands and then resolves the bug in question.

Backward compatibility

Depending on the versions to which the fix needs to be back-ported (potentially all), the donor shard could fall back into the current logic (consisting in finding the current primary shard for the given database and then committing changes) if the new config server command is not exposed (e.g. in a multiversion deployment).

Also, up to 5.0 version, the config server already exposes the _configsvrCommitMovePrimary command and it would be interesting to understand why the logic was changed. Was the goal to decentralize the configuration server logic? However, the idea (to be validated) is to use the same command to have a compatible solution with 5.0 version and lower.

Comment by Kaloian Manassiev [ 04/Aug/22 ]

Hmm, will this actually get fixed just by the movePrimary coordinator implementation by itself? There is nothing that prevents even at commit time of the shard removal that the commit of the new placement will not happen after the shard has been removed.

I don't think we need to wait until the Add/Remove Shard project, but the move primary commit needs to become a command on the CSRS which serialises with the shard removal lock. CC antonio.fuschetto@mongodb.com to keep in mind.

Comment by Cris Insignares Cuello [ 04/Aug/22 ]

kaloian.manassiev@mongodb.com antonio.fuschetto@mongodb.com Considering as part of Sharding First we are going to rewrite the MovePrimary coordinator, we should also fix this one.

Generated at Thu Feb 08 06:11:05 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.