[SERVER-69181] MovePrimary followed by dropDatabase and recreate on the original shard can lose data Created: 26/Aug/22  Updated: 10/Nov/23  Resolved: 06/Feb/23

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 4.2.0, 4.4.0, 5.0.0, 6.0.0
Fix Version/s: 6.3.0-rc0

Type: Bug Priority: Major - P3
Reporter: Jordi Serra Torrens Assignee: Antonio Fuschetto
Resolution: Fixed Votes: 0
Labels: PM-2144-Milestone-0
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: Text File 0001-Repro-SERVER-69181.patch    
Issue Links:
Depends
depends on SERVER-71201 Prevent operations on the recipient w... Closed
Assigned Teams:
Sharding EMEA
Backwards Compatibility: Fully Compatible
Operating System: ALL
Steps To Reproduce:

0001-Repro-SERVER-69181.patch

Sprint: Sharding EMEA 2023-02-20
Participants:

 Description   

Consider the following interleaving:

  1. 'Shard1' is the db-primary shard for database 'dbA'. There's an unsharded collection 'dbA.coll'.
  2. User runs movePrimary(dbA, to: 'Shard2') -> On 'shard1', the MovePrimary coordinator commits the operation and releases the critical section. But hangs before cleaning the "stale db data".
  3. User runs dropDatabase -> On 'shard2', which is now the db-primary shard.
  4. User now recreates 'dbA' on 'shard1' and does some writes on 'dbA.coll'.
  5. Now the MovePrimary coordinator of point (2) proceeds with cleaning "stale data". It will drop 'dbA.coll', so the writes that happened on point (4) are lost!


 Comments   
Comment by Antonio Fuschetto [ 06/Feb/23 ]

This bug has been fixed by SERVER-71201. Now the recipient shard of a movePrimary operation enters the critical section of the database, preventing any new operations (such as a dropDatabase) from refreshing the database version. The critical section is exited when movePrimary completes, allowing any pending operations to continue. The described data loss cannot occur since the dropDatabase operation would actually be processed after the completion of the movePrimary.

Generated at Thu Feb 08 06:12:47 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.