MovePrimary followed by dropDatabase and recreate on the original shard can lose data

XMLWordPrintableJSON

    • Sharding EMEA
    • Fully Compatible
    • ALL
    • Show
      0001-Repro-SERVER-69181.patch
    • Sharding EMEA 2023-02-20
    • None
    • 3
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Consider the following interleaving:

      1. 'Shard1' is the db-primary shard for database 'dbA'. There's an unsharded collection 'dbA.coll'.
      2. User runs movePrimary(dbA, to: 'Shard2') -> On 'shard1', the MovePrimary coordinator commits the operation and releases the critical section. But hangs before cleaning the "stale db data".
      3. User runs dropDatabase -> On 'shard2', which is now the db-primary shard.
      4. User now recreates 'dbA' on 'shard1' and does some writes on 'dbA.coll'.
      5. Now the MovePrimary coordinator of point (2) proceeds with cleaning "stale data". It will drop 'dbA.coll', so the writes that happened on point (4) are lost!

              Assignee:
              Antonio Fuschetto
              Reporter:
              Jordi Serra Torrens
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

                Created:
                Updated:
                Resolved: