Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-77309

An interleaving might cause a migration to continue when it shouldn't

    • Type: Icon: Bug Bug
    • Resolution: Fixed
    • Priority: Icon: Major - P3 Major - P3
    • 7.1.0-rc0, 6.0.7, 7.0.0-rc3
    • Affects Version/s: 6.0.0, 6.0.1, 6.0.2, 6.0.3, 6.0.4, 6.3.0, 7.0 Required, 6.0.5, 6.0.6, 6.3.1
    • Component/s: Sharding
    • Labels:
      None
    • Fully Compatible
    • ALL
    • v7.0, v6.0
    • Sharding EMEA 2023-05-29
    • 135

      Currently we have an exclusive CSR lock at the beginning of the migration that is used to atomically check the allowMigrations metadata flag and then set the ScopedRegisterer for the migration. Additionally in the refresh code, we have a shared CSR lock used to abort any ongoing migration registered in the migration's decoration. However, that lock goes out of scope, before taking it again in exclusive mode to install the new metadata, making the following interleaving possible:

      Suppose we have two threads thread1 and thread2. thread1 starts executing a migration command, and thread2 a refresh triggered as part of the setAllowMigrations code (which could be the result of a DDL that used the stopMigration helper).

      1. thread1 executes the migration's refresh, but does not see the setAllowMigration's commit
      2. A race for the CSR lock happens, on one side thread1 goes for the migration CSR lock and thread2 goes for the refresh CSR lock, but thread2 is the winner
      3. In the refresh we check the migration decoration, but we don't find any migration to abort
      4. A second race for the CSR lock happens, between thread1 that goes again for the migration CSR lock and thread2 that goes for the metadata installation CSR lock, thread1 wins the lock, but because of 1 the allowMigrations check passes, allowing the migration to continue

      The condition described by 4 could cause a migration acquiring the critical section while a DDL requires it (for example, a rename participant might try to acquire the critical section when the migration already held it).

      We could leave the initial migration check in the refresh as an optimistic verification, but we need to re-check for migrations while holding the exclusive lock and before installing the new metadata.

            Assignee:
            marcos.grillo@mongodb.com Marcos José Grillo Ramirez
            Reporter:
            marcos.grillo@mongodb.com Marcos José Grillo Ramirez
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: