Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-77748

movePrimary coordinator does not clear database metadata in case of stepdown

    XMLWordPrintableJSON

Details

    • Icon: Bug Bug
    • Resolution: Fixed
    • Icon: Major - P3 Major - P3
    • 7.1.0-rc0, 7.0.0-rc4
    • 7.0.0-rc2
    • None
    • None
    • Sharding EMEA
    • Fully Compatible
    • ALL
    • v7.0
    • Hide

      Stepdown on the coordinator shard during movePrimary coordinator after completion of kCommit phase and beginning of kExitCriticalSection.

      Show
      Stepdown on the coordinator shard during movePrimary coordinator after completion of kCommit phase and beginning of kExitCriticalSection.
    • Sharding EMEA 2023-06-12, Sharding EMEA 2023-06-26
    • 113

    Description

      If a primary failover happens during movePrimary operation, we could miss to clear database metadata on the original primary node of the coordiantor shard, leading to possible data loss.

      As part of movePrimary coordinator, database metadata on primary node is explicitly cleared in kCommit phase, while on secondary nodes metadata is cleared indirectly when we exit the database recoverable critical section in kExitCriticalSection phase.

      If a step-down happens between these two phases and a new primary node is elected on the coordinator shard we could miss clearing metadata on the new primary.

      Consider the following scenario:

      • kCommit
        • N1 (primary)    ->   db metadata cleared
        • N2 (secondary) -> db metadata not cleared
      • kExitCriticalSection
        • N1 (secondary) ->  db metadata cleared
        • N2 (primary)     ->   db metadata not cleared

      Attachments

        Activity

          People

            enrico.golfieri@mongodb.com Enrico Golfieri
            tommaso.tocci@mongodb.com Tommaso Tocci
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: