Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-41934

Filtering metadata could be stale and serve queries if stepdown happens during migration

    • Type: Icon: Task Task
    • Resolution: Gone away
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: Sharding
    • Labels:
      None
    • Sharding

      During migration, we persist a critical section counter which is replicated to secondaries to make them clear their filtering metadata so that their next refresh will see the result of the migration. The idea is that when the secondary refreshes it will refresh from the primary by calling forceRoutingTableRefresh on the primary and waiting for it to replicate, which waits for the critical section before refreshing.

      However, when we persist that critical section counter, we don't use majority write concern, and we never wait for majority before we commit the migration on the config server.

      This means that if we
      1. Start a migration
      2. Write the critical section counter. Suppose it doesn't get replicated at all.
      3. Commit the migration on the config server.
      4. Failover
      5. A new primary is elected which does not know that a migration has occurred, and could continue serving requests for a router which is equally as stale as the secondary, leading to stale data being read.

      We should verify this with a jstest and then fix by persisting the critical section counter with majority write concern.

            Assignee:
            backlog-server-sharding [DO NOT USE] Backlog - Sharding Team
            Reporter:
            matthew.saltz@mongodb.com Matthew Saltz (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: