Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-48679

flushRoutingTableCacheUpdates should block on critical section with kWrite, not kRead

    • Type: Icon: Bug Bug
    • Resolution: Fixed
    • Priority: Icon: Major - P3 Major - P3
    • 4.4.1, 4.7.0, 4.2.10, 4.0.22
    • Affects Version/s: 3.6.18, 4.5.1, 4.0.18, 4.2.7, 4.4.0-rc8
    • Component/s: Sharding
    • None
    • Fully Compatible
    • ALL
    • v4.4, v4.2, v4.0
    • Sharding 2020-06-15, Sharding 2020-06-29
    • 0

      The donor writes the enterCriticalSectionCounter flag
      -> which causes secondaries to clear their filtering metadata
      -> which causes the next versioned request on the secondary to throw StaleConfig and trigger the secondary to refresh
      -> which causes the secondary to send flushRoutingTableCacheUpdates to the primary
      -> which blocks behind the critical section only if reads are being blocked

      In 4.4 and earlier versions, if reads haven't started being blocked yet, the secondary will finish the refresh and serve reads for stale mongoses even if the migration commits. 

      For example:

      • Donor writes enterCriticalSectionSignal at T90
      • Secondary sees the flag, invalidates its filtering metadata
      • Secondary gets versioned read, sendsflushRoutingTableCacheUpdates, gets back success
      • Donor starts blocking writes
      • Donor commits the migration, which succeeds at T100
      • Client does a write from mongos1, which contacts donor and gets back StaleConfig, then retries write on recipient, which succeeds at T101
      • Client does afterClusterTime: T101 read from mongos2, which is stale and contacts the donor secondary. >>> That secondary will wait until T101, then serve the read <<<

      In 4.5, that happens to not be an issue since the refresh is done by calling onShardVersionMismatch which waits for the critical section as long as writes are already being blocked

      Despite that, we want to change flushRoutingTableCacheUpdates in all versions to block behind the critical section with kWrite, not kRead, as it does today.

            luis.osta@mongodb.com Luis Osta (Inactive)
            esha.maharishi@mongodb.com Esha Maharishi (Inactive)
            0 Vote for this issue
            5 Start watching this issue