Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-31521

Shard primaries that step down can't force the new primary to refresh its routing table

    • Type: Icon: Bug Bug
    • Resolution: Fixed
    • Priority: Icon: Major - P3 Major - P3
    • 3.6.0-rc1
    • Affects Version/s: None
    • Component/s: Sharding
    • Labels:
    • Fully Compatible
    • ALL
    • Sharding 2017-10-23
    • 0

      If a shard primary steps down, it will trigger the ShardServerCatalogCacheLoader::onStepDown function which sets the PrimarySteppedDown error code interrupt on the OperationContextGroup used by the ShardServerCatalogCacheLoader. This means that all operation contexts added to this group will be automatically interrupted until resetInterrupt() is called in the ShardServerCatalogCacheLoader::onStepUp hook, when/if the node transitions back to primary.

      The problem is that when a secondary that used to be primary receives a command that forces it to refresh its routing table, it will add an operation context to its OperationContextGroup and call _runSecondaryGetChunksSince, which will attempt to send a "forceRoutingTableRefresh" command to the primary, but because the operation context was automatically interrupted upon being added to the group, this will fail with PrimarySteppedDown, and the primary will never actually receive the refresh command.

      I think SERVER-30148 will fix this, because the ShardingState's refresh logic shouldn't use an operation context from the ShardServerCatalogCacheLoader's interrupted OperationContextGroup.

            dianna.hohensee@mongodb.com Dianna Hohensee (Inactive)
            jack.mulrow@mongodb.com Jack Mulrow
            0 Vote for this issue
            5 Start watching this issue