Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-31521

Shard primaries that step down can't force the new primary to refresh its routing table

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.6.0-rc1
    • Component/s: Sharding
    • Labels:
      None
    • Backwards Compatibility:
      Fully Compatible
    • Operating System:
      ALL
    • Sprint:
      Sharding 2017-10-23
    • Linked BF Score:
      0

      Description

      If a shard primary steps down, it will trigger the ShardServerCatalogCacheLoader::onStepDown function which sets the PrimarySteppedDown error code interrupt on the OperationContextGroup used by the ShardServerCatalogCacheLoader. This means that all operation contexts added to this group will be automatically interrupted until resetInterrupt() is called in the ShardServerCatalogCacheLoader::onStepUp hook, when/if the node transitions back to primary.

      The problem is that when a secondary that used to be primary receives a command that forces it to refresh its routing table, it will add an operation context to its OperationContextGroup and call _runSecondaryGetChunksSince, which will attempt to send a "forceRoutingTableRefresh" command to the primary, but because the operation context was automatically interrupted upon being added to the group, this will fail with PrimarySteppedDown, and the primary will never actually receive the refresh command.

      I think SERVER-30148 will fix this, because the ShardingState's refresh logic shouldn't use an operation context from the ShardServerCatalogCacheLoader's interrupted OperationContextGroup.

        Attachments

          Activity

            People

            Assignee:
            dianna.hohensee Dianna Hohensee
            Reporter:
            jack.mulrow Jack Mulrow
            Participants:
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Dates

              Created:
              Updated:
              Resolved: