Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-30714

Handle step down error in ReplicationCoordinatorExternalStateImpl::_shardingOnTransitionToPrimaryHook

    XMLWordPrintable

    Details

    • Backwards Compatibility:
      Fully Compatible
    • Operating System:
      ALL
    • Backport Requested:
      v4.0, v3.6
    • Sprint:
      Sharding 2018-10-08
    • Case:
    • Linked BF Score:
      25

      Description

      The _shardingOnTransitionToPrimaryHook callback is invoked when a node becomes a primary. If that node is part of a sharded cluster, it will execute the "ShardingStateRecovery" step, which reads from disk the optime of the last write that the node performed against the config server (where such a write is the chunk migration commit).

      The _shardingOnTransitionToPrimaryHook step is executed after the replMutex has been unlocked and because of this, it is possible that the node can actually lose the majority quorum and never become primary. Since the "ShardingStateRecovery" step performs majority reads it will fail in this case, which in turn will crash replication step-up with assert 40107.

      Since this is an expected situation, the sharding code should handle it appropriately.

        Attachments

          Activity

            People

            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: