-
Type: Bug
-
Resolution: Fixed
-
Priority: Major - P3
-
Affects Version/s: 3.5.11
-
Component/s: Sharding
-
Fully Compatible
-
ALL
-
v4.0, v3.6
-
Sharding 2018-10-08
-
(copied to CRM)
-
25
The _shardingOnTransitionToPrimaryHook callback is invoked when a node becomes a primary. If that node is part of a sharded cluster, it will execute the "ShardingStateRecovery" step, which reads from disk the optime of the last write that the node performed against the config server (where such a write is the chunk migration commit).
The _shardingOnTransitionToPrimaryHook step is executed after the replMutex has been unlocked and because of this, it is possible that the node can actually lose the majority quorum and never become primary. Since the "ShardingStateRecovery" step performs majority reads it will fail in this case, which in turn will crash replication step-up with assert 40107.
Since this is an expected situation, the sharding code should handle it appropriately.