Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Fixed
Priority: Major - P3
Fix Version/s: 4.2.6, 4.4.0-rc0, 4.7.0
Affects Version/s: None
Component/s: Replication
Labels:
- safe-reconfig-related

Backwards Compatibility:
Fully Compatible
Operating System:
ALL
Backport Requested:

v4.4, v4.2
Sprint:
Repl 2020-03-23
Confidence Status:
None
Work Order:
3
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

Unconditional stepdown on learning of higher terms and relinquishing primary due to liveness check can change _leaderMode to kSteppingDown, then unlock the replCoord mutex to continue stepdown. A concurrent reconfig may acquire the lock after that, call _updateMemberStateFromTopologyCoordinator which sets canAcceptNonLocalWrites to the topology coordinator's canAcceptWrites():

bool TopologyCoordinator::canAcceptWrites() const {
    return _leaderMode == LeaderMode::kMaster;
}

Since _leaderMode has been changed, the reconfig thread picks up the half-work done by stepdown and continues to update canAcceptNonLocalWrites to false out of the RSTL X mode.

The contract is canAcceptNonLocalWrites has to be updated in RSTL X mode and is violated here, failing an invariant.

~~SERVER-45081~~ works around this by only updating canAcceptNonLocalWrites when RSTL X is acquired, so the work will be left to the stepdown thread.

There are several solutions to fix the issue in a holistic way:

Move the update of readWriteAbility out of _updateMemberStateFromTopologyCoordinator, so it's only called when changed.
Don't change _leaderMode to kSteppingDown before acquiring RSTL. We need to rethink the concurrency of stepdown then.

The concurrency rule of _updateMemberStateFromTopologyCoordinator is whenever the topology coordinator states depended by _updateMemberStateFromTopologyCoordinator gets changed, this function should be called within the same lock scope. This issue violates this rule.

Assignee:: Lingzhi Deng
Reporter:: Siyuan Zhou
Participants:: A. Jesse Jiryu Davis, Githook User, Lingzhi Deng, Siyuan Zhou
Votes:: 0 Vote for this issue
Watchers:: 7 Start watching this issue

Created:: Mar 01 2020 10:09:44 PM UTC
Updated:: Oct 29 2023 10:11:30 PM UTC
Resolved:: Mar 19 2020 02:32:53 PM UTC
Confidence Status Last Update:: 11/Mar/20 5:11 PM

Details

Description

Attachments

Forms

Activity

People

Dates