[SERVER-46517] Stepdown changes the underlying state of canAcceptWrites() out of RSTL X mode Created: 01/Mar/20 Updated: 29/Oct/23 Resolved: 19/Mar/20 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | None |
| Fix Version/s: | 4.2.6, 4.4.0-rc0, 4.7.0 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Siyuan Zhou | Assignee: | Lingzhi Deng |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | safe-reconfig-related | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||
| Backwards Compatibility: | Fully Compatible | ||||
| Operating System: | ALL | ||||
| Backport Requested: |
v4.4, v4.2
|
||||
| Sprint: | Repl 2020-03-23 | ||||
| Participants: | |||||
| Description |
|
Unconditional stepdown on learning of higher terms and relinquishing primary due to liveness check can change _leaderMode to kSteppingDown, then unlock the replCoord mutex to continue stepdown. A concurrent reconfig may acquire the lock after that, call _updateMemberStateFromTopologyCoordinator which sets canAcceptNonLocalWrites to the topology coordinator's canAcceptWrites():
Since _leaderMode has been changed, the reconfig thread picks up the half-work done by stepdown and continues to update canAcceptNonLocalWrites to false out of the RSTL X mode. The contract is canAcceptNonLocalWrites has to be updated in RSTL X mode and is violated here, failing an invariant.
There are several solutions to fix the issue in a holistic way:
The concurrency rule of _updateMemberStateFromTopologyCoordinator is whenever the topology coordinator states depended by _updateMemberStateFromTopologyCoordinator gets changed, this function should be called within the same lock scope. This issue violates this rule. |
| Comments |
| Comment by Githook User [ 07/Apr/20 ] |
|
Author: {'name': 'Lingzhi Deng', 'email': 'lingzhi.deng@mongodb.com', 'username': 'ldennis'}Message: (cherry picked from commit 6d0a10abd1e6f222bc16c59afc28dcfb9613b86f) |
| Comment by Githook User [ 25/Mar/20 ] |
|
Author: {'email': 'lingzhi.deng@mongodb.com', 'name': 'Lingzhi Deng', 'username': 'ldennis'}Message: (cherry picked from commit 994c78a1a36c006ad659983e2f0a3cba7a6dea41) |
| Comment by Githook User [ 25/Mar/20 ] |
|
Author: {'email': 'lingzhi.deng@mongodb.com', 'name': 'Lingzhi Deng', 'username': 'ldennis'}Message: (cherry picked from commit 6d0a10abd1e6f222bc16c59afc28dcfb9613b86f) |
| Comment by Githook User [ 19/Mar/20 ] |
|
Author: {'email': 'lingzhi.deng@mongodb.com', 'name': 'Lingzhi Deng', 'username': 'ldennis'}Message: |
| Comment by Githook User [ 19/Mar/20 ] |
|
Author: {'email': 'lingzhi.deng@mongodb.com', 'name': 'Lingzhi Deng', 'username': 'ldennis'}Message: |
| Comment by A. Jesse Jiryu Davis [ 11/Mar/20 ] |
|
I agree, Option 1 looks like a general improvement. On Wed, Mar 11, 2020 at 11:48 AM Lingzhi Deng (Jira) <jira@mongodb.org> |
| Comment by A. Jesse Jiryu Davis [ 11/Mar/20 ] |
|
|