[SERVER-37574] Force reconfig should kill user operations Created: 11/Oct/18  Updated: 29/Oct/23  Resolved: 20/May/19

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: 4.1.12

Type: Bug Priority: Major - P3
Reporter: Judah Schvimer Assignee: Suganthi Mani
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
is depended on by SERVER-41037 Stepup should kill all user operation... Closed
is depended on by SERVER-40487 Stop running the RstlKillOpthread whe... Backlog
Documented
is documented by DOCS-12717 Docs for SERVER-37574: Force reconfig... Closed
Related
related to SERVER-41035 Rollback should kill all user operati... Closed
related to SERVER-37381 Allow prepared transactions to surviv... Closed
is related to SERVER-27534 All writing operations must fail if t... Closed
is related to SERVER-41234 Concurrent step downs due to step dow... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v4.0, v3.6
Sprint: Repl 2018-10-22, Repl 2019-03-11, Repl 2019-03-25, Repl 2019-04-08, Repl 2019-04-22, Repl 2019-05-06, Repl 2019-05-20, Repl 2019-06-03
Participants:
Linked BF Score: 12

 Description   

Force reconfig from the command and from a heartbeat can lead to a stepdown and thus needs to kill user operations.
Here: https://github.com/mongodb/mongo/blob/a222ef5e647ac527f7d4f8636bcacd6cc0ae6b8e/src/mongo/db/repl/replication_coordinator_impl.cpp#L2329-L2335
And here:
https://github.com/mongodb/mongo/blob/a222ef5e647ac527f7d4f8636bcacd6cc0ae6b8e/src/mongo/db/repl/replication_coordinator_impl_heartbeat.cpp#L559-L568



 Comments   
Comment by Githook User [ 20/May/19 ]

Author:

{'name': 'Suganthi Mani', 'email': 'suganthi.mani@mongodb.com', 'username': 'smani87'}

Message: SERVER-37574 Force reconfig should kill operations that conflict state
transition (primary to secondary/removed).
Branch: master
https://github.com/mongodb/mongo/commit/87c4ff3e8f939593c2ab0101f930648d8d208904

Comment by Suganthi Mani [ 20/May/19 ]

If the node has already step down due to reconfig code-path, then step down via heartbeat should simply return . Just noticed that when reading the code, before returning, it should also signal waiters waiting on the step down event. https://github.com/mongodb/mongo/commit/8e6ad096f8a8b81e1be01d012920f52332650d6f didn't address it. So, reverted the commit.

Comment by Githook User [ 20/May/19 ]

Author:

{'name': 'Suganthi Mani', 'email': 'suganthi.mani@mongodb.com', 'username': 'smani87'}

Message: Revert "SERVER-37574 Force reconfig should kill operations that conflict state"

This reverts commit 8e6ad096f8a8b81e1be01d012920f52332650d6f.
Branch: master
https://github.com/mongodb/mongo/commit/674276b0149d3b77b51ac46adc53b11c47a26519

Comment by Githook User [ 19/May/19 ]

Author:

{'email': 'suganthi.mani@mongodb.com', 'name': 'Suganthi Mani', 'username': 'smani87'}

Message: SERVER-37574 Force reconfig should kill operations that conflict state
transition (primary to secondary/removed).
Branch: master
https://github.com/mongodb/mongo/commit/8e6ad096f8a8b81e1be01d012920f52332650d6f

Comment by Suganthi Mani [ 07/May/19 ]

This ticket should handle below 2 things.

1) Unconditional step down after acquiring mutex should check whether the member state is still primary and return if not. Else, it can lead to this invariant getting failed.

  • A concurrent step down might have happened via reconfig/ reconfig via heartbeat, which means the leader mode is no longer in kSteppingDown.

2) Reconfig cmd and reconfig via heartbeat should be able to take care of updating the term & resetting _pendingTermUpdateDuringStepDown (like we do it in unconditional step down) if they cause step down and _pendingTermUpdateDuringStepDown is set.

This is a 4.0 bug and the fix must be backported.

EDIT: Problem #2 will be addressed by SERVER-41234

 CC judah.schvimer

Comment by Ravind Kumar (Inactive) [ 16/Apr/19 ]

Note for docs - we should also update the following text for replSetReconfig

replSetReconfig obtains a special mutually exclusive lock to prevent more than one replSetReconfig operation from occurring at the same time.

To be more specific. Via slack:

Reads and writes to admin database won’t be blocked because of the replication coordinator mutex lock. But if we take RSTL lock (the node steps down/removed because of reconfig), then the reads will be blocked during that period and write will be killed (after this ticket SERVER-37574)
 
1) w/ & w/o SERVER-37574 fix, during that period where RSTL lock is taken by reconfig we would be blocking the new incoming reads and writes. And, after that period, writes should be able to acquire the lock and it would be failing with NotMaster errorcode.

2) W/o SERVER-37574, the in-progress read and write operations won’t be killed and we will allow it to complete when this reconfig is trying to acquire the RSTL lock. But after SERVER-37574, the in-progress write operations will be killed and read would be allowed to proceed.

both of these cases result in retryable errors, and should be transparent to applications. Some of this information is not quite related to SERVER-37574, but since we're touch at least a bit on locking behavior and during what scenarios we trigger the mutex vs RSTL lock.

Comment by Judah Schvimer [ 19/Mar/19 ]

While I agree that we will not roll back w:majority writes that get acknowledged (since we close the connections), we could still write ordered operations out of order as described in SERVER-27534. I suspect that is almost as bad for some client applications.

Comment by Suganthi Mani [ 18/Mar/19 ]

judah.schvimer, in 4.0, we don't kill  user operations on reconfig via command/heartbeat. But, we still close connections on step down. So, the current request result won't reach the client.

I feel its good to backport SERVER-27534 (kill user operations) to 4.0 just to make reconfig faster, but I don't see any correctness or some deadlock issue.According to me, the backport work is not that urgent.

Comment by Tess Avitabile (Inactive) [ 18/Mar/19 ]

Got it. In that case, we should backport this work.

Comment by Judah Schvimer [ 18/Mar/19 ]

I do think we need to backport it due to SERVER-27534. If we do not kill the operations and step down and then back up, when we check that we can accept writes, it will return true but we could have rolled back in the mean time.

Alternatively we could check that the term hasn't changed since starting the operation in the OpObserver in logOp, and backport that which would maybe be easier.

Comment by Tess Avitabile (Inactive) [ 18/Mar/19 ]

suganthi.mani, judah.schvimer, do you think we need to backport this change? I'm not certain we need to. On older versions, we do not need to kill operations on stepdown in order to avoid deadlock; it just may take us longer to acquire the global lock. Additionally, it's okay if we don't kill write operations, since they will check if the node can accept writes after their next lock acquisition. I would prefer not to backport this if we don't need to, since the code has changed so much in this area.

Comment by Suganthi Mani [ 14/Mar/19 ]

Note: When this ticket is closed, need to file a DOC ticket to update https://docs.mongodb.com/manual/reference/method/rs.reconfig/#availability section

The rs.reconfig() shell method can trigger the current primary to step down in some situations. When the primary steps down, it forcibly closes all client connections. Primary step-down triggers an election to select a new primary.

 After  PM-639, we no longer close all client connections on primary step down.

Generated at Thu Feb 08 04:46:23 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.