[SERVER-59108] Resolve race with transaction operation not killed after step down Created: 04/Aug/21  Updated: 29/Oct/23  Resolved: 11/Oct/21

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 4.4.11, 4.2.18, 5.0.4, 5.1.0-rc0

Type: Bug Priority: Major - P3
Reporter: Xuerui Fa Assignee: Vesselina Ratcheva (Inactive)
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Problem/Incident
Related
is related to SERVER-50486 invokeWithSessionCheckedOut being cal... Closed
is related to SERVER-66351 Audit uses of OperationContext::setAl... Open
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v5.0, v4.4, v4.2
Sprint: Repl 2021-09-06, Repl 2021-09-20, Repl 2021-10-04, Repl 2021-10-18
Participants:
Linked BF Score: 70

 Description   

In SERVER-50486, we added a flag on the opCtx of transaction operations to ensure that these operations would be interrupted on step down. We then check to make sure we are still the primary. The commandCanRunHere function will return true if we can accept non-local writes.

In the stepDown code path, we first acquire the RSTL, which is where we run the killOps thread to kill the opCtx of any commands that have the flag set. Only then do we update if we can accept non-local writes or not. As a result, it seems possible for the following to happen:

  1. In the user thread t1, we add a user command to the _clients vector in ServiceContext. However, we haven't yet hit ExecCommandDatabase::_initiateCommand() and set the flag
  2. In the stepDown thread t2, we attempt to acquire RSTL and loop through all commands. Since the flag is not yet set for the command in t1, it is not killed
  3. In t1, we now set the flag and check if we can still service non-local writes. Since we still can, the command proceeds
  4. In t2, we acquire RSTL and set that we can no longer service non-local writes.


 Comments   
Comment by Githook User [ 21/Oct/21 ]

Author:

{'name': 'Vesselina Ratcheva', 'email': 'vesselina.ratcheva@10gen.com', 'username': 'vessy-mongodb'}

Message: SERVER-59108 Resolve race with transaction operation not killed after stepdown

(cherry picked from commit 1b31e6ca3d25a35f31f48547aafe0ec33c8c9bfd)
Branch: v4.2
https://github.com/mongodb/mongo/commit/5bb885c0fd06484ca4b3a596b1959d64e3080e8b

Comment by Githook User [ 21/Oct/21 ]

Author:

{'name': 'Vesselina Ratcheva', 'email': 'vesselina.ratcheva@10gen.com', 'username': 'vessy-mongodb'}

Message: SERVER-59108 Resolve race with transaction operation not killed after stepdown

(cherry picked from commit 1b31e6ca3d25a35f31f48547aafe0ec33c8c9bfd)
Branch: v4.4
https://github.com/mongodb/mongo/commit/bba630c895a90a59b78d12105424575b2f847b91

Comment by Githook User [ 21/Oct/21 ]

Author:

{'name': 'Vesselina Ratcheva', 'email': 'vesselina.ratcheva@10gen.com', 'username': 'vessy-mongodb'}

Message: SERVER-59108 Resolve race with transaction operation not killed after stepdown

(cherry picked from commit 1b31e6ca3d25a35f31f48547aafe0ec33c8c9bfd)
Branch: v5.0
https://github.com/mongodb/mongo/commit/af958aeb4c242065d33fe0831eac0368fa3e3f21

Comment by Githook User [ 11/Oct/21 ]

Author:

{'name': 'Vesselina Ratcheva', 'email': 'vesselina.ratcheva@10gen.com', 'username': 'vessy-mongodb'}

Message: SERVER-59108 Resolve race with transaction operation not killed after stepdown
Branch: master
https://github.com/mongodb/mongo/commit/1b31e6ca3d25a35f31f48547aafe0ec33c8c9bfd

Generated at Thu Feb 08 05:46:22 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.