[SERVER-77633] Calling withTransaction with a checked out session may end up in a deadlock (on stepdown) Created: 31/May/23  Updated: 22/Jun/23  Resolved: 22/Jun/23

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Task Priority: Major - P3
Reporter: Silvia Surroca Assignee: Silvia Surroca
Resolution: Won't Do Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
is duplicated by SERVER-77634 withTransaction must yield the sessio... Closed
Related
related to SERVER-71011 ShardingCatalogManager::withTransacti... Backlog
is related to SERVER-78318 ConfigsvrCollMod command should not s... Closed
Assigned Teams:
Sharding EMEA
Sprint: Sharding EMEA 2023-06-12, Sharding EMEA 2023-06-26
Participants:
Linked BF Score: 135

 Description   

Any code running `withTransaction` may end up with a deadlock if the given OperationContext holds a SessionId and there is a stepdown during the transaction process. Right now we don't have any thread that holds a session when `withTransaction` is called, however, it should be fixed to avoid hitting this error in the future.

The sequence of events leading to a deadlock is the following:

  • (): withTransaction thread
  • (): step-down thread

1. () checks out a SessionId
2. () run `withTransaction`
3. () step-down thread starts and an Interruption is sent to all the threads.
4. () abortTransaction is executed
4. () step-down thread acquires RSTL lock
5. () tries to checkout all sessions to kill them
6. () gets blocked when trying to checkout the session of thread A
7. () gets blocked trying to acquire RSTL lock to abort the transaction.

withTransaction is a method implemented as a utility for the ShardingCatalogManager when new transactions API didn't exist.

The new transaction API yields the session attached to the thread to avoid this scenario. So I suggest getting rid of withTransaction code and using the new transaction API instead. This is an example of implementation for the new transaction API

This issue was discovered when the sessionId was attached to the ConfigsvrCollMod request. The sessionId was finally removed to solve quickly the bug.



 Comments   
Comment by Silvia Surroca [ 22/Jun/23 ]

We've decided to don't address this ticket since any operation is using the withTransaction utility holding a session.

On one side, the user of withTransaction should yield any resource in case of holding them.

On the other side, we are migrating the code toward the new transaction API, where this problem is not present any more.

Generated at Thu Feb 08 06:36:09 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.