[SERVER-77667] Prevent mongos from starting new transactions at shutdown Created: 31/May/23  Updated: 16/Jan/24  Resolved: 10/Jan/24

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: 7.2.1, 7.3.0-rc0

Type: Bug Priority: Major - P3
Reporter: Max Hirschhorn Assignee: Wenqin Ye
Resolution: Fixed Votes: 0
Labels: sharding-nyc-subteam3
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Assigned Teams:
Cluster Scalability
Backwards Compatibility: Fully Compatible
Backport Requested:
v7.2
Sprint: Cluster Scalability 2024-1-8, Cluster Scalability 2024-1-22
Participants:
Linked BF Score: 135
Story Points: 3

 Description   

At shutdown, the mongos process performs a best-effort attempt to abort any transactions the mongos process may have started. This is beneficial for freeing up transaction resources more quickly because the client/driver must retry any in-progress transactions which haven't had their commit coordination handed off already. This is because the transaction protocol does not support committing a multi-statement transaction through a different mongos from the mongos which originally ran the read/write operations. The different mongos can only be used to recover the original commit xor abort decision for the transaction.

The implicitlyAbortAllTransactions() function which performs this best-effort attempt to abort any transactions the mongos process may have started. However it doesn't prevent TransactionRouter from being used by a not-yet-interrupted OperationContext and starting a new transaction on a shard. Ordinarily this would be an issue because mongos shutting down is rare and the transaction would eventually be aborted on the shard after the transactionLifetimeLimitSeconds (= 60 seconds by default). In testing the transactionLifetimeLimitSeconds server parameter is set to 24 hours to catch cases where a transaction is unintentionally "leaked" by the system. While the system has liveness through the PeriodicThreadToAbortExpiredTransactions job, a stall would be undesirable to have happen in production.

One place in testing we've seen show up where new transactions are being started while the mongos process is shutting down is with the ClusterServerParameterRefresher thread reading from the config server primary in a multi-statement transaction. The MODE_IX lock held on the config server primary prevents the testing infrastructure from running its data consistency checks before shutting down the config server replica set. One idea to improve implicitlyAbortAllTransactions() is to set a flag on the SessionCatalog indicating process shutdown has begun. TransactionRouter instances which are obtained from the SessionCatalog can check whether this flag has been set and throw an InterruptedAtShutdown or equivalent error to prevent the mongos process from starting any new transactions.



 Comments   
Comment by Githook User [ 12/Jan/24 ]

Author:

{'name': 'Wenqin Ye', 'email': 'wenqin908@gmail.com', 'username': 'wenqinYe'}

Message: SERVER-77667: Prevent mongos from starting new transactions at shutdown
Branch: v7.2
https://github.com/mongodb/mongo/commit/4eff4f8ec43badf02328a8a598e9ea4747fac226

Comment by Githook User [ 09/Jan/24 ]

Author:

{'name': 'Wenqin', 'email': 'wenqinYe@users.noreply.github.com', 'username': 'wenqinYe'}

Message: SERVER-77667: Prevent mongos from starting new transactions at shutdown (#17886)

GitOrigin-RevId: 9d3e5e69b0623a89e357e2ed0967b9011ca8f74c
Branch: master
https://github.com/mongodb/mongo/commit/fdd105b625ffb52cfd1747776c9b62733cd62eeb

Comment by Max Hirschhorn [ 31/May/23 ]

Relatedly, we should consider whether mongod shutdown would benefit from logic similar to implicitlyAbortAllTransactions() from mongos_main.cpp. I can imagine we would move this logic from mongos_main.cpp into something common between mongos and a router-mongod process as part of PM-635. Yet it may also that the introduction of internal transactions and mongod already being able to act as a router means there'd be benefit in earlier binVersions as well. CC antonio.fuschetto@mongodb.com

Generated at Thu Feb 08 06:36:14 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.