[SERVER-70131] setFeatureCompatabilityVersion() causes local transactions to be aborted Created: 30/Sep/22 Updated: 11/Jan/23 Resolved: 11/Jan/23 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication, Sharding |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Paolo Polato | Assignee: | Adrian Gonzalez Montemayor |
| Resolution: | Won't Fix | Votes: | 0 |
| Labels: | sharding-nyc-subteam1 | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||
| Assigned Teams: |
Sharding NYC
|
||||
| Sprint: | Sharding NYC 2022-12-26, Sharding NYC 2023-01-09, Sharding NYC 2023-01-23 | ||||
| Participants: | |||||
| Linked BF Score: | 150 | ||||
| Story Points: | 3 | ||||
| Description |
|
The execution of the setFCV command includes the interruption of sessions involving nodes with different binary versions - and causes the abortion of any ongoing transaction associated. It has been observed that transactions running on a local client (such as the one introduced with The objective of this ticket is
|
| Comments |
| Comment by Adrian Gonzalez Montemayor [ 11/Jan/23 ] |
|
Not aborting transactions would cause setFCV to wait for the global S lock while ongoing transactions (uncommitted) are taking place (transactions take the IX lock). This would then stall all other operations in the system and enqueue them behind the FCV change. For this reason we decided to not complete this ticket. |
| Comment by Githook User [ 11/Jan/23 ] |
|
Author: {'name': 'Adrian Gonzalez', 'email': 'adriangonzalezmontemayor@gmail.com', 'username': 'adriangzz'}Message: |
| Comment by Randolph Tan [ 08/Nov/22 ] |
|
Had a discussion with max.hirschhorn@mongodb.com and then it came to the discussion why we kill transactions and not normal operations. It looks like the setFCV abort transaction logic was put as a safety precaution for certain older version, but we kept the logic around. I think we can remove it. |
| Comment by Paolo Polato [ 13/Oct/22 ] |
|
max.hirschhorn@mongodb.com, Thanks for the thorough explanation. Yes, BF-26501 was the failure that led to the creation of this ticket (I have updated the ticket accordingly). |
| Comment by Max Hirschhorn [ 12/Oct/22 ] |
|
paolo.polato@mongodb.com, do you have server logs and a resmoke invocation to reproducing this problem? Is it BF-26501? The code you linked to for the feature compatibility version changing to call ServiceEntryPoint::endAllSessions() is about transport::Session meaning there is a network connection associated with the Client object. A withTransaction() block running on behalf of an internal thread wouldn't have a transport::Session associated with it. What may be happening instead is that the local replica set transaction got started prior to the FCV downgrade and then was interrupted due to the killSessionsAbortUnpreparedTransactions() logic. I would have expected the killing to result in a transient transaction error such that the client (including an internal thread using withTransaction()) would retry automatically. However, my mental model may be inaccurate because ObservableSession::kill() calls through to ServiceContext::killOperation() which calls through to OperationContext::markKilled(ErrorCodes::Interrupted). I wonder if the reason we hadn't see this be a problem for external clients is because the network connection will be closed before their OperationContext is interrupted? |