[SERVER-62479] [Retryability] Investigate the lifetime of TransactionParticipants stored in RetryableTransactionParticipantCatalog Created: 10/Jan/22  Updated: 29/Oct/23  Resolved: 13/Apr/22

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 6.0.0-rc1, 6.1.0-rc0

Type: Task Priority: Major - P3
Reporter: Cheahuychou Mao Assignee: Cheahuychou Mao
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Problem/Incident
Related
related to SERVER-65496 Test that the SessionCatalog does not... Closed
related to SERVER-65505 Make the gdb pretty printers for Sess... Closed
Backwards Compatibility: Fully Compatible
Backport Requested:
v6.0, v5.3
Sprint: Sharding 2022-02-07, Sharding 2022-02-21, Sharding 2022-03-07, Sharding NYC 2022-03-21, Sharding NYC 2022-04-04, Sharding NYC 2022-04-18
Participants:
Linked BF Score: 65
Story Points: 3

 Description   

SERVER-62020 introduced execution history check across external/parent session and its internal/child sessions. It introduced the notion of RetryableTransactionParticipantCatalog which exists as a decoration on the Session object for the external session. The catalog stores to the TransactionParticipant:Participants for any active retryable write on the session, and allows for cross-session write history lookup and state validation. This ticket is to investigate whether the those TransactionParticipant:Participants can become invalid given that the lifetime of each TransactionParticipant object is tied to that of its owning Session. Preliminary investigation suggests that this can occur after the external session/transactions expire. Below is the reasoning. 

When we check out a child Session, we update both the lastCheckOut time of that child Session and the lastCheckOut time of its parent Session. Therefore, a parent Session can only become expired after all of its Child sessions have expired, which is TransactionRecordMinimumLifetimeMinutes (defaults to 30) after its children or itself was last checked out, and its config.system.sessions doc is guaranteed to exist as long as it hasn't been more than localLogicalSessionTimeoutMinutes (defaults to 30) since the last checkout time. Additionally, SERVER-59506 made it such that child Sessions are only reaped when the parent Session is reaped, which can only occur after the config.system.sessions entry for its parent session no longer exists (i.e. deleted by the TTL monitor). The removal of those Sessions is done atomically while holding the SessionCatlog mutex. Despite this, the lifetime of child Sessions is not completely tied that of the parent Session because:

  1. _shouldBeReaped() returns false if the Session is still checked out by some opCtx.
  2. The destruction of the SessionRuntimeInfo's (i.e. Sessions) occurs asynchronously after scanSessions() returns.

Therefore, if there is a session that is checked out by some opCtx when reaping occurs, that TransactionParticipant for that session will outlive the remaining TransactionParticipants in the catalog. This ticket is to verify this hypothesis through a test and find a solution for the issue if required.



 Comments   
Comment by Githook User [ 18/Apr/22 ]

Author:

{'name': 'Cheahuychou Mao', 'email': 'mao.cheahuychou@gmail.com', 'username': 'cheahuychou'}

Message: SERVER-62479 Reap sessions for the same retryable write atomically

(cherry picked from commit 87393ce9bcfe06f8aa93b856474fb77bfb3a5267)
Branch: v6.0
https://github.com/mongodb/mongo/commit/845602af9c2f05f33fe03ec4f0b49f843bb81740

Comment by Githook User [ 13/Apr/22 ]

Author:

{'name': 'Cheahuychou Mao', 'email': 'mao.cheahuychou@gmail.com', 'username': 'cheahuychou'}

Message: SERVER-62479 Reap sessions for the same retryable write atomically
Branch: master
https://github.com/mongodb/mongo/commit/87393ce9bcfe06f8aa93b856474fb77bfb3a5267

Generated at Thu Feb 08 05:55:15 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.