Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-62479

[Retryability] Investigate the lifetime of TransactionParticipants stored in RetryableTransactionParticipantCatalog

    • Type: Icon: Task Task
    • Resolution: Fixed
    • Priority: Icon: Major - P3 Major - P3
    • 6.0.0-rc1, 6.1.0-rc0
    • Affects Version/s: None
    • Component/s: None
    • None
    • Fully Compatible
    • v6.0, v5.3
    • Sharding 2022-02-07, Sharding 2022-02-21, Sharding 2022-03-07, Sharding NYC 2022-03-21, Sharding NYC 2022-04-04, Sharding NYC 2022-04-18
    • 65
    • 3

      SERVER-62020 introduced execution history check across external/parent session and its internal/child sessions. It introduced the notion of RetryableTransactionParticipantCatalog which exists as a decoration on the Session object for the external session. The catalog stores to the TransactionParticipant:Participants for any active retryable write on the session, and allows for cross-session write history lookup and state validation. This ticket is to investigate whether the those TransactionParticipant:Participants can become invalid given that the lifetime of each TransactionParticipant object is tied to that of its owning Session. Preliminary investigation suggests that this can occur after the external session/transactions expire. Below is the reasoning. 

      When we check out a child Session, we update both the lastCheckOut time of that child Session and the lastCheckOut time of its parent Session. Therefore, a parent Session can only become expired after all of its Child sessions have expired, which is TransactionRecordMinimumLifetimeMinutes (defaults to 30) after its children or itself was last checked out, and its config.system.sessions doc is guaranteed to exist as long as it hasn't been more than localLogicalSessionTimeoutMinutes (defaults to 30) since the last checkout time. Additionally, SERVER-59506 made it such that child Sessions are only reaped when the parent Session is reaped, which can only occur after the config.system.sessions entry for its parent session no longer exists (i.e. deleted by the TTL monitor). The removal of those Sessions is done atomically while holding the SessionCatlog mutex. Despite this, the lifetime of child Sessions is not completely tied that of the parent Session because:

      1. _shouldBeReaped() returns false if the Session is still checked out by some opCtx.
      2. The destruction of the SessionRuntimeInfo's (i.e. Sessions) occurs asynchronously after scanSessions() returns.

      Therefore, if there is a session that is checked out by some opCtx when reaping occurs, that TransactionParticipant for that session will outlive the remaining TransactionParticipants in the catalog. This ticket is to verify this hypothesis through a test and find a solution for the issue if required.

            Assignee:
            cheahuychou.mao@mongodb.com Cheahuychou Mao
            Reporter:
            cheahuychou.mao@mongodb.com Cheahuychou Mao
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: