The SessionCatalogMongoD reaper has the logic for deleting the on-disk state for every expired internal transaction session for retryable write that it think is no longer in use, regardless of whether its corresponding logical session has expired. To determine if an expired transaction session is still in use, it relies this set of expired transaction sessions not reaped from the SessionCatalog. The limitation around this is that the set only gets populated with expired transaction session ids whose logical session has expired and been removed from the config.system.sessions collection. So if a logical session hasn't expired but its latest internal transaction session for retryable write has expired (*), the reaper would delete the on-disk of that transaction session and that would cause any operation running on that transaction session to get interrupted. One of the few rare cases where (*) can happen is where the transaction has committed or aborted but is stuck waiting for write concern. This showed up in BF-25724 as this test sets the TransactionRecordMinimumLifetimeMinutes to 0 (default 30 mins).
One solution is to make the SessionCatalogMongoD reaper not delete the on-disk state for any expired internal transaction session for retryable write until its logical session has expired just like it does for all other kinds of transaction sessions. That is, we will completely rely on the best-effort eager reaping in the SessionCatalog to delete the on-disk state for internal transaction session for old retryable writes.