Priority: Major - P3
Affects Version/s: 4.0.0, 4.2.0, 4.4.0
Fix Version/s: 5.0.0-rc0
Backport Requested:v4.4, v4.2, v4.0
Sprint:Repl 2021-04-19, Repl 2021-05-03, Repl 2021-05-17
Linked BF Score:70
SessionUpdateTracker::_updateSessionInfo() is used by secondary oplog application to coalesce multiple updates to the same config.transactions record into a single update of the most recent retryable write statement. The changes from 02020fa as part of
SERVER-47844 made it possible for a secondary to choose its stable_timestamp as a majority-committed timestamp from within an oplog batch rather than always being on a batch boundary. The combination of these two can lead to the following sequence:
- During single batch of oplog application:
- User data write for stmtId=0 at t=10.
- User data write for stmtId=1 at t=11.
- User data write for stmtId=2 at t=12.
- Session txn record write at t=12 with stmtId=2 as lastWriteOpTime.
- In particular, no session txn record write for t=10 with stmtId=0 as lastWriteOpTime or for t=11 with stmtId=1 as lastWriteOpTime because they were coalseced by the SessionUpdateTracker.
- Rollback to stable timestamp t=10.
- The session txn record won't exist with stmtId=0 as lastWriteOpTime (because the write was entirely skipped by oplog application) despite the user data write for stmtId=0 being reflected on-disk. This allows stmtId=0 to be re-executed by this node if it became primary.
The stable optime candidates list prevents this issue for retryable inserts, updates, and deletes applied during secondary oplog application.
However, retryable inserts on primaries also coalesce multiple updates to the same config.transactions record into a single update of the most recent retryable write statement. This happens through OpObserverImpl::onInserts() calling TransasctionParticipant::onWriteOpCompletedOnPrimary() once for a batch of insert statements (aka vectored insert).
- Only retryable inserts are impacted.
- A retry attempt fails with a DuplicateKey error so long as the document wasn't deleted by another client in the meantime. (The document is re-inserted otherwise.)
The stable optime candidates list was removed and so this issue exists for retryable inserts, updates, and deletes applied during secondary oplog application. Retryable inserts on primaries continue to coalesce multiple updates to the same config.transactions record into a single update of the most recent retryable write statement.
- All of retryable inserts, updates, and deletes are impacted.
- A retry attempt for an update can execute more than once (e.g. double increment a counter).
This issue was discovered while reasoning through why the atClusterTime read on config.transactions to fix
SERVER-54626 was insufficient (hence SERVER-55214). Shout out to Daniel Gottlieb for the assist!