Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Fixed
Priority: Major - P3
Fix Version/s: 5.0.0-rc0
Affects Version/s: None
Component/s: Replication, Sharding
Labels:
None

Backwards Compatibility:
Fully Compatible
Operating System:
ALL
Backport Requested:

v4.4, v4.2, v4.0
Sprint:
Repl 2021-04-19, Repl 2021-05-03, Repl 2021-05-17
Linked BF Score:
200
Confidence Status:
None
Work Order:
3
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

SessionUpdateTracker::_updateSessionInfo() is used by secondary oplog application to coalesce multiple updates to the same config.transactions record into a single update of the most recent retryable write statement. The changes from 02020fa as part of ~~SERVER-47844~~ made it possible for a secondary to choose its stable_timestamp as a majority-committed timestamp from within an oplog batch rather than always being on a batch boundary. The combination of these two can lead to the following sequence:

During single batch of oplog application:
1. User data write for stmtId=0 at t=10.
2. User data write for stmtId=1 at t=11.
3. User data write for stmtId=2 at t=12.
4. Session txn record write at t=12 with stmtId=2 as lastWriteOpTime.
  - In particular, no session txn record write for t=10 with stmtId=0 as lastWriteOpTime or for t=11 with stmtId=1 as lastWriteOpTime because they were coalseced by the SessionUpdateTracker.
Rollback to stable timestamp t=10.
The session txn record won't exist with stmtId=0 as lastWriteOpTime (because the write was entirely skipped by oplog application) despite the user data write for stmtId=0 being reflected on-disk. This allows stmtId=0 to be re-executed by this node if it became primary.

Impact on 4.0, 4.2, and 4.4 branches

The stable optime candidates list prevents this issue for retryable inserts, updates, and deletes applied during secondary oplog application.

However, retryable inserts on primaries also coalesce multiple updates to the same config.transactions record into a single update of the most recent retryable write statement. This happens through OpObserverImpl::onInserts() calling TransasctionParticipant::onWriteOpCompletedOnPrimary() once for a batch of insert statements (aka vectored insert).

Only retryable inserts are impacted.
A retry attempt fails with a DuplicateKey error so long as the document wasn't deleted by another client in the meantime. (The document is re-inserted otherwise.)

Impact on 4.9 and master branches

The stable optime candidates list was removed and so this issue exists for retryable inserts, updates, and deletes applied during secondary oplog application. Retryable inserts on primaries continue to coalesce multiple updates to the same config.transactions record into a single update of the most recent retryable write statement.

All of retryable inserts, updates, and deletes are impacted.
A retry attempt for an update can execute more than once (e.g. double increment a counter).

This issue was discovered while reasoning through why the atClusterTime read on config.transactions to fix ~~SERVER-54626~~ was insufficient (hence ~~SERVER-55214~~). Shout out to daniel.gottlieb for the assist!

is related to

SERVER-54626 Retryable writes may execute more than once in resharding if statements straddle the fetchTimestamp

Closed

SERVER-55214 Resharding txn cloner can miss config.transactions entry when fetching

Closed

SERVER-56631 Retryable write pre-fetch phase could miss entry from config.transactions when reading from donor secondaries

Closed

SERVER-56796 Support atClusterTime snapshot reads on config.transactions

Backlog

SERVER-47844 Update _setStableTimestampForStorage to set the stable timestamp without using the stable optime candidates set when EMRC=true

Closed

SERVER-47845 Remove obsolete code related to storing and updating stable optime candidates

Closed

related to

SERVER-99185 Handle transactionally replicated vectored inserts when restoring config.transactions during rollback

Closed

SERVER-55578 Disallow atClusterTime reads on the config.transactions collection

Closed

(1 is related to, 2 related to)

Assignee:: Jason Chan
Reporter:: Max Hirschhorn
Participants:: Githook User, Jason Chan, Max Hirschhorn
Votes:: 0 Vote for this issue
Watchers:: 18 Start watching this issue

Created:: Mar 18 2021 04:46:25 PM UTC
Updated:: Jan 09 2025 05:42:33 PM UTC
Resolved:: May 05 2021 01:10:43 PM UTC
Confidence Status Last Update:: 22/Apr/21 9:00 PM

Details

Description

Impact on 4.0, 4.2, and 4.4 branches

Impact on 4.9 and master branches

Attachments

Issue Links

Forms

Activity

People

Dates