Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Fixed
Priority: Critical - P2
Fix Version/s: 7.1.0-rc0, 6.0.7, 5.0.19, 4.4.23, 7.0.0-rc4
Affects Version/s: 4.4.19, 5.0.15, 6.3.0-rc0, 6.0.5, 7.1.0-rc0, 7.0.0-rc4
Component/s: Sharding
Labels:
- sharding-nyc-subteam1

Backwards Compatibility:
Fully Compatible
Operating System:
ALL
Backport Requested:

v7.0
Sprint:
Sharding NYC 2023-06-26
Linked BF Score:
135
Confidence Status:
None
Work Order:
0
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

If a update is performed in a transaction, we can't immediately read back the document if we don't already have the post-image, so we instead insert it into a list of deferred updates. These deferred updates will be processed in nextModBatch by reading the latest version of the document when the recipient shard calls _transferMods.

Next, the ids of the documents that have changed will be pulled from the update list, and those documents will be read in order to get the latest state to transfer to the recipient.

However, if we have already read documents while processing the deferred updates, a snapshot will be opened and pinned to the operation context, continuing to be used when we read later to get the latest state of the documents in the update list.

This means that it is possible for an update to read from a stale snapshot during the following sequence of events:

Deferred updates are processed, a snapshot is opened
Another thread updates a document in the chunk being moved, adding its id to the update list
The updates list is spliced in nextModsBatch
The state of the documents in the updates list is read using the same snapshot as in step 1, prior to the update being made in step 2
The update is lost

Calling abandonSnapshot on the operation context's recovery unit after splicing the update list should be sufficient to ensure that we will read from a snapshot at least as recent as the updates in the list, though it's not clear if this is the best long term solution to the problem.

is caused by

SERVER-71219 Migration can miss writes from prepared transactions

Closed

Assignee:: Brett Nawrocki
Reporter:: Brett Nawrocki
Participants:: Brett Nawrocki, Githook User
Votes:: 0 Vote for this issue
Watchers:: 11 Start watching this issue

Created:: Jun 13 2023 06:48:01 PM UTC
Updated:: Oct 29 2023 09:20:06 PM UTC
Resolved:: Jun 15 2023 10:22:55 PM UTC
Confidence Status Last Update:: 13/Jun/23 6:49 PM

Details

Description

Attachments

Issue Links

Forms

Activity

People

Dates