[SERVER-53544] Verify prevOpTime chain for transactions is intact in resharding oplog fetcher Created: 30/Dec/20  Updated: 05/Jun/21  Resolved: 05/Jun/21

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: None

Type: Task Priority: Major - P3
Reporter: Max Hirschhorn Assignee: Max Hirschhorn
Resolution: Duplicate Votes: 0
Labels: PM-234-M3, PM-234-T-oplog-fetch, post-rc0
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Duplicate
duplicates SERVER-57303 Create transaction history iterator s... Closed
Related
is related to SERVER-49892 Continued testing for aggregation pip... Closed
Backport Requested:
v5.0
Sprint: Sharding 2021-06-14, Sharding 2021-05-31
Participants:
Story Points: 2

 Description   

ReshardingOplogFetcher relies on {$_requestReshardingResumeToken: true} enabling QueryPlannerParams::ASSERT_MIN_TS_HAS_NOT_FALLEN_OFF_OPLOG and causing the aggregation pipeline to fail with an OplogQueryMinTsMissing error response if the donor's oplog rolls over. However, it is possible for the $graphLookup in the resharding oplog fetcher's aggregation pipeline to read from a later snapshot than the snapshot that was used for the collection scan to populate the DocumentSourceCursor. This means the following sequence could happen:

  1. ReshardingOplogFetcher runs [{$match: {ts: {$gte: Timestamp(10, 1)}}}, ..., {$graphLookup: ...}, ...].
  2. CollectionScan verifies the {ts: Timestamp(10, 1)} document is still present in the oplog. The {ts: Timestamp(10, 1)} document has {prevOpTime: {ts: Timestamp(5, 1), t: 1}}.
  3. DocumentSourceCursor buffers a batch of documents.
  4. Oplog truncation occurs and now {ts: Timestamp(6, 1)} is the oldest oplog entry.
  5. $graphLookup finds the {ts: Timestamp(10, 1)} document but fails to find a {ts: Timestamp(5, 1)} document.

The ReshardingOplogFetcher could confirm the prevOpTime value is {prevOpTime: {ts: Timestamp(0, 0), t: -1}} whenever it sees a new transaction. The aggregation pipeline that resharding's oplog fetching is using guarantees all of the oplog entries from the same multi-document transaction are contiguous so it only needs to track this for a single multi-document transaction at a time. We may as well verify the entire prevOpTime chain values for the later entries too.



 Comments   
Comment by Max Hirschhorn [ 05/Jun/21 ]

This issue has been addressed by the changes from b4c7db3 as part of SERVER-57303 by having the TransactionHistoryIterator propagate the IncompleteTransactionHistory exception when the prevOpTime chain isn't intact. Moreover, resharding's oplog fetching pipeline no longer uses $lookup or $graphLookup so the non-transaction oplog entries are not at risk of being elided by reading from different snapshots between stages within the aggregation pipeline.

Comment by Max Hirschhorn [ 30/Dec/20 ]

I think there's a related edge case where the $graphLookup and $lookup fail to find the {ts: Timestamp(10, 1)} document itself.

  1. ReshardingOplogFetcher runs [{$match: {ts: {$gte: Timestamp(10, 1)}}}, ..., {$graphLookup: ...}, ...].
  2. CollectionScan verifies the {ts: Timestamp(10, 1)} document is still present in the oplog.
  3. DocumentSourceCursor buffers a batch of documents.
  4. Oplog truncation occurs and now {ts: Timestamp(11, 1)} is the oldest oplog entry.
  5. $graphLookup fails to find a {ts: Timestamp(10, 1)} document.

I suspect using {preserveNullAndEmptyArrays: true} with the $unwind stages that follow the $graphLookup and the $lookup would pass through enough information that we "lost" an oplog entry from the donor shard.

We could also consider a different approach to address both scenarios of having recipient shards prevent donor shards from truncation portions of their oplog. Or having donor shards attach the timestamp of their oldest oplog entry in the cursor response and doing the necessary plumbing in PlanExecutorPipeline (see SERVER-53534) so ReshardingOplogFetcher can error out itself.

Generated at Thu Feb 08 05:31:13 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.