[SERVER-55214] Resharding txn cloner can miss config.transactions entry when fetching Created: 15/Mar/21  Updated: 29/Oct/23  Resolved: 09/Apr/21

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: 5.0.0-rc0

Type: Bug Priority: Major - P3
Reporter: Randolph Tan Assignee: Randolph Tan
Resolution: Fixed Votes: 0
Labels: PM-234-M2.5, PM-234-T-config-txn-clone
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File test.js    
Issue Links:
Depends
Related
related to SERVER-55305 Retryable write may execute more than... Closed
related to SERVER-55578 Disallow atClusterTime reads on the c... Closed
related to SERVER-55873 Force secondaries to apply each write... Closed
is related to SERVER-54626 Retryable writes may execute more tha... Closed
is related to SERVER-54681 Resharding recipient shards which are... Closed
is related to SERVER-56631 Retryable write pre-fetch phase could... Closed
is related to SERVER-52921 Integrate config.transactions cloner ... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Sprint: Sharding 2021-03-22, Sharding 2021-04-05, Sharding 2021-04-19
Participants:
Linked BF Score: 42
Story Points: 2

 Description   

This is because we use readConcern snapshot + atClusterTime and readPref nearest.

The issue is that, in secondaries, we squash multiple updates to config.transactions into one and use the newest timestamp when calling setTimestamp. So if we have 3 writes from retryable that corresponds to ts1, ts2 & ts3, the secondary will only have ts3 set properly and you will not be able to see the config.transactions document when reading with atClusterTime ts1 or ts2.



 Comments   
Comment by Githook User [ 08/Apr/21 ]

Author:

{'name': 'Randolph Tan', 'email': 'randolph@10gen.com', 'username': 'renctan'}

Message: SERVER-55214 Make resharding recipient shards use fetchTimestamp from each donor shard when fetching config.transactions and the oplog

Also force the no-op oplog write that is being used as the minFetchTimestamp marker for resharding into its own batch when replicating.
Branch: master
https://github.com/mongodb/mongo/commit/aa401671d769f20e98f40b864338c7bd1c14d292

Comment by Max Hirschhorn [ 26/Mar/21 ]

The work plan for this ticket is do the following:

  1. Change the no-op oplog entry written by donor shards as part of calculating their minFetchTimestamp value to be applied by secondaries in its own batch. This ensures that any retryable write statements committed before the minFetchTimestamp are visible in config.transactions collection when using {atClusterTime: <minFetchTimestamp>} (and any retryable write statements committed afterwards are not visible).
  2. Change recipient shards to clone the config.transactions collection using {atClusterTime: <minFetchTimestamp>} with the individual minFetchTimestamp value corresponding to the donor shard rather than the single fetchTimestamp value. Note that recipients shards must continue clone the collection being resharded using {atClusterTime: <fetchTimestamp>} to assert the non-existence of documents with duplicate _ids.
  3. Change recipient shards to start fetching oplog entries using {$gte: {ts: <minFetchTimestamp>}} with the individual minFetchTimestamp value corresponding to the donor shard rather than the single fetchTimestamp value. This is necessary for the config.transactions collection to become correct via idempotency of oplog application due to how (2) is having it cloned at an earlier timestamp than the fetchTimestamp value.
Comment by Randolph Tan [ 15/Mar/21 ]

Attaching js test that demonstrates the secondary read behavior for config.transactions

Generated at Thu Feb 08 05:35:50 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.