[SERVER-55391] Command with snapshot read concern without 'atClusterTime' can fail with SnapshotTooOld Created: 22/Mar/21  Updated: 27/Oct/23  Resolved: 30/Mar/21

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Jordi Serra Torrens Assignee: Lingzhi Deng
Resolution: Works as Designed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Operating System: ALL
Sprint: Repl 2021-04-05
Participants:

 Description   

In snapshot read concern, when the client does not specify an 'atClusterTime', the server will chose one in the command entry point to be the CurrentCommittedSnapshotOpTime. However, when the command gets executed, the currentCommitedSnapshotOpTime in the node could already have been advanced beyond the minSnapshotHistoryWindowInSeconds, so the command would fail to open the cursor and return SnapshotTooOld. This is because the selection of the opTime to read and the opening of the cursor does not happen atomically.
The client can retry on that error, but it seems that this situation can potentially keep happening on retries. Thus the client can't deterministically get that query to eventually succeed.



 Comments   
Comment by Lingzhi Deng [ 30/Mar/21 ]

Closing as "Works as Designed". Sharding might need to consider having a larger snapshot window on configsvr for catalog cache refreshes (which will also depends on SERVER-47855).

Generated at Thu Feb 08 05:36:21 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.