[SERVER-85869] Exhaustive find on config shard can return stale data Created: 29/Jan/24  Updated: 06/Feb/24

Status: In Code Review
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Jordi Olivares Provencio Assignee: Jordi Olivares Provencio
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Related
related to SERVER-86278 Investigate discrepancy between Shard... Needs Scheduling
is related to SERVER-64863 Interoperability issues between the V... Blocked
Operating System: ALL
Backport Requested:
v7.3
Sprint: CAR Team 2024-02-05, CAR Team 2024-02-19
Participants:
Linked BF Score: 105

 Description   

In order to populate the sharding caches we perform exhaustive find commands on the config shard. These finds are performed with majority readConcern and a minimum cluster time to ensure causality.

However, doing so could cause the query to return invalid results with particularly slow machines as an internal snapshot refresh could make us ignore later documents if they got deleted. This is an issue because we transactionally modify multiple documents and the exhaustive find could return partially applied commits due to the refresh.
For example:

  • Say we have two documents A0 and B0.
  • We start a scan here, read A0, and yield.
  • We update both transactionally to A1 and B1.
  • We refresh the scan snapshot and now read B1.
  • End result is A0, and B1.

If B1 is a deletion and we expected to read B0 then the query would only return A.



 Comments   
Comment by Jordi Olivares Provencio [ 30/Jan/24 ]

Linking with SERVER-64863 as we rolled it back to prevent advancing the timeInStore in the config shards too fast.

It should be done at the same time as this.

Generated at Thu Feb 08 06:58:49 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.