In order to populate the sharding caches we perform exhaustive find commands on the config shard. These finds are performed with majority readConcern and a minimum cluster time to ensure causality.
However, doing so could cause the query to return invalid results with particularly slow machines as an internal snapshot refresh could make us ignore later documents if they got deleted. This is an issue because we transactionally modify multiple documents and the exhaustive find could return partially applied commits due to the refresh.
For example:
- Say we have two documents A0 and B0.
- We start a scan here, read A0, and yield.
- We update both transactionally to A1 and B1.
- We refresh the scan snapshot and now read B1.
- End result is A0, and B1.
If B1 is a deletion and we expected to read B0 then the query would only return A.
- causes
-
SERVER-86624 Make RSLocalClient also wait for a snapshot to be available
- Closed
-
SERVER-87198 [5.0] Make shard registry reads fallback to majority readConcern if snapshot reads fail
- Closed
- is related to
-
SERVER-64863 Interoperability issues between the VectorClock and the CatalogCache on secondary nodes of the CSRS
- Closed
- related to
-
SERVER-86278 Investigate discrepancy between ShardLocal and ShardRemote aggregations
- Open