[SERVER-85218] Config Shard calls do not enforce causal consistency access for local shards Created: 15/Jan/24  Updated: 01/Feb/24

Status: Needs Scheduling
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Jordi Olivares Provencio Assignee: Backlog - Catalog and Routing
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Assigned Teams:
Catalog and Routing
Operating System: ALL
Sprint: CAR Team 2024-01-22
Participants:

 Description   

Calls to the config shard right now are done via the help of ShardRegistry::getConfigShard() which returns a ConfigShardWrapper instance. This wrapper is in charge of attaching the configTime to the requests sent to the config server.

However, the approach chosen is flaky since the causal consistency expected here is not enforced and instead offered as a best-effort. This is because the code uses the minClusterTime setting in the ReadConcernPreferences. This is exclusively used for choosing which node to route the request to but is ignored if no node satisfies it (code that does this).

This means that theoretically if the wrong node is chosen it could be reading stale data if they do not set a valid readConcern clusterTime.

This is currently the case if the node chosen is a stale secondary accessing itself via the RSLocalClient as part of ShardLocal.



 Comments   
Comment by Jordi Olivares Provencio [ 18/Jan/24 ]

Putting this back into Needs Scheduling as the original failure that we believed was the root cause is false.

Right now we only access via ShardLocal if we're the primary config shard node and for doing modifications only. This is currently safe, but we're leaving this ticket open as a way to cleanup ShardLocal from our codebase.

Generated at Thu Feb 08 06:57:09 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.