[SERVER-85218] Config Shard calls do not enforce causal consistency access for local shards Created: 15/Jan/24 Updated: 01/Feb/24 |
|
| Status: | Needs Scheduling |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Jordi Olivares Provencio | Assignee: | Backlog - Catalog and Routing |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Assigned Teams: |
Catalog and Routing
|
| Operating System: | ALL |
| Sprint: | CAR Team 2024-01-22 |
| Participants: |
| Description |
|
Calls to the config shard right now are done via the help of ShardRegistry::getConfigShard() which returns a ConfigShardWrapper instance. This wrapper is in charge of attaching the configTime to the requests sent to the config server. However, the approach chosen is flaky since the causal consistency expected here is not enforced and instead offered as a best-effort. This is because the code uses the minClusterTime setting in the ReadConcernPreferences. This is exclusively used for choosing which node to route the request to but is ignored if no node satisfies it (code that does this). This means that theoretically if the wrong node is chosen it could be reading stale data if they do not set a valid readConcern clusterTime. This is currently the case if the node chosen is a stale secondary accessing itself via the RSLocalClient as part of ShardLocal. |
| Comments |
| Comment by Jordi Olivares Provencio [ 18/Jan/24 ] |
|
Putting this back into Needs Scheduling as the original failure that we believed was the root cause is false. Right now we only access via ShardLocal if we're the primary config shard node and for doing modifications only. This is currently safe, but we're leaving this ticket open as a way to cleanup ShardLocal from our codebase. |