[SERVER-82860] Local data access for aggregations should not keep retrying in case of StaleConfig Created: 07/Nov/23  Updated: 06/Feb/24

Status: Open
Project: Core Server
Component/s: None
Affects Version/s: 5.0.0, 6.0.0, 7.0.0, 7.1.0
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Enrico Golfieri Assignee: Enrico Golfieri
Resolution: Unresolved Votes: 0
Labels: car-qw
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Problem/Incident
Related
related to SERVER-77402 Create ShardRole retry loop utility Backlog
Assigned Teams:
Catalog and Routing
Operating System: ALL
Sprint: CAR Team 2023-12-25, CAR Team 2024-02-05, CAR Team 2024-02-19
Participants:
Story Points: 3

 Description   

In all version previous 7.2, in case of aggregation with $lookup, if the user data are located on the local shard we will simply run a router loop that will attempt 10 times to run the aggregation locally hoping at least one will succeed.

The local access will cause a check on the local filtering metadata which in case they are not installed yet, the collection access would return StaleConfig. Usually it's ok to retry since it's just a transient error that requires a refresh on the shard side. However, because the access is local, the filtering metadata are not refreshed until the error is propagated back to the entry point which will performed the refresh and obtain the filtering metadata

This happens after failing 10 times, but we could simply fail at the 1th in case of StaleConfig. In 7.2 this issue was unintentionally fixed by SERVER-74816https://github.com/10gen/mongo/blob/ba27121ae83e40362e418f7f4b0f88ef79977765/src/mongo/db/pipeline/sharded_agg_helpers.cpp#L1822-L1862 

The goal of this ticket is to backport that specific change up to 5.0 

 



 Comments   
Comment by Jordi Serra Torrens [ 05/Dec/23 ]

One idea is to wrap that local read within a shard-role retry loop (proposed here: SERVER-77402)

Generated at Thu Feb 08 06:50:29 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.