[SERVER-38758] Avoid restarting $lookup from the top if a stale version error is encountered during a single lookup Created: 21/Dec/18  Updated: 06/Dec/22  Resolved: 26/Feb/20

Status: Closed
Project: Core Server
Component/s: Aggregation Framework
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Charlie Swanson Assignee: Backlog - Query Team (Inactive)
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
depends on SERVER-38728 Allow pipeline with $lookup into a sh... Closed
Duplicate
duplicates SERVER-45538 Implement and test shard version retr... Closed
Related
related to SERVER-45538 Implement and test shard version retr... Closed
Assigned Teams:
Query
Participants:

 Description   

When a $lookup is reading from a sharded collection it will go through the normal shard versioning procedure to establish cursors on all the shards so that everyone agrees on a point-in-time of ownership across the cluster. During this establishment one of the shards may return a stale version error to the $lookup stage. The $lookup stage does not have any logic to handle that error, refresh the sharding catalog cache used for targeting, and retry the cursor establishment. This logic does exist, but the error is retried all the way at the top of the aggregate command, not the $lookup itself. This ticket tracks the work to handle this error and retry within the $lookup itself.

In order to share the retry logic with $graphLookup, we think the best place to implement this would be somewhere inside of attachCursorSourceToPipeline.



 Comments   
Comment by Nicholas Zolnierz [ 20/Feb/20 ]

charlie.swanson I think we'll get this for free once SERVER-45538 lands?

Comment by Ian Boros [ 24/Jan/19 ]

[note to self] This is purely an optimization. We just want to avoid restarting an entire aggregation when one "sub lookup" fails due StaleShardVersion error. (3)

Comment by Charlie Swanson [ 21/Dec/18 ]

Marking this as depends on SERVER-38728 not because of a hard or logical dependency, but rather because I believe SERVER-38728 will move around a lot of code that will need to be changed for this ticket, and the merge conflicts will be nasty.

Generated at Thu Feb 08 04:49:55 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.