[SERVER-38758] Avoid restarting $lookup from the top if a stale version error is encountered during a single lookup Created: 21/Dec/18 Updated: 06/Dec/22 Resolved: 26/Feb/20 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Aggregation Framework |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Charlie Swanson | Assignee: | Backlog - Query Team (Inactive) |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||
| Assigned Teams: |
Query
|
||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||
| Description |
|
When a $lookup is reading from a sharded collection it will go through the normal shard versioning procedure to establish cursors on all the shards so that everyone agrees on a point-in-time of ownership across the cluster. During this establishment one of the shards may return a stale version error to the $lookup stage. The $lookup stage does not have any logic to handle that error, refresh the sharding catalog cache used for targeting, and retry the cursor establishment. This logic does exist, but the error is retried all the way at the top of the aggregate command, not the $lookup itself. This ticket tracks the work to handle this error and retry within the $lookup itself. In order to share the retry logic with $graphLookup, we think the best place to implement this would be somewhere inside of attachCursorSourceToPipeline. |
| Comments |
| Comment by Nicholas Zolnierz [ 20/Feb/20 ] |
|
charlie.swanson I think we'll get this for free once |
| Comment by Ian Boros [ 24/Jan/19 ] |
|
[note to self] This is purely an optimization. We just want to avoid restarting an entire aggregation when one "sub lookup" fails due StaleShardVersion error. (3) |
| Comment by Charlie Swanson [ 21/Dec/18 ] |
|
Marking this as depends on |