[SERVER-83614] $queue/$lookup pipeline can throw if database dropped at wrong time Created: 27/Nov/23 Updated: 12/Jan/24 Resolved: 12/Jan/24 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Romans Kasperovics | Assignee: | Henri Nikku |
| Resolution: | Gone away | Votes: | 0 |
| Labels: | greenerbuild | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||
| Assigned Teams: |
Query Optimization
|
||||||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||||||
| Sprint: | QO 2023-12-11, QO 2023-12-25, QO 2024-01-08, QO 2024-01-22 | ||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||
| Linked BF Score: | 135 | ||||||||||||||||||||||||
| Description |
|
An aggregation pipeline like
should run on mongos if 'some_db.test_coll' does not exist, or on a shard otherwise. Currently, it will throw if someone drops the database at the wrong moment during the query execution. The reason for this is the deferred construction of CollectionRoutingInfo in ClusterAggregation::runAggregate(). One possible solution would be to acquire CollectionRoutingInfo only once during query optimization and rewrite the pipeline accordingly. For instance, if we know 'some_db.test_coll' does not exist, we can remove the '$lookup' stage. When this is done, we should consider replacing the uassert with a tassert in runPipelineOnMongoS(), so that the fuzzer tests can discover unexpected issues. |
| Comments |
| Comment by Henri Nikku [ 12/Jan/24 ] | |||||||||||||||||||||||||||||||||||||||
|
Closing this as SERVER-83658 tracks the refactoring work around acquiring CollectionRoutingInfo. Generational fuzzers can't reproduce this issue as the aggregation grammars don't contain $out or $_internalSplitPipeline. Mutational fuzzers already tolerate the existing behavior, which is to uassert. | |||||||||||||||||||||||||||||||||||||||
| Comment by Romans Kasperovics [ 27/Nov/23 ] | |||||||||||||||||||||||||||||||||||||||
|
Here is the script to reproduce the bug (inspired by Mihai's script to reproduce
... and we need to add some sleeps to the server code:
The command to run the test:
|