[SERVER-39109] mongod crash: Invariant failure !_exec src/mongo/db/pipeline/document_source_cursor.cpp 295 Created: 21/Jan/19 Updated: 29/Oct/23 Resolved: 23/Jan/19 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Aggregation Framework |
| Affects Version/s: | 3.6.10, 4.0.5, 4.1.7 |
| Fix Version/s: | 3.6.11, 4.0.6, 4.1.8 |
| Type: | Bug | Priority: | Critical - P2 |
| Reporter: | Ivica Hrg | Assignee: | Ian Boros |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||
| Operating System: | ALL | ||||||||
| Backport Requested: |
v4.0, v3.6
|
||||||||
| Steps To Reproduce: | I was writing an aggreggation and after adding one step, server crashed down. Aggregation that worked fine is in attached file "aggregation - 01.txt" Aggregation that caused the server to crash is in attached file "aggregation - 02.txt"
Also, I've attached "server build info" and "server status" files ( after reboot ) |
||||||||
| Sprint: | Query 2019-01-28 | ||||||||
| Participants: | |||||||||
| Description |
|
Aggregation execution produced following error
|
| Comments |
| Comment by Githook User [ 31/Jan/19 ] | |||||||||||||||||||||||||||||||||||||||||||||
|
Author: {'name': 'Ian Boros', 'email': 'ian.boros@10gen.com'}Message: | |||||||||||||||||||||||||||||||||||||||||||||
| Comment by Githook User [ 23/Jan/19 ] | |||||||||||||||||||||||||||||||||||||||||||||
|
Author: {'email': 'ian.boros@10gen.com', 'name': 'Ian Boros'}Message: | |||||||||||||||||||||||||||||||||||||||||||||
| Comment by Githook User [ 23/Jan/19 ] | |||||||||||||||||||||||||||||||||||||||||||||
|
Author: {'email': 'ian.boros@10gen.com', 'name': 'Ian Boros'}Message: | |||||||||||||||||||||||||||||||||||||||||||||
| Comment by David Storch [ 22/Jan/19 ] | |||||||||||||||||||||||||||||||||||||||||||||
|
charlie.swanson agreed. Let's fix the crash first and then separately decide whether to prioritize the more complete fix. | |||||||||||||||||||||||||||||||||||||||||||||
| Comment by Charlie Swanson [ 22/Jan/19 ] | |||||||||||||||||||||||||||||||||||||||||||||
|
david.storch that's a good point. However, keep in mind that in version 3.4 we would reject this pipeline with the error I mentioned above. We should consider adding a mechanism to allow this pipeline but at first glance it seems like that may take significantly more engineering effort because it will involve either adding an escape-hatch from the dependency analysis if an invalid path is detected, or it will involve a more significant change to allow such paths inside the projection or more generally. I would advocate for splitting that into a separate ticket. | |||||||||||||||||||||||||||||||||||||||||||||
| Comment by David Storch [ 22/Jan/19 ] | |||||||||||||||||||||||||||||||||||||||||||||
|
charlie.swanson, it seems to me that in addition to the exception-safety problem you diagnosed above, there is additional bug. Namely, this query seems entirely legal under the current semantics of MQL. Therefore, the query should succeed rather than throwing an exception. | |||||||||||||||||||||||||||||||||||||||||||||
| Comment by Charlie Swanson [ 22/Jan/19 ] | |||||||||||||||||||||||||||||||||||||||||||||
|
Ok I think I've figured out what's going on here. This pipeline uses the path "parentOffer.$ref" which is not valid in the context of the aggregation system. Using it in a stage other than $match would result in "Location16410: FieldPath field names may not start with '$'.". Consider this pipeline:
This has a couple things to notice:
What ends up happening to cause this crash is that we compute that "parentOffer.$ref" is a dependency of the pipeline and needed from storage, then we successfully build a collection scan and a DocumentSourceCursor ($cursor) before realizing that "parentOffer.$ref" is invalid within the aggregation system and throwing an exception. Specifically, we throw an exception here during 'toParsedDeps()':
This exception happens at the critical time after we've established the cursor but before we've put the cursor inside the final Pipeline. Both the PlanExecutor and the Pipeline have a mechanism to ensure they are properly disposed, but the DocumentSourceCursor does not. During this method 'cursor' has ownership over the PlanExecutor. This disables the auto-disposal from a unique_ptr<PlanExecutor, PlanExecutor::Deleter> we have before creating 'cursor'. After this method, 'cursor' is owned by 'pipeline' which will ensure we dispose of it correctly. So only during this patch of code is it a problem if we encounter an exception. I have two ideas of how we could fix this. Either (1) immediately adding 'cursor' to 'pipeline' to ensure disposal or (2) add an ON_BLOCK_EXIT to ensure 'cursor' is properly destroyed if an exception happens during this method. Neither seem bulletproof, but this whole disposal mechanism should become unnecessary soon on the master branch due to work in PM-1081 to make all cursors globally managed and easier to clean up. | |||||||||||||||||||||||||||||||||||||||||||||
| Comment by Ivica Hrg [ 22/Jan/19 ] | |||||||||||||||||||||||||||||||||||||||||||||
|
Hi Charlie, Thanks for the quick reaction, looking forward to updates. | |||||||||||||||||||||||||||||||||||||||||||||
| Comment by Charlie Swanson [ 22/Jan/19 ] | |||||||||||||||||||||||||||||||||||||||||||||
|
Hi ihrg. Thanks for filing this report. I can reproduce the issue and I'm marking this for triage by the query team. Stay tuned for updates. |