[SERVER-59754] Incorrect logging of queryHash/planCacheKey for operations that share the same $lookup shape Created: 02/Sep/21 Updated: 29/Oct/23 Resolved: 13/Jan/22 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Logging, Querying |
| Affects Version/s: | None |
| Fix Version/s: | 5.3.0, 5.0.6, 4.2.19, 5.2.1, 4.4.13 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Alex Bevilacqua | Assignee: | Jennifer Peshansky (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | query-director-triage | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||
| Backwards Compatibility: | Fully Compatible | ||||
| Operating System: | ALL | ||||
| Backport Requested: |
v5.2, v5.1, v5.0, v4.4, v4.2
|
||||
| Sprint: | QE 2021-10-04, QE 2021-10-18, QE 2021-11-01, QE 2021-11-15, QE 2021-11-29, QE 2021-12-13, QE 2021-12-27, QE 2022-01-10, QE 2022-01-24 | ||||
| Participants: | |||||
| Case: | (copied to CRM) | ||||
| Description |
|
When multiple aggregation operations with different compositions share an identical $lookup stage, the log entry that is generated for the aggregate command can incorrectly list the queryHash and planCacheKey. It appears to be showing the values from the $lookup collection's plan cache; not the collection the operation was targeting.
The output of the above will be similar to the following, which each list those fields as "queryHash":"ABFD1233","planCacheKey":"80A4A233" though the operations are clearly different:
Checking the plan caches for the two collections will show the duplicate values above belong to test.bar; not test.foo. If you run with explain however you get the actual values (and what should likely be logged):
This has been tested and verified on 5.0.2-ent and 4.2.15-ent and appears to only affect the logged representation of these values. |
| Comments |
| Comment by Githook User [ 24/Jan/22 ] | ||||
|
Author: {'name': 'Jennifer Peshansky', 'email': 'jennifer.peshansky@mongodb.com', 'username': 'jenniferpeshansky'}Message: (cherry picked from commit 4970d9626853e3b916aab3d911bcf4aba23e8c88) | ||||
| Comment by Githook User [ 21/Jan/22 ] | ||||
|
Author: {'name': 'Jennifer Peshansky', 'email': 'jennifer.peshansky@mongodb.com', 'username': 'jenniferpeshansky'}Message: (cherry picked from commit 4970d9626853e3b916aab3d911bcf4aba23e8c88) | ||||
| Comment by Githook User [ 21/Jan/22 ] | ||||
|
Author: {'name': 'Jennifer Peshansky', 'email': 'jennifer.peshansky@mongodb.com', 'username': 'jenniferpeshansky'}Message: (cherry picked from commit 4970d9626853e3b916aab3d911bcf4aba23e8c88) | ||||
| Comment by Githook User [ 18/Jan/22 ] | ||||
|
Author: {'name': 'Jennifer Peshansky', 'email': 'jennifer.peshansky@mongodb.com', 'username': 'jenniferpeshansky'}Message: (cherry picked from commit 4970d9626853e3b916aab3d911bcf4aba23e8c88) | ||||
| Comment by Githook User [ 13/Jan/22 ] | ||||
|
Author: {'name': 'Jennifer Peshansky', 'email': 'jennifer.peshansky@mongodb.com', 'username': 'jenniferpeshansky'}Message: | ||||
| Comment by Jennifer Peshansky (Inactive) [ 15/Dec/21 ] | ||||
|
I have a preliminary/naive solution to this that I have confirmed solves the issue: in get_executor.cpp, simply wrapping
with a check to see if the value is already filled, like:
and doing the same for PlanCacheKey. But this brings up a more fundamental problem: it is not just the query hash and plan cache key that are being overwritten; any information that get_executor adds to OpDebug will be overwritten by the inner pipeline. Does the information of the inner $lookup pipeline matter? Should we wrap all calls to set an OpDebug variable in a check to see that the variable was previously empty? Or, perhaps, should the $lookup pipeline not be using the same OpCtx? | ||||
| Comment by Jennifer Peshansky (Inactive) [ 14/Dec/21 ] | ||||
|
Logging has confirmed get_executor.cpp to be the place where the queryHash is overwritten. Now this becomes a question of policy - when a nested stage has its own queryHash and planCacheKey, that query may be the one that's being slow, so that information may still be relevant. Do we want to re-overwrite it with the queryHash of the outer query? Or do we want to add some possibility of storing this information in a nested format? | ||||
| Comment by Jennifer Peshansky (Inactive) [ 14/Dec/21 ] | ||||
|
There are three places where both queryHash and planCacheKey are set in OpDebug. These are: src/mongo/db/clientcursor.cpp I'll investigate more, but I suspect the culprit is get_executor.cpp. | ||||
| Comment by Jennifer Peshansky (Inactive) [ 30/Sep/21 ] | ||||
|
I was able to repro this on the latest master as well as 5.0.2. I'm getting the exact same values for queryHash and planCacheKey for all three queries: |