[SERVER-59754] Incorrect logging of queryHash/planCacheKey for operations that share the same $lookup shape Created: 02/Sep/21  Updated: 29/Oct/23  Resolved: 13/Jan/22

Status: Closed
Project: Core Server
Component/s: Logging, Querying
Affects Version/s: None
Fix Version/s: 5.3.0, 5.0.6, 4.2.19, 5.2.1, 4.4.13

Type: Bug Priority: Major - P3
Reporter: Alex Bevilacqua Assignee: Jennifer Peshansky (Inactive)
Resolution: Fixed Votes: 0
Labels: query-director-triage
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v5.2, v5.1, v5.0, v4.4, v4.2
Sprint: QE 2021-10-04, QE 2021-10-18, QE 2021-11-01, QE 2021-11-15, QE 2021-11-29, QE 2021-12-13, QE 2021-12-27, QE 2022-01-10, QE 2022-01-24
Participants:
Case:

 Description   

When multiple aggregation operations with different compositions share an identical $lookup stage, the log entry that is generated for the aggregate command can incorrectly list the queryHash and planCacheKey. It appears to be showing the values from the $lookup collection's plan cache; not the collection the operation was targeting.

db.setProfilingLevel(0, { slowms: 0 })
 
db.foo.drop(); db.foo.insert({ a: 1, b: 1, c: 1 }); 
db.foo.createIndexes([{ a: 1, b: 1 }, { b: 1, a: 1 }]);
db.bar.drop(); db.bar.insert({ a: 1, b: 1, c: 1 }); 
db.bar.createIndexes([{ a: 1, b: 1, c: 1 }, { b: 1, a: 1, c: 1 }])
 
 
db.foo.aggregate([
{"$match":{ $or: [ { a: 1 }, { b: 1 } ]}},
{"$lookup":{"from":"bar","let":{"b":{"$ifNull":["$b",null]}},"pipeline":[{"$match":{"$or":[{"a":{"$exists":false}},{"a":1}]}},{"$match":{"$expr":{"$eq":["$b","$$b"]}}}],"as":"bar"}},
])
 
db.foo.aggregate([
{"$match":{ a: { $in: [ 1, 2, 3, 4, 5 ] } } },
{"$lookup":{"from":"bar","let":{"b":{"$ifNull":["$b",null]}},"pipeline":[{"$match":{"$or":[{"a":{"$exists":false}},{"a":1}]}},{"$match":{"$expr":{"$eq":["$b","$$b"]}}}],"as":"bar"}},
])
 
db.foo.aggregate([
{"$match":{ b: 1 }},
{"$lookup":{"from":"bar","let":{"b":{"$ifNull":["$b",null]}},"pipeline":[{"$match":{"$or":[{"a":{"$exists":false}},{"a":1}]}},{"$match":{"$expr":{"$eq":["$b","$$b"]}}}],"as":"bar"}},
])

The output of the above will be similar to the following, which each list those fields as "queryHash":"ABFD1233","planCacheKey":"80A4A233" though the operations are clearly different:

{"t":{"$date":"2021-09-02T12:19:47.015-04:00"},"s":"I","c":"COMMAND","id":51803,"ctx":"conn4","msg":"Slow query","attr":{"type":"command","ns":"test.foo","command":{"aggregate":"foo","pipeline":[{"$match":{"$or":[{"a":1},{"b":1}]}},{"$lookup":{"from":"bar","let":{"b":{"$ifNull":["$b",null]}},"pipeline":[{"$match":{"$or":[{"a":{"$exists":false}},{"a":1}]}},{"$match":{"$expr":{"$eq":["$b","$$b"]}}}],"as":"bar"}}],"cursor":{},"$db":"test","lsid":{"id":{"$uuid":"8ad43435-01e8-4984-9f78-98cb37c1e463"}}},"planSummary":"IXSCAN { a: 1, b: 1 }, IXSCAN { b: 1, a: 1 }","keysExamined":3,"docsExamined":2,"cursorExhausted":true,"numYields":0,"nreturned":1,"queryHash":"ABFD1233","planCacheKey":"80A4A233","reslen":223,"locks":{"Global":{"acquireCount":{"r":4}},"Mutex":{"acquireCount":{"r":4}}},"storage":{"data":{"bytesRead":412}},"remote":"127.0.0.1:59396","protocol":"op_msg","durationMillis":12}}

{"t":{"$date":"2021-09-02T12:19:47.450-04:00"},"s":"I","c":"COMMAND","id":51803,"ctx":"conn4","msg":"Slow query","attr":{"type":"command","ns":"test.foo","command":{"aggregate":"foo","pipeline":[{"$match":{"a":{"$in":[1,2,3,4,5]}}},{"$lookup":{"from":"bar","let":{"b":{"$ifNull":["$b",null]}},"pipeline":[{"$match":{"$or":[{"a":{"$exists":false}},{"a":1}]}},{"$match":{"$expr":{"$eq":["$b","$$b"]}}}],"as":"bar"}}],"cursor":{},"$db":"test","lsid":{"id":{"$uuid":"8ad43435-01e8-4984-9f78-98cb37c1e463"}}},"planSummary":"IXSCAN { a: 1, b: 1 }","keysExamined":2,"docsExamined":2,"cursorExhausted":true,"numYields":0,"nreturned":1,"queryHash":"ABFD1233","planCacheKey":"80A4A233","reslen":223,"locks":{"Global":{"acquireCount":{"r":4}},"Mutex":{"acquireCount":{"r":4}}},"storage":{},"remote":"127.0.0.1:59396","protocol":"op_msg","durationMillis":1}}

{"t":{"$date":"2021-09-02T12:19:47.917-04:00"},"s":"I","c":"COMMAND","id":51803,"ctx":"conn4","msg":"Slow query","attr":{"type":"command","ns":"test.foo","command":{"aggregate":"foo","pipeline":[{"$match":{"b":1}},{"$lookup":{"from":"bar","let":{"b":{"$ifNull":["$b",null]}},"pipeline":[{"$match":{"$or":[{"a":{"$exists":false}},{"a":1}]}},{"$match":{"$expr":{"$eq":["$b","$$b"]}}}],"as":"bar"}}],"cursor":{},"$db":"test","lsid":{"id":{"$uuid":"8ad43435-01e8-4984-9f78-98cb37c1e463"}}},"planSummary":"IXSCAN { b: 1, a: 1 }","keysExamined":2,"docsExamined":2,"cursorExhausted":true,"numYields":0,"nreturned":1,"queryHash":"ABFD1233","planCacheKey":"80A4A233","reslen":223,"locks":{"Global":{"acquireCount":{"r":4}},"Mutex":{"acquireCount":{"r":4}}},"storage":{},"remote":"127.0.0.1:59396","protocol":"op_msg","durationMillis":0}}

Checking the plan caches for the two collections will show the duplicate values above belong to test.bar; not test.foo. If you run with explain however you get the actual values (and what should likely be logged):

db.foo.explain().aggregate([
{"$match":{ $or: [ { a: 1 }, { b: 1 } ]}},
{"$lookup":{"from":"bar","let":{"b":{"$ifNull":["$b",null]}},"pipeline":[{"$match":{"$or":[{"a":{"$exists":false}},{"a":1}]}},{"$match":{"$expr":{"$eq":["$b","$$b"]}}}],"as":"bar"}},
]).stages[0].$cursor.queryPlanner.planCacheKey
// 070E843B
 
db.foo.explain().aggregate([
{"$match":{ a: { $in: [ 1, 2, 3, 4, 5 ] } } },
{"$lookup":{"from":"bar","let":{"b":{"$ifNull":["$b",null]}},"pipeline":[{"$match":{"$or":[{"a":{"$exists":false}},{"a":1}]}},{"$match":{"$expr":{"$eq":["$b","$$b"]}}}],"as":"bar"}},
]).stages[0].$cursor.queryPlanner.planCacheKey
// 5BD7613D
 
db.foo.explain().aggregate([
{"$match":{ b: 1 }},
{"$lookup":{"from":"bar","let":{"b":{"$ifNull":["$b",null]}},"pipeline":[{"$match":{"$or":[{"a":{"$exists":false}},{"a":1}]}},{"$match":{"$expr":{"$eq":["$b","$$b"]}}}],"as":"bar"}},
]).stages[0].$cursor.queryPlanner.planCacheKey
// 3C84EBC6

This has been tested and verified on 5.0.2-ent and 4.2.15-ent and appears to only affect the logged representation of these values.



 Comments   
Comment by Githook User [ 24/Jan/22 ]

Author:

{'name': 'Jennifer Peshansky', 'email': 'jennifer.peshansky@mongodb.com', 'username': 'jenniferpeshansky'}

Message: SERVER-59754 Prevent get_executor from overwriting information set by an outer pipeline

(cherry picked from commit 4970d9626853e3b916aab3d911bcf4aba23e8c88)
Branch: v5.2
https://github.com/mongodb/mongo/commit/fd8c42a780e11463fe109ba18d2bf44197699167

Comment by Githook User [ 21/Jan/22 ]

Author:

{'name': 'Jennifer Peshansky', 'email': 'jennifer.peshansky@mongodb.com', 'username': 'jenniferpeshansky'}

Message: SERVER-59754 Prevent get_executor from overwriting information set by an outer pipeline

(cherry picked from commit 4970d9626853e3b916aab3d911bcf4aba23e8c88)
(cherry picked from commit da0cacc0c3f3aa6e8b3405443aa2011c5e6d8220)
Branch: v4.4
https://github.com/mongodb/mongo/commit/f348e7d2b2bd9f4a5c45ced40362cbe707db7453

Comment by Githook User [ 21/Jan/22 ]

Author:

{'name': 'Jennifer Peshansky', 'email': 'jennifer.peshansky@mongodb.com', 'username': 'jenniferpeshansky'}

Message: SERVER-59754 Prevent get_executor from overwriting information set by an outer pipeline

(cherry picked from commit 4970d9626853e3b916aab3d911bcf4aba23e8c88)
(cherry picked from commit da0cacc0c3f3aa6e8b3405443aa2011c5e6d8220)
(cherry picked from commit b8698458dd5420162e0dec1f5615669ad1c137b1)
Branch: v4.2
https://github.com/mongodb/mongo/commit/0efeb8cb201e48dca15459c863decacd0ac847e7

Comment by Githook User [ 18/Jan/22 ]

Author:

{'name': 'Jennifer Peshansky', 'email': 'jennifer.peshansky@mongodb.com', 'username': 'jenniferpeshansky'}

Message: SERVER-59754 Prevent get_executor from overwriting information set by an outer pipeline

(cherry picked from commit 4970d9626853e3b916aab3d911bcf4aba23e8c88)
Branch: v5.0
https://github.com/mongodb/mongo/commit/8cff596482240e32e1d8249b6043bc75987541ca

Comment by Githook User [ 13/Jan/22 ]

Author:

{'name': 'Jennifer Peshansky', 'email': 'jennifer.peshansky@mongodb.com', 'username': 'jenniferpeshansky'}

Message: SERVER-59754 Prevent get_executor from overwriting information set by an outer pipeline
Branch: master
https://github.com/mongodb/mongo/commit/f4a8099767f70e32751de6c0c63b5a80fba8c381

Comment by Jennifer Peshansky (Inactive) [ 15/Dec/21 ]

I have a preliminary/naive solution to this that I have confirmed solves the issue: in get_executor.cpp, simply wrapping

CurOp::get(_opCtx)->debug().queryHash = planCacheKey.queryHash();

with a check to see if the value is already filled, like:

if (!CurOp::get(_opCtx)->debug().queryHash) {
    CurOp::get(_opCtx)->debug().queryHash = planCacheKey.queryHash();
}

and doing the same for PlanCacheKey.

But this brings up a more fundamental problem: it is not just the query hash and plan cache key that are being overwritten; any information that get_executor adds to OpDebug will be overwritten by the inner pipeline.

Does the information of the inner $lookup pipeline matter? Should we wrap all calls to set an OpDebug variable in a check to see that the variable was previously empty? Or, perhaps, should the $lookup pipeline not be using the same OpCtx?

Comment by Jennifer Peshansky (Inactive) [ 14/Dec/21 ]

Logging has confirmed get_executor.cpp to be the place where the queryHash is overwritten. Now this becomes a question of policy - when a nested stage has its own queryHash and planCacheKey, that query may be the one that's being slow, so that information may still be relevant. Do we want to re-overwrite it with the queryHash of the outer query? Or do we want to add some possibility of storing this information in a nested format?

Comment by Jennifer Peshansky (Inactive) [ 14/Dec/21 ]

There are three places where both queryHash and planCacheKey are set in OpDebug. These are:

src/mongo/db/clientcursor.cpp
src/mongo/db/cursor_manager.cpp
src/mongo/db/query/get_executor.cpp

I'll investigate more, but I suspect the culprit is get_executor.cpp.

Comment by Jennifer Peshansky (Inactive) [ 30/Sep/21 ]

I was able to repro this on the latest master as well as 5.0.2. I'm getting the exact same values for queryHash and planCacheKey for all three queries:
"queryHash":"ABFD1233","planCacheKey":"80A4A233"
And if I change the $match state slightly, the queryHash and planCacheKey change but stay consistent between all three queries.

Generated at Thu Feb 08 05:48:02 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.