-
Type:
Bug
-
Resolution: Duplicate
-
Priority:
Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
None
-
Query Optimization
-
ALL
-
QO 2023-01-09, QO 2023-01-23, QO 2023-02-06
-
None
-
None
-
None
-
None
-
None
-
None
-
None
I tried out a simple example of using $telemetry at version 038c67d99cda1fb242ce3b4dcaf331e459f3ff41 of the master branch. First, in order to enable workload telemetry collection in the server, I started it like so:
./mongod --setParameter internalQueryConfigureTelemetrySamplingRate=1000000
Here's a snippet from the mongo shell which reproduces the problem:
MongoDB Enterprise > db.c.find({a: {$gt: 3}})
MongoDB Enterprise > db.getSiblingDB("admin").aggregate([{$telemetry: {}}]).pretty()
{
"key" : {
"find" : {
"find" : "###",
"filter" : {
"###" : {
"###" : "###"
}
}
},
"namespace" : "test.c",
"applicationName" : "MongoDB Shell"
},
"metrics" : {
"lastExecutionMicros" : NumberLong(1961),
"execCount" : NumberLong(1),
"queryOptMicros" : {
"sum" : NumberLong(287),
"max" : NumberLong(287),
"min" : NumberLong(287),
"sumOfSquares" : NumberLong(82369)
},
"queryExecMicros" : {
"sum" : NumberLong(1961),
"max" : NumberLong(1961),
"min" : NumberLong(1961),
"sumOfSquares" : NumberLong(3845521)
},
"docsReturned" : {
"sum" : NumberLong(0),
"max" : NumberLong(0),
"min" : NumberLong(0),
"sumOfSquares" : NumberLong(0)
},
"docsScanned" : {
"sum" : NumberLong(1),
"max" : NumberLong(1),
"min" : NumberLong(1),
"sumOfSquares" : NumberLong(1)
},
"keysScanned" : {
"sum" : NumberLong(0),
"max" : NumberLong(0),
"min" : NumberLong(0),
"sumOfSquares" : NumberLong(0)
},
"firstSeenTimestamp" : Timestamp(1668636882, 0)
},
"asOf" : Timestamp(1668636883, 0)
}
...
The bug pertains to the value of the key field. As you can see, all field names and values are redacted, including the $gt. We know that we need to redact constants in the query since it may be PII or have data security/privacy considerations. I believe there is an active discussion about our behavior about redacting or anonymizing field names. But there is no doubt that we should be including the $gt in the output. Otherwise we know nothing about what the query actually was.
Note that the same problem occurs even if I enable internalQueryConfigureTelemetryFieldNameRedactionStrategy=sha256. In that case, the key looks like this:
"key" : {
"find" : {
"find" : "###",
"filter" : {
"ypeBEsobvcr6" : {
"Jt74XIwL/Ngm" : "###"
}
}
},
"namespace" : "test.c",
"applicationName" : "MongoDB Shell"
},
Another note: Are there any end-to-end tests which show that redaction is working as expected?
- depends on
-
SERVER-73141 Generate query shape (literal redaction) for expressions in expression_leaf.h
-
- Closed
-
- related to
-
SERVER-71427 $telemetry returns multiple entries with the same key even though the corresponding queries were distinct shapes
-
- Closed
-