[SERVER-71426] Redaction for $telemetry redacts not only field names and values, but also MQL operators Created: 16/Nov/22  Updated: 27/Jan/23  Resolved: 24/Jan/23

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: David Storch Assignee: Jennifer Peshansky (Inactive)
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
depends on SERVER-73141 Generate query shape (literal redacti... Closed
Related
related to SERVER-71427 $telemetry returns multiple entries w... Closed
Assigned Teams:
Query Optimization
Operating System: ALL
Sprint: QO 2023-01-09, QO 2023-01-23, QO 2023-02-06
Participants:

 Description   

I tried out a simple example of using $telemetry at version 038c67d99cda1fb242ce3b4dcaf331e459f3ff41 of the master branch. First, in order to enable workload telemetry collection in the server, I started it like so:

./mongod --setParameter internalQueryConfigureTelemetrySamplingRate=1000000

Here's a snippet from the mongo shell which reproduces the problem:

MongoDB Enterprise > db.c.find({a: {$gt: 3}})
MongoDB Enterprise > db.getSiblingDB("admin").aggregate([{$telemetry: {}}]).pretty()
{
	"key" : {
		"find" : {
			"find" : "###",
			"filter" : {
				"###" : {
					"###" : "###"
				}
			}
		},
		"namespace" : "test.c",
		"applicationName" : "MongoDB Shell"
	},
	"metrics" : {
		"lastExecutionMicros" : NumberLong(1961),
		"execCount" : NumberLong(1),
		"queryOptMicros" : {
			"sum" : NumberLong(287),
			"max" : NumberLong(287),
			"min" : NumberLong(287),
			"sumOfSquares" : NumberLong(82369)
		},
		"queryExecMicros" : {
			"sum" : NumberLong(1961),
			"max" : NumberLong(1961),
			"min" : NumberLong(1961),
			"sumOfSquares" : NumberLong(3845521)
		},
		"docsReturned" : {
			"sum" : NumberLong(0),
			"max" : NumberLong(0),
			"min" : NumberLong(0),
			"sumOfSquares" : NumberLong(0)
		},
		"docsScanned" : {
			"sum" : NumberLong(1),
			"max" : NumberLong(1),
			"min" : NumberLong(1),
			"sumOfSquares" : NumberLong(1)
		},
		"keysScanned" : {
			"sum" : NumberLong(0),
			"max" : NumberLong(0),
			"min" : NumberLong(0),
			"sumOfSquares" : NumberLong(0)
		},
		"firstSeenTimestamp" : Timestamp(1668636882, 0)
	},
	"asOf" : Timestamp(1668636883, 0)
}
...

The bug pertains to the value of the key field. As you can see, all field names and values are redacted, including the $gt. We know that we need to redact constants in the query since it may be PII or have data security/privacy considerations. I believe there is an active discussion about our behavior about redacting or anonymizing field names. But there is no doubt that we should be including the $gt in the output. Otherwise we know nothing about what the query actually was.

Note that the same problem occurs even if I enable internalQueryConfigureTelemetryFieldNameRedactionStrategy=sha256. In that case, the key looks like this:

	"key" : {
		"find" : {
			"find" : "###",
			"filter" : {
				"ypeBEsobvcr6" : {
					"Jt74XIwL/Ngm" : "###"
				}
			}
		},
		"namespace" : "test.c",
		"applicationName" : "MongoDB Shell"
	},

Another note: Are there any end-to-end tests which show that redaction is working as expected?



 Comments   
Comment by Jennifer Peshansky (Inactive) [ 05/Jan/23 ]

I've typed up a document summarizing the types of redaction with different purposes, and discussing implementation options.

Generated at Thu Feb 08 06:18:58 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.