[DOCS-14205] [SERVER] Bug in docs - _id queries on sharded collections are not actually covered Created: 20/Jan/21  Updated: 30/Oct/23  Resolved: 01/Aug/22

Status: Closed
Project: Documentation
Component/s: Server
Affects Version/s: 4.0.0, 4.2.0, 4.4.0, 5.0.0, 6.0.0
Fix Version/s: Server_Docs_20231030

Type: Task Priority: Major - P3
Reporter: Esha Maharishi (Inactive) Assignee: Dave Cuthbert (Inactive)
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Participants:
Days since reply: 1 year, 27 weeks, 2 days ago
Epic Link: DOCSP-11701

 Description   

Our docs say:

Starting in MongoDB 3.0, an index cannot cover a query on a sharded collection when run against a mongos if the index does not contain the shard key, with the following exception for the _id index: If a query on a sharded collection only specifies a condition on the _id field and returns only the _id field, the _id index can cover the query when run against a mongos even if the _id field is not the shard key.

Either the docs are wrong, and queries on the _id index are not covered because they actually do fetch the full document through the IDHACK stage (rather than the normal FETCH stage) and do orphan filtering based on the shard key from the document.

Or there is a server bug that queries on the _id index are covered, but orphan documents are not filtered out.



 Comments   
Comment by Githook User [ 01/Aug/22 ]

Author:

{'name': 'Dave Cuthbert', 'email': '69165704+davemungo@users.noreply.github.com', 'username': 'davemungo'}

Message: DOCS-14205 BACKPORT (#1547)
Branch: v4.2
https://github.com/10gen/docs-mongodb-internal/commit/89397d05be0f54d7027a726d622e32ffa6298e97

Comment by Githook User [ 01/Aug/22 ]

Author:

{'name': 'Dave Cuthbert', 'email': '69165704+davemungo@users.noreply.github.com', 'username': 'davemungo'}

Message: DOCS-14205 BACKPORT (#1546)
Branch: v4.4
https://github.com/10gen/docs-mongodb-internal/commit/49e19482ed9ff005e27b361640f513fc0b93108b

Comment by Githook User [ 01/Aug/22 ]

Author:

{'name': 'Dave Cuthbert', 'email': '69165704+davemungo@users.noreply.github.com', 'username': 'davemungo'}

Message: DOCS-14205 BACKPORT (#1545)
Branch: v5.0
https://github.com/10gen/docs-mongodb-internal/commit/9dcd551fddeaba3272455aab9ef8c7652d44fec4

Comment by Githook User [ 01/Aug/22 ]

Author:

{'name': 'Dave Cuthbert', 'email': '69165704+davemungo@users.noreply.github.com', 'username': 'davemungo'}

Message: DOCS-14205-docs-bug (#1481)
Branch: master
https://github.com/10gen/docs-mongodb-internal/commit/e0eaf039a6b53f9bb86655a20d12458d4d95ae61

Comment by Charlie Swanson [ 09/Feb/21 ]

Yes that's a fair point. Re-reading that after diagnosing what's going on, I think it is misleading and better left unsaid. I'll move this ticket over to DOCS.

Comment by Esha Maharishi (Inactive) [ 09/Feb/21 ]

Thanks charlie.swanson for investigating! Good to know that the orphan docs are being filtered out because the full document is being fetched.

Does this mean the docs should be updated to remove the part in bold?

Starting in MongoDB 3.0, an index cannot cover a query on a sharded collection when run against a mongos if the index does not contain the shard key*, with the following exception for the _id index: If a query on a sharded collection only specifies a condition on the _id field and returns only the _id field, the _id index can cover the query when run against a mongos even if the _id field is not the shard key.*

Comment by Charlie Swanson [ 09/Feb/21 ]

Ah! esha.maharishi I had a misunderstanding which explains this. The IDHACK stage will always perform a fetch of the full document. I'm actually a little surprised, but it does explain how we're able to get away with orphan filtering on IDHACK stages. I tested this out and we do manage to avoid returning orphan documents, even for the following queries:

db.foo.find({_id: "target"}, {_id: 1})
db.foo.find({_id: {$gt: 0}}, {_id: 1})

The second one can't use IDHACK, but we add a FETCH and a SHARD_FILTER stage there. You can see the SHARD_FILTER in the IDHACK in explain:

mongos> db.foo.explain().find({_id: 1}, {_id: 1})
{
	"queryPlanner" : {
		"mongosPlannerVersion" : 1,
		"winningPlan" : {
			"stage" : "SHARD_MERGE",
			"shards" : [
				{
					"shardName" : "__unknown_name__-rs0",
					"connectionString" : "__unknown_name__-rs0/franklinia:20000",
					"serverInfo" : {
						"host" : "franklinia",
						"port" : 20000,
						"version" : "4.9.0-alpha4",
						"gitVersion" : "unknown"
					},
					"namespace" : "test.foo",
					"indexFilterSet" : false,
					"parsedQuery" : {
						"_id" : {
							"$eq" : 1
						}
					},
					"queryHash" : "18D22A43",
					"planCacheKey" : "21ED6765",
					"maxIndexedOrSolutionsReached" : false,
					"maxIndexedAndSolutionsReached" : false,
					"maxScansToExplodeReached" : false,
					"winningPlan" : {
						"stage" : "PROJECTION_SIMPLE",
						"transformBy" : {
							"_id" : 1
						},
						"inputStage" : {
							"stage" : "SHARDING_FILTER",
							"inputStage" : {
								"stage" : "IDHACK"
							}
						}
					},
					"rejectedPlans" : [ ]
				},
				{
					"shardName" : "__unknown_name__-rs1",
					"connectionString" : "__unknown_name__-rs1/franklinia:20001",
					"serverInfo" : {
						"host" : "franklinia",
						"port" : 20001,
						"version" : "4.9.0-alpha4",
						"gitVersion" : "unknown"
					},
					"namespace" : "test.foo",
					"indexFilterSet" : false,
					"parsedQuery" : {
						"_id" : {
							"$eq" : 1
						}
					},
					"queryHash" : "18D22A43",
					"planCacheKey" : "21ED6765",
					"maxIndexedOrSolutionsReached" : false,
					"maxIndexedAndSolutionsReached" : false,
					"maxScansToExplodeReached" : false,
					"winningPlan" : {
						"stage" : "PROJECTION_SIMPLE",
						"transformBy" : {
							"_id" : 1
						},
						"inputStage" : {
							"stage" : "SHARDING_FILTER",
							"inputStage" : {
								"stage" : "IDHACK"
							}
						}
					},
					"rejectedPlans" : [ ]
				}
			]
		}
	},
	"serverInfo" : {
		"host" : "franklinia",
		"port" : 20003,
		"version" : "4.9.0-alpha4",
		"gitVersion" : "unknown"
	},
	"command" : {
		"find" : "foo",
		"filter" : {
			"_id" : 1
		},
		"projection" : {
			"_id" : 1
		},
		"lsid" : {
			"id" : UUID("9918f1e2-416e-4853-8fd3-e7ab23a2ff40")
		},
		"$clusterTime" : {
			"clusterTime" : Timestamp(1612877043, 1),
			"signature" : {
				"hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
				"keyId" : NumberLong(0)
			}
		},
		"$db" : "test"
	},
	"ok" : 1,
	"$clusterTime" : {
		"clusterTime" : Timestamp(1612877047, 1),
		"signature" : {
			"hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
			"keyId" : NumberLong(0)
		}
	},
	"operationTime" : Timestamp(1612877010, 1)
}
 

Feel free to re-open if you have any questions! 

 

Generated at Thu Feb 08 08:09:46 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.