[SERVER-14944] Use of Indexes for Query Created: 18/Aug/14  Updated: 03/Sep/14  Resolved: 03/Sep/14

Status: Closed
Project: Core Server
Component/s: Querying
Affects Version/s: 2.6.4
Fix Version/s: None

Type: Question Priority: Major - P3
Reporter: Michael De Lorenzo Assignee: Unassigned
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
duplicates SERVER-15086 Allow for efficient range queries ove... Closed
Participants:

 Description   

I have a large collection of documents that look like:

// Example Document 1
{
  _id: 1,
  authorized_advertisers: [1,2,3],
  timestamp: 1406865600
}
 
// Example Document 2
{
  _id: 2,
  authorized_advertisers: [1],
  timestamp: 1406865600
}
// Example Document 3
{
  _id: 3,
  authorized_advertisers: [],
  timestamp: 1406865600
}

I have indexes on the following two indexes defined on this collection (among others):

{
		"v" : 1,
		"key" : {
			"authorized_advertisers" : 1
		},
		"name" : "authorized_advertisers_1",
		"ns" : "cmply_app_engage_production.social_engagements",
		"sparse" : true
	}

{
		"v" : 1,
		"key" : {
			"authorized_advertisers" : 1,
			"timestamp" : 1
		},
		"name" : "authorized_advertisers_1_timestamp_1",
		"ns" : "cmply_app_engage_production.social_engagements"
	}

When I execute queries against this collection, a larger than (seemingly) necessary number of documents are scanned to produce the results. If I provide a hint to the query engine to use the authorized_advertisers_1_timestamp_1 index, the lower bound for the timestamp seems to be ignored.

Criteria

{
	"authorized_advertisers" : ObjectId("4e6906f2dd06fd1f50000007"),
	"timestamp" : {
		"$lte" : 1408371025,
		"$gt" : 1406865600
	}
}

Query using $hint

db.social_engagements.find(q).hint("authorized_advertisers_1_timestamp_1").explain()
{
	"cursor" : "BtreeCursor authorized_advertisers_1_timestamp_1",
	"isMultiKey" : true,
	"n" : 148823,
	"nscannedObjects" : 1274509,
	"nscanned" : 1274509,
	"nscannedObjectsAllPlans" : 1274509,
	"nscannedAllPlans" : 1274509,
	"scanAndOrder" : false,
	"indexOnly" : false,
	"nYields" : 9959,
	"nChunkSkips" : 0,
	"millis" : 6447,
	"indexBounds" : {
		"authorized_advertisers" : [
			[
				ObjectId("4e6906f2dd06fd1f50000007"),
				ObjectId("4e6906f2dd06fd1f50000007")
			]
		],
		"timestamp" : [
			[
				-Infinity,
				1408371025
			]
		]
	},
	"server" : "mydb.server.com:27017",
	"filterSet" : false
}

Query withOUT $hint

db.social_engagements.find(q).explain()
{
	"cursor" : "BtreeCursor authorized_advertisers_1",
	"isMultiKey" : true,
	"n" : 148823,
	"nscannedObjects" : 1275405,
	"nscanned" : 1275405,
	"nscannedObjectsAllPlans" : 1275975,
	"nscannedAllPlans" : 1275976,
	"scanAndOrder" : false,
	"indexOnly" : false,
	"nYields" : 9969,
	"nChunkSkips" : 0,
	"millis" : 7162,
	"indexBounds" : {
		"authorized_advertisers" : [
			[
				ObjectId("4e6906f2dd06fd1f50000007"),
				ObjectId("4e6906f2dd06fd1f50000007")
			]
		]
	},
	"server" : "mydb.server.com:27017",
	"filterSet" : false
}

In each case, the same number of documents is returned. I don't understand why the lower bound for the timestamp is ignored, this would seem like it would reduce the number of documents the query would need to scan (in the collection or in the index) to produce a result.

I've also tried having the index have the timestamp attribute as the first key in the index, but performance was significantly worse than my current configuration.

These same criteria are used in Aggregation Framework queries, too. The queries take on average 7-10s to complete.



 Comments   
Comment by Thomas Rueckstiess [ 02/Sep/14 ]

Hi Michael,

This is a known limitation of multikey indices. Bounds intersection cannot currently be used in the case of indexed arrays (multikey), even if the range is on a non-array field of the index (timestamp in your case). See the linked tickets SERVER-15086, SERVER-7959 and their further duplicates, with more information and some workarounds.

Regards,
Thomas

Comment by Michael De Lorenzo [ 18/Aug/14 ]

Just a note, that the authorized_advertisers attribute is an array of BSON ObjectIds, not the Integers I used for simplicity.

Generated at Thu Feb 08 03:36:25 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.