[SERVER-9925] Index covered binary prefix search Created: 13/Jun/13  Updated: 06/Dec/22

Status: Backlog
Project: Core Server
Component/s: Index Maintenance, Querying
Affects Version/s: None
Fix Version/s: None

Type: New Feature Priority: Critical - P2
Reporter: Nicholas Tang Assignee: Backlog - Query Optimization
Resolution: Unresolved Votes: 1
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Assigned Teams:
Query Optimization
Participants:

 Description   

We need a way to do a binary prefix search on an existing field that's covered by an index so it can happen quickly in RAM.



 Comments   
Comment by Eliot Horowitz (Inactive) [ 15/Jan/14 ]

To do for binary, need to change sort order of bindata

Comment by Yuri Finkelstein [ 05/Jul/13 ]

Yes, the performance of this query is very poor at the moment. If indeed $or will be as performant as $in that could be an acceptable solution. Please note that if the following would be supported

db.bincov.find({_id: {$in: [{$prefix: BinData(5,"MD5_1")}, {$prefix: BinData(5,"MD5_1")}, ...]}}

it would more more concise and readable.

What I also asked in relaed ticket is for the ability to specify limit for each $prefix term match.

Example:

db.bincov.find({_id: {$in: [{$prefix: {$value: BinData(5,"MD5_1"}, $limit: 1)}, {$prefix: {$value: BinData(5,"MD5_1"), $limit: 1}}, ...]}}

This would return at most one document with _id having a binary prefix MD5_1, at most one document with _id having a binary prefix MD5_2, etc. Without the $limit feature the query would first return all documents with _id having a binary prefix MD5_1. If the number of them is high and exceed the limit on the cursor we would have a logical problem of how to resume the query on the same cursor.

Again, the syntax is just schematic and is meant only to help express the point. How you folks end up expressing it is not important as long as the same capabilities are supported.

Does it make sense?
Yuri

Comment by Asya Kamsky [ 03/Jul/13 ]

This already works in current version.

> db.bincov.find()
{ "_id" : BinData(5,"1295AB3E") }
{ "_id" : BinData(5,"389AAB3E") }
{ "_id" : BinData(5,"A56AAB3E") }
{ "_id" : BinData(5,"12900000") }
{ "_id" : BinData(5,"38FFFFFF") }
{ "_id" : BinData(5,"38FFFF00") }
{ "_id" : BinData(5,"1290FFFF") }
{ "_id" : BinData(5,"00000000") }
{ "_id" : BinData(5,"11111111") }
{ "_id" : BinData(5,"AAAAAAAA") }
{ "_id" : BinData(5,"AAAA0000") }
{ "_id" : BinData(5,"0000AAAA") }
{ "_id" : BinData(5,"FFFFAAAA") }
{ "_id" : BinData(5,"MD5HHXWW") }
{ "_id" : BinData(5,"MD512X16") }
{ "_id" : BinData(5,"MD520X24") }
{ "_id" : BinData(5,"MD536X40") }
{ "_id" : BinData(5,"MD812X16") }
{ "_id" : BinData(5,"MD120X24") }
{ "_id" : BinData(5,"MD220X24") }
> > db.bincov.find({$or : [ {_id: {$gt:BinData(5,"MD5AAAAA"), $lt:BinData(5,"MD599999")} } , {_id: {$gt:BinData(5,"MD8AAAAA"), $lt:BinData(5,"MD899999")}}, {_id: {$gt:BinData(5,"MD1AAAAA"), $lt:BinData(5,"MD199999")} } ]  },{_id:1}).explain()
{
	"clauses" : [
		{
			"cursor" : "BtreeCursor _id_",
			"isMultiKey" : false,
			"n" : 4,
			"nscannedObjects" : 0,
			"nscanned" : 4,
			"nscannedObjectsAllPlans" : 0,
			"nscannedAllPlans" : 4,
			"scanAndOrder" : false,
			"indexOnly" : true,
			"nYields" : 0,
			"nChunkSkips" : 0,
			"millis" : 0,
			"indexBounds" : {
				"_id" : [
					[
						BinData(5,"MD5AAAAA"),
						BinData(5,"MD599999")
					]
				]
			}
		},
		{
			"cursor" : "BtreeCursor _id_",
			"isMultiKey" : false,
			"n" : 1,
			"nscannedObjects" : 0,
			"nscanned" : 1,
			"nscannedObjectsAllPlans" : 0,
			"nscannedAllPlans" : 1,
			"scanAndOrder" : false,
			"indexOnly" : true,
			"nYields" : 0,
			"nChunkSkips" : 0,
			"millis" : 0,
			"indexBounds" : {
				"_id" : [
					[
						BinData(5,"MD8AAAAA"),
						BinData(5,"MD899999")
					]
				]
			}
		},
		{
			"cursor" : "BtreeCursor _id_",
			"isMultiKey" : false,
			"n" : 1,
			"nscannedObjects" : 0,
			"nscanned" : 1,
			"nscannedObjectsAllPlans" : 0,
			"nscannedAllPlans" : 1,
			"scanAndOrder" : false,
			"indexOnly" : true,
			"nYields" : 0,
			"nChunkSkips" : 0,
			"millis" : 0,
			"indexBounds" : {
				"_id" : [
					[
						BinData(5,"MD1AAAAA"),
						BinData(5,"MD199999")
					]
				]
			}
		}
	],
	"n" : 6,
	"nscannedObjects" : 0,
	"nscanned" : 6,
	"nscannedObjectsAllPlans" : 0,
	"nscannedAllPlans" : 6,
	"millis" : 1,
	"server" : "asyasmacbook.local:24444"
}

Is the issue that $in can't be used and it has to be an $or operator? There is a separate ticket for making $or as performant as $or but this ticket is specifically about covered index queryability which seems to work.

Generated at Thu Feb 08 03:21:49 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.