[SERVER-2381] Cannot use compound of 2d and multikey index Created: 20/Jan/11  Updated: 12/Jul/16  Resolved: 21/Apr/11

Status: Closed
Project: Core Server
Component/s: Geo, Index Maintenance
Affects Version/s: 1.6.5, 1.7.4
Fix Version/s: 1.8.0

Type: Improvement Priority: Major - P3
Reporter: Andreas Kalsch Assignee: Greg Studer
Resolution: Done Votes: 8
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Debian Linux, x64


Attachments: File geog.js     File geoh.js    
Issue Links:
Related
is related to SERVER-2391 Cannot use compound of 2d and date va... Closed
Participants:

 Description   

2 examples with unexpected results:

db.test.insert({
point:[1,10],
tags:[

{k:'key',v:'value'}

,

{k:'key2',v:123}

]
})

db.test.insert({
point:[1,10],
tags:

{k:'key',v:'value'}


})

db.test.ensureIndex(

{point:"2d","tags.k":1,"tags.v":1}

)
db.test.find({
point:{"$within":{"$box":[[0,0],[12,12]]}},
"tags.k":"key"
})

Only 1 result:
{ "_id" : ObjectId("4d373f753ffe71e78a71bfc0"), "point" : [ 1, 10 ],
"tags" :

{ "k" : "key", "v" : "value" }

}

2)
db.test.drop()
db.test.insert({p:[1112,3473],t:[

{k:'a',v:'b'}

,

{k:'c',v:'d'}

]})
db.test.ensureIndex(

{p:'2d','t.k':1,'t.v':1}

,

{min:0,max:10000}

)
db.test.find(

{p:[1112,3473],'t.k':'a'}

)
No result.

Some discussion here: http://groups.google.com/group/mongodb-user/browse_thread/thread/8fa375ba78081b2c/fd9967f89ff758da



 Comments   
Comment by Greg Studer [ 07/Mar/11 ]

Unfortunately breaking up your elements won't speed things up (and may slow things down), as the lookup logic will effectively still be geo-filter then t-filter individually. The benefit of additional criteria in a geo-index is that the specified fields get added to the index keys themselves (including multi-key fields, as was fixed earlier), making the filtering faster as there is no additional object lookup required (as in the case with covered indices).

http://www.mongodb.org/display/DOCS/Geospatial+Indexing#GeospatialIndexing-CompoundIndexes mentions this, though not as clearly as one might hope.

Understand you're looking for a general multi-d indexing solution, but it's not quite there yet.

Comment by Andreas Kalsch [ 05/Mar/11 ]

Now I will create several documents to index one document, by creating one for every array element and not using arrays any more. In this case I need to add an additional grouping to guarantee unique results.

db.bri.drop()

db.bri.insert(

{ "_id" : NumberLong("13500582142917"), "p" : "n", "id" : NumberLong(582142917), "c" : [ 34378, 21334 ], "t" : 4294968392 }

)
db.bri.insert(

{ "_id" : NumberLong("13500582142917"), "p" : "n", "id" : NumberLong(582142917), "c" : [ 34378, 21334 ], "t" : 4715874093102 }

)
db.bri.insert(

{ "_id" : NumberLong("13500836578420"), "p" : "n", "id" : NumberLong(836578420), "c" : [ 34378, 21339 ], "t" : 315540509818880 }

)
db.bri.insert(

{ "_id" : NumberLong("13500836578420"), "p" : "n", "id" : NumberLong(836578420), "c" : [ 34378, 21339 ], "t" : 794568949810 }

)
db.bri.insert(

{ "_id" : NumberLong("13500836578420"), "p" : "n", "id" : NumberLong(836578420), "c" : [ 34378, 21339 ], "t" : 2104533975531 }

)
db.bri.insert(

{ "_id" : NumberLong("13500836578420"), "p" : "n", "id" : NumberLong(836578420), "c" : [ 34378, 21339 ], "t" : 592705486860 }

)
db.bri.insert(

{ "_id" : NumberLong("13500836578420"), "p" : "n", "id" : NumberLong(836578420), "c" : [ 34378, 21339 ], "t" : 803158886206 }

)
db.bri.insert(

{ "_id" : NumberLong("13500836578420"), "p" : "n", "id" : NumberLong(836578420), "c" : [ 34378, 21339 ], "t" : 4294967999 }

)
db.bri.insert(

{ "_id" : NumberLong("13500836578420"), "p" : "n", "id" : NumberLong(836578420), "c" : [ 34378, 21339 ], "t" : 1327144898739 }

)
db['bri'].ensureIndex(

{ c: '2d', t: 1 }

,

{ min: 0, max: 66000 }

)
db.bri.find({c:{$within:{$box:[[34373, 21334],[34378, 21339]]}}, t:{ "$gte" : 201863462912, "$lte" : 206158430207 }}).limit(200).toArray()

I am right that, in this case (WITHOUT the multikey design), the full index will be used? (Else, it would make no sense to combine 2d + 1.)

Comment by Greg Studer [ 04/Mar/11 ]

Think the operator you're looking for is $elemMatch, it supports the array-search semantics you're looking for. As far as efficiency, the additional criteria will work, but won't be index-accelerated, which may still be fast enough for what you need if your additional t-arrays do not grow too large.

Comment by Andreas Kalsch [ 03/Mar/11 ]

"so if an element is > 201 billion and another element is < 201 billion, it is added"
That's the point and it seems awkward to me that it is interpreted this way. I would expect checks always for one element. In this case, I need to remove the multikey design.

"If there are additional criteria, the geo-index can't be used for that query (though you can create an additional regular index to quickly look up exact values). It's a little bit awkward, but the geo-indexes store additional details differently (and in a way not really very suitable for further filtering)."
I am waiting for nd (at least 3d) indexing ( http://jira.mongodb.org/browse/SERVER-691 ). This would allow much more flexibility. So there is no scalable way to search for 2d (big box) + 1d range in 1 query?

Comment by Greg Studer [ 01/Mar/11 ]

Sorry about the delayed response to the previous question also:

> So is there a guarantee, that in both cases the index is used?

Exact lookups can use the geo-index in the current implementation only if there are no other criteria (i.e. find(

{ $within : <blah> }

) and not find(

{ $within : <blah>, x : 3 }

) ). If there are additional criteria, the geo-index can't be used for that query (though you can create an additional regular index to quickly look up exact values). It's a little bit awkward, but the geo-indexes store additional details differently (and in a way not really very suitable for further filtering).

You can always force the geo-index, however, by doing a $within query and specifying very small bounds.

The test case geoh.js shows how the indices are used with different query types.

Comment by Greg Studer [ 01/Mar/11 ]

Indexes used example

Comment by Greg Studer [ 28/Feb/11 ]

Looked at the code, think it's working as-designed. The numbers are a bit hard to interpret, but if you think of everything as divided by a billion, you're basically saying get everything >201 billion and <201 billion (the second "$lte" just overrides the first). For individual numbers this will not be true unless the number is actually 201 billion, but for multi-element arrays in mongodb this is interpreted differently, so if an element is > 201 billion and another element is < 201 billion, it is added.

The test case geog.js above is a simplified example, with smaller #s, comparing the $gte/$lte query with and without the geoindexing.

Also pretty sure geo-queries exactly at the boundaries of regions can be a bit finicky, but that doesn't seem to affect the script you've posted.

Comment by Greg Studer [ 28/Feb/11 ]

Example of array gte/lte logic

Comment by Andreas Kalsch [ 08/Feb/11 ]

Additional bug:

db.bri.insert(

{ "_id" : NumberLong("13500582142917"), "p" : "n", "id" : NumberLong(582142917), "c" : [ 34378, 21334 ], "t" : [ 4294968392, 4715874093102 ] }

)
db.bri.insert(

{ "_id" : NumberLong("13500836578420"), "p" : "n", "id" : NumberLong(836578420), "c" : [ 34378, 21339 ], "t" : [ 315540509818880, 794568949810, 2104533975531, 592705486860, 803158886206, 4294967999, 1327144898739 ] }

)
db['bri'].ensureIndex(

{ c: '2d', t: 1 }

,

{ min: 0, max: 66000 }

)
db.bri.find({c:{$within:{$box:[[34373, 21334],[34378, 21339]]}}, t:{ "$gte" : 201863462912, "$lte" : 201863462912, "$lte" : 206158430207 }}).limit(200).toArray()

There should be 0 results, but I get both objects.

Comment by Andreas Kalsch [ 07/Feb/11 ]

The result is correct, but explain() is still irritating me:

db.br.ensureIndex(

{ '_i.c': '2d', '_i.t': 1 }

,

{min:0,max:66000}

)
db.br.find(

{'_i.c':[ 34399, 21331 ],'_i.t':4294967473}

).explain()
{
"cursor" : "BasicCursor",
"nscanned" : 260296,
"nscannedObjects" : 260296,
"n" : 1,
"millis" : 315,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : false,
"indexOnly" : false,
"indexBounds" : {

}
}
db.br.find({'_i.c':{ $within:

{ $box: [[ 34399, 21331 ], [34400, 21332]] }

},'_i.t':4294967473}).explain()
{
"cursor" : "GeoBrowse-box",
"nscanned" : 1,
"nscannedObjects" : 1,
"n" : 1,
"millis" : 1,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : false,
"indexOnly" : false,
"indexBounds" : {

}
}

I usually see "cursor: BasicCursor" in all queries that do not use an index. I cannot interprete "cursor: GeoBrowse-box", but one of the developers told me that the reporting for 2d indexes is still not complete.

So is there a guarantee, that in both cases the index is used?

Comment by Eliot Horowitz (Inactive) [ 03/Feb/11 ]

Don't think this was actually fixed, was it?

Comment by Harsh Waghmare [ 29/Jan/11 ]

Would this fix be back ported to any of the 1.8-rc releases ?

Comment by Greg Studer [ 28/Jan/11 ]

Fixed by...

https://github.com/mongodb/mongo/commit/7fa627b35627e94c94aafe9b0275de665129acd2

For $near/$center queries, indexing should work as expected. Geo-indexing disabled for exact lookups of positions with additional criteria, as the additional criteria are stored in a diff way from ordinary indices.

Comment by Greg Studer [ 25/Jan/11 ]

Second issue as well:

t = db.geo_circle2;
t.drop();
t.ensureIndex(

{loc : "2d", categories:1}

,

{"name":"placesIdx", "min": -100, "max": 100}

);
t.insert({ "uid" : 368900 , "loc" :

{ "x" : -36 , "y" : -8}

,"categories" : [ "sports" , "hotel" , "restaurant"]});
t.find({ "loc" :

{ "x" : -36, "y" : -8}

, "categories" : "sports" })

Multikeys are stored differently in geoindexes, so normal indexed lookups fail. Should be addressed at same time.

Comment by Greg Studer [ 24/Jan/11 ]

Scratch that... looking into a better fix.

Comment by Greg Studer [ 24/Jan/11 ]

Looked into this... reclassifying as an enhancement. Explicitly disallowing multikey and geoindexing for now, since with the current matching algorithm used we probably won't get too much speedup here in any case aside from the lookup from a covered index.

Generated at Thu Feb 08 02:59:48 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.