[SERVER-13830] Problem with 2dsphere index and compound index Created: 05/May/14  Updated: 10/Dec/14  Resolved: 07/May/14

Status: Closed
Project: Core Server
Component/s: Geo, Querying
Affects Version/s: 2.6.0
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Ulrich VANDENHEKKE Assignee: David Storch
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
is related to SERVER-13701 Query using 2d index throws exception... Closed
Operating System: ALL
Participants:

 Description   

Hi

I have just tested the 2.6 and i always have the problem describe in teh Jira SERVER-9257 for my case. In this comment i use the query and index detailed in the description of the jira SERVER-9257.

Case 1 : old index

With the old "2d" index the query on position+date don't use the 2d index but only the index on creationDate.

db.ntr_tracks.find({"detection.position": { $geoWithin: { $box: [ [ 0.7724536441932998, 50.61956018018893 ], [ 2.483031806726593, 51.42851043057254 ] ] } }, creationDate: { $gt: new ISODate("20120801T000000"), $lt: ISODate("20120802T000000") } }).sort({ creationDate: 1 }).limit(1000).explain();
{
    "cursor" : "BtreeCursor creationDate",
    "isMultiKey" : false,
    "n" : 1000,
    "nscannedObjects" : 47625,
    "nscanned" : 47626,
    "nscannedObjectsAllPlans" : 52620,
    "nscannedAllPlans" : 52621,
    "scanAndOrder" : false,
    "indexOnly" : false,
    "nYields" : 411,
    "nChunkSkips" : 0,
    "millis" : 216,
    "indexBounds" : {
        "creationDate" : [ 
            [ 
                ISODate("2012-08-01T00:00:00.000Z"), 
                ISODate("2012-08-02T00:00:00.000Z")
            ]
        ]
    },
    "server" : "fmwd0012:27017",
    "filterSet" : false,
    "stats" : {
        "type" : "LIMIT",
        "works" : 47625,
        "yields" : 411,
        "unyields" : 411,
        "invalidates" : 0,
        "advanced" : 1000,
        "needTime" : 46625,
        "needFetch" : 0,
        "isEOF" : 1,
        "children" : [ 
            {
                "type" : "FETCH",
                "works" : 47625,
                "yields" : 411,
                "unyields" : 411,
                "invalidates" : 0,
                "advanced" : 1000,
                "needTime" : 46625,
                "needFetch" : 0,
                "isEOF" : 0,
                "alreadyHasObj" : 0,
                "forcedFetches" : 0,
                "matchTested" : 1000,
                "children" : [ 
                    {
                        "type" : "IXSCAN",
                        "works" : 47625,
                        "yields" : 411,
                        "unyields" : 411,
                        "invalidates" : 0,
                        "advanced" : 47625,
                        "needTime" : 0,
                        "needFetch" : 0,
                        "isEOF" : 0,
                        "keyPattern" : "{ creationDate: 1 }",
                        "boundsVerbose" : "field #0['creationDate']: (new Date(1343779200000), new Date(1343865600000))",
                        "isMultiKey" : 0,
                        "yieldMovedCursor" : 0,
                        "dupsTested" : 0,
                        "dupsDropped" : 0,
                        "seenInvalidated" : 0,
                        "matchTested" : 0,
                        "keysExamined" : 47626,
                        "children" : []
                    }
                ]
            }
        ]
    }
}

If i force using the 2d index with hint the query doesn't respond (i have +16Go of data). I think that the index isn't used.

db.ntr_tracks.find({"detection.position": { $geoWithin: { $box: [ [ 0.7724536441932998, 50.61956018018893 ], [ 2.483031806726593, 51.42851043057254 ] ] } }, creationDate: { $gt: new ISODate("20120801T000000"), $lt: ISODate("20120802T000000") } }).sort({ creationDate: 1 }).hint('detection.position_2d_creationDate_1').limit(1000).explain()

Edit: i finally have a result:

{
    "cursor" : "GeoBrowse-box",
    "isMultiKey" : false,
    "n" : 1000,
    "nscannedObjects" : 400694,
    "nscanned" : 400694,
    "nscannedObjectsAllPlans" : 400694,
    "nscannedAllPlans" : 400694,
    "scanAndOrder" : true,
    "indexOnly" : false,
    "nYields" : 6618,
    "nChunkSkips" : 0,
    "millis" : 1903392,
    "indexBounds" : {
        "detection.position" : [ 
            [ 
                [ 
                    1.40625, 
                    50.625
                ], 
                [ 
                    2.8125, 
                    52.03125
                ]
            ], 
            [ 
                [ 
                    1.0546875, 
                    50.2734375
                ], 
                [ 
                    1.40625, 
                    50.625
                ]
            ], 
            [ 
                [ 
                    0.703125, 
                    50.2734375
                ], 
                [ 
                    1.0546875, 
                    50.625
                ]
            ], 
            [ 
                [ 
                    0.703125, 
                    50.625
                ], 
                [ 
                    1.40625, 
                    51.328125
                ]
            ], 
            [ 
                [ 
                    1.0546875, 
                    51.328125
                ], 
                [ 
                    1.40625, 
                    51.6796875
                ]
            ], 
            [ 
                [ 
                    0.703125, 
                    51.328125
                ], 
                [ 
                    1.0546875, 
                    51.6796875
                ]
            ], 
            [ 
                [ 
                    2.4609375, 
                    50.2734375
                ], 
                [ 
                    2.8125, 
                    50.625
                ]
            ], 
            [ 
                [ 
                    2.109375, 
                    50.2734375
                ], 
                [ 
                    2.4609375, 
                    50.625
                ]
            ], 
            [ 
                [ 
                    1.7578125, 
                    50.2734375
                ], 
                [ 
                    2.109375, 
                    50.625
                ]
            ], 
            [ 
                [ 
                    1.40625, 
                    50.2734375
                ], 
                [ 
                    1.7578125, 
                    50.625
                ]
            ]
        ]
    },
    "server" : "fmwd0012:27017",
    "filterSet" : false,
    "stats" : {
        "type" : "SORT",
        "works" : 401697,
        "yields" : 6618,
        "unyields" : 6618,
        "invalidates" : 0,
        "advanced" : 1000,
        "needTime" : 400696,
        "needFetch" : 0,
        "isEOF" : 1,
        "forcedFetches" : 0,
        "memUsage" : 580000,
        "memLimit" : 33554432,
        "children" : [ 
            {
                "type" : "KEEP_MUTATIONS",
                "works" : 400696,
                "yields" : 6618,
                "unyields" : 6618,
                "invalidates" : 0,
                "advanced" : 1786,
                "needTime" : 398909,
                "needFetch" : 0,
                "isEOF" : 1,
                "children" : [ 
                    {
                        "type" : "FETCH",
                        "works" : 400696,
                        "yields" : 6618,
                        "unyields" : 6618,
                        "invalidates" : 0,
                        "advanced" : 1786,
                        "needTime" : 398909,
                        "needFetch" : 0,
                        "isEOF" : 1,
                        "alreadyHasObj" : 400694,
                        "forcedFetches" : 0,
                        "matchTested" : 1786,
                        "children" : [ 
                            {
                                "type" : "GEO_2D",
                                "works" : 400694,
                                "yields" : 0,
                                "unyields" : 0,
                                "invalidates" : 0,
                                "advanced" : 400694,
                                "needTime" : 0,
                                "needFetch" : 0,
                                "isEOF" : 1,
                                "geometryType" : "box",
                                "field" : "detection.position",
                                "boundsVerbose" : [ 
                                    "(1.40625,50.625) -->> (2.8125,52.0313)", 
                                    "(1.05469,50.2734) -->> (1.40625,50.625)", 
                                    "(0.703125,50.2734) -->> (1.05469,50.625)", 
                                    "(0.703125,50.625) -->> (1.40625,51.3281)", 
                                    "(1.05469,51.3281) -->> (1.40625,51.6797)", 
                                    "(0.703125,51.3281) -->> (1.05469,51.6797)", 
                                    "(2.46094,50.2734) -->> (2.8125,50.625)", 
                                    "(2.10938,50.2734) -->> (2.46094,50.625)", 
                                    "(1.75781,50.2734) -->> (2.10938,50.625)", 
                                    "(1.40625,50.2734) -->> (1.75781,50.625)"
                                ],
                                "children" : []
                            }
                        ]
                    }
                ]
            }
        ]
    }
}

Case 2 : new index, date first - query on date

With a second test where i have the following index only :

{ "creationDate": 1, "detection.position": "2dsphere" }

The following query

db.ntr_tracks.find({creationDate: { $gt: new ISODate("20120801T000000"), $lt: ISODate("20120802T000000") }}).sort({ creationDate: 1 }).limit(1000).explain()

Take 1168608 ms .... (but if i create an index on creationDate the query take less than 100ms).

The explain show that we use BtreeCursor on 2dsphere is used :

{
    "cursor" : "BtreeCursor 2dsphere",
    "isMultiKey" : false,
    "n" : 1000,
    "nscannedObjects" : 11629283,
    "nscanned" : 11629284,
    "nscannedObjectsAllPlans" : 11629283,
    "nscannedAllPlans" : 11629284,
    "scanAndOrder" : false,
    "indexOnly" : false,
    "nYields" : 433063,
    "nChunkSkips" : 0,
    "millis" : 118608,
    "indexBounds" : {
        "creationDate" : [ 
            [ 
                {
                    "$minElement" : 1
                }, 
                {
                    "$maxElement" : 1
                }
            ]
        ],
        "detection.position" : [ 
            [ 
                {
                    "$minElement" : 1
                }, 
                {
                    "$maxElement" : 1
                }
            ]
        ]
    },
    "server" : "fmwd0012:27017",
    "filterSet" : false,
    "stats" : {
        "type" : "LIMIT",
        "works" : 12062212,
        "yields" : 433063,
        "unyields" : 433063,
        "invalidates" : 0,
        "advanced" : 1000,
        "needTime" : 11628283,
        "needFetch" : 432929,
        "isEOF" : 1,
        "children" : [ 
            {
                "type" : "FETCH",
                "works" : 12062212,
                "yields" : 433063,
                "unyields" : 433063,
                "invalidates" : 0,
                "advanced" : 1000,
                "needTime" : 11628283,
                "needFetch" : 432929,
                "isEOF" : 0,
                "alreadyHasObj" : 0,
                "forcedFetches" : 0,
                "matchTested" : 1000,
                "children" : [ 
                    {
                        "type" : "IXSCAN",
                        "works" : 11629283,
                        "yields" : 433063,
                        "unyields" : 433063,
                        "invalidates" : 0,
                        "advanced" : 11629283,
                        "needTime" : 0,
                        "needFetch" : 0,
                        "isEOF" : 0,
                        "keyPattern" : "{ creationDate: 1, detection.position: \"2dsphere\" }",
                        "boundsVerbose" : "field #0['creationDate']: [MinKey, MaxKey], field #1['detection.position']: [MinKey, MaxKey]",
                        "isMultiKey" : 0,
                        "yieldMovedCursor" : 0,
                        "dupsTested" : 0,
                        "dupsDropped" : 0,
                        "seenInvalidated" : 0,
                        "matchTested" : 0,
                        "keysExamined" : 11629284,
                        "children" : []
                    }
                ]
            }
        ]
    }
}

Case 3 : new index, date first - query on date and position

If i make query that search by position, the query is very quick (so no problem, it is better than 2.4)

db.ntr_tracks.find({"detection.position": { $geoWithin: { $geometry: { type: "Polygon", coordinates: [ [ [ 0.7724536441932998, 50.61956018018893 ], [ 0.7724536441932998, 51.42851043057254 ], [ 2.483031806726593, 51.42851043057254 ], [ 2.483031806726593, 50.61956018018893 ], [ 0.7724536441932998, 50.61956018018893 ] ] ] } } }, creationDate: { $gt: new ISODate("20121001T000000"), $lt: ISODate("20121002T000000") } }).sort({ creationDate: 1 }).limit(1000).explain()
 
{
    "cursor" : "BtreeCursor 2dsphere",
    "isMultiKey" : false,
    "n" : 1000,
    "nscannedObjects" : 1752,
    "nscanned" : 47634,
    "nscannedObjectsAllPlans" : 1752,
    "nscannedAllPlans" : 47634,
    "scanAndOrder" : false,
    "indexOnly" : false,
    "nYields" : 14,
    "nChunkSkips" : 0,
    "millis" : 184,
    "indexBounds" : {
        "creationDate" : [ 
            [ 
                ISODate("2012-08-01T00:00:00.000Z"), 
                ISODate("2012-08-02T00:00:00.000Z")
            ]
        ],
        "detection.position" : [ 
            [ 
                "2f0332302", 
                "2f0332303"
            ], 
            [ 
                "2f0332312", 
                "2f0332312"
            ], 
            [ 
                "2f03323123", 
                "2f03323123"
            ], 
            [ 
                "2f033231233", 
                "2f033231234"
            ], 
            [ 
                "2f0332313", 
                "2f0332314"
            ], 
            [ 
                "2f0332320", 
                "2f0332321"
            ], 
            [ 
                "2f0332321", 
                "2f0332322"
            ], 
            [ 
                "2f0332322", 
                "2f0332322"
            ], 
            [ 
                "2f03323220", 
                "2f03323221"
            ], 
            [ 
                "2f0332323", 
                "2f0332324"
            ], 
            [ 
                "2f0332330", 
                "2f0332331"
            ], 
            [ 
                "2f0332331", 
                "2f0332332"
            ]
        ]
    },
    "server" : "fmwd0012:27017",
    "filterSet" : false,
    "stats" : {
        "type" : "LIMIT",
        "works" : 1752,
        "yields" : 14,
        "unyields" : 14,
        "invalidates" : 0,
        "advanced" : 1000,
        "needTime" : 752,
        "needFetch" : 0,
        "isEOF" : 1,
        "children" : [ 
            {
                "type" : "FETCH",
                "works" : 1752,
                "yields" : 14,
                "unyields" : 14,
                "invalidates" : 0,
                "advanced" : 1000,
                "needTime" : 752,
                "needFetch" : 0,
                "isEOF" : 0,
                "alreadyHasObj" : 0,
                "forcedFetches" : 0,
                "matchTested" : 1000,
                "children" : [ 
                    {
                        "type" : "IXSCAN",
                        "works" : 1752,
                        "yields" : 14,
                        "unyields" : 14,
                        "invalidates" : 0,
                        "advanced" : 1752,
                        "needTime" : 0,
                        "needFetch" : 0,
                        "isEOF" : 0,
                        "keyPattern" : "{ creationDate: 1, detection.position: \"2dsphere\" }",
                        "boundsVerbose" : "field #0['creationDate']: (new Date(1343779200000), new Date(1343865600000)), field #1['detection.position']: [\"2f0332302\", \"2f0332303\"), [\"2f0332312\", \"2f0332312\"], [\"2f03323123\", \"2f03323123\"], [\"2f033231233\", \"2f033231234\"), [\"2f0332313\", \"2f0332314\"), [\"2f0332320\", \"2f0332321\"), [\"2f0332321\", \"2f0332322\"), [\"2f0332322\", \"2f0332322\"], [\"2f03323220\", \"2f03323221\"), [\"2f0332323\", \"2f0332324\"), [\"2f0332330\", \"2f0332331\"), [\"2f0332331\", \"2f0332332\")",
                        "isMultiKey" : 0,
                        "yieldMovedCursor" : 0,
                        "dupsTested" : 0,
                        "dupsDropped" : 0,
                        "seenInvalidated" : 0,
                        "matchTested" : 0,
                        "keysExamined" : 47634,
                        "children" : []
                    }
                ]
            }
        ]
    }
}

Thanks for the help.



 Comments   
Comment by Ulrich VANDENHEKKE [ 12/May/14 ]

Thanks for the response,

I now understand the result. Is it possible to force sparse to false for the new version of 2dsphere index ? A position is always defined for each documents.

Thanks.

Comment by David Storch [ 07/May/14 ]

Hi phoenix741,

Thanks for the detailed ticket. I'll try to answer your questions below. Please do let me know if I've missed anything or if you have any further questions. Note that I am answering all of your questions based on the behavior of MongoDB version 2.6.1, the latest stable production release.

Case 1

With the old "2d" index the query on position+date don't use the 2d index but only the index on creationDate.

In order to investigate this, I began by constructing the following indices against an empty collection:

> db.case1.drop();
> db.case1.ensureIndex({"detection.position": "2d", "creationDate": 1});
> db.case1.ensureIndex({"creationDate": 1});

With these indices, I ran the following query:

db.case1.find({"detection.position": { $geoWithin: { $box: [ [ 0.7724536441932998, 50.61956018018893 ], [ 2.483031806726593, 51.42851043057254 ] ] } }, creationDate: { $gt: new ISODate("20120801T000000"), $lt: ISODate("20120802T000000") } }).sort({ creationDate: 1 }).limit(1000)

I confirmed from looking at verbose logs that the query optimizer generates two query plans: one using {"detection.position": "2d", "creationDate": 1} and one using {"creationDate": 1}. The plan using the 2d index has the advantage of using an index to answer the $geoWithin. However, the {"creationDate": 1} index will return results in the requested sort order, whereas the plan using the 2d index will have to perform an explicit in-memory sort. Therefore, depending on your data, the optimizer could very well prefer using {"creationDate": 1} instead of {"detection.position": "2d", "creationDate": 1}.

Note also that in nightly unstable builds, this query fails with the following assertion:

Assertion failure bits <= 32

This is a known issue being tracked in SERVER-13701, which should be fixed soon.

If i force using the 2d index with hint the query doesn't respond (i have +16Go of data). I think that the index isn't used.

I ran the query again with the hint, in a collection with the same two indices:

db.case1.find({"detection.position": { $geoWithin: { $box: [ [ 0.7724536441932998, 50.61956018018893 ], [ 2.483031806726593, 51.42851043057254 ] ] } }, creationDate: { $gt: new ISODate("20120801T000000"), $lt: ISODate("20120802T000000") } }).sort({ creationDate: 1 }).hint('detection.position_2d_creationDate_1').limit(1000).explain()

The explain and log output shows that this query is indeed using the compound 2d index on "detection.position" and "creationDate", as specified by the hint. The query is very slow in this case because the plan has to do an explicit in memory-sort, as I described above. Even though there is a limit of 1000 applied, the query must continue scanning through all matching documents in order to find the first 1000 according to the sort order.

To understand this better, say that we have a big array of numbers, such as [1, 9, 8, 3, 4, 5, ... ]. If you were asked to find the 1000 smallest numbers in the array that are greater than 3, you would have look at all numbers greater than 3 in order to be sure that you have found the 1000 smallest. On the other hand, if you are getting the numbers in sorted order, e.g. [1, 3, 4, 5, 8, 9, ...], then you would be able to stop scanning as soon as you hit 1000 matches.

Case 2

I created a collection with the "2dsphere" index you describe:

> db.case2.drop();
> db.case2.ensureIndex({"creationDate": 1, "detection.position": "2dsphere"});

2.6.1 of the database will construct a Version 2 2dsphere index. (The documentation for 2dsphere indices is here: http://docs.mongodb.org/manual/core/2dsphere/.) A Version 2 2dsphere index is always sparse. This means that documents which do not have the 2dsphere field, namely "detection.position", will not be inserted into the index. 2dsphere indices in previous versions would simply reject such documents, which is why the sparseness property is desirable. A consequence of the sparseness property is that a Version 2 2dsphere index cannot be used to answer queries which are missing a geo predicate over the 2dsphere field. Your query has predicates over "creationDate", but does not have a geo predicate over "detection.position":

db.case2.find({creationDate: { $gt: new ISODate("20120801T000000"), $lt: ISODate("20120802T000000") }}).sort({ creationDate: 1 }).limit(1000).explain()

As such, it cannot use the 2dsphere index in order to answer the predicates over "creationDate". This is why your query takes a very long time---it has to examine the entire collection! The explain() output does show that the query is using the 2dsphere index, but this is only because the query engine is scanning the entire index in order to get the documents back in sorted order (again, to avoid an in-memory sort). In order to support queries over "creationDate" alone, I would recommend rebuilding a non-geo index over "creationDate", e.g. {"creationDate": 1}.

Let me provide an example. Suppose that we allowed the query {"creationDate": "tuesday"} to use the index {"creationDate": 1, "detection.position": "2dsphere"}. Also suppose that the collection has the following documents:

{_id: 1, "creationDate": "wednesday"}
{_id: 2, "creationDate": "tuesday"}

Neither of these two documents will be present in the index, due to the sparseness property of Version 2 2dsphere indices. This means that if we tried to lookup the value "tuesday" in the index, we would return zero results, rather than properly returning the document {_id: 2, "creationDate": "tuesday"}.

One last thing: keep in mind that "2d" and "2dsphere" are used for different purposes. You should use "2d" for data on a flat, two-dimensional plane. On the other hand, "2dsphere" is used for positions on a sphere such as (latitude, longitude) data.

Case 3

If i make query that search by position, the query is very quick (so no problem, it is better than 2.4)

Great, I'm glad that things are working well in this case.

Generated at Thu Feb 08 03:33:01 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.