[SERVER-14264] Compound index on 2dsphere and datetime very slow in 2.6.1 and different result than in 2.4.10 Created: 16/Jun/14  Updated: 14/Apr/16  Resolved: 13/Feb/15

Status: Closed
Project: Core Server
Component/s: Querying
Affects Version/s: 2.6.1
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Fabian Fülling [X] Assignee: Siyuan Zhou
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
duplicates SERVER-13687 Results of $near query on compound mu... Closed
duplicates SERVER-17279 Intersect bounds over non-geo field o... Closed
Related
is related to SERVER-16042 Optimise $all/$and to select smallest... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Sprint: RPL 0 3/13/15
Participants:

 Description   

I have a collection with documents, which contain a datetime field 'from' (and also 'to', but that was not used in queries) and multiple polygons.

Until Version 2.4.10 I solved that by having an array of geometry subdocuments in "location.geometry".
The index looked like that and worked very well. I had query times of ~50ms

{from:1, 'location.geometry' : '2dsphere'}

Version 2.6 supports MultiPolygons and it seems, that my workaround of manual "multi polygons" does not work anymore. The query times are much slower now.
I converted my polygons to use MultiPolygons in the field "geometry" and now I try to get the queries as fast as in 2.4.

The pure geo query: (on 2.6.1 the geo field name is "geometry")

{
	"location.geometry" : {
		"$near" : {
			"$geometry" : {
				"type" : "Point",
				"coordinates" : [
					13.4059717,
					52.5208876
				]
			},
			"$maxDistance" : 1
		}
	}
}

Both versions return 4599 results.
It's faster on 2.6.1. Here it takes 548 ms against 1691 ms on 2.4.10

But when I query for the date and the location, it get's problematic. The full query is: (on 2.6.1 the geo field name is "geometry")

{
	"from" : {
		"$gte" : ISODate("2014-06-19T00:00:00Z"),
		"$lt" : ISODate("2014-06-20T00:00:00Z")
	},
	"location.geometry" : {
		"$near" : {
			"$geometry" : {
				"type" : "Point",
				"coordinates" : [
					13.4059717,
					52.5208876
				]
			},
			"$maxDistance" : 1
		}
	}
}

On 2.4.10 the query takes only 49 ms:

{
	"cursor" : "S2NearCursor",
	"isMultiKey" : true,
	"n" : 55,
	"nscannedObjects" : 55,
	"nscanned" : 6226,
	"nscannedObjectsAllPlans" : 55,
	"nscannedAllPlans" : 6226,
	"scanAndOrder" : false,
	"indexOnly" : false,
	"nYields" : 0,
	"nChunkSkips" : 0,
	"millis" : 49,
	"indexBounds" : {
 
	},
	"nscanned" : 6226,
	"matchTested" : NumberLong(3189),
	"geoMatchTested" : NumberLong(56),
	"numShells" : NumberLong(2),
	"keyGeoSkip" : NumberLong(3027),
	"returnSkip" : NumberLong(0),
	"btreeDups" : NumberLong(10),
	"inAnnulusTested" : NumberLong(56)
}

But on 2.6.1 the query takes ~200ms and the number of results differs a lot:

{
	"cursor" : "BtreeCursor from_1_geometry_2dsphere",
	"isMultiKey" : true,
	"n" : 209,
	"nscannedObjects" : 6805,
	"nscanned" : 262252,
	"nscannedObjectsAllPlans" : 6805,
	"nscannedAllPlans" : 262252,
	"scanAndOrder" : false,
	"indexOnly" : false,
	"nYields" : 2048,
	"nChunkSkips" : 0,
	"millis" : 205,
	"indexBounds" : {
		"from" : [
			[
				true,
				ISODate("2014-06-20T00:00:00Z")
			]
		],
		"geometry" : [
			[
				{
					"$minElement" : 1
				},
				{
					"$maxElement" : 1
				}
			]
		]
	},
	"filterSet" : false
}

The indexBounds look a bit strange. If I instead query just for a date (with an index on

{from:1}

the bounds look correct:

	"indexBounds" : {
		"from" : [
			[
				ISODate("2014-06-19T00:00:00Z"),
				ISODate("2014-06-20T00:00:00Z")
			]
		]
	}

What is the correct way, to index and query a collection for a specific date and geo location?
Or how can I optimize the data structure?

Thanks a lot!
Fabian



 Comments   
Comment by Siyuan Zhou [ 13/Feb/15 ]

Closing as a dup of SERVER-17279.

Comment by Siyuan Zhou [ 13/Feb/15 ]

FabFuel,

Thank you very much for pointing out this problem and providing the data. We're able to reproduce this issue and we can confirm that this is caused by the different behavior of 2.4 and 2.6 (also 3.0).

The bounds of "from" can be improved to intersect "$gte" and "$lt", as you mentioned. Unfortunately, we don't have that yet. SERVER-17279 has been filed for the performance improvement. There is no workaround currently.

Thanks,
Siyuan

Comment by David Storch [ 18/Sep/14 ]

Hi FabFuel,

My apologies for the delay in getting to this. My colleague siyuan.zhou@10gen.com is currently working on performance of geo queries, so I'm reassigning this to him. Please continue to watch this ticket for updates, as we hope to make some progress soon.

Best,
Dave

Comment by David Storch [ 11/Aug/14 ]

Thanks FabFuel. We have received the data successfully, and will update you when we have more info.

Comment by Fabian Fülling [X] [ 11/Aug/14 ]

Hi david.storch,

I have uploaded the dataset.

Best
Fabian

Comment by David Storch [ 11/Aug/14 ]

Hi FabFuel,

I have re-opened the ticket and sent you instructions for transferring the data set. I'm going to put this in the "Waiting for User Input" state pending the data. Thanks for your help on this!

Best,
Dave

Comment by Fabian Fülling [X] [ 09/Aug/14 ]

Hi david.storch,

sorry for the delay!

The issue is still present, it has nothing to do with the write command, it's just about reading or using the indexes.
I have prepared a dataset with 36.415 documents to reproduce the issue. How can I share it with you privately?

I have downloaded 2.6.3 and 2.4.10, started fresh mongod instances and did a benchmark with this query:

{
	"from" : {
		"$gte" : ISODate("2014-08-19T00:00:00Z"),
		"$lt" : ISODate("2014-08-20T00:00:00Z")
	},
	"location.geometry" : {
		"$near" : {
			"$geometry" : {
				"type" : "Point",
				"coordinates" : [
					13.4059717,
					52.5208876
				]
			},
			"$maxDistance" : 1
		}
	}
}

These are the results, which can be reproduced:

on 2.6.3

> db.timeslots.find(q).explain()
{
	"cursor" : "S2NearCursor",
	"isMultiKey" : false,
	"n" : 78,
	"nscannedObjects" : 11626,
	"nscanned" : 11626,
	"nscannedObjectsAllPlans" : 11626,
	"nscannedAllPlans" : 11626,
	"scanAndOrder" : false,
	"indexOnly" : false,
	"nYields" : 90,
	"nChunkSkips" : 0,
	"millis" : 312,
	"indexBounds" : {
 
	},
	"filterSet" : false
}

and 2.4.10 is upto 7x faster:

> db.timeslots.find(q).explain()
{
	"cursor" : "S2NearCursor",
	"isMultiKey" : true,
	"n" : 78,
	"nscannedObjects" : 78,
	"nscanned" : 10694,
	"nscannedObjectsAllPlans" : 78,
	"nscannedAllPlans" : 10694,
	"scanAndOrder" : false,
	"indexOnly" : false,
	"nYields" : 0,
	"nChunkSkips" : 0,
	"millis" : 45,
	"indexBounds" : {
 
	},
	"nscanned" : 10694,
	"matchTested" : NumberLong(4076),
	"geoMatchTested" : NumberLong(81),
	"numShells" : NumberLong(2),
	"keyGeoSkip" : NumberLong(6585),
	"returnSkip" : NumberLong(0),
	"btreeDups" : NumberLong(33),
	"inAnnulusTested" : NumberLong(81)
}

For me it seems, that the "from" condition is not used to reduce the dataset before using the S2NearCursor.

Best
Fabian

PS: I'm not allowed to reopen the ticket

Comment by David Storch [ 08/Aug/14 ]

Hi FabFuel,

I have marked this ticket as resolved due to inactivity. If you would like to keep working with us to diagnose this issue, please do not hesitate to reopen.

Best,
Dave

Comment by David Storch [ 07/Aug/14 ]

Hi FabFuel,

We haven't heard from you in a while, so I just wanted to check in. Are you still affected by this issue? Would you be able to share a dataset with us as described above by my colleague Thomas?

Best,
Dave

Comment by Thomas Rueckstiess [ 24/Jul/14 ]

Hi Fabian,

I've discussed this with David and we came to the conclusion that the repro script he posted is actually not a valid reproduction of the issue. The reason why the above script is slower in 2.6. is related to the introduction of the new write commands, which always wait for completion. This means the loop runs slower and 2.6 creates more distinct different date values.

We confirmed this with the distinct command.

In 2.4. we get

> t.distinct("from").length
306

And in 2.6 we get

> t.distinct("from").length
4038

When we add a sleep(1) to the code after each insert, or randomize the dates, the difference is no longer there and we get roughly the same query times for both versions. This means we do not have a way to reproduce the insert difference you are describing.

Is this still an issue for you? Are you able to share a test dataset with us that shows the slower queries on 2.6? We can arrange for the dataset to be shared privately if you like.

Regards,
Thomas

Comment by David Storch [ 18/Jun/14 ]

Hi FabFuel,

Thanks for providing the additional information. I can reproduce the slower execution in 2.6.2 as opposed to 2.4.10 with the following simple script:

var t = db.t;
t.drop();
 
t.ensureIndex({from: 1, geometry: "2dsphere"});
 
var start = new Date();
 
for (var i = 0; i < 10000; i++) {
    var inc = 0.001 * i;
    var x = 13.4 + inc;
    var y = 52.5 + inc;
    t.insert({from: new Date(), geometry: {type: "Point", coordinates: [x, y]}});
}
 
var end = new Date();
 
printjson(t.find({
    "from" : {
        "$gte" : start,
        "$lt" : end
    },
    "geometry" : {
        "$near" : {
            "$geometry" : {
                "type" : "Point",
                "coordinates" : [
                    13.4,
                    52.5
                ]
            }
        }
    }
}).explain());

I can only reproduce the problem when the "from" values are dates---the repro does not seem to work when they are numbers. I am going to do some further investigation and get back to you when I have more info.

Thanks,
Dave

Comment by Fabian Fülling [X] [ 17/Jun/14 ]

Hi Dave,

thank you very much for the quick reply!
I upgraded to 2.6.2 and now the result count is correct again!

But the query is still 4x slower than in 2.4.10 and it seems like the btree is not used at all.

This is the explain:

{
	"cursor" : "S2NearCursor",
	"isMultiKey" : false,
	"n" : 55,
	"nscannedObjects" : 6550,
	"nscanned" : 6550,
	"nscannedObjectsAllPlans" : 6550,
	"nscannedAllPlans" : 6550,
	"scanAndOrder" : false,
	"indexOnly" : false,
	"nYields" : 51,
	"nChunkSkips" : 0,
	"millis" : 223,
	"indexBounds" : {
 
	},
	"filterSet" : false
}

With the MultiPolygon structure form 2.6, the index is not multiKey. This is the index:

	{
		"v" : 1,
		"key" : {
			"from" : 1,
			"geometry" : "2dsphere"
		},
		"name" : "from_1_geometry_2dsphere",
		"ns" : "de_live.timeslots",
		"2dsphereIndexVersion" : 2
	}

There is no other 2dsphere index in this collection.

This is the query:

{
	"from" : {
		"$gte" : ISODate("2014-06-19T00:00:00Z"),
		"$lt" : ISODate("2014-06-20T00:00:00Z")
	},
	"geometry" : {
		"$near" : {
			"$geometry" : {
				"type" : "Point",
				"coordinates" : [
					13.4059717,
					52.5208876
				]
			},
			"$maxDistance" : 1
		}
	}
}

Thanks in advance
Fabian

Comment by David Storch [ 17/Jun/14 ]

Hi FabFuel,

Thanks very much for the bug report. It looks like you are running into a known issue that affects versions 2.6.0 and 2.6.1: see SERVER-13687. The fix is already available in the 2.6.2 release. Would you be able to test 2.6.2 and confirm that the issue is fixed? I'm going to resolve as a duplicate, but please reopen if you experience further related problems in 2.6.2.

Just to confirm, the {from:1, 'location.geometry' : '2dsphere'} index is multikey, correct?

Best,
Dave

Generated at Thu Feb 08 03:34:19 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.