[SERVER-4565] Count() takes exceptionally long time/hangs Created: 27/Dec/11  Updated: 06/Apr/23  Resolved: 17/Jan/12

Status: Closed
Project: Core Server
Component/s: Querying
Affects Version/s: 2.0.2
Fix Version/s: None

Type: Question Priority: Minor - P4
Reporter: Brett Goldstein Assignee: Bill Hayward
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Red Hat Linux Enterprise


Attachments: File server-4565.js    
Participants:

 Description   

Having a problem with following query:

db.test.count({"location" : {"$within" : {"$center" : [[ -87, 41], 5]}}, "initial_type" : "x"})

I built an index for geospatial and the above call type. Coordinates are reduced for privacy reasons fyi...

The find() returns exceptionally fast...but the count above just hangs. Any quick thoughts? thx



 Comments   
Comment by Richard Kreuter (Inactive) [ 28/Dec/11 ]

Yes, please make a new issue for the separate topic.

Comment by Brett Goldstein [ 28/Dec/11 ]

i have some questions about faults....should i open a new thread?

Comment by Brett Goldstein [ 28/Dec/11 ]

got the diff between the count and find ...the issue was too much data within the search radius.

what we are doing is lab testing mongodo to be the aggregation point of very large sets of disparate spatial data. the intent would be to run a mapping layer on top to pass a coordinate, data para, radius to return everything in that area. we may also layer on some real-time data...whether it goes to mongo first or comes from a geospatial SDE is TBD. make sense?

Comment by Richard Kreuter (Inactive) [ 28/Dec/11 ]

@Brett: are you saying that find() returns a cursor immediately, or that iterating through all the documents that match is seemingly immediate? The count and the query both traverse a subset of the index to see what documents match the query, but count has to do so immediately, whereas the query only steps through the index as the client requests documents from the server by iterating the cursor. So if you're comparing the performance of

var cnt = db.test.count(...);

and

var cur = db.test.find(...);

then you're comparing very different things.

Anyway, I'm still curious what you're trying to do with this geo query. To reiterate, that radius of 5 in the $within part of your query specifies a circle of about 300-350 miles around your center point. That's a pretty big space, relatively speaking; how much of your data set falls within that area?

Comment by Eliot Horowitz (Inactive) [ 28/Dec/11 ]

Are these lat/long?
If so - 5 is very large - 300 miles or so.

Comment by Brett Goldstein [ 27/Dec/11 ]

ok...been playing with this. find() returns almost instantly...and count() does not (and i am just adding .count() to the end of the query. however, when i radically reduce the radius, the return improves dramatically. my first move was from 5 -> 1 and that did not impact much. However,moving from1 -> .001 has a 1 second return with count() = ~ 11k points.

Comment by Richard Kreuter (Inactive) [ 27/Dec/11 ]

Attempt to reproduce problem described, with randomly-generated data. Possibly meaningless.

Comment by Richard Kreuter (Inactive) [ 27/Dec/11 ]

Have you compared how long it takes to iterate through all the documents that match the find() to how long it takes for count() to return? In local testing (with a randomly generated data set but identical geospatial index), the count() takes a similar amount of time as iterating through all the documents from the find().

(Additionally, note that the geo part of your query searches 5 degrees of arc over a hypothetical great circle on the earth, or about a 350 mile radius around [-87, 41]. If you have a lot of points within that radius and if any specific value for initial_type is relatively sparsely distributed within that search space, the "haystack" index may be more suitable. See:

http://www.mongodb.org/display/DOCS/Geospatial+Haystack+Indexing

But it's also possible that the geo search isn't necessary at all. Could you say any more about what you're trying to find?)

Comment by Brett Goldstein [ 27/Dec/11 ]

originally when i first saw the count() issue it was 24 million...now 44808486

Comment by Dwight Merriman [ 27/Dec/11 ]

what does db.test.count() return?

Generated at Thu Feb 08 03:06:21 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.