[SERVER-16505] .explain on query with $gt never finish Created: 11/Dec/14 Updated: 15/May/15 Resolved: 15/May/15 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Index Maintenance, Querying |
| Affects Version/s: | 2.6.6 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Critical - P2 |
| Reporter: | Hendrik Swanepoel | Assignee: | Ramon Fernandez Marina |
| Resolution: | Cannot Reproduce | Votes: | 1 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
| Operating System: | ALL |
| Participants: |
| Description |
|
We have a collection with approximately 5 million records. When we query the collection with
we get the cursor back very quickly. Similar with a count. When we add a .explain() to the query, it never completes (we waited for 30 minutes) and our disks go beserk with read activity. When we change the query to
and run it with .explain(), it behaves as expected. As an aside: We had issues creating that index in production, where we've got replica sets and sharding (and 10 mil records). We pulled some of our data into a standalone instance in order to play around with the index without affecting production. When we had issues in production, it had similar symptoms (disks would be slammed with read activity). We didn't run .explain() to trigger our production issue, but it seems spookily similar to this issue. Just deleting the index would cause the system to operate normally again. |
| Comments |
| Comment by Ramon Fernandez Marina [ 20/Mar/15 ] | |||||||||||||||||||||||||||||||||||
|
The "explain" output with limit(1) is not very useful as it shows that the query is doing a collection scan (which is fast enough for one document). As for the "query_plan.json" output, it shows:
This looks like a large index scan was performed, followed like lots of fetching from the disk; in other words, the data was "cold", and the server yielded many times while retrieving data from disk. The part that I haven't seen before is the "Date()" in the index bounds:
I'll look into this, but so far I haven't had any luck reproducing. hendrik@22seven.com, does a repeated run of the query with explain(true) show the same problem after the data has been paged into memory? | |||||||||||||||||||||||||||||||||||
| Comment by Hendrik Swanepoel [ 13/Jan/15 ] | |||||||||||||||||||||||||||||||||||
|
I attached a file containing the output after letting it run for 50 minutes. It seems like it's hitting the index, but then scanning all the documents? (query_plan.json) | |||||||||||||||||||||||||||||||||||
| Comment by Mor G [ 11/Jan/15 ] | |||||||||||||||||||||||||||||||||||
|
Happens to me too. db.collection_name.find() - returns normally Attached the output of the explain result with limit(1) | |||||||||||||||||||||||||||||||||||
| Comment by Mor G [ 11/Jan/15 ] | |||||||||||||||||||||||||||||||||||
|
Explained output | |||||||||||||||||||||||||||||||||||
| Comment by Ramon Fernandez Marina [ 11/Dec/14 ] | |||||||||||||||||||||||||||||||||||
|
hendrikswan, can you clarify whether this issue appears on a standalone instance? I'm unable to reproduce this behavior on my end:
How do your documents look like and what other indexes do you have? My guess is that the query planner is using an index that, while it appears it's a better choice, ends up being sub-optimal and causes a table scan somewhere. Can you try adding a limit(1) to the query with explain() and post the output? Here's the equivalent for the example above:
|