[SERVER-6099] Advanced Queries $gt and $lt take longer in 'indexed' collection Created: 15/Jun/12 Updated: 11/Jul/16 Resolved: 26/Jun/12 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Querying |
| Affects Version/s: | 2.0.5 |
| Fix Version/s: | None |
| Type: | Question | Priority: | Critical - P2 |
| Reporter: | btd5nerds | Assignee: | Kristina Chodorow (Inactive) |
| Resolution: | Done | Votes: | 0 |
| Labels: | query | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
linux |
||
| Attachments: |
|
| Participants: |
| Description |
|
My collection has 1 million data. I am performing this query to retrieve In case 1, the pid (player id) of the collection isn't indexed. Everything is still ok. BUT in case 2, i indexed pid, which supposed to provide a faster query. However, I have 2 questions here: |
| Comments |
| Comment by Kristina Chodorow (Inactive) [ 26/Jun/12 ] | |
|
Great! | |
| Comment by btd5nerds [ 26/Jun/12 ] | |
|
oO. This makes everything clearer! Kristina, thanks a lot! This can be considered resolved. | |
| Comment by Kristina Chodorow (Inactive) [ 26/Jun/12 ] | |
|
You're welcome! Sorry it's taken me a while to get back to you, I've been trying to figure this out. The likelihood of a query yielding is proportional to how long it runs and the two $or clauses are treated as separate queries. Later $or clauses can be slower than earlier ones because they do the de-duping. So, if >700k is slower and second, it is more likely to hit the yield threshold. Also, yielding will only occur when there's a write waiting to happen. I'm not sure what your system is doing, but that might be a factor. | |
| Comment by btd5nerds [ 22/Jun/12 ] | |
|
oO. That's a great info. Anyway, thanks for the explanation. | |
| Comment by Kristina Chodorow (Inactive) [ 22/Jun/12 ] | |
|
Sure! MongoDB has fairly coarse locking (in 2.0, one read/write lock per process, in 2.2 it'll be one R/W lock per database). To prevent writes from getting stuck behind long-running reads, queries occasionally yield the read lock so some writes can go, then they'll grab it again. The nyields field in explain() output shows you how many times a query yielded the read lock. | |
| Comment by btd5nerds [ 22/Jun/12 ] | |
|
oO. I think you have just mentioned the term that's unfamiliar to me: yield. Could you elaborate a little bit what is yield? I have searched around but didn't get a clear picture about it. Thanks in advance. | |
| Comment by Kristina Chodorow (Inactive) [ 22/Jun/12 ] | |
|
> "2%-40%" Yeah, it varies on data/access patterns/query/indexes, sorry :-/ I'd generally time both for anything returning more than 20%. > I thought they are supposed to consume the similar period of time, but they don't (and the difference is 2 > times). Is this expected? At least in the example you gave, (2) yields once, which would explain the difference in time. Does (2) consistently yield (nyields>0) and (1) does not? If it does not always yield, is the difference in query time present when nyields=0? | |
| Comment by btd5nerds [ 22/Jun/12 ] | |
|
First, thanks for the reply. "2%-40%" | |
| Comment by Kristina Chodorow (Inactive) [ 21/Jun/12 ] | |
|
It looks like the issue is that using an index is less efficient that a table scan when you're returning a large hunk of your data set, which is a known limitation of databases in general (relational and non-relational). The overhead of going from index entry to doc for each element match is not worth the overhead once you're returning a certain percent of your data (estimates vary from 2%-40% of your data... kind of depends on what you're doing). You can use .hint({$natural:1}) to force a table scan. | |
| Comment by btd5nerds [ 18/Jun/12 ] | |
|
fyi, the 'gcid' in the image is actually the 'pid' (i just rename it) | |
| Comment by btd5nerds [ 18/Jun/12 ] | |
|
Hi Kristina, I have uploaded 3 images:
b) indexOr.bmp
c) indexOr2.bmp
The difference of b) and c) can grow very large if collection is expanded. | |
| Comment by Kristina Chodorow (Inactive) [ 15/Jun/12 ] | |
|
Can you do an explain on each of the options? E.g.,
and so on? |