[SERVER-54797] Consider behaving similarly for non-positive limits before and after optimizations Created: 25/Feb/21 Updated: 06/Dec/22 |
|
| Status: | Backlog |
| Project: | Core Server |
| Component/s: | Aggregation Framework |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Question | Priority: | Major - P3 |
| Reporter: | Mathias Stearn | Assignee: | Backlog - Query Optimization |
| Resolution: | Unresolved | Votes: | 1 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Assigned Teams: |
Query Optimization
|
| Sprint: | Query Execution 2021-03-08, Query Execution 2021-03-22 |
| Participants: |
| Description |
|
We currently reject {$limit: 0}:
But we allow it to happen after optimizations:
This seems inconsistent. Either we should reject everything like this since it is likely to indicate user error, or we should allow both. I haven't tested on a sharded system, but I could imagine the current behavior causing an issue there since we would serialize a {$limit:0} stage after optimization on mongos. |
| Comments |
| Comment by Mathias Stearn [ 04/Jan/22 ] | |||||
|
kevin.morte@hotmail.com {$limit: 0} meaning "no limit" seems quite wrong to me. If we are to allow 0, it should mean the same as any other number in that position, namely that at most 0 (or 1, 2, 3, etc) documents are allowed to progress from the $limit stage into the next one. Anything else would be inconsistent. If we really need to allow users to express "no limit" explicitly when using a $limit stage, I think {$limit: null} is a better way to express that, since null is a better way to express the lack of a quantity, while zero is still a specific quantity. Using the bank account example, balance: €0 means that you have a an account with 0 Euro in it. But you still have an account, and fees may drag it negative. balance: null on the other hand may be a way to signal that you don't even have an account (Not actually suggesting that this is a good schema though). | |||||
| Comment by kevin Morte i Piferrer [ 20/Dec/21 ] | |||||
|
I'm probably the least knowlegable in the conversation, but I sincerly believe than any thing else as allowing limit: 0 to operate as no limit was placed is wrong. Apart from the great convenience from a development perspective, I truely believe it has great semantics. The number zero is equiparable for almost every culture in the world as the concept of "nothing", "nada", "nichts", "res", etc. So when we state on a logic, limit(0) is very understandable that the writter means "no limits". And regarding a query would perfectly convey the idea of "perform no limits to this query". The only way a can understand others disagreeing is if somehow the concept of the method "limit" is misunderstood for a concept like batch or bulk. Only in that case, a batch of zero would correctly mean "I want to fetch zero items" or "I want to fetch nothing", which, obviously, would have not much sense. We would all agree that If I say I have a banc account with zero euros means that I don't have money in that account. In our case, a limit of zero would mean that there is that I don't have a limit for the query. Finally, I would also thing that anything different from a positive integer should mean no limit. Just as a saveguard of a retrieval. But allowing zero as no limit should be enough for an efficient writting. Thanks | |||||
| Comment by Mathias Stearn [ 06/Apr/21 ] | |||||
|
I think part of the problem is that we are lacking a mechanism for warnings and/or hints. So for every possible pipeline we need to decide between accept or reject, even when we know that something is probably a typo or thinko. Ideally, we would have a way of saying to the user "hey, I know what your request was syntactically valid, but are you really sure that is what you mean to do because it seems kinda silly." I know we can print to the logs, but that seems like a poor solution for this kind of thing since A) the person reading the log may not be the person writing the query, and even if it was, it would be better if they get the feedback "inline" rather than somewhere else and B) this query could be run 1000 times per second and we definitely don't want to log every time. {$limit:0} Seems like something that falls in that category of being likely an indicator of a bug, but also, something that seems like it should be syntactically valid. The case for {$limit: -1} being valid is a bit weaker, but I'm also not 100% convinced that it should be syntactically invalid any more than {$limit:2}, {$skip:3} should be. Both seem like things that would be handled as warnings if we had a warning mechanism. And it feels more natural to have warnings that depend on optimizations than errors. But since we only have errors, it doesn't seems completely wrong to have errors that trigger or not based on optimizations. I'm pretty sure we have some cases like this already, or at least some cases where internal implementation details decide whether you see an error or not based on the order that we happen to evaluate things. | |||||
| Comment by David Percy [ 01/Apr/21 ] | |||||
If we decided that {$limit:2}, {$skip:2} should be an error, then should {$limit:2}, {$match:{x:'abc'}}, {$skip:2} also be an error? Is it ok for the error to depend on how much optimization happens?
{$limit: 0} seems like a useful tool for machine-generated queries. For example:
If the application plugs in 0 for one of those $limits, the query overall can still be useful. | |||||
| Comment by Anton Korshunov [ 18/Mar/21 ] | |||||
|
I can't say I agree with that. In case of {$limit: 0} it's a straight validation error because we don't want users to be able to set 0 limit (mainly due to semantic concerns, as described in What's interesting, we do allow limit(0) in find, which means no limit, and this is where the real inconsistency is, but this is just one of many MQL quirks. | |||||
| Comment by Mathias Stearn [ 18/Mar/21 ] | |||||
|
To me, the inconsistency is that {$limit:2}, {$skip:2} seems like exactly as much an indication of user-error as {$limit: 0} is. So I think we should either error on both or neither. I suppose there is an argument either for only locally rather than globally checking for likely errors, or that this is simply the input validation of $limit, but both feel a bit like arguing from implementation that rational design. | |||||
| Comment by Anton Korshunov [ 18/Mar/21 ] | |||||
|
I don't think there is any inconsistency here, but there is definitely a room for improvement. Firstly, when [{$limit:0}] is specified explicitly by the user, we consider it as an error, as we expect only positive values. In the same time, there is nothing wrong from a user perspective with specifying [{$limit:2}, {$skip:3}]. In other words, explicitly setting zero limit, and issuing a contradictory query is not the same thing. We could go unoptimized and do a LIMIT 2 followed by SKIP 3 and still would get the same empty result. At present, when we push the limit/skip down into the find layer, we translate it into a LIMIT 0/SKIP N plan. However, we could, and perhaps should, generate an EOF sub-plan instead, so that there no LIMIT 0 at all. We can also be smarter on mongos and avoid establishing cursors to the shards if we know upfront no result will be returned. I'm not sure though what it would involve to produce an EOF-style pipeline on mongos though. | |||||
| Comment by Charlie Swanson [ 25/Feb/21 ] | |||||
|
anton.korshunov can you or someone on your team see if there's a real bug here with sharding? |