[SERVER-20614] Add a new method to "quickly jump" documents in a cursor (that has been skipped and already iterated). Created: 22/Sep/15 Updated: 06/Oct/15 Resolved: 05/Oct/15 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Querying |
| Affects Version/s: | 3.0.4 |
| Fix Version/s: | None |
| Type: | New Feature | Priority: | Major - P3 |
| Reporter: | Andrea Di Cesare | Assignee: | Unassigned |
| Resolution: | Won't Fix | Votes: | 0 |
| Labels: | feature | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Participants: |
| Description |
|
Add a new method to "quickly jump" documents in a cursor (that has been skipped and already iterated). This might be done either allowing to invoke skip() twice (in a way that the second time it starts skipping from current position) or providing a method similar to next() that does not actually return data (at least avoiding network traffic between MongoDB and the client). The usage scenario is the following: We have a huge collection and created a cursor skipping N documents (being N big, say 10 millions). We retrieve some documents, and now we need to jump another M documents. We have two options: Being N big, option 1 results slow. On the other side, next() is less efficient than skip() to "just jump" documents. Our tests suggest that using next() M times (option 2) is worthwhile only if M is less than 10% of N. I'm from the RESTHeart team (http://restheart.org), and we met this need implementing a DBCursor Pool to speedup queries on collections (that already speedups things up to 1000% according to performance tests). For some reference about the idea, have a look at: https://softinstigate.atlassian.net/wiki/x/h4CM Performance test results: For some code: |
| Comments |
| Comment by Andrea Di Cesare [ 06/Oct/15 ] | |||||||||||||||
|
Hi Jason I think you misunderstood the proposal: it is not about implementing a complex "mechanism to advance a cursor by an arbitrary number of results". It is rather to have a method similar to OP_NEXT that "advance the cursor by one result at a time" but avoiding to actually send data back to the client over the wire. This way, there is no need for a complex development, and we can save bandwidth usage (that can be the most expensive task) in the scenarios that I have described. | |||||||||||||||
| Comment by J Rassi [ 05/Oct/15 ] | |||||||||||||||
|
Hi, If I understand correctly, this feature request is for an efficient mechanism to advance a cursor by an arbitrary number of results. By design, MongoDB cursors are only allowed to be advanced by one result at a time; implementing this feature would be a large undertaking, and I am of the opinion that it's the application's job to implement optimizations such as the ones discussed in the ticket description. As such, I am resolving this ticket as "Won't Fix". ~ Jason Rassi | |||||||||||||||
| Comment by Andrea Di Cesare [ 24/Sep/15 ] | |||||||||||||||
|
Jeff, not decoding BSON would be indeed an improvement, even if small compared to the network transfer time. I really think that what I propose would be a great improvement to MongoDB, since we all know that skip() can have low performance (even the official documentation suggests to avoid it in favor of range-based pagination) and this way we better support optimization strategies such as our cursor pools.. Do you thing it would be possible extending the wire protocol with the OP_JUMP message: OP_JUMP
The response would be a OP_REPLY message sent by the database will null documents field and numberReturned being the actual number of jumped documents.
Then implementing cursor.jump() on the Java driver should be easy.... Thanks for your time | |||||||||||||||
| Comment by Jeffrey Yemin [ 24/Sep/15 ] | |||||||||||||||
|
Andrea, I don't see how it's possible to avoid the network data transfer time. MongoDB cursors just don't work that way. After the skip on the initial query, there is no way to ask the server to skip again. A driver has to read all documents in order off the wire through a series of get-more messages and responses. The best we could do would be to avoid the BSON decoding costs by just reading the bytes off the wire but not turning them into DBObject instances. | |||||||||||||||
| Comment by Andrea Di Cesare [ 24/Sep/15 ] | |||||||||||||||
|
Since invoking skip() twice on the same cursor is a no option, I propose to implement the method: jump(int n) Moves the cursor ahead by n. as opposed to: next() Returns the object the cursor is at and moves the cursor ahead by one. jump() basically does everything next() does but actually retrieving the DBObject to save network data transfer time. Would this be possible? | |||||||||||||||
| Comment by Jeffrey Yemin [ 23/Sep/15 ] | |||||||||||||||
|
How do you propose that a subsequent skip would be implemented by the driver? Consider that the skip is part of the OP_QUERY message that the driver sends to the server. See for details. But OP_GET_MORE messages has no such thing, so once the initial skip is done, the driver must retrieve the remaining query results in order. Let me know if that doesn't jibe with your understanding. |