[SERVER-20614] Add a new method to "quickly jump" documents in a cursor (that has been skipped and already iterated). Created: 22/Sep/15  Updated: 06/Oct/15  Resolved: 05/Oct/15

Status: Closed
Project: Core Server
Component/s: Querying
Affects Version/s: 3.0.4
Fix Version/s: None

Type: New Feature Priority: Major - P3
Reporter: Andrea Di Cesare Assignee: Unassigned
Resolution: Won't Fix Votes: 0
Labels: feature
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Participants:

 Description   

Add a new method to "quickly jump" documents in a cursor (that has been skipped and already iterated).

This might be done either allowing to invoke skip() twice (in a way that the second time it starts skipping from current position) or providing a method similar to next() that does not actually return data (at least avoiding network traffic between MongoDB and the client).

The usage scenario is the following:

We have a huge collection and created a cursor skipping N documents (being N big, say 10 millions). We retrieve some documents, and now we need to jump another M documents.

We have two options:
1- Create another cursor and skipping it N+M times;
2- Invoke next() M times on the original cursor.

Being N big, option 1 results slow.

On the other side, next() is less efficient than skip() to "just jump" documents. Our tests suggest that using next() M times (option 2) is worthwhile only if M is less than 10% of N.

I'm from the RESTHeart team (http://restheart.org), and we met this need implementing a DBCursor Pool to speedup queries on collections (that already speedups things up to 1000% according to performance tests).

For some reference about the idea, have a look at: https://softinstigate.atlassian.net/wiki/x/h4CM

Performance test results:
https://softinstigate.atlassian.net/wiki/x/gICM

For some code:
https://github.com/SoftInstigate/restheart/blob/develop/src/main/java/org/restheart/db/DBCursorPool.java



 Comments   
Comment by Andrea Di Cesare [ 06/Oct/15 ]

Hi Jason

I think you misunderstood the proposal: it is not about implementing a complex "mechanism to advance a cursor by an arbitrary number of results".

It is rather to have a method similar to OP_NEXT that "advance the cursor by one result at a time" but avoiding to actually send data back to the client over the wire.

This way, there is no need for a complex development, and we can save bandwidth usage (that can be the most expensive task) in the scenarios that I have described.

Comment by J Rassi [ 05/Oct/15 ]

Hi,

If I understand correctly, this feature request is for an efficient mechanism to advance a cursor by an arbitrary number of results. By design, MongoDB cursors are only allowed to be advanced by one result at a time; implementing this feature would be a large undertaking, and I am of the opinion that it's the application's job to implement optimizations such as the ones discussed in the ticket description.

As such, I am resolving this ticket as "Won't Fix".

~ Jason Rassi

Comment by Andrea Di Cesare [ 24/Sep/15 ]

Jeff,

not decoding BSON would be indeed an improvement, even if small compared to the network transfer time.

I really think that what I propose would be a great improvement to MongoDB, since we all know that skip() can have low performance (even the official documentation suggests to avoid it in favor of range-based pagination) and this way we better support optimization strategies such as our cursor pools..

Do you thing it would be possible extending the wire protocol with the OP_JUMP message:

OP_JUMP

struct {
    MsgHeader header;              // standard message header
    int32     ZERO;                      // 0 - reserved for future use
    cstring   fullCollectionName; // "dbname.collectionname"
    int32     *numberToJump;*    // number of documents to jump
    int64     cursorID;                  // cursorID from the OP_REPLY
}

The response would be a OP_REPLY message sent by the database will null documents field and numberReturned being the actual number of jumped documents.

struct {
    MsgHeader header;          // standard message header
    int32     responseFlags;    // bit vector - see details below
    int64     cursorID;              // cursor id if client needs to do get more's
    int32     startingFrom;        // where in the cursor this reply is starting
    int32     numberReturned; // number of documents in the reply
    document* documents;      // documents
}

Then implementing cursor.jump() on the Java driver should be easy....

Thanks for your time

Comment by Jeffrey Yemin [ 24/Sep/15 ]

Andrea,

I don't see how it's possible to avoid the network data transfer time. MongoDB cursors just don't work that way. After the skip on the initial query, there is no way to ask the server to skip again. A driver has to read all documents in order off the wire through a series of get-more messages and responses.

The best we could do would be to avoid the BSON decoding costs by just reading the bytes off the wire but not turning them into DBObject instances.

Comment by Andrea Di Cesare [ 24/Sep/15 ]

Since invoking skip() twice on the same cursor is a no option, I propose to implement the method:

jump(int n) Moves the cursor ahead by n.

as opposed to:

next() Returns the object the cursor is at and moves the cursor ahead by one.

jump() basically does everything next() does but actually retrieving the DBObject to save network data transfer time.

Would this be possible?

Comment by Jeffrey Yemin [ 23/Sep/15 ]

How do you propose that a subsequent skip would be implemented by the driver?

Consider that the skip is part of the OP_QUERY message that the driver sends to the server. See for details. But OP_GET_MORE messages has no such thing, so once the initial skip is done, the driver must retrieve the remaining query results in order.

Let me know if that doesn't jibe with your understanding.

Generated at Thu Feb 08 03:54:44 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.