[JAVA-344] Cursor.skip should return the number of documents actually skipped Created: 02/May/11  Updated: 07/Mar/14  Resolved: 07/Mar/14

Status: Closed
Project: Java Driver
Component/s: API
Affects Version/s: 2.5.3
Fix Version/s: None

Type: Improvement Priority: Minor - P4
Reporter: Tal Liron Assignee: Unassigned
Resolution: Won't Fix Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   

Example: If I call Cursor.skip(100), but there are only 84 documents in the cursor, I would want to be sure that indeed only 84 were skipped.

As it stands now, without this information, Cursor.skip is of limited use, and in many cases documents will have to be read one by one in order to count them.



 Comments   
Comment by Jeffrey Yemin [ 07/Mar/14 ]

This can't be fixed by the driver, as it would require a wire protocol change to implement. See http://docs.mongodb.org/meta-driver/latest/legacy/mongodb-wire-protocol/#op-reply.

Comment by Tal Liron [ 02/May/11 ]

OK, I do understand that the server has to iterate the documents, but if they are not being retrieved, then it is an efficient implementation, and it's preferable to use skip() over next().

And though your suggestion is good, as I said it's not always possible. It also doesn't address the first use case I presented, which I actually use a lot in one of my applications.

I would still strongly recommend implementing the enhancement as I suggested it: it's a rather small feature, but can go a long way towards optimizing many cases in which skip() is used. Without it reporting how many documents it skipped, skip() cannot be used in exactly those situations for where its optimized performance is desired.

Comment by Antoine Girbal [ 02/May/11 ]

skip() is still faster than iterating through docs.
If you do iterate it will actually make driver retrieve all these docs from the server.
With skip it tells server to omit these documents from the result, which is quite faster.
But mongod still has to iterate through documents on server side (either on disk or in index) until it finds the 1st doc to return.
Say if you do skip(1000) and use an index, it will have to iterate through 1000 index entries before returning data.
Try to use a query condition if you can.

Comment by Tal Liron [ 02/May/11 ]

Hm, I actually did not realize that skip() was handled so inefficiently. So, in fact, it seems that it is a driver-made convenience method, and not a supported feature in MongoDB. Am I correct? In that case, there is no advantage to using skip() over iterating documents in the cursor one by one using next(): they are the same.

Please confirm that I am correct! If so, I will open a new issue in the Core component to ask for an efficient skip() feature to be implemented for cursors.

Great point about an alternative paging implementation, by the way! You are right that it is more efficient to have the query in advance be limited to the page sought, rather than using skip() or even limit(). However, it's not always possible, easy, or even desirable to implement such an optimization. An efficient skip() implementation would be more generally useful.

Comment by Antoine Girbal [ 02/May/11 ]

Note that if possible it is better to avoid skip() to do any sort of efficient paging.
While it is a convenient solution, the DB must iterate through these objects before returning the 1st document.
This means that if you have thousands of documents or more, showing the skip result can be quite slow.
The best way to implement paging is to order documents according to a key, and request keys higher than a certain value with a limit.
This way the 1st document can be found using the index.
You do need to remember the highest returned key for a given page, and if a user skips forward you may need to request each page.
Less intuitive but also more efficient.

Comment by Tal Liron [ 02/May/11 ]

1) It is useful to chain cursors together, for example to combine documents from various collections, or from different queries in the same collection (like UNION in SQL). However, in such cases, we cannot use Cursor.skip at all anymore. For example, imagine that we want to skip our chained cursor by 200 documents. But if we don't know how many document were actually skipped in the first cursor of the chain, we cannot reliably skip the remaining documents in the next cursor. The only solution is to use next() and read documents one at a time: very ineffecient.

2) Also, many applications that involve optimized paging could use this feature. For example, imagine a UI widget that shows only 20 documents at a time, allowing the user to page forward or backward. If the user says "go to page 10", that would mean 200 documents forward. But if only 84 documents were skipped, then it would reasonable to show the page indicator as page 4. As it stands, we can only give the user an error message, and the user can continue guessing until they reach the last page.

In both these cases, you can argue that count() can be used to calculate the size, but this an extra, non-atomic operation (thus possibly already returning the wrong number), and also seems wasteful considering that the skip() operation should know what is being skipped and can report it.

Comment by Antoine Girbal [ 02/May/11 ]

thread info on this:
http://groups.google.com/group/mongodb-user/browse_thread/thread/9b200bab47d9497e

Tal,
could you give precise example of application for this.
Why does it matter that you app needs to know how many documents where actually skipped?
thanks

Generated at Thu Feb 08 08:52:03 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.